-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rake task for dumping, restoring and anonymizing user data #1013
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkuku looks good! Just the usual nitpickings. I'm gonna try it out on my machine now and send you a an anonymized dump.
One question: Don't you think that keeping the ids will allow for de-anonymization? Maybe we should shuffle the user ids?
|
||
user.email = "user#{user.id}@rundfunk.com" | ||
user.save! | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkuku you can probably update all the users in one sql statement: https://apidock.com/rails/ActiveRecord/Base/update_all/class
User.update_all(:password, 'xxxxxx')
User.update_all(:latitute, '50 + 0.001 * users.id')
# ... etc
@dkuku there are very similar rake tasks for dumping and restoring already 😲 |
@dkuku okay 👍 I ran the task on my machine. The
Apparently IFAD's chronomodel does not like the db tasks? 😢 @dkuku shall I send you the dump via Slack? |
I had no idea how to do that but now I think that the id's can be dumped to
an array, shuffled and assigned to the users. What do you think?
…On Wed, 9 Jan 2019, 17:08 Robert Schäfer ***@***.*** wrote:
***@***.**** requested changes on this pull request.
@dkuku <https://github.com/dkuku> looks good! Just the usual nitpickings.
I'm gonna try it out on my machine now and send you a an anonymized dump.
One question: Don't you think that keeping the ids will allow for
de-anonymization? Maybe we should shuffle the user ids?
------------------------------
In backend/lib/tasks/db.rake
<#1013 (comment)>
:
> @@ -0,0 +1,54 @@
+namespace :db do
+ desc 'replace user sensitive data with placeholders'
+ task anonymize_user: :environment do
+ if ENV['RAILS_ENV'] == 'production'
+ puts 'You do not want to anonymize production data'
+ else
+ puts 'Anonymizing user data'
+
+ User.all.each do |user|
⬇️ Suggested change
- User.all.each do |user|
+ User.find_each do |user|
------------------------------
In backend/lib/tasks/db.rake
<#1013 (comment)>
:
> @@ -0,0 +1,54 @@
+namespace :db do
+ desc 'replace user sensitive data with placeholders'
+ task anonymize_user: :environment do
+ if ENV['RAILS_ENV'] == 'production'
+ puts 'You do not want to anonymize production data'
+ else
+ puts 'Anonymizing user data'
+
+ User.all.each do |user|
+ user.encrypted_password = user.encrypted_password.truncate(8)
+ user.latitude = 50 + 0.001 * user.id
+ user.longitude = 10 - 0.001 * user.id
+ user.city = "city#{user.id}"
+
+ user.email = ***@***.***"
⬇️ Suggested change
- user.email = ***@***.***"
+ user.email = ***@***.***"
The domain rundfunk.com exists. The domain http://example.org/ is
intended to be used for such purposes.
------------------------------
In backend/lib/tasks/db.rake
<#1013 (comment)>
:
> + desc 'replace user sensitive data with placeholders'
+ task anonymize_user: :environment do
+ if ENV['RAILS_ENV'] == 'production'
+ puts 'You do not want to anonymize production data'
+ else
+ puts 'Anonymizing user data'
+
+ User.all.each do |user|
+ user.encrypted_password = user.encrypted_password.truncate(8)
+ user.latitude = 50 + 0.001 * user.id
+ user.longitude = 10 - 0.001 * user.id
+ user.city = "city#{user.id}"
+
+ user.email = ***@***.***"
+ user.save!
+ end
@dkuku <https://github.com/dkuku> you can probably update all the users
in one sql statement:
https://apidock.com/rails/ActiveRecord/Base/update_all/class
User.update_all(:password, 'xxxxxx')
User.update_all(:latitute, '50 + 0.001 * users.id')
# ... etc
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1013 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3L8xHtPEQwBOaFkHcWfelgq3s6IwNAks5vBiImgaJpZM4ZxNS5>
.
|
@roschaefer you can send it to me via slack. |
Co-Authored-By: dkuku <[email protected]>
Co-Authored-By: dkuku <[email protected]>
|
||
@user_ids = User.ids.shuffle | ||
Broadcast.find_each do |broadcast| | ||
broadcast.creator_id = @user_ids.sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I would not do that because it gets confusing. E.g. @ciremoussadia is working on a PR where we update the user role if you create a broadcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to do proper anonymization 😆 maybe you can do some research if the user id
itself is a means of de-anonymization and if yes, what are the counter-measures?
Maybe it's not necessary after all? 🤷♂️
@dkuku that gem looks OK to me on the first glimpse. Last commit not too old, a couple of stars and focused use case. Does it do sth. to the |
when playing with it yesterday I found that we really want to anonymize the
impressions table so it's not possible to recover what have you voted for.
But the foreign key constrains need to be removed temporary and I need to
find where it is stored-probably a join table because I can't update
Impression.first.creator_id even if I can see it in the rails console.
…On Thu, 10 Jan 2019, 11:10 Robert Schäfer ***@***.*** wrote:
***@***.**** commented on this pull request.
------------------------------
In backend/lib/tasks/db.rake
<#1013 (comment)>
:
> @@ -5,16 +5,25 @@ namespace :db do
puts 'You do not want to anonymize production data'
else
puts 'Anonymizing user data'
-
+ @user_ids = User.ids.shuffle
+ Broadcast.find_each do |broadcast|
+ broadcast.creator_id = @user_ids.sample
I don't know how to do proper anonymization 😆 maybe you can do some
research if the user id *itself* is a means of de-anonymization and if
yes, what are the counter-measures?
Maybe it's not necessary after all? 🤷♂️
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1013 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3L8_xMjCHVBkhp55QMc0XR1-BFl_mSks5vBx-XgaJpZM4ZxNS5>
.
|
@dkuku the foreign key is constraint comes from the database migrations. Ugh, good luck in getting rid of them. |
@dkuku could you please: |
We could namespace the rake tasks |
Created rake task for anonymizing user data. fixes #987
Description
Also for dumping and restoring development database
Currently to use it you need to dump your data
rake db:dump
and backup the filedb/backend.dump
Then you can run
rake db:anonymize_user
which changes current user tableNow you can run
rake db:dump
again to dump anonymized datato restore the original file move it to
db/
and runrake db/restore
Waiting for suggestions to extend this.
Motivation and Context
To be in compliace with current European law we can't get developers access to user data - this rake task annonymizes the user table
How Has This Been Tested?
Tested locally on seeded data - might need some adjustments, I annonymized only the data I currently have access to
alternative solution using shell and temporary database
http://www.michaelkrenz.de/2012/08/05/how-to-anonymize-data-in-a-postgresql-database/
dump data
pg_dump database > original_datadump
create temp database
createdb tempDB
import data to temp
psql tempDB < ./original_datadump
run anonymize script - on the botttom
psql tempDB < ./anonymize_db.sql
dump anonymized data
pg_dump tempDB > anon_dump.sql
delete temporary table
dropdb tempDB