Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data exports #101

Merged
merged 107 commits into from
Mar 4, 2020
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
dbb0dd2
new AzureAdapter class, add azure storage creds
Dec 27, 2019
d44b986
outlined class methods for azure adapter
Dec 30, 2019
44fd4f2
add zip file generator class
nciemniak Jan 16, 2020
9c5bf04
renaming azure adapter as azureblobstorage for now
nciemniak Jan 17, 2020
c5c7588
add azure-storage-blob gem
nciemniak Jan 17, 2020
9e58e7a
start building out transcription file generator
nciemniak Jan 17, 2020
0d85a2b
add file for data export spec
nciemniak Jan 17, 2020
1bc1cd0
sketching out TranscripFileGen methods
nciemniak Jan 17, 2020
f847861
flesh out TranFileGen methods
nciemniak Jan 17, 2020
26ec578
flesh out TranFileGen methods part2
nciemniak Jan 17, 2020
340d033
add method to put mutiple blobs on azure
nciemniak Jan 22, 2020
306f0fa
generating files and exporting to azure
nciemniak Jan 22, 2020
dcd1072
delete unused spec for now bc its breaking build
nciemniak Jan 22, 2020
b874514
new AzureAdapter class, add azure storage creds
Dec 27, 2019
36f6bf4
outlined class methods for azure adapter
Dec 30, 2019
24aebbd
add zip file generator class
nciemniak Jan 16, 2020
afbad86
renaming azure adapter as azureblobstorage for now
nciemniak Jan 17, 2020
73d47eb
add azure-storage-blob gem
nciemniak Jan 17, 2020
93aada8
start building out transcription file generator
nciemniak Jan 17, 2020
86b7b80
add file for data export spec
nciemniak Jan 17, 2020
3d93d29
sketching out TranscripFileGen methods
nciemniak Jan 17, 2020
262abf9
flesh out TranFileGen methods
nciemniak Jan 17, 2020
21e5ff0
flesh out TranFileGen methods part2
nciemniak Jan 17, 2020
2950521
add method to put mutiple blobs on azure
nciemniak Jan 22, 2020
8633e85
generating files and exporting to azure
nciemniak Jan 22, 2020
9795cd1
delete unused spec for now bc its breaking build
nciemniak Jan 22, 2020
5974fc2
Merge branch 'data-exports' of github.com:zooniverse/tove into data-e…
nciemniak Jan 22, 2020
0fbf985
trigger data save to azure on approval
nciemniak Jan 23, 2020
7291519
move storage directory method out of GenTranFile
nciemniak Jan 23, 2020
e2853a9
add delete file functionality
nciemniak Jan 23, 2020
6b48e4c
fix arguments error
nciemniak Jan 24, 2020
9209d02
add metadata files to export process
nciemniak Jan 24, 2020
76c1ec8
delete stored files on tran unapproval
nciemniak Jan 24, 2020
054e68e
add route for export
nciemniak Jan 27, 2020
27411af
placeholder method for transcription export
nciemniak Jan 27, 2020
f139efa
azure blob: get file
nciemniak Jan 27, 2020
862fa9a
download files method - initial commit
nciemniak Jan 27, 2020
5e5a09c
add baseline active_storage setup
nciemniak Jan 28, 2020
7f9ab6a
add active_storage to transcription
nciemniak Jan 28, 2020
6fa9710
development credentials file update
nciemniak Jan 29, 2020
f88ace7
update the way azure credentials are accessed
nciemniak Jan 30, 2020
47a6c8a
set up working placeholder method for export
nciemniak Jan 30, 2020
917d99e
progress on file zipping and downloading
nciemniak Jan 30, 2020
2a8a664
delete unneeded azure_blob_storage file
nciemniak Jan 30, 2020
d577f71
update file saving process and export process
nciemniak Jan 30, 2020
0922d80
extract export to common app controller method
nciemniak Jan 31, 2020
c2c3a51
create different paths for zipping proj/workflow
nciemniak Jan 31, 2020
d65991a
start adding tests for this big ol castle of code
nciemniak Jan 31, 2020
423217d
reorg DataStorage to have method for each resource
nciemniak Feb 5, 2020
496132e
use send_data to allow for deleting file
nciemniak Feb 5, 2020
c04e9d4
add export method to project
nciemniak Feb 5, 2020
ba39380
add functionality for group export to controller
nciemniak Feb 5, 2020
a586a21
start DataStorage spec
nciemniak Feb 5, 2020
945d96b
add tests for exporting transcription collections
nciemniak Feb 5, 2020
33efb80
Merge branch 'master' into data-exports
nciemniak Feb 5, 2020
6314143
separate method for export_group; apply roles
nciemniak Feb 6, 2020
dee0193
add authorization for project and workflow export
nciemniak Feb 6, 2020
92c690a
tests for project and workflow exports
nciemniak Feb 6, 2020
91a9ada
zip file generator spec
nciemniak Feb 7, 2020
eeb3631
zip generator spec: remove file created by test
nciemniak Feb 7, 2020
c06c185
use tempfile when generating approved tran files
nciemniak Feb 7, 2020
07286a4
remove :focus => true, whoops
nciemniak Feb 7, 2020
6852ffa
add missing data peices to file generator
nciemniak Feb 10, 2020
4f96013
split export method up by proj/wf/group/transcrip
nciemniak Feb 11, 2020
9747c64
aggregate all metadata into single file
nciemniak Feb 13, 2020
2e94a24
put data exports classes into their own files
nciemniak Feb 13, 2020
cf0666b
fill out agg file generator spec
nciemniak Feb 13, 2020
df9f733
switch to using tempdir in data_storage
nciemniak Feb 14, 2020
38869d3
add spec for TranscriptionFileGenerator
nciemniak Feb 14, 2020
b32966d
staging creds: include new credentials from master
nciemniak Feb 14, 2020
72353f2
Merge branch 'master' into data-exports
nciemniak Feb 14, 2020
70689e4
remove :focus => true
nciemniak Feb 14, 2020
d19e1d7
Merge branch 'data-exports' of github.com:zooniverse/tove into data-e…
nciemniak Feb 14, 2020
74c967a
memoize update_attrs
nciemniak Feb 17, 2020
f38efe4
make the code more stylish
nciemniak Feb 17, 2020
faf0b84
clean up status_has_changed? method
nciemniak Feb 17, 2020
bcfc440
don't include metadata file in individual t folder
nciemniak Feb 17, 2020
148841a
update format of consensus text file
nciemniak Feb 18, 2020
d9f28b9
change error thrown for export group
nciemniak Feb 18, 2020
27ca166
style update: { } vs do/end
nciemniak Feb 18, 2020
2bdc134
use 'approve?' method in update_tr_exports
nciemniak Feb 18, 2020
17de9a5
Avoid multi line ternaries for readability
nciemniak Feb 18, 2020
9d2f54c
refactor is_text_edited? and add tests for it
nciemniak Feb 18, 2020
c3f6443
refactor how code checks for status change
nciemniak Feb 19, 2020
3f48bcc
clearer wording, eliminating unnecessary code
nciemniak Feb 19, 2020
8c3f386
rename file variable for readability
nciemniak Feb 19, 2020
1dae2bc
rename approve to approving for clarity
nciemniak Feb 19, 2020
6e2b5a1
reorganize methods around storing files
nciemniak Feb 19, 2020
2acbbeb
remove unnecessary var
nciemniak Feb 19, 2020
f7a819e
change methods in agg file gen to class methods
nciemniak Feb 20, 2020
f202769
construct regex prior to loop
nciemniak Feb 20, 2020
85623f1
move guard clause to beginning of method
nciemniak Feb 20, 2020
3b944e6
use full names for variable
nciemniak Feb 20, 2020
30b9e59
Merge branch 'master' into data-exports
nciemniak Feb 20, 2020
208a7c5
use namespacing for export policy scope
nciemniak Feb 20, 2020
b82225d
Merge branch 'data-exports' of github.com:zooniverse/tove into data-e…
nciemniak Feb 20, 2020
da5a3fd
add test for generating edited consensus text
nciemniak Feb 20, 2020
3152050
pass tempfile directly rather than file.open
nciemniak Feb 21, 2020
3c872a4
update how export_group is authorized
nciemniak Feb 25, 2020
ffe9af4
refactoring: move methods from controller to model
nciemniak Mar 2, 2020
0ad3929
switch from instance var to regular var
nciemniak Mar 2, 2020
85223f4
stylistic reworking
nciemniak Mar 2, 2020
2de969e
add test for exporting group with no transcrips
nciemniak Mar 3, 2020
4fdb0cd
remove class instance vars for thread safety
nciemniak Mar 3, 2020
e9bfd31
rename transcription.files to export_files
nciemniak Mar 3, 2020
2f43d0d
rename transcription.files to export_files p2
nciemniak Mar 3, 2020
52be865
DRY up the code
nciemniak Mar 3, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ gem 'puma', '~> 4.3'

gem 'panoptes-client'

gem 'azure-storage-blob'

# jsonapi.rb is a bundle that incorporates fast_jsonapi (serialization),
# ransack (filtration), and some RSpec matchers along with some
# boilerplate for pagination and error handling
Expand Down
12 changes: 12 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,17 @@ GEM
minitest (~> 5.1)
tzinfo (~> 1.1)
zeitwerk (~> 2.2)
azure-core (0.1.15)
faraday (~> 0.9)
faraday_middleware (~> 0.10)
nokogiri (~> 1.6)
azure-storage-blob (1.1.0)
azure-core (~> 0.1.13)
azure-storage-common (~> 1.0)
nokogiri (~> 1.6, >= 1.6.8)
azure-storage-common (1.1.0)
azure-core (~> 0.1.13)
nokogiri (~> 1.6, >= 1.6.8)
bootsnap (1.4.5)
msgpack (~> 1.0)
builder (3.2.3)
Expand Down Expand Up @@ -245,6 +256,7 @@ PLATFORMS
ruby

DEPENDENCIES
azure-storage-blob
bootsnap (>= 1.4.2)
coveralls
factory_bot_rails
Expand Down
41 changes: 41 additions & 0 deletions app/services/azure_blob_storage.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
require 'azure/storage/blob'

class AzureBlobStorage
def initialize
# login
@blob_client = Azure::Storage::Blob::BlobService.create(
storage_account_name: Rails.application.credentials.storage_account_name,
storage_access_key: Rails.application.credentials.storage_access_key
)

# this is equivalent to S3 bucket, where have we stored bucket names
# in the past? I dont wanna keep it here, hardcoded.
# maybe put in the rails credentials file...? not sure.
@container_name = 'data-exports'
end

def put_file(blob_path, file)
content = ::File.open(file, 'rb') { |file| file.read }
@blob_client.create_block_blob(@container_name, blob_path, content)
end

# receive a list of hashes formatted as
# { blob_path: <path_to_blob>, file: <file_name> }
def put_files_multiple(file_list)
file_list.each do |f|
put_file(f[:blob_path], f[:file])
end
end

def get_file(path)
# to do
end

def delete_file
# to do
end

def get_files(prefix)
# to do
end
end
94 changes: 94 additions & 0 deletions app/services/data_exports.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
require 'fileutils'
require 'securerandom'

module DataExports

class DataExporter
def export_transcription(transcription_id)
file_generator = TranscriptionFileGenerator.new(transcription_id)
file_list = file_generator.generate_transcription_files

azure = AzureBlobStorage.new
azure.put_files_multiple(file_list)

# we are done with the files, delete the temp directory and its contents
file_generator.delete_temp_directory
end
end

class TranscriptionFileGenerator
def initialize(transcription_id)
@transcription = Transcription.find(transcription_id)
@directory_path = File.expand_path("~/transcription_files_temp/t#{transcription_id}_#{SecureRandom.uuid}")

FileUtils.mkdir_p @directory_path
end

def generate_transcription_files
file_list = []
blob_directory = generate_blob_directory(@transcription)

# raw data file
file_path = write_raw_data_to_file(@directory_path)
blob_path = File.join(blob_directory, "raw_data_#{@transcription.id}.json")
file_list.append({ :blob_path => blob_path, :file => file_path})

# consensus text file
file_path = write_consensus_text_to_file(@directory_path)
blob_path = File.join(blob_directory, "consensus_text_#{@transcription.id}.txt")
file_list.append({ :blob_path => blob_path, :file => file_path})
nciemniak marked this conversation as resolved.
Show resolved Hide resolved
end

def delete_temp_directory
FileUtils.remove_dir(@directory_path)
end
nciemniak marked this conversation as resolved.
Show resolved Hide resolved

private

# creates raw data file
# returns location of the file
def write_raw_data_to_file(directory_path)
file_path = File.join(directory_path, "raw_data_#{@transcription.id}.json")

File.open(file_path, 'w') { |f|
f.puts @transcription.text
}
file_path
end

# creates raw data file
# returns location of the file
def write_consensus_text_to_file(directory_path)
file_path = File.join(directory_path, "consensus_text_#{@transcription.id}.txt")

File.open(file_path, 'w') { |f|
f.puts consensus_text
}

file_path
end

def consensus_text
consensus_text = ""
@transcription.text.each do |key, value|
# if we find a frame, iterate through the lines of the frame
if /^frame/.match(key)
value.each do |line|
puts line["consensus_text"]
consensus_text.concat "#{line["consensus_text"]} "
end
end
end

consensus_text
end

def generate_blob_directory(transcription)
workflow_id = transcription.workflow_id
project_id = Workflow.find(workflow_id).project_id

"approved-transcriptions/#{project_id}/#{workflow_id}/#{transcription.group_id}/#{transcription.id}"
end
end

end
11 changes: 11 additions & 0 deletions app/services/zip_file_generator.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
class ZipFileGenerator
# Initialize with the directory to zip and the location of the output archive.
def initialize(input_dir, output_file)
@input_dir = input_dir
@output_file = output_file
end

def write_to_file
# TO DO
end
end
2 changes: 1 addition & 1 deletion config/credentials/development.yml.enc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
BDFosYTQ4IYJUiWlYbHhn1pF7w615FMHDFWk0HdaLpIsMlnrRF9j8Jz7M1T8f6KyLqMA4IalPb8ZMNDsgjYPPUgVsvPcxnO27fU3T45iR0SFdXu2yZj8J/KCabg/TZC7zK5tz0AwKQKDJbZq4QgRAZJ0GjcLCuRT22EO9bQjj00xhEvKRPESIvLIkfbc9+xVpafrp7ugpuIU1QEe8w40VNSgGuVdcKSysosskUAx5Me7AluxWt2AmmFhfK9D5cWYi5hfSHsg8FadDTYbpuVdIU3I4sTUBGvb336Tnp83VHZSsQ61DN8IVF9yivBPGq4xmAayp7MIBLgg0JQLQ/jCnsDOGdTMnAtnV5Bn+p5WmGFyexx8Obis5frc5LP7dfwsI3PE46HNbifGsxe5O3/VvgRpJzrUrWcmU9njYmbQ5Lor48L5Ffytfw8xgc91Dl/xhJvAVl7UHduAU7YfmLbLqYixtTOgtJ1s3I8oM0W6B3xf+X1Xh3LlPg839Nu5kGEPUIrLeXJfFOCXxoyichzei8agAnYY7n8hm2dYnQQ8fUFuPy0qPkjMzI1XYOKXLD+cHNXF0bmHuKmSutg2IgfuTz2TWIbZvQz05RAKPb111MFqJTYRYDcRaxI2kMbEpTogKWSXV2Co8A==--8m86VjL9ErzM0pYo--lDHySsl4pCt2k5vxlc8nTQ==
xLnPctk/D5tdYwgubgdaXFZynrG9FOSowt/3aR4AdC9Up6Kf/xeNX4+J3829RxmWdj3HnL90vK9QEP+LZq0U8fRHGfB3Om5G2KPeYQHxCcwVh4zKmi2BzcbWpt0+3vmhhmjU/iSwOH1I9y8Ary0tKUzbIATIB+wb+f/dEorWkxe7YU9YT3+Br3FCqTvVnxjuBVy+NxFTFSeIGWTMycTUXlOUmrE2cijIKx5AGXbN2h1x8BOTJehtzGrqQFi/9gMUrYMWOmqHPr3WvWQ8Kb/SqQRWiFxYWYIbNQK5x3KUpISJSI03NvA1QbMxXQl8PKDOFOfp215fyHg4bxT9zSe2o6P9sABx90CEnt1HYYgv+Qgqe1j+SA/qjx6seQ58y2UQpO1xiVhff4BLFJI+Gic4tN/aPtR7z/zxSEpHRcKDKXyVXKXSsxWZxG7/NbIF/9QBdpf/RD9qJJN8v5s7tWGGW0x4vCTkB+121HG2u+PSbDPYkKdhUY9XYCjm1ULx9eYDwrYLVZ7sLd/Mta3I1iC6gOZC80jtFpM8CSOs36Qhuhv1lkZCaWDlVViHp/RGQjS8hLQ39JKscrIj4VIxgqGsfNdBGAF8NpW+uoFircuvfMXaA8Ed0UwE2nJ4pQkPhG9jBiIfzbjb/mcyqwEP6SuHbLO5P+6mPM+NXOtv+DqQ7meuieVVf4Cgs88VDOp2UqlB3T0ECF1/CjEL5ToKIANuBA5S/ijqlXzgTR5qhthhmepSrMes9SvlICzPUsOqzGwgo9DWZBSnGRMlBqdR9oj5ZX5JkjPPzyE+Dmpo3gfSNJhekv7yIghgzPG4R34XllznewFe6k5U0W4SCjMHWjTJ+g0=--l7vzkTuEoY0PFhLC--ox5cxai5P9nHkH2B6OBwRw==
2 changes: 1 addition & 1 deletion config/credentials/staging.yml.enc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
dbVsO7tt8C7NtzVaASevcKoqzdud39C0UwQcPbAvovNBODYd6sSJOpQWBs+DcA8ASX6ZMoRmHEmYmXb8ShQOpF1kDNXdylETILmVts5FBs4xmKguqWsYlNNYqqGagrw1NoZDWUt6TQotM6Rol2B1rQsdVdRbjGUr9NdnKy7liLpZolF9gBIRByHUZXjI/waOzkmJE62EhgDaqWCUdmlCNqk+zw/q0nw83ZxCjwngOKAObEcrGwsrxdIbhHbIWeQAUprZuWdIcvVtajrjul22zyWE31dqCD7L58jFxKnVMiv9a4N28aINAT4D5gnDl7gtmWMZEhSI0Oof0a8rdyCnaNQSjOGCNyPnvF175L+SVsnyH/jrEQy/z4jB0wOH8Hol9sh30k0PlCuNgcP8yzkpmfvuXeklFDueImuDBqJcnpDnYgN4GyhuaZCJ3eVC4s8M8NNcLfEtEkbXi5pJ3jlBLWdMM2nxZjotwhEbvhtAE8dOZ8jhayO15ScH05XVx50A/Ca/cNL4Mcfy/NzbwSr+yXL1HznliC5oxSCe37ENVBe9UifxAsKhbBYTJSrcUew66jZRv81rSKFpAOqSe0PolipJT/VDgHjtusXF7TUw1INgRIdVnNBcas8KAco2CtQnBtr6XGB8uPz+m+yJ6Q3Yp8D1e5Q2h4ygQLVd/pXvYwFNELnQQPgY1Xv5bF5q4NFy9zZfQMhZnjnW2ylvquY95l3ObtdVDqQ05n3rO8bm//IZJPBqDVo62b3UnRa5OUM0+Nb7H772xR5MAgUWL5u9zThA9idcBLpypkwM5++TGWATgDwuTOJB9WNq+JTHYRVo1LL++8m5sMfSQkn7WH2Oi5svhRyFxoy/8gvINbi3FxfQf0Jq7pP0zcKLyYW96JeWcYM/sIgMC78Btl58LoSuIxktiO8EQevXc3zGz38OIGhKeEzf--IRKoGrapAb6uBc5o--jPNtDC+nLdKsVYTodvFa9g==
b5Viyt9F/180us5F0aH4lWUYleVhmOeiNoEcPCKJuOidIIAwPbVecvRyFL7r6nDFF/4r8nV/dJ2vO9OI5Idbkz9mXHAw6w+Cd4w5MdbqLzyREn8VU5a/XXY4uHKTjyVlLKb+E7RzlJy3YeXAl1rrjGmvsFNcMVtYzCZtV2CwZdd5BHrCisZygXMz7EnbYPRd3GSQRMYC24utj25nXGQwkv5Bkv6PfpiF1rAZ5bwDqdMCJTV4HQhn+5SXxtyeJZZ7SUE3gmF5SEkYKq+akUuOTxJZRLZylIG0OFaiG/dWHX5U5oCU7MiEt1gm3Z2/iDbBi7veziskzyDyYZrp7iIQIHVG2LGDHXU2lqcPg1ayYiaVMfB4xfaPLbTIX7sIt6ysGKOw7KySNFaak8ZAoUXlkAeCqlj3noru69th6ph375kn2F+R2/6Dxymfk3l7+NSPCCFrMLBM8KYQOuJ4G38/Gj+l+s2l2+WkyZ5J/Wj0YENniAHwRk4TUSII76gV6dCr90s8Fd2hRkAGvKFyn9ztco+/5QO5e95JsyeJU7fPtnYzei3agSdrf6IC15l86p9wnuVL/G/w6fECP00M2GobED/COnObU/4H8u0EfZtysWZ0bUpnvBviJMtY1cSy+vCegrM1srSW/6SiBuJ6jovyYE8UhzI0ebgF/FWH45xXiGSQjRkj/JX67vqDEY/X07D24f6OgaTfv+NSWfA6b82gANdvXZAfyxhCIxV9XbEoF+jWw1m48iNO57KbbcuvWQWbcW3z5FgJT9jJ9pKNZ4T6KJL8y3njkG3ffud0qkGReKpvhsRrCVMnB4kBSlNkVuBPSd22boAH2USW/+jumJE0Tdjxo/gZZxljswmL6zqo8yWEJsTqhir0VtOzhSzPVTjX3guEUW7velTqi4JclqdWgziKYyFIZJu+BjWYdFuGVA6dL9oApnzhIJj35INQ1ICK+lQbGQs6p/RsI3wCqBQHPg23AHOIgI6G2Bz1tidEyVmIyUaozPzvSOIdaxaWoGDQzDKidY0ZPXQhnJlYXaXGZCJ1+rodgnUYbvgyxRyJK0j+CYRJIkzrR7mlKU6BW0KWSpfNbe/YqpIw53oILpyi9lXUXhHUCsaefENGkVFqFGsc55Yi0hkElu8doxW8xg==--s7gLpzyjZZ/Uj1qh--DwPwGF7dD3xgAfT71g5+HQ==
9 changes: 9 additions & 0 deletions spec/services/data_exports_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
RSpec.describe TranscriptionFileGenerator, type: :service do
describe '#generate_transcription_files' do
let(:transcription){ create :transcription }

it 'does something' do

end
end
end