Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data exports #101

Merged
merged 107 commits into from
Mar 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
dbb0dd2
new AzureAdapter class, add azure storage creds
Dec 27, 2019
d44b986
outlined class methods for azure adapter
Dec 30, 2019
44fd4f2
add zip file generator class
nciemniak Jan 16, 2020
9c5bf04
renaming azure adapter as azureblobstorage for now
nciemniak Jan 17, 2020
c5c7588
add azure-storage-blob gem
nciemniak Jan 17, 2020
9e58e7a
start building out transcription file generator
nciemniak Jan 17, 2020
0d85a2b
add file for data export spec
nciemniak Jan 17, 2020
1bc1cd0
sketching out TranscripFileGen methods
nciemniak Jan 17, 2020
f847861
flesh out TranFileGen methods
nciemniak Jan 17, 2020
26ec578
flesh out TranFileGen methods part2
nciemniak Jan 17, 2020
340d033
add method to put mutiple blobs on azure
nciemniak Jan 22, 2020
306f0fa
generating files and exporting to azure
nciemniak Jan 22, 2020
dcd1072
delete unused spec for now bc its breaking build
nciemniak Jan 22, 2020
b874514
new AzureAdapter class, add azure storage creds
Dec 27, 2019
36f6bf4
outlined class methods for azure adapter
Dec 30, 2019
24aebbd
add zip file generator class
nciemniak Jan 16, 2020
afbad86
renaming azure adapter as azureblobstorage for now
nciemniak Jan 17, 2020
73d47eb
add azure-storage-blob gem
nciemniak Jan 17, 2020
93aada8
start building out transcription file generator
nciemniak Jan 17, 2020
86b7b80
add file for data export spec
nciemniak Jan 17, 2020
3d93d29
sketching out TranscripFileGen methods
nciemniak Jan 17, 2020
262abf9
flesh out TranFileGen methods
nciemniak Jan 17, 2020
21e5ff0
flesh out TranFileGen methods part2
nciemniak Jan 17, 2020
2950521
add method to put mutiple blobs on azure
nciemniak Jan 22, 2020
8633e85
generating files and exporting to azure
nciemniak Jan 22, 2020
9795cd1
delete unused spec for now bc its breaking build
nciemniak Jan 22, 2020
5974fc2
Merge branch 'data-exports' of github.com:zooniverse/tove into data-e…
nciemniak Jan 22, 2020
0fbf985
trigger data save to azure on approval
nciemniak Jan 23, 2020
7291519
move storage directory method out of GenTranFile
nciemniak Jan 23, 2020
e2853a9
add delete file functionality
nciemniak Jan 23, 2020
6b48e4c
fix arguments error
nciemniak Jan 24, 2020
9209d02
add metadata files to export process
nciemniak Jan 24, 2020
76c1ec8
delete stored files on tran unapproval
nciemniak Jan 24, 2020
054e68e
add route for export
nciemniak Jan 27, 2020
27411af
placeholder method for transcription export
nciemniak Jan 27, 2020
f139efa
azure blob: get file
nciemniak Jan 27, 2020
862fa9a
download files method - initial commit
nciemniak Jan 27, 2020
5e5a09c
add baseline active_storage setup
nciemniak Jan 28, 2020
7f9ab6a
add active_storage to transcription
nciemniak Jan 28, 2020
6fa9710
development credentials file update
nciemniak Jan 29, 2020
f88ace7
update the way azure credentials are accessed
nciemniak Jan 30, 2020
47a6c8a
set up working placeholder method for export
nciemniak Jan 30, 2020
917d99e
progress on file zipping and downloading
nciemniak Jan 30, 2020
2a8a664
delete unneeded azure_blob_storage file
nciemniak Jan 30, 2020
d577f71
update file saving process and export process
nciemniak Jan 30, 2020
0922d80
extract export to common app controller method
nciemniak Jan 31, 2020
c2c3a51
create different paths for zipping proj/workflow
nciemniak Jan 31, 2020
d65991a
start adding tests for this big ol castle of code
nciemniak Jan 31, 2020
423217d
reorg DataStorage to have method for each resource
nciemniak Feb 5, 2020
496132e
use send_data to allow for deleting file
nciemniak Feb 5, 2020
c04e9d4
add export method to project
nciemniak Feb 5, 2020
ba39380
add functionality for group export to controller
nciemniak Feb 5, 2020
a586a21
start DataStorage spec
nciemniak Feb 5, 2020
945d96b
add tests for exporting transcription collections
nciemniak Feb 5, 2020
33efb80
Merge branch 'master' into data-exports
nciemniak Feb 5, 2020
6314143
separate method for export_group; apply roles
nciemniak Feb 6, 2020
dee0193
add authorization for project and workflow export
nciemniak Feb 6, 2020
92c690a
tests for project and workflow exports
nciemniak Feb 6, 2020
91a9ada
zip file generator spec
nciemniak Feb 7, 2020
eeb3631
zip generator spec: remove file created by test
nciemniak Feb 7, 2020
c06c185
use tempfile when generating approved tran files
nciemniak Feb 7, 2020
07286a4
remove :focus => true, whoops
nciemniak Feb 7, 2020
6852ffa
add missing data peices to file generator
nciemniak Feb 10, 2020
4f96013
split export method up by proj/wf/group/transcrip
nciemniak Feb 11, 2020
9747c64
aggregate all metadata into single file
nciemniak Feb 13, 2020
2e94a24
put data exports classes into their own files
nciemniak Feb 13, 2020
cf0666b
fill out agg file generator spec
nciemniak Feb 13, 2020
df9f733
switch to using tempdir in data_storage
nciemniak Feb 14, 2020
38869d3
add spec for TranscriptionFileGenerator
nciemniak Feb 14, 2020
b32966d
staging creds: include new credentials from master
nciemniak Feb 14, 2020
72353f2
Merge branch 'master' into data-exports
nciemniak Feb 14, 2020
70689e4
remove :focus => true
nciemniak Feb 14, 2020
d19e1d7
Merge branch 'data-exports' of github.com:zooniverse/tove into data-e…
nciemniak Feb 14, 2020
74c967a
memoize update_attrs
nciemniak Feb 17, 2020
f38efe4
make the code more stylish
nciemniak Feb 17, 2020
faf0b84
clean up status_has_changed? method
nciemniak Feb 17, 2020
bcfc440
don't include metadata file in individual t folder
nciemniak Feb 17, 2020
148841a
update format of consensus text file
nciemniak Feb 18, 2020
d9f28b9
change error thrown for export group
nciemniak Feb 18, 2020
27ca166
style update: { } vs do/end
nciemniak Feb 18, 2020
2bdc134
use 'approve?' method in update_tr_exports
nciemniak Feb 18, 2020
17de9a5
Avoid multi line ternaries for readability
nciemniak Feb 18, 2020
9d2f54c
refactor is_text_edited? and add tests for it
nciemniak Feb 18, 2020
c3f6443
refactor how code checks for status change
nciemniak Feb 19, 2020
3f48bcc
clearer wording, eliminating unnecessary code
nciemniak Feb 19, 2020
8c3f386
rename file variable for readability
nciemniak Feb 19, 2020
1dae2bc
rename approve to approving for clarity
nciemniak Feb 19, 2020
6e2b5a1
reorganize methods around storing files
nciemniak Feb 19, 2020
2acbbeb
remove unnecessary var
nciemniak Feb 19, 2020
f7a819e
change methods in agg file gen to class methods
nciemniak Feb 20, 2020
f202769
construct regex prior to loop
nciemniak Feb 20, 2020
85623f1
move guard clause to beginning of method
nciemniak Feb 20, 2020
3b944e6
use full names for variable
nciemniak Feb 20, 2020
30b9e59
Merge branch 'master' into data-exports
nciemniak Feb 20, 2020
208a7c5
use namespacing for export policy scope
nciemniak Feb 20, 2020
b82225d
Merge branch 'data-exports' of github.com:zooniverse/tove into data-e…
nciemniak Feb 20, 2020
da5a3fd
add test for generating edited consensus text
nciemniak Feb 20, 2020
3152050
pass tempfile directly rather than file.open
nciemniak Feb 21, 2020
3c872a4
update how export_group is authorized
nciemniak Feb 25, 2020
ffe9af4
refactoring: move methods from controller to model
nciemniak Mar 2, 2020
0ad3929
switch from instance var to regular var
nciemniak Mar 2, 2020
85223f4
stylistic reworking
nciemniak Mar 2, 2020
2de969e
add test for exporting group with no transcrips
nciemniak Mar 3, 2020
4fdb0cd
remove class instance vars for thread safety
nciemniak Mar 3, 2020
e9bfd31
rename transcription.files to export_files
nciemniak Mar 3, 2020
2f43d0d
rename transcription.files to export_files p2
nciemniak Mar 3, 2020
52be865
DRY up the code
nciemniak Mar 3, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ gem 'puma', '~> 4.3'
gem 'panoptes-client'
gem 'pundit'

# Connect to Azure Storage with Rails Active Storage
gem 'azure-storage'
gem 'azure-storage-blob'

gem 'rubyzip'

# jsonapi.rb is a bundle that incorporates fast_jsonapi (serialization),
# ransack (filtration), and some RSpec matchers along with some
# boilerplate for pagination and error handling
Expand Down
20 changes: 20 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,22 @@ GEM
minitest (~> 5.1)
tzinfo (~> 1.1)
zeitwerk (~> 2.2)
azure-core (0.1.15)
faraday (~> 0.9)
faraday_middleware (~> 0.10)
nokogiri (~> 1.6)
azure-storage (0.15.0.preview)
azure-core (~> 0.1)
faraday (~> 0.9)
faraday_middleware (~> 0.10)
nokogiri (~> 1.6, >= 1.6.8)
azure-storage-blob (1.1.0)
azure-core (~> 0.1.13)
azure-storage-common (~> 1.0)
nokogiri (~> 1.6, >= 1.6.8)
azure-storage-common (1.1.0)
azure-core (~> 0.1.13)
nokogiri (~> 1.6, >= 1.6.8)
bootsnap (1.4.5)
msgpack (~> 1.0)
builder (3.2.3)
Expand Down Expand Up @@ -214,6 +230,7 @@ GEM
rspec-mocks (~> 3.9.0)
rspec-support (~> 3.9.0)
rspec-support (3.9.0)
rubyzip (2.1.0)
sentry-raven (2.13.0)
faraday (>= 0.7.6, < 1.0)
simplecov (0.17.1)
Expand Down Expand Up @@ -251,6 +268,8 @@ PLATFORMS
ruby

DEPENDENCIES
azure-storage
azure-storage-blob
bootsnap (>= 1.4.2)
coveralls
factory_bot_rails
Expand All @@ -267,6 +286,7 @@ DEPENDENCIES
rack-cors
rails (~> 6.0.1)
rspec-rails
rubyzip
sentry-raven
simplecov
spring
Expand Down
10 changes: 8 additions & 2 deletions app/controllers/application_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ class ApplicationController < ActionController::Base

attr_reader :current_user, :auth_token
before_action :set_user
after_action :verify_authorized, except: :index
after_action :verify_policy_scoped, only: :index
after_action :verify_authorized, except: [:index]
after_action :verify_policy_scoped, only: [:index]

include ErrorExtender
include JSONAPI::Pagination
Expand Down Expand Up @@ -85,4 +85,10 @@ def jsonapi_meta(resources)
pagination = jsonapi_pagination_meta(resources)
{ pagination: pagination } if pagination.present?
end

def send_export_file(zip_file)
File.open(zip_file, 'r') do |f|
send_data f.read, filename: 'export.zip', type: 'application/zip'
end
end
end
10 changes: 10 additions & 0 deletions app/controllers/projects_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ def show
render jsonapi: @project
end

def export
@project = Project.find(params[:id])
authorize @project

data_storage = DataExports::DataStorage.new
data_storage.zip_project_files(@project) do |zip_file|
send_export_file zip_file
end
end

private

def allowed_filters
Expand Down
41 changes: 39 additions & 2 deletions app/controllers/transcriptions_controller.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
class TranscriptionsController < ApplicationController
include JSONAPI::Deserialization

class NoExportableTranscriptionsError < StandardError; end

before_action :status_filter_to_int, only: :index

def index
Expand All @@ -19,17 +21,52 @@ def update
raise ActionController::BadRequest if type_invalid?
raise ActionController::BadRequest unless whitelisted_attributes?

if approve?
if approving?
authorize @transcription, :approve?
else
authorize @transcription
end

update_attrs['updated_by'] = current_user.login
@transcription.update!(update_attrs)

if @transcription.status_previously_changed?
if approving?
@transcription.upload_files_to_storage
else
@transcription.remove_files_from_storage
end
end

render jsonapi: @transcription
end

def export
@transcription = Transcription.find(params[:id])
authorize @transcription

data_storage = DataExports::DataStorage.new
data_storage.zip_transcription_files(@transcription) do |zip_file|
send_export_file zip_file
end
end

def export_group
workflow = Workflow.find(params[:workflow_id])
authorize workflow

@transcriptions = Transcription.where(group_id: params[:group_id], workflow_id: params[:workflow_id])

if @transcriptions.empty?
raise NoExportableTranscriptionsError.new("No exportable transcriptions found for group id '#{params[:group_id]}'")
end

data_storage = DataExports::DataStorage.new
data_storage.zip_group_files(@transcriptions) do |zip_file|
send_export_file zip_file
end
end

private

def update_attrs
Expand Down Expand Up @@ -73,7 +110,7 @@ def whitelisted_attributes?
update_attrs.keys.all? { |key| update_attr_whitelist.include? key }
end

def approve?
def approving?
update_attrs["status"] == "approved"
end

Expand Down
10 changes: 10 additions & 0 deletions app/controllers/workflows_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ def show
render jsonapi: @workflow
end

def export
@workflow = Workflow.find(params[:id])
authorize @workflow

data_storage = DataExports::DataStorage.new
data_storage.zip_workflow_files(@workflow) do |zip_file|
send_export_file zip_file
end
end

private

def allowed_filters
Expand Down
19 changes: 18 additions & 1 deletion app/models/transcription.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
class Transcription < ApplicationRecord
belongs_to :workflow
has_many_attached :export_files

validates :status, presence: true
validates :group_id, presence: true
Expand All @@ -10,9 +11,25 @@ class Transcription < ApplicationRecord
in_progress: 1,
ready: 2, # ready as in "ready for approval"
unseen: 3

}

def upload_files_to_storage
file_generator = DataExports::TranscriptionFileGenerator.new(self)
file_generator.generate_transcription_files.each do |temp_file|
# get filename without the temfile's randomly generated unique string
basename = File.basename(temp_file)
filename = basename.split('-').first + File.extname(basename)
export_files.attach(io: temp_file, filename: filename)

temp_file.close
temp_file.unlink
end
end

def remove_files_from_storage
export_files.map(&:purge)
end

private
def text_json_is_not_nil
if text.nil?
Expand Down
8 changes: 8 additions & 0 deletions app/policies/application_policy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@ def show?
admin? || (logged_in? && viewer?)
end

def export?
admin? || (logged_in? && editor?)
end

def export_group?
admin? || (logged_in? && editor?)
end

def admin?
logged_in? && user.admin
end
Expand Down
2 changes: 1 addition & 1 deletion app/policies/transcription_policy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,4 @@ def viewer_policy_scope
end
end
end
end
end
78 changes: 78 additions & 0 deletions app/services/data_exports/aggregate_metadata_file_generator.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
require 'csv'

module DataExports
# Helper class for aggregating metadata from individual transcriptions
# within a group/workflow/project into a single csv file
class AggregateMetadataFileGenerator
class << self
# Public: add metadata csv file to group folder
def generate_group_file(transcriptions, output_folder)
metadata_rows = compile_transcription_metadata(transcriptions)
generate_csv(output_folder, metadata_rows)
end

# Public: add metadata csv file to workflow folder
def generate_workflow_file(workflow, output_folder)
metadata_rows = compile_workflow_metadata(workflow)
generate_csv(output_folder, metadata_rows)
end

def generate_project_file(project, output_folder)
metadata_rows = []
project.workflows.each do |w|
metadata_rows += compile_workflow_metadata(w)
end

generate_csv(output_folder, metadata_rows)
end

private

# Private: for each transcription, extracts transcription metadata from metadata
# storage file, adds it to the metadata_rows array, which will be passed to a
# csv file generator.
# @param metadata_rows [Array]: collection of metadata rows for the current
# group/workflow/project being processed
# returns updated metadata_rows array
def compile_transcription_metadata(transcriptions)
metadata_rows = []
metadata_file_regex = /^transcription_metadata_.*\.csv$/

transcriptions.each do |transcription|
transcription.export_files.each do |storage_file|
is_transcription_metadata_file = metadata_file_regex.match storage_file.filename.to_s
if is_transcription_metadata_file
rows = CSV.parse(storage_file.download)

# add header if it's the first transcription being added
metadata_rows << rows[0] if metadata_rows.empty?
# add content regardless
metadata_rows << rows[1]
end
end
end

metadata_rows
end

def compile_workflow_metadata(workflow)
metadata_rows = []

workflow.transcription_group_data.each_key do |group_key|
transcriptions = Transcription.where(group_id: group_key)
metadata_rows += compile_transcription_metadata(transcriptions)
end

metadata_rows
end

def generate_csv(output_folder, metadata_rows)
metadata_file = File.join(output_folder, 'transcriptions_metadata.csv')

CSV.open(metadata_file, 'wb') do |csv|
metadata_rows.each { |row| csv << row }
end
end
end
end
end
Loading