-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Data Exports functionality is responsible for: * Generating transcription data files and saving them to Azure Blob Storage upon transcription approval. this includes a raw data file (.json), a consensus text file (.txt), a metadata file (.csv), and a line metadata file (.csv) * Removing transcription data files from storage if transcription is unapproved * Downloading all transcription files pertaining to a requested project, workflow, group, or single transcription, zipping the files, and sending them to the user * Generating a single csv file containing the metadata for all transcriptions included in the collection (handled by the `AggregateMetadataFileGenerator` class), which is included in the zip file
- Loading branch information
Showing
32 changed files
with
1,934 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,4 +35,4 @@ def viewer_policy_scope | |
end | ||
end | ||
end | ||
end | ||
end |
78 changes: 78 additions & 0 deletions
78
app/services/data_exports/aggregate_metadata_file_generator.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
require 'csv' | ||
|
||
module DataExports | ||
# Helper class for aggregating metadata from individual transcriptions | ||
# within a group/workflow/project into a single csv file | ||
class AggregateMetadataFileGenerator | ||
class << self | ||
# Public: add metadata csv file to group folder | ||
def generate_group_file(transcriptions, output_folder) | ||
metadata_rows = compile_transcription_metadata(transcriptions) | ||
generate_csv(output_folder, metadata_rows) | ||
end | ||
|
||
# Public: add metadata csv file to workflow folder | ||
def generate_workflow_file(workflow, output_folder) | ||
metadata_rows = compile_workflow_metadata(workflow) | ||
generate_csv(output_folder, metadata_rows) | ||
end | ||
|
||
def generate_project_file(project, output_folder) | ||
metadata_rows = [] | ||
project.workflows.each do |w| | ||
metadata_rows += compile_workflow_metadata(w) | ||
end | ||
|
||
generate_csv(output_folder, metadata_rows) | ||
end | ||
|
||
private | ||
|
||
# Private: for each transcription, extracts transcription metadata from metadata | ||
# storage file, adds it to the metadata_rows array, which will be passed to a | ||
# csv file generator. | ||
# @param metadata_rows [Array]: collection of metadata rows for the current | ||
# group/workflow/project being processed | ||
# returns updated metadata_rows array | ||
def compile_transcription_metadata(transcriptions) | ||
metadata_rows = [] | ||
metadata_file_regex = /^transcription_metadata_.*\.csv$/ | ||
|
||
transcriptions.each do |transcription| | ||
transcription.export_files.each do |storage_file| | ||
is_transcription_metadata_file = metadata_file_regex.match storage_file.filename.to_s | ||
if is_transcription_metadata_file | ||
rows = CSV.parse(storage_file.download) | ||
|
||
# add header if it's the first transcription being added | ||
metadata_rows << rows[0] if metadata_rows.empty? | ||
# add content regardless | ||
metadata_rows << rows[1] | ||
end | ||
end | ||
end | ||
|
||
metadata_rows | ||
end | ||
|
||
def compile_workflow_metadata(workflow) | ||
metadata_rows = [] | ||
|
||
workflow.transcription_group_data.each_key do |group_key| | ||
transcriptions = Transcription.where(group_id: group_key) | ||
metadata_rows += compile_transcription_metadata(transcriptions) | ||
end | ||
|
||
metadata_rows | ||
end | ||
|
||
def generate_csv(output_folder, metadata_rows) | ||
metadata_file = File.join(output_folder, 'transcriptions_metadata.csv') | ||
|
||
CSV.open(metadata_file, 'wb') do |csv| | ||
metadata_rows.each { |row| csv << row } | ||
end | ||
end | ||
end | ||
end | ||
end |
Oops, something went wrong.