Skip to content

Commit

Permalink
Add ability to import FileSet metadata using the CSV Importer (#388)
Browse files Browse the repository at this point in the history
* set up initial logic for importing file sets

* add file set entry counter, and file set entries to importer show view

* guard migrations against "duplicate column name" errors

* add ApplicationParser#file_sets_total fallback

* add #create_file_set to ObjectFactory, extract dynamic record lookup into module

* case statements don't work as expected, change #create_file_set to more closely mirror Hyrax's AttachFilesToWorkJob

* wait for the file set's work to exist before attempting to import

* relationships jobs should not run in validate only mode

* FileSets can have multiple files, better error messages for FileSet entries

* add ability to update file set metadata

* more accurate error message, remove unused comment

* enqueued_records counts all entries, index is set per entry type

* fix importer run counters and split them up by type

* fix specs, ignore class length warning for now

* add shared specs for DynamicRecordLookup#find_record

* add validations to ImportFileSetJob, use descriptive params

* specs for ImportFileSetJob and DynamicRecordLookup

* bundler version and db/schema updates

* include DynamicRecordLookup specs in ObjectFactory spec

* add specs for #file_sets and #create_file_sets
  • Loading branch information
bkiahstroud authored Feb 2, 2022
1 parent 14e4992 commit 0919acf
Show file tree
Hide file tree
Showing 34 changed files with 748 additions and 72 deletions.
16 changes: 8 additions & 8 deletions .rubocop_todo.yml
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
# This configuration was generated by
# `rubocop --auto-gen-config`
# on 2021-12-14 15:16:54 -0800 using RuboCop version 0.85.1.
# on 2022-01-26 14:39:48 -0800 using RuboCop version 0.85.1.
# The point is for the user to remove these configuration records
# one by one as the offenses are removed from the code base.
# Note that changes in the inspected code, or installation of new
# versions of RuboCop, may require this file to be generated again.

# Offense count: 6
# Offense count: 7
# Cop supports --auto-correct.
# Configuration parameters: AutoCorrect, AllowHeredoc, AllowURI, URISchemes, IgnoreCopDirectives, IgnoredPatterns.
# URISchemes: http, https
Layout/LineLength:
Max: 301

# Offense count: 17
# Offense count: 20
# Configuration parameters: IgnoredMethods.
Metrics/AbcSize:
Max: 42

# Offense count: 6
# Configuration parameters: CountComments.
Metrics/ClassLength:
Max: 197
Max: 202

# Offense count: 10
# Offense count: 11
# Configuration parameters: IgnoredMethods.
Metrics/CyclomaticComplexity:
Max: 12

# Offense count: 26
# Offense count: 30
# Configuration parameters: CountComments, ExcludedMethods.
Metrics/MethodLength:
Max: 26
Expand All @@ -38,7 +38,7 @@ Metrics/MethodLength:
Metrics/ModuleLength:
Max: 131

# Offense count: 10
# Offense count: 11
# Configuration parameters: IgnoredMethods.
Metrics/PerceivedComplexity:
Max: 13
Expand Down Expand Up @@ -66,7 +66,7 @@ RSpec/MessageChain:
- 'spec/parsers/bulkrax/csv_parser_spec.rb'
- 'spec/parsers/bulkrax/xml_parser_spec.rb'

# Offense count: 80
# Offense count: 78
# Configuration parameters: .
# SupportedStyles: have_received, receive
RSpec/MessageSpies:
Expand Down
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -928,4 +928,4 @@ DEPENDENCIES
willow_sword!

BUNDLED WITH
1.17.3
2.1.4
1 change: 1 addition & 0 deletions app/controllers/bulkrax/importers_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ def show

@work_entries = @importer.entries.where(type: @importer.parser.entry_class.to_s).page(params[:work_entries_page]).per(30)
@collection_entries = @importer.entries.where(type: @importer.parser.collection_entry_class.to_s).page(params[:collections_entries_page]).per(30)
@file_set_entries = @importer.entries.where(type: @importer.parser.file_set_entry_class.to_s).page(params[:file_set_entries_page]).per(30)
end
end

Expand Down
58 changes: 51 additions & 7 deletions app/factories/bulkrax/object_factory.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ module Bulkrax
class ObjectFactory
extend ActiveModel::Callbacks
include Bulkrax::FileFactory
include DynamicRecordLookup

define_model_callbacks :save, :create
attr_reader :attributes, :object, :source_identifier_value, :klass, :replace_files, :update_files, :work_identifier, :collection_field_mapping
attr_reader :attributes, :object, :source_identifier_value, :klass, :replace_files, :update_files, :work_identifier, :collection_field_mapping, :related_parents_parsed_mapping

# rubocop:disable Metrics/ParameterLists
def initialize(attributes:, source_identifier_value:, work_identifier:, collection_field_mapping:, replace_files: false, user: nil, klass: nil, update_files: false)
def initialize(attributes:, source_identifier_value:, work_identifier:, collection_field_mapping:, related_parents_parsed_mapping: nil, replace_files: false, user: nil, klass: nil, update_files: false)
ActiveSupport::Deprecation.warn(
'Creating Collections using the collection_field_mapping will no longer be supported as of Bulkrax version 3.0.' \
' Please configure Bulkrax to use related_parents_field_mapping and related_children_field_mapping instead.'
Expand All @@ -19,6 +21,7 @@ def initialize(attributes:, source_identifier_value:, work_identifier:, collecti
@user = user || User.batch_user
@work_identifier = work_identifier
@collection_field_mapping = collection_field_mapping
@related_parents_parsed_mapping = related_parents_parsed_mapping
@source_identifier_value = source_identifier_value
@klass = klass || Bulkrax.default_work_type.constantize
end
Expand All @@ -33,7 +36,7 @@ def run
arg_hash = { id: attributes[:id], name: 'UPDATE', klass: klass }
@object = find
if object
object.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX
object.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX if object.respond_to?(:reindex_extent)
ActiveSupport::Notifications.instrument('import.importer', arg_hash) { update }
else
ActiveSupport::Notifications.instrument('import.importer', arg_hash.merge(name: 'CREATE')) { create }
Expand All @@ -51,10 +54,16 @@ def run!

def update
raise "Object doesn't exist" unless object
destroy_existing_files if @replace_files && klass != Collection
destroy_existing_files if @replace_files && ![Collection, FileSet].include?(klass)
attrs = attribute_update
run_callbacks :save do
klass == Collection ? update_collection(attrs) : work_actor.update(environment(attrs))
if klass == Collection
update_collection(attrs)
elsif klass == FileSet
update_file_set(attrs)
else
work_actor.update(environment(attrs))
end
end
log_updated(object)
end
Expand Down Expand Up @@ -90,10 +99,16 @@ def search_by_identifier
def create
attrs = create_attributes
@object = klass.new
object.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX
object.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX if object.respond_to?(:reindex_extent)
run_callbacks :save do
run_callbacks :create do
klass == Collection ? create_collection(attrs) : work_actor.create(environment(attrs))
if klass == Collection
create_collection(attrs)
elsif klass == FileSet
create_file_set(attrs)
else
work_actor.create(environment(attrs))
end
end
end
log_created(object)
Expand Down Expand Up @@ -150,6 +165,35 @@ def update_collection(attrs)
object.save!
end

# This method is heavily inspired by Hyrax's AttachFilesToWorkJob
def create_file_set(attrs)
work = find_record(attributes[related_parents_parsed_mapping].first)
work_permissions = work.permissions.map(&:to_hash)
file_set_attrs = attrs.slice(*object.attributes.keys)
object.assign_attributes(file_set_attrs)

attrs['uploaded_files'].each do |uploaded_file_id|
uploaded_file = ::Hyrax::UploadedFile.find(uploaded_file_id)
next if uploaded_file.file_set_uri.present?

actor = ::Hyrax::Actors::FileSetActor.new(object, @user)
uploaded_file.update(file_set_uri: actor.file_set.uri)
actor.file_set.permissions_attributes = work_permissions
actor.create_metadata
actor.create_content(uploaded_file)
actor.attach_to_work(work)
end

object.save!
end

def update_file_set(attrs)
file_set_attrs = attrs.slice(*object.attributes.keys)
actor = ::Hyrax::Actors::FileSetActor.new(object, @user)

actor.update_metadata(file_set_attrs)
end

# Add child to parent's #member_collections
# Add parent to child's #member_of_collections
def persist_collection_memberships(parent:, child:)
Expand Down
35 changes: 2 additions & 33 deletions app/jobs/bulkrax/create_relationships_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ module Bulkrax
# NOTE: In the context of this job, "identifier" is used to generically refer
# to either a record's ID or an Bulkrax::Entry's source_identifier.
class CreateRelationshipsJob < ApplicationJob
include DynamicRecordLookup

queue_as :import

attr_accessor :base_entry, :child_record, :parent_record, :importer_run
Expand Down Expand Up @@ -74,39 +76,6 @@ def create_relationship
end
end

# This method allows us to create relationships with preexisting records (by their ID) OR
# with records that are concurrently being imported (by their Bulkrax::Entry source_identifier).
#
# @param identifier [String] Work/Collection ID or Bulkrax::Entry source_identifier
# @return [Work, Collection, nil] Work or Collection if found, otherwise nil
def find_record(identifier)
record = Entry.find_by(identifier: identifier)
record ||= ::Collection.where(id: identifier).first
if record.blank?
available_work_types.each do |work_type|
record ||= work_type.where(id: identifier).first
end
end
record = record.factory.find if record.is_a?(Entry)

record
end

# Check if the record is a Work
def curation_concern?(record)
available_work_types.include?(record.class)
end

# @return [Array<Class>] list of work type classes
def available_work_types
# If running in a Hyku app, do not reference disabled work types
@available_work_types ||= if defined?(::Hyku)
::Site.instance.available_works.map(&:constantize)
else
::Hyrax.config.curation_concerns
end
end

def user
@user ||= importer_run.importer.user
end
Expand Down
2 changes: 2 additions & 0 deletions app/jobs/bulkrax/import_collection_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ def perform(*args)
entry.build
entry.save
add_user_to_permission_template!(entry)
ImporterRun.find(args[1]).increment!(:processed_records)
ImporterRun.find(args[1]).increment!(:processed_collections)
ImporterRun.find(args[1]).decrement!(:enqueued_records)
rescue => e
ImporterRun.find(args[1]).increment!(:failed_records)
ImporterRun.find(args[1]).increment!(:failed_collections)
ImporterRun.find(args[1]).decrement!(:enqueued_records)
raise e
Expand Down
69 changes: 69 additions & 0 deletions app/jobs/bulkrax/import_file_set_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# frozen_string_literal: true

module Bulkrax
class MissingParentError < ::StandardError; end
class ImportFileSetJob < ApplicationJob
include DynamicRecordLookup

queue_as :import

def perform(entry_id, importer_run_id)
entry = Entry.find(entry_id)
parent_identifier = entry.raw_metadata[entry.related_parents_raw_mapping]&.strip

validate_parent!(parent_identifier)

entry.build
if entry.succeeded?
# rubocop:disable Rails/SkipsModelValidations
ImporterRun.find(importer_run_id).increment!(:processed_records)
ImporterRun.find(importer_run_id).increment!(:processed_file_sets)
else
ImporterRun.find(importer_run_id).increment!(:failed_records)
ImporterRun.find(importer_run_id).increment!(:failed_file_sets)
# rubocop:enable Rails/SkipsModelValidations
end
ImporterRun.find(importer_run_id).decrement!(:enqueued_records) # rubocop:disable Rails/SkipsModelValidations
entry.save!

rescue MissingParentError => e
# try waiting for the parent record to be created
entry.import_attempts += 1
entry.save!
if entry.import_attempts < 5
ImportFileSetJob
.set(wait: (entry.import_attempts + 1).minutes)
.perform_later(entry_id, importer_run_id)
else
ImporterRun.find(importer_run_id).decrement!(:enqueued_records) # rubocop:disable Rails/SkipsModelValidations
entry.status_info(e)
end
end

private

attr_reader :parent_record

def validate_parent!(parent_identifier)
# if parent_identifier is missing, it will be caught by #validate_presence_of_parent!
return if parent_identifier.blank?

find_parent_record(parent_identifier)
check_parent_exists!(parent_identifier)
check_parent_is_a_work!(parent_identifier)
end

def check_parent_exists!(parent_identifier)
raise MissingParentError, %(Unable to find a record with the identifier "#{parent_identifier}") if parent_record.blank?
end

def check_parent_is_a_work!(parent_identifier)
error_msg = %(A record with the ID "#{parent_identifier}" was found, but it was a #{parent_record.class}, which is not an valid/available work type)
raise ::StandardError, error_msg unless curation_concern?(parent_record)
end

def find_parent_record(parent_identifier)
@parent_record ||= find_record(parent_identifier)
end
end
end
2 changes: 2 additions & 0 deletions app/jobs/bulkrax/import_work_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ def perform(*args)
entry.build
if entry.status == "Complete"
ImporterRun.find(args[1]).increment!(:processed_records)
ImporterRun.find(args[1]).increment!(:processed_works)
ImporterRun.find(args[1]).decrement!(:enqueued_records) unless ImporterRun.find(args[1]).enqueued_records <= 0 # rubocop:disable Style/IdenticalConditionalBranches
else
# do not retry here because whatever parse error kept you from creating a work will likely
# keep preventing you from doing so.
ImporterRun.find(args[1]).increment!(:failed_records)
ImporterRun.find(args[1]).increment!(:failed_works)
ImporterRun.find(args[1]).decrement!(:enqueued_records) unless ImporterRun.find(args[1]).enqueued_records <= 0 # rubocop:disable Style/IdenticalConditionalBranches
end
entry.save!
Expand Down
2 changes: 2 additions & 0 deletions app/jobs/bulkrax/importer_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ def import(importer, only_updates_since_last_import)

importer.import_collections
importer.import_works
importer.import_file_sets
end

def unzip_imported_file(parser)
Expand All @@ -31,6 +32,7 @@ def unzip_imported_file(parser)
def update_current_run_counters(importer)
importer.current_run.total_work_entries = importer.limit || importer.parser.works_total
importer.current_run.total_collection_entries = importer.parser.collections_total
importer.current_run.total_file_set_entries = importer.parser.file_sets_total
importer.current_run.save!
end

Expand Down
11 changes: 9 additions & 2 deletions app/models/bulkrax/csv_entry.rb
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ def build_metadata

self.parsed_metadata = {}
add_identifier
add_metadata_for_model
add_visibility
add_ingested_metadata
add_metadata_for_model
add_rights_statement
add_collections
add_local
Expand All @@ -57,6 +57,9 @@ def add_identifier
def add_metadata_for_model
if factory_class == Collection
add_collection_type_gid
elsif factory_class == FileSet
add_path_to_file
validate_presence_of_parent!
else
add_file unless importerexporter.metadata_only?
add_admin_set_id
Expand Down Expand Up @@ -85,7 +88,11 @@ def add_file
elsif record['file'].is_a?(Array)
self.parsed_metadata['file'] = record['file']
end
self.parsed_metadata['file'] = self.parsed_metadata['file'].map { |f| path_to_file(f.tr(' ', '_')) }
self.parsed_metadata['file'] = self.parsed_metadata['file'].map do |f|
next if f.blank?

path_to_file(f.tr(' ', '_'))
end.compact
end

def build_export_metadata
Expand Down
Loading

0 comments on commit 0919acf

Please sign in to comment.