🐛 Fix rerun for entries that came from zips

Previously, when rerunning an entry that came from a zip file, the rerun would fail because it would look for the CSV in an assumed location which is based on it's last importer ID. Since this was a rerun, it does not do an unzip into the assumed location so the directory does not exist. This commit will first check if the assumed location exists, and if not, it will look for the location of the last unizpped files and use that for the rerun. This does cause an interesting behavior where if the entry is a work with a file attached, it will add the file again resulting in duplicate files. I feel this is such an edge case though because typically if the entry is successful, the user will not rerun it. I added a hint text to the importer to let the user know this is a possibility. Ref: - notch8/palni_palci_knapsack#210
samvera · Jan 28, 2025 · 3b89489 · 3b89489
1 parent 0e8de61
commit 3b89489
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 1 deletion.
diff --git a/app/parsers/bulkrax/csv_parser.rb b/app/parsers/bulkrax/csv_parser.rb
@@ -360,6 +360,8 @@ def path_to_files(**args)
       @path_to_files = File.join(
           zip? ? importer_unzip_path : File.dirname(import_file_path), 'files', filename
         )
+
+      Dir.exist?(@path_to_files) ? @path_to_files : File.join(real_importer_unzip_path, 'files', filename)
     end
 
     private
@@ -379,7 +381,7 @@ def unique_collection_identifier(collection_hash)
     # We expect a single CSV at the top level of the zip in the CSVParser
     # but we are willing to go look for it if need be
     def real_import_file_path
-      return Dir["#{importer_unzip_path}/**/*.csv"].reject { |path| in_files_dir?(path) }.first if file? && zip?
+      return Dir["#{real_importer_unzip_path}/**/*.csv"].reject { |path| in_files_dir?(path) }.first if file? && zip?
 
       parser_fields['import_file_path']
     end
@@ -389,5 +391,18 @@ def real_import_file_path
     def in_files_dir?(path)
       File.dirname(path).ends_with?('files')
     end
+
+    # If we don't have an existing unzip path, we'll try and find it.
+    # Just in case there are multiple paths, we sort by the number at the end of the path and get the last one
+    def real_importer_unzip_path
+      return importer_unzip_path if Dir.exist?(importer_unzip_path)
+
+      Dir.glob(base_importer_unzip_path + '*').sort_by { |path| path.split(base_importer_unzip_path).last[1..-1].to_i }.last
+    end
+
+    def base_importer_unzip_path
+      # turns "tmp/imports/tenant/import_1_20250122035229_1" to "tmp/imports/tenant/import_1_20250122035229"
+      importer_unzip_path.split('_')[0...-1].join('_')
+    end
   end
 end
diff --git a/app/views/bulkrax/importers/_edit_item_buttons.html.erb b/app/views/bulkrax/importers/_edit_item_buttons.html.erb
@@ -5,6 +5,7 @@
         <h5>Options for Updating an Entry</h5>
         <hr />
         <p>Rebuild metadata and files.</p>
+        <p class="text-warning">Files may be duplicated if this option is used on a successful entry. Consider using <em>Remove and then Build</em> instead.</p>
         <%= link_to 'Build', item_entry_path(item, e), method: :patch, class: 'btn btn-primary'  %>
         <hr />
         <p>Remove existing work and then recreate the works metadata and files.</p>