-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a new PDF splitter option that wraps the DerivateRodeo's PdfSplitGenerator. It handles, in theory, PDF splitting and the derivative's generated in the DerivativeRodeo. Related to: - #220 Co-authored-by: LaRita Robinson <[email protected]> Co-authored-by: Shana Moore <[email protected]>
- Loading branch information
Showing
8 changed files
with
173 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
module IiifPrint | ||
module SplitPdfs | ||
## | ||
# This class wraps the DerivativeRodeo::Generators::PdfSplitGenerator to find preprocessed | ||
# images, or split a PDF if there are no preprocessed images. | ||
# | ||
# We have already attached the original file to the file_set. We want to convert that original | ||
# file that's attached to a input_uri (e.g. "file://path/to/original-file" as in what we have | ||
# written to Fedora as the PDF) | ||
# | ||
# @see .call | ||
class DerivativeRodeoSplitter | ||
## | ||
# @param path [String] the local file location | ||
# @param file_set [FileSet] file set containing a PDF file to split | ||
# | ||
# @return [Array] paths to images split from each page of PDF file | ||
def self.call(path, file_set:) | ||
new(path, file_set: file_set).split_files | ||
end | ||
|
||
def initialize(path, file_set:, output_tmp_dir: Dir.tmpdir) | ||
@input_uri = "file://#{path}" | ||
|
||
# We are writing the images to a location that CarrierWave can upload. | ||
# | ||
# https://github.com/scientist-softserv/iiif_print/blob/b969541de1a0526305b54de37bf7cf100289f088/lib/iiif_print/jobs/child_works_from_pdf_job.rb#L108 | ||
output_template_path = File.join(output_tmp_dir, '{{ dir_parts[-1..-1] }}', '{{ filename }}') | ||
|
||
@output_location_template = "file://#{output_template_path}" | ||
@preprocessed_location_template = IiifPrint::DerivativeRodeoService.derivative_rodeo_input_uri(file_set: file_set, filename: filename) | ||
end | ||
|
||
## | ||
# This is where, in "Fedora" we have the original file. This is not the original file in the | ||
# pre-processing location but instead the long-term location of the file in the application | ||
# that mounts IIIF Print. | ||
# | ||
# @return [String] | ||
attr_reader :input_uri | ||
|
||
## | ||
# This is the location where we're going to write the derivatives that will "go into Fedora". | ||
# | ||
# @return [String] | ||
attr_reader :output_location_template | ||
|
||
## | ||
# Where can we find, in the DerivativeRodeo's storage, what has already been done regarding | ||
# derivative generation. | ||
# | ||
# For example, SpaceStone::Serverless will pre-process derivatives and write them into an S3 | ||
# bucket that we then use for IIIF Print. | ||
# | ||
# @return [String] | ||
# | ||
# @see https://github.com/scientist-softserv/space_stone-serverless/blob/7f46dd5b218381739cd1c771183f95408a4e0752/awslambda/handler.rb#L58-L63 | ||
attr_reader :preprocessed_location_template | ||
|
||
## | ||
# @return [Array<Strings>] the paths to each of the images split off from the PDF. | ||
def split_files | ||
DerivativeRodeo::Generators::PdfSplitGenerator.new( | ||
input_uris: [@input_uri], | ||
output_location_template: output_location_template, | ||
preprocessed_location_template: preprocessed_location_template | ||
).generated_files.map(&:file_path) | ||
end | ||
end | ||
end | ||
end |
33 changes: 33 additions & 0 deletions
33
spec/iiif_print/split_pdfs/derivative_rodeo_splitter_spec.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'spec_helper' | ||
|
||
RSpec.describe IiifPrint::SplitPdfs::DerivativeRodeoSplitter do | ||
let(:path) { nil } | ||
let(:work) { double(MyWork, aark_id: '12345') } | ||
let(:file_set) { FileSet.new.tap { |fs| fs.save!(validate: false) } } | ||
|
||
describe 'class' do | ||
subject { described_class } | ||
|
||
it { is_expected.to respond_to(:call) } | ||
end | ||
|
||
describe "instance" do | ||
subject { described_class.new(file_set: file_set) } | ||
|
||
before do | ||
allow(file_set).to receive(:parent).and_return(work) | ||
# TODO: This is a hack that leverages the internals of Hydra::Works; not excited about it but | ||
# this part is only one piece of the over all integration. | ||
allow(file_set).to receive(:original_file).and_return(double(original_filename: __FILE__)) | ||
end | ||
|
||
it { is_expected.to respond_to :split_files } | ||
|
||
it 'uses the rodeo to split' do | ||
expect(DerivativeRodeo::Generators::PdfSplitGenerator).to receive(:new) | ||
described_class.call(path, file_set: file_set) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters