-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CsvBooks doesn't like periods in filenames #498
Comments
I think this is more to do with using Just to be sure, can you change the filenames back to using periods as separators but leave the (currently invalid until we merge #496) |
Sorry, my logic is wrong. Change the extension to |
OK, a couple of tests: Separator So it looks like the uppercase/lowercase extension is the culprit. But interestingly, it behaves very unpredictably in my set -- because the first record is not The presence of this file results in weird behaviours. Typically directory 0 just isn't generated, but it can also cause MIK to skip, say, directory 2, or more directories. |
@bondjimbond when you say "can also", are you saying that it behaves differently across runs of MIK with the same input data? |
Yes... I was running with a test directory of just four images (0000.tif through 0003.tif). The first few runs, it produced directories 1, 2, and 3. Later runs, just 1 and 3. Another later run, just 3. After removing 0000.tif, it consistently produced 1, 2, and 3. |
Could you zip up all your data and config files and send them to me so I can try to replicate that? |
Sure, here's my test directory and ini file: https://vault.sfu.ca/index.php/s/ChGaez7NLOygY3w |
OK, got it, I'll give it a try this evening. |
Can you send me your mappings file? |
Ack, of course, sorry |
@bondjimbond Strangely, I can't replicate the behavior you are seeing. I ran MIK about 10 times and always got the same thing: page objects for pages 1-3 and an error indicating a problem with the
The problem is coming from https://github.com/MarcusBarnes/mik/blob/master/src/writers/CsvBooks.php#L132-L134: since we trim all left padding 0s, we need something other than a 0000 as the page number. I'm not sure a fix to allow |
Although a check to see if Something like: $page_number = ltrim(end($filename_segments), '0');
if (strlen($page_number) === 0) {
$page_number = '0';
}
$page_level_output_dir = $book_level_output_dir . DIRECTORY_SEPARATOR . $page_number;
mkdir($page_level_output_dir); |
Just tried that, it worked:
MODS.xml for page 0 is: <titleInfo>
<title>This is a title, page 0</title>
</titleInfo>
</mods> MODS.xml for page 1 is: <titleInfo>
<title>This is a title, page 1</title>
</titleInfo>
</mods> |
That's exactly what I need! :) |
OK, I can open a PR for this if you want. |
Please do! |
I've made the same change to the CsvNewspapers writer and pushed up the issue-498 branch. I'll need to assemble some test data later but once I do that I'll open a PR. |
I got a bunch of files with names like this:
1987.0019.0039.0001.TIF
I set
page_sequence_separator = .
Result:
If I change the filename:
1987.0019.0039_0001.tif
and setpage_sequence_separator = _
, it works.Hypothesis: using a period as a page sequence separator doesn't work because MIK reads ".tif" as the page rather than the extension.
Alternate hypothesis: MIK is reading
.0019.0039.0001.tif
as a file extension rather than as part of a filename followed by a page number followed by extension.Is it possible to fix this?
The text was updated successfully, but these errors were encountered: