Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digitize with a reusable Makefile #143

Merged
merged 8 commits into from
Mar 20, 2022
Merged

Digitize with a reusable Makefile #143

merged 8 commits into from
Mar 20, 2022

Conversation

saraedum
Copy link
Member

@saraedum saraedum commented Mar 19, 2022

To digitize all data

cd data
make

To run the svgdigitizer in parallel on 8 cores, use instead

make -j8

We can also only digitize a single data set

make generated/svgdigitizer/mello_2018_understanding_J3045/mello_2018_understanding_J3045_p1_f1H_black.csv

To digitize data coming from a different source directory than
literature/ use

make SOURCE_DIR=/another/path

TODO

To digitize all data
```
cd data
make
```

To run the svgdigitizer in parallel on 8 cores, use instead
```
make -j8
```

We can also only digitize a single data set
```
make generated/svgdigitizer/mello_2018_understanding_J3045/mello_2018_understanding_J3045_p1_f1H_black.csv
```

To digitize data coming from a different source directory than
`literature/` use
```
make SOURCE_DIR=/another/path
```
@saraedum
Copy link
Member Author

@DunklesArchipel, I believe that this is more flexible and easier than echemdb/svgdigitizer#138. I am not sure if it works well on Windows, so I'll see what the CI thinks about this.

@saraedum
Copy link
Member Author

saraedum commented Mar 19, 2022

Btw., this is not software from the 80s but from the 70s ;) [though it did of course not have all these fancy features back then; doing something in parallel was probably not really a thing.]

@DunklesArchipel
Copy link
Member

@DunklesArchipel, I believe that this is more flexible and easier than echemdb/svgdigitizer#138. I am not sure if it works well on Windows, so I'll see what the CI thinks about this.

Looks great!
Can this file/approach also be used in a purely pythonic approach as in echemdb/svgdigitizer#138?

For example, if you want to convert a file and subsequently create a database with these fictitious functions:

filename = `generated/svgdigitizer/mello_2018_understanding_J3045/mello_2018_understanding_J3045_p1_f1H_black.csv`
make(filename) # calls the makefile
create_db(filename) # creates the db from converted files and bibfiles.

@DunklesArchipel
Copy link
Member

Also some questions/suggestions:

  • can you add an issue that we include the documentation on the makefile
  • Will the makefile simply skip files that are not convertible?
  • Is there a check that, SVGs are only converted when there is a YAML with the same name
  • Later on we will also want to check if the JSON matches a schema. Can this also be included?

@saraedum
Copy link
Member Author

filename = `generated/svgdigitizer/mello_2018_understanding_J3045/mello_2018_understanding_J3045_p1_f1H_black.csv`
make(filename) # calls the makefile
create_db(filename) # creates the db from converted files and bibfiles.

Sure. We'd have to work out the details but assuming that make is installed, this is going to work.

@saraedum
Copy link
Member Author

* can you add an issue that we include the documentation on the makefile

Yes, it should be part of the documentation. I'll add that here.

* Will the makefile simply skip files that are not convertible?

No. When something is not convertible, the whole process will fail. However, we could change it so that it ignores files that cannot be converted. (I don't recommend that though.)

* Is there a check that, SVGs are only converted when there is a YAML with the same name

Yes. Both files need to be present. When a file is missing it is going to complain that a prerequisite of the .json/.csv file is missing.

* Later on we will also want to check if the JSON matches a schema. Can this also be included?

Yes. That's trivial to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants