harmonise function incorrectly adds latex escaping bibtex fields #74

cpmpercussion · 2024-09-09T10:40:58Z

all capital letters in title are wrapped in {}, which is something I have not found in 2023.bib and earlier, and replaces all the non-ASCII characters with LaTex code (also something I have not found in previous bib files). Also, the harmonise function messes up the URL, for example {http://nime.org/proceedings/2024/nime2024_11.pdf} becomes {http://nime.org/proceedings/2024/nime2024\_11.pdf} which is deadly for the zenodo upload tool.

This is incorrect behaviour:

Special characters in the .bib file should be written in UTF-8 code (not LaTeX symbol represenations).
URLs need to have their actual URL not escaped LaTeX representations
Titles should have their normal text representation not escaped LaTeX representations

This is because the .bib file is in bibtex format but used to create other text representations of the papers (e.g., NIME individual paper webpages and Zenodo entries). So we need the text in the bibtex fields to be a "plain" UTF-8 representation of the text that could go into an HTML document or an API call, not something tuned to show up correctly in a LaTeX document.

The todo here is:

test and update the harmonise function so that it doesn't do the above bad things.

Ultimately we may want to move away from .bib files as a storage system, but they have an advantage of ubiquity within academic publishing and if the processes here break down at some point, the .bib files could easily be used in a different ad hoc system by other future maintainers.

The text was updated successfully, but these errors were encountered:

stefanofasciani · 2024-09-09T20:41:52Z

it seems that the harmonise function is doing what we are asking to

The current version of the harmoniser function, uses the BibTexParser at line 36 with customization=homogenize_latex_encoding . So the behavior -- with respect to characters encoding -- is correct, while it's weird what happens to the title and url. Apparently BibTexParser has only built in customization as homogenize_latex_encoding or convert_to_unicode. If we use the latter, the strange behaviors disappear, and there are no apparent changes in the .bib file as the text is already unicode.

So we either need to develop a 'custom' customization (possible?), or perhaps see if migrating from BibTexParser 1.4 --> 2.0 is a viable option to get the UTF-8 code.

cpmpercussion added the bug label Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harmonise function incorrectly adds latex escaping bibtex fields #74

harmonise function incorrectly adds latex escaping bibtex fields #74

cpmpercussion commented Sep 9, 2024

stefanofasciani commented Sep 9, 2024

harmonise function incorrectly adds latex escaping bibtex fields #74

harmonise function incorrectly adds latex escaping bibtex fields #74

Comments

cpmpercussion commented Sep 9, 2024

stefanofasciani commented Sep 9, 2024