Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing GEDCOM file is not very stable #4

Open
hartenthaler opened this issue Dec 13, 2020 · 1 comment
Open

parsing GEDCOM file is not very stable #4

hartenthaler opened this issue Dec 13, 2020 · 1 comment

Comments

@hartenthaler
Copy link

After installing python and the missing modules (networkx, gedcompy, six) I was able to use gedcom2gexf.py for one of my GEDCOM files. But all the other files couldn't be used because there are several parsing errors when reading these files because they contain "wrong" UTF8 characters. Is there a possibility to make the parsing a bit more robust? Is the source for this error in the module gedcompy?
Traceback (most recent call last): File "C:\Users\herma\AppData\Local\Programs\Python\Python39\gedcom2gexf.py", line 48, in <module> gedcom2gephi(gedcomFilename=args.gedcom, gephiFilename=args.outputGexf) File "C:\Users\herma\AppData\Local\Programs\Python\Python39\gedcom2gexf.py", line 24, in gedcom2gephi g = gedcom.parse(gedcomFilename) File "C:\Users\herma\AppData\Local\Programs\Python\Python39\lib\site-packages\gedcom\__init__.py", line 739, in parse return parse_filename(obj) File "C:\Users\herma\AppData\Local\Programs\Python\Python39\lib\site-packages\gedcom\__init__.py", line 704, in parse_filename return __parse(fp.readlines()) File "C:\Users\herma\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1222: character maps to <undefined>

@Gjacquenot
Copy link
Owner

Hello @hartenthaler

Thanks for your interest in these tools.

Parsing is done with module gedcompy, that does not specify file encoding as one can see here

To get around with this error quickly, I would recommend you to change file encoding with Notepad++ as you are working on Windows. Convert the encoding of your input file to UTF-8 and this should do the trick.

I hope to have some time in the next week to give a more appropriate encoding solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants