-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when loading MINT-1T-PDF-2023-06 #13
Comments
Hi Han-Cheol! Thanks for your interest. I am not too sure why the huggingface default way of loading the dataset is returning this error. It seems that some samples might be missing some metadata. I have managed to load the data using the following code, hope this is helpful! Feel free to reopen if the issue persists.
|
@anas-awadalla Thank you for fast reply :-) It is quite strange.
|
Ah ok I was using
|
@anas-awadalla Hi, Finally, I can load the data following your code. p.s.
|
Hi,
First of all, thank you for releasing MINT-1T dataset :-)
I loaded one of MINT-1T datasets (MINT-1T-PDF-2023-06) but encountered the following error.
Error message shows that the output data has two additional fields: language_id_whole_page_fasttext and previous_word_count.
Do you have any idea how to fix it?
Best regards,
Han-Cheol
The text was updated successfully, but these errors were encountered: