Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error #1

Open
Nithyashreem opened this issue Dec 25, 2018 · 6 comments
Open

Error #1

Nithyashreem opened this issue Dec 25, 2018 · 6 comments

Comments

@Nithyashreem
Copy link

/papers directory does not exist

@olekscode
Copy link
Member

Sorry, this repository is no longer maintained.
But if you tell us what you are trying to do and where do you get this error we might be able to help you.

@Nithyashreem
Copy link
Author

Hey thanks for replying. In line number 8, Can you tell me what is papers/ directory?
screenshot 105

@olekscode
Copy link
Member

We have downloaded thousands of papers from https://arxiv.org/ as pdf files names as {paper-id}.pdf. For example: 1409.1259.pdf. We stored them in papers/ directory.

The code you are looking at is getting all file names from papers/ directory and removes 4 last characters .pdf from each name, which gives us the list of paper ids. Then uses BeautifulSoup to parse HTML of arXiv page for each paper and extract its abstract.

What you need for this code to work is an array of paper ids. We extracted it from papers/ directory because we had a pre-defined set of papers that we wanted to work on. But you can use your own list of paper ids.

@Nithyashreem
Copy link
Author

Got it. Thank you

@Nithyashreem
Copy link
Author

screenshot 106
Can you tell me what is stored in data.csv ? (Line 4 )
And also, are the papers in the papers/ directory similar to the ones in the abstracts-2k.csv and papers-2k.csv ?

@1848798517
Copy link

excuse me?May I bother you to know where does your meta file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants