Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweetset_loader tweet count is misleading #169

Open
kerchner opened this issue Apr 20, 2022 · 0 comments
Open

tweetset_loader tweet count is misleading #169

kerchner opened this issue Apr 20, 2022 · 0 comments

Comments

@kerchner
Copy link
Member

kerchner commented Apr 20, 2022

tweetset_loader looks at all files in the folder and simply counts lines in the files and produces a message at the console such as:

INFO:__main__:Counting tweets in 34 files.
INFO:__main__:191,631 total tweets

Following our documentation for loading to tweetsets results in the creation of other files in the folder that should not be counted, such as files containing concatenated contents from all of the tweet ID files, etc. - the result being that tweetset_loader counts lines in more files than necessary, leading to a wildly inaccurate tweet count.

Relevant code is here:
https://github.com/gwu-libraries/TweetSets/blob/master/tweetset_loader.py#L319-L322

Since this is a back-end function, I would suggest simply making the message less specific, rather than spending effort to make it more precise. This will at least avoid creating the appearance to the person invoking the load that something isn't correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant