-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Date Filtering and Character Merging to Twitter Scraper #6
base: main
Are you sure you want to change the base?
Conversation
- Add date filtering for tweet collection - Add character merging capability - Add merge-characters npm script - Update README with new features
- Add GenerateMergedCharacter.js for merged Virtuals character cards - Update TwitterPipeline.js with merge functionality - Add generate-merged-virtuals script to package.json - Update README with merged character generation docs
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Summary
This PR adds two key enhancements to the Twitter scraper pipeline:
Date Range Filtering
Users can specify
--start-date
and--end-date
to only collect tweets within a given time window.Character Merging
– Allows merging tweets from multiple accounts into a single “character,” with options to filter retweets, sort by engagement, and more.
These changes unify the new functionality into the existing pipeline and preserve backward compatibility.
Changes
src/twitter/TwitterPipeline.js
src/twitter/merge_characters.js
General
Testing
Local Tests
npm run twitter -- <username> --start-date 2024-10-01 --end-date 2025-01-30
to confirm date filtering.Merging Characters
Used npm run merge_characters (or equivalent script) to merge multiple accounts.
Checked that merge_stats.json and merged_tweets.json were correctly generated.
No Breaking Changes
Existing usage (without date arguments) still collects full tweet history.
If fallback isn’t triggered, primary flow remains the same.
Additional Context