Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Given a list of "seed users", get all users within N links #1

Open
DonaldTsang opened this issue Sep 9, 2020 · 4 comments
Open

Comments

@DonaldTsang
Copy link

If I were to have a list of people from a single community, would it be possible to find N-distance connections of those users and save them down into a file (follow list of everyone involved)? That would be useful for community detection.

@narendraj9
Copy link
Owner

I think a breadth-first search from each user, while maintaining the set of members in the overall community, should work.

@DonaldTsang
Copy link
Author

DonaldTsang commented Sep 10, 2020

How do you stop it at a certain breadth? (e.g. A-B-C-D or three degrees?)

Also, how would de-duplication and distance updating works (e.g. W is 1 away from X, 2 away from Y and 3 away from Z, and X to Z are all part of the seed list)?

Lastly, how do you export that into JSON or some other data format (dictionary of user-id as key, to list of user-id as value)?

regarding visualization https://github.com/jvallyea/Mapping-Social-Media and https://github.com/timbennett/twitter-chat-networks and https://github.com/mgmacias95/TwitterFriends and https://github.com/SadeghHayeri/Twitter-Friend-Connections
regarding storing data someone made a thing for CSV (but only a single layer) https://github.com/ian-nai/Twitter-Friends-Scraper
and maybe multiple https://github.com/DocNow/foaf

@narendraj9
Copy link
Owner

narendraj9 commented Sep 10, 2020

How do you stop it at a certain breadth? (e.g. A-B-C-D or three degrees?)

Breath-first search can be used to compute shortest distance from a node in a graph. So, as you traverse the graph you can compute the distance from the starting node and avoid traversing the outgoing edges once you have reached a node with a distance of X units from the source node.

Also, how would de-duplication and distance updating works (e.g. W is 1 away from X, 2 away from Y and 3 away from Z, and X to Z are all part of the seed list)?

If I understand you correctly about de-duping the members, I think a "set" data-structure will take care of that. I assumed that the final output of the algorithm would be this set which contains the members of the community.

Lastly, how do you export that into JSON or some other data format (dictionary of user-id as key, to list of user-id as value)?

Not sure what exactly you mean here.

@DonaldTsang
Copy link
Author

Not sure what exactly you mean here.

I am using this to scrape a Twitter follow network for Community Detection and Role Similarity/Discovery/Detection. So for every user I need its follow links saved as some kind of file for ease of storage and use in NetworkX or iGraph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants