-
Notifications
You must be signed in to change notification settings - Fork 0
Aishwarya22-06/Distributed-System-Using-Spark
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
-- ============================================================== -- Albert-Ludwigs-Universitaet Freiburg -- -- Database and Information Systems group -- -- Georges-koehler-Allee 51, 79110 Freiburg, Germany -- -- email: [email protected] -- -- ============================================================== This dataset contains information collected from citeulike website (http://www.citeulike.org). A website for helping researchers keep track of relevant scientific papers. Users can build their personalized libraries by adding selected papers to their libraries and annotate them with personalized tags tags. This dataset records information about a set of users, their libraries, and a set of scientific publications (papers). # users = 28416 # papers = 172079 # min library size = 10 # max library size = 2000 # min paper popularity = 3 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Files structures: ---------------------------------------------------------------------------------------------------------------------------- - papers.csv This file records papers information: Format: - Comma delimited (CSV file) - No header Fields: 1 - paper_id 2 - type 3 - journal 4 - bookـtitle 5 - series 6 - publisher 7 - pages 8 - volume 9 - number 10 - year 11 - month 12 - postedat 13 - address 14 - title 15 - abstract ---------------------------------------------------------------------------------------------------------------------------- - users_libraries.txt This file records users ratings (libraries), it reports users and their paper libraries semi-colon to separate user hash with library, comma to separate the IDs in the library, Format: - No header - user_hash_id; comma separated list of paper_id's Fields: 1 - user_hash_id 2 - user library: comma separated list of paper_id's ---------------------------------------------------------------------------------------------------------------------------- - stopwords_en.txt This file contains the list of stop words in English Format: - No header Fields: - Single column ocntains the stop word
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published