-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
57 lines (57 loc) · 2.37 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- ==============================================================
-- Albert-Ludwigs-Universitaet Freiburg --
-- Database and Information Systems group --
-- Georges-koehler-Allee 51, 79110 Freiburg, Germany --
-- email: [email protected] --
-- ==============================================================
This dataset contains information collected from citeulike website (http://www.citeulike.org). A website for helping researchers
keep track of relevant scientific papers. Users can build their personalized libraries by adding selected papers to their
libraries and annotate them with personalized tags tags.
This dataset records information about a set of users, their libraries, and a set of scientific publications (papers).
# users = 28416
# papers = 172079
# min library size = 10
# max library size = 2000
# min paper popularity = 3
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Files structures:
----------------------------------------------------------------------------------------------------------------------------
- papers.csv
This file records papers information:
Format:
- Comma delimited (CSV file)
- No header
Fields:
1 - paper_id
2 - type
3 - journal
4 - bookـtitle
5 - series
6 - publisher
7 - pages
8 - volume
9 - number
10 - year
11 - month
12 - postedat
13 - address
14 - title
15 - abstract
----------------------------------------------------------------------------------------------------------------------------
- users_libraries.txt
This file records users ratings (libraries), it reports users and their paper libraries
semi-colon to separate user hash with library, comma to separate the IDs in the library,
Format:
- No header
- user_hash_id; comma separated list of paper_id's
Fields:
1 - user_hash_id
2 - user library: comma separated list of paper_id's
----------------------------------------------------------------------------------------------------------------------------
- stopwords_en.txt
This file contains the list of stop words in English
Format:
- No header
Fields:
- Single column ocntains the stop word