- Projects for each language
- Frequency distribution for each language, n-gram with n = 1 to 10
- Code to calculate cross-entropy. If you do not want to calculate the cross-entropy values yourself, please follow this link to get the values calculated by us.
RQ2, Artificial Repetition: how repetitive and predictable is code once we remove language specific tokens?
- Frequency distribution for each language without separators, operators, keywords, all language specific
- List of separators, operators, keywords for each language.
- Code to calculate entropy
- Frequency distribution for each language with only Java API elements, n-grams with n = 1 to 10
- List of Java API elements considered
- Code to calculate entropy
- List of tokens to ensure that n-grams and graphs contain the same set of elements (ie control flow, Java API).
- Frequency distribution of Java graphs with nodes n = 2 to 4
- Frequency distribution of Java tokens that are included in graphs with n-grams with n = 2 to 4
- Graph database is available, but tools to extract graphs must be be obtained from Dr. Tien Nguyen
- Code to calculate occurence frequency for graphs