Skip to content

Latest commit

 

History

History
28 lines (19 loc) · 752 Bytes

README.md

File metadata and controls

28 lines (19 loc) · 752 Bytes

Dataset: Python-code-docstring

For Python dataset, its original codes are not runnable in python3. An optional way to deal with such problem is that we can acquire runnable Python codes from raw data.

Step 1: Download pre-processed and raw (python_wan) dataset.

bash dataset/python_wan/download.sh

Step 2: Clean raw code files.

python -m dataset.python_wan.clean

Step 3: Move code/code_tokens/docstring/docstring_tokens to ~/python_wan/flatten/*.

python -m dataset.python_wan.attributes_cast

Step 4 (optional): Or you can download our processed Python(Wan) dataset

bash dataset/python_wan/lazy_download.sh