Jupyter Notebooks used to compute and visualize data used in the "How I Learned to Stop Worrying and Love ChatGPT" paper submitted and accepted for MSR'24 Mining Challenge https://2024.msrconf.org/track/msr-2024-mining-challenge
This directory includes the following notebooks:
-
analyze_commit_sharings_agg.ipynb
includes simple statistical analysis of the results of the 'commit_agg' stage in DVC pipeline, saved in../data/interim/commit_sharings_df.csv
file. Not used directly by the paper. -
analyze_changes_survival.ipynb
performs survival analysis of changed lines (including separately for changed lines with change inspired1 by ChatGPT conversation), where line "survives" if it is present in current (HEAD) state of the project. The Fig. 1(c) comes from this notebook. -
repositories.ipynb
does the statistical analysis (which includes computing confidence intervals using bootstrapping) of the results of 'repo_stats_git' and 'repo_stats_github' stages in DVC pipeline. Used to create Table 2. -
DevGPT_conversations_stats.ipynb
does the statistical analysis (with bootstrap) of the results of various '*_survival' stages in DVC pipeline, and computes various statistics of the DevGPT dataset. Used to create Table 1. -
compare.ipynb
computes similarities between lines in either pre-image (+context) or post-image of the relevant changeset2, and either prompt, answer, or blocks of code in ChatGPT conversation (via DevGPT dataset). The Fig. 1(a) and the Mermaid source for base of Fig. 1(b) come from this notebook.
Footnotes
-
The changed line is considered "inspired" by ChatGPT conversation if it is similar to some line either in DevGPT answer, or in DevGPT code block. ↩
-
Relevant changeset is the diff of commit in commit sharings, and changes brought by the pull request in PR sharings; issue sharings are handled like commit or pull request closing them. ↩