Skip to content

Commit

Permalink
Merge pull request #84 from ncusi/documentation - docs/contributors_g…
Browse files Browse the repository at this point in the history
…raph.md

Documentation: Add docs/contributors_graph.md, enhance README.md
  • Loading branch information
jnareb authored Dec 24, 2024
2 parents 6d710d0 + b16359f commit e184deb
Show file tree
Hide file tree
Showing 26 changed files with 487 additions and 94 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,11 @@ See the basic [demo on Heroku](https://patchscope-9d05e7f15fec.herokuapp.com/):
- <img height="20" src="favicon.png" width="20" />&nbsp;[Contributors Graph app](https://patchscope-9d05e7f15fec.herokuapp.com/contributors)
- <img height="20" src="favicon-author.png" width="20" />&nbsp;[Author Statistics app](https://patchscope-9d05e7f15fec.herokuapp.com/author)

You can find description of those two apps, with screenshots, at

- `docs/author_statistics.md` (**TODO**)
- [`docs/contributors_graph.md`](docs/contributors_graph.md)

[Panel]: https://panel.holoviz.org/


Expand Down
15 changes: 9 additions & 6 deletions data/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
The `stats/` subdirectory contains JSON files generated with the `diff-gather-stats`
script and its various subcommand, processing the output of running
`diff-annotate from-repo` command, found in `annotations/` subdirectory.
Those files were generated by running the [DVC][] [pipeline][[dvc-pipelines]]
(which is defined in the [`/dvc.yaml`](../../dvc.yaml) file) with the `dvc repro` command.
Those files were generated by running the [DVC][] [pipeline][dvc-pipelines]
(which is defined in the [`/dvc.yaml`](../../dvc.yaml) file), with the `dvc repro` command.

Using DVC pipeline makes it possible to regenerate only those files
that need it, and re-run all stages that need it.
Expand Down Expand Up @@ -33,6 +33,9 @@ Variables in the DAG of DVC stages above:
- 1: [qtile](https://github.com/qtile/qtile) repository
- 1.c: all authors in qtile repository, no merge commits

You can also see whole up-to-date _interactive_ graph of stages
and their dependencies at <https://dagshub.com/ncusi/PatchScope#repo-graph-view>.

Those files are being analyzed by Jupyter notebooks in the
[`/notebooks/`](../../notebooks) directory,
see [`/notebooks/README.md`](../../notebooks/README.md).
Expand All @@ -52,11 +55,11 @@ Other repositories were selected by authors of this project:
- [Qtile](https://github.com/qtile/qtile): A full-featured, hackable tiling window manager written and configured in Python<br>
This repo is a medium-sized, but quite active project.

Repositories are cloned into `~/example_repositories/`
which links to `/mnt/data/python-diff-annotator/example_repositories/`
on `przybysz` (access via SSH using VPN).
Repositories are cloned into `~/example_repositories/`.
On authors workstation this directory is a symbolic link to
`/mnt/data/python-diff-annotator/example_repositories/` directory.

This can be done by running the "clone" stage of the DVC pipeline.
This operation can be done by running the "clone" stage of the DVC pipeline.

**NOTE:** all commands are assumed to be run from the _top directory_
of the project, not from its `examples/stats/` subdirectory.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/assets/export-icon.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
399 changes: 399 additions & 0 deletions docs/contributors_graph.md

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ edit_uri: "" # turn off the feature
nav:
- home: index.md
#- installation: installation.md
#- usage: usage.md
- usage:
- contributors_graph.md
#- modules: api.md
#- contributing: contributing.md
#- authors: authors.md
Expand Down Expand Up @@ -80,6 +81,7 @@ markdown_extensions:
custom_checkbox: true # replace native checkbox styles with icons
- admonition # admonitions (or call-outs), !!! note "optional title"...
- attr_list # add HTML attributes and CSS classes to almost every element
- footnotes # footnoted[^1] ... [^1]: footnote text
- toc: # automatically generate a table of contents
baselevel: 2
permalink: true
Expand Down
39 changes: 39 additions & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,45 @@ This directory includes the following notebooks:
and it's various subcommands, using annotations generated with
`diff-annotate from-repo`.

## `panel` and `bokeh` subdirectories

The **`bokeh/`** subdirectory contains some Python scripts that
create various interactive plots to visualize annotation results.
Plots are created using the [Bokeh][][^bokeh] library.

[^bokeh]: Bokeh is an interactive visualization library for modern web browsers.

The **`panel/`** subdirectory contains Jupyter notebooks (`*.ipynb`)
and Python scripts (`*.py`), which use [Panel][][^panel] (member of the [HoloViz][] ecosystem)
to explore various way of interactively visualizing and analysing
annotation results; though some are there just to explore [Panel][] features.
See its [`panel/README.md`](./panel/README.md).

[^panel]: Panel is an open-source Python library designed to streamline the development of robust tools, dashboards, and complex applications entirely within Python.

[Bokeh]: https://bokeh.org/
[Panel]: https://panel.holoviz.org/
[HoloViz]: https://holoviz.org/

## `experiments` subdirectory

The **`experiments/` subdirectory contains Jupyter notebooks that are part
of comparing automatic line annotations from this tool (PatchScope), with
different datasets that include manual line annotations.

- [`00-HaPy_Bug-Paper.ipynb`](./experiments/00-HaPy_Bug-Paper.ipynb)
reproduces results in the HaPy-Bug paper[^hapy-bug].
- [`01-compare_annotations.ipynb`](./experiments/01-compare_annotations.ipynb)
compares automatic annotations from PatchScope with manual annotations
in BugsInPy subset of HaPy-Bug dataset.
- [`02-compare_annotations_Herbold.ipynb`](./experiments/02-compare_annotations_Herbold.ipynb)
compares automatic annotations from PatchScope with manual annotations
from Herbold et al. paper[^herbold].

[^hapy-bug]: Piotr Przymus, Mikołaj Fejzer, Jakub Narębski, Radosław Woźniak, Łukasz Halada, Aleksander Kazecki, Mykhailo Molchanov and Krzysztof Stencel _"HaPy-Bug – Human Annotated Python Bug Resolution Dataset"_ (2024)

[^herbold]: Steffen Herbold et al. _"A fine-grained data set and analysis of tangling in bug fixing commits"_ https://doi.org/10.1007/s10664-021-10083-5

## Running notebooks

If needed, install required packages with
Expand Down
115 changes: 28 additions & 87 deletions src/diffinsights_web/README.md
Original file line number Diff line number Diff line change
@@ -1,106 +1,47 @@
# DiffInsights - web interface for analyzing DiffAnnotator results

This directory includes various web dashboards
that demonstrate how one can use the **`diffanotator`** project.
The `src/diffinsights_web/` subdirectory in PatchScope sources
includes various web dashboards that demonstrate
how one can use the **`PatchScope`** project.

All web applications in this directory use
the [HoloViz Panel][Panel] framework.

In all cases, plots and diagrams shown in those web apps
are created from files generated with PatchScope scripts
from selected repository.

## Contributors graph
[Panel]: https://panel.holoviz.org/ "Panel: The Powerful Data Exploration & Web App Framework for Python"

## Contributors Graph

You can run this app with `panel serve src/diffinsights_web/apps/contributors.py`
from the top directory of PatchScope sources.

The demo of this app is also available at
<https://patchscope-9d05e7f15fec.herokuapp.com/contributors>.

This dashboard is meant to be
enhanced version of the Contributors subpage
in the Insights tab
for the GitHub repository
(example: <https://github.com/qtile/qtile/graphs/contributors>)

Below there is a
simplified graph of dependencies between
- functions (rounded rectangle),
- widgets (hexagons, in green), and
- outputs ("subroutine" shape, in blue)
in `02-contributors_graph.py`:
```mermaid
flowchart TD
classDef widgetClass fill:#9f8;
classDef finalClass fill:#cef;
select_file_widget{{"input JSON file"}}
select_repo_widget{{"repository"}}
resample_frequency_widget{{"frequency"}}
select_period_from_widget{{"Period:"}}
select_contribution_type_widget{{"Contributions:"}}
It provides plots (like weekly number of commits) for the whole selected repository,
and individually for each of the top-N most active authors.

class select_file_widget widgetClass
class select_repo_widget widgetClass
class resample_frequency_widget widgetClass
class select_period_from_widget widgetClass
class select_contribution_type_widget widgetClass
## Author statistics

find_dataset_dir("`find_dataset_dir()`")
find_timeline_files("`find_timeline_files(dataset_dir)`")
get_timeline_data("`get_timeline_data(json_path)`")
find_repos("`find_repos(timeline_data)`")
get_timeline_df("`get_timeline_df(timeline_data, repo)`")
authors_info_df("`authors_info_df(timeline_df, column, from_date)`")
resample_timeline("`resample_timeline(timeline_df, resample_rate, group_by)`")
%% add_pm_count_perc("`add_pm_count_perc(resampled_df)`")
%% filter_df_by_from_date("`filter_df_by_from_date(resampled_df, from_date, date_column)`")
get_date_range("`get_date_range(timeline_df, from_date)`")
get_value_range("`get_value_range(resampled_df, column)`")
%% head_info(["`head_info(repo, resample_rate)`"])
%% sampling_info(["`sampling_info(resample_rate, column, date_range)`"])
%% author_info(["`author_info(authors_df, author)`"])
plot_commits[["`plot_commits(resampled_df, column, from_date)`"]]
authors_cards[["`authors_cards(authors_df, resample_by_author_df, top_n)`"]]
You can run this app with `panel serve src/diffinsights_web/apps/author.py`
from the top directory of PatchScope sources.

class sampling_info finalClass
class head_info finalClass
class plot_commits finalClass
class authors_cards finalClass
The demo of this app is also available at
<https://patchscope-9d05e7f15fec.herokuapp.com/author>.

resample_frequency_widget -.-> resample_timeline
This dashboard currently is a cross between plots from GitHub Insights,
but limited to selected user, with some extra plots that make sense
only for individual author.

find_dataset_dir --> find_timeline_files
get_timeline_data --> find_repos
get_timeline_data --> get_timeline_df
get_timeline_df --> resample_timeline
get_timeline_df --> authors_info_df
%% get_timeline_df --> get_date_range
%% resample_timeline --> add_pm_count_perc
resample_timeline --> plot_commits
%% resample_timeline --> filter_df_by_from_date
resample_timeline --> get_date_range
resample_timeline --> get_value_range
resample_timeline --> authors_cards
%% get_date_range --> plot_commits
get_date_range --> authors_cards
%% get_date_range --> sampling_info
%% get_value_range --> plot_commits
get_value_range --> authors_cards
%% authors_info_df --> author_info
authors_info_df --> authors_cards
find_timeline_files ---> select_file_widget
find_repos ---> select_repo_widget
select_file_widget -.-> get_timeline_data
%% select_repo_widget -.-> head_info
select_repo_widget -.-> get_timeline_df
%% resample_frequency_widget -.-> head_info
%% resample_frequency_widget -.-> sampling_info
select_period_from_widget -.-> authors_info_df
select_period_from_widget -.-> get_date_range
select_period_from_widget -.-> plot_commits
select_contribution_type_widget -.-> authors_info_df
select_contribution_type_widget -.-> get_value_range
%% select_contribution_type_widget -.-> sampling_info
select_contribution_type_widget -.-> plot_commits
plot_commits ---o authors_cards
linkStyle 23 stroke:#ff3,stroke-width:4px,color:red;
```

[Panel]: https://panel.holoviz.org/ "Panel: The Powerful Data Exploration & Web App Framework for Python"
Example of the latter is the heatmap plot that examines
what days of the week and which hours of day dominate
in given author contributions commit author date.

0 comments on commit e184deb

Please sign in to comment.