Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding more details and data to cat plots #21

Merged
merged 1 commit into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ Further reading on the use of colour:
- [Coloring for Colorblindness](https://davidmathlogic.com/colorblind/)
- [The misuse of colour in science communication](https://www.nature.com/articles/s41467-020-19160-7)
- [Scientific colour maps user guide](https://www.fabiocrameri.ch/ws/media-library/ce2eb6eee7c345f999e61c02e2733962/readme_scientificcolourmaps.pdf)
- [Seaborn colour palettes guide](https://seaborn.pydata.org/tutorial/color_palettes.html)


## Rule 7: Do Not Mislead the Reader
Expand Down
4 changes: 2 additions & 2 deletions docs/02-first-plot.md
Original file line number Diff line number Diff line change
Expand Up @@ -364,9 +364,9 @@ We will look at this in more detail in a [later session](https://arctraining.git

### Colour

Matplotlib automatically assigns a different colour to the different data series, but this only one of the ways data can be encoded. We will delve into using colour more deeply over the next few sessions, but there are a few key points to note now:
Matplotlib automatically assigns a different colour to the different data series, but this only one of the ways data can be encoded. The Seaborn documentation provides a really useful [introduction to colour](https://seaborn.pydata.org/tutorial/color_palettes.html). We will delve into using colour more deeply over the next few sessions, but there are a few key points to note now:

- Hue shouldn't be the *only* way you visually encode data; it can be difficult to distinguish on some screens or when printed, might be illegible if printed in greyscale, and could be inaccessible to someone with a colour vision deficiency. In the session ["Making comparisons: line plots"](https://arctraining.github.io/data-vis/03-comparison-line-plots.html#part-2-changing-default-settings) we will look more closely at colour choices.
- Hue usually shouldn't be the *only* way you visually encode data; it can be difficult to distinguish on some screens or when printed, might be illegible if printed in greyscale, and could be inaccessible to someone with a colour vision deficiency. In the session ["Making comparisons: line plots"](https://arctraining.github.io/data-vis/03-comparison-line-plots.html#part-2-changing-default-settings) we will look more closely at colour choices.

### Shape

Expand Down
118 changes: 115 additions & 3 deletions docs/05-composition-bar-charts.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,8 +337,6 @@ Using the colour palette in the plot above reinforces stereotyping. From the Ame

![alt text](image-1.png)

![alt text](image-2.png)

```{admonition} Political associations
:class: dropdown
Colour has strong associations with different political parties, which can vary with geographical location.
Expand Down Expand Up @@ -383,13 +381,127 @@ Stacked bar charts are a popular way of representing compositional data and expl
- Are your data really served by stacking the bars or by finding a different way to represent this data?
- If your data are not absolute values and instead are the result of statistical calculations (e.g. they are the mean value for a certain category), look at plots such as [boxplots](https://seaborn.pydata.org/examples/grouped_boxplot.html) or [violin plots](https://seaborn.pydata.org/examples/grouped_violinplots.html).

## Using Seaborn to build barplots

The built in plotting options in pandas are a little bit limited, so lets see what seaborn offers for [barplots](https://seaborn.pydata.org/generated/seaborn.barplot.html).

```python
fig, ax = plt.subplots()
sns.barplot(data[["Results A", "Results B","Results C"]], ax=ax)
```

Using the default settings creates a pretty basic barplot, with one main difference when compared to the pandas barplot: the seaborn version includes an error bar. If we look at the [documentation](https://seaborn.pydata.org/generated/seaborn.barplot.html), we can see that the default setting for the `errorbar` argument is `('ci', 95)`: a 95% confidence interval. You can read more about the different error bar options [here](https://seaborn.pydata.org/tutorial/error_bars.html).

We can easily change the error bars to show standard deviation:

```python
sns.barplot(data[["Results A", "Results B","Results C"]], ax=ax, errorbar="sd")
```

Note that the error estimations for this dataset are very large due to the random nature of the dataset! The error bars can also be switched off by passing the option `None`:

```python
sns.barplot(data[["Results A", "Results B","Results C"]], ax=ax, errorbar=None)
```

We can pass in a seaborn colour palette, or use one of the custom ones we built:

```python
sns.barplot(data[["Results A", "Results B","Results C"]], ax=ax, errorbar="sd", palette="rocket")
```

Another thing you'll notice is that Seaborn isn't picking up on the categories, like pandas did. This is because the table is considered "wide form" data, while seaborn needs "long form" data.

This requires a little bit of dataframe wrangling.

First, lets have a look at the shape of the dataframe, by either calling just the
```python
data[["Results A", "Results B","Results C"]]
```

It should look something like this:

```bash
Results A Results B Results C
Cat 1 1 2 12
Cat 2 27 35 30
Cat 3 32 15 20
Cat 4 15 18 27
```

We can now use the `unstack` function and `reset_index` function to reshape it:

```python
new_data = data[["Results A", "Results B","Results C"]].unstack().reset_index()
```

This `new_data` dataframe will look something like this:

```bash
level_0 level_1 0
0 Results A Cat 1 1
1 Results A Cat 2 27
2 Results A Cat 3 32
3 Results A Cat 4 15
4 Results B Cat 1 2
5 Results B Cat 2 35
6 Results B Cat 3 15
7 Results B Cat 4 18
8 Results C Cat 1 12
9 Results C Cat 2 30
10 Results C Cat 3 20
11 Results C Cat 4 27
```

We can give the new columns more sensible names:

```python
new_data = new_data.rename(columns={"level_0": "Category", "level_1": "Result Group", 0: "Value"})
```

Which should now give us a dataframe that looks like this:

```bash
Category Result Group Value
0 Results A Cat 1 1
1 Results A Cat 2 27
2 Results A Cat 3 32
3 Results A Cat 4 15
4 Results B Cat 1 2
5 Results B Cat 2 35
6 Results B Cat 3 15
7 Results B Cat 4 18
8 Results C Cat 1 12
9 Results C Cat 2 30
10 Results C Cat 3 20
11 Results C Cat 4 27
```

We can then plot our new longform data using seaborn:

```python
sns.barplot(new_data, palette="rocket", y = "Value", x="Result Group", hue="Category")
```

![alt text](image-12.png)

While this is a lot of work to wrangle the dataframe, to essentially reproduce the plots we made above, it is useful to understand how to reform your dataframes to work with seaborn. This new dataframe will now be easy to use with a wide range of seaborn plots, such as a [box plot](https://seaborn.pydata.org/tutorial/categorical.html):


```python
sns.catplot(new_data, kind="box", y="Value", x="Result Group", hue="Result Group", palette="rocket",)
```

![alt text](image-13.png)


```{admonition} Key Points
:class: tip
- Absolute and proportional bar charts can highlight and disguise different relationships between data
- Bar charts are useful when the value zero is important in comparing groups - use a different visualisation type if you feel the need to move the y-limit above zero to highlight your results
- Do not imply order in unordered variables through the use of colour or non-strategic spatial ordering on the page (use alphabetical or numerical ordering to avoid biases)

Further reading: The American Chemical Society [Data visualisation inclusivity style guide](https://www.acs.org/about/diversity/inclusivity-style-guide/data-visualization.html)
Further reading: The American Chemical Society [Data visualisation inclusive style guide](https://www.acs.org/about/diversity/inclusivity-style-guide/data-visualization.html)
```


Binary file added docs/image-12.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/image-13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading