Skip to content

Commit

Permalink
refactor stats modules to allow inheritance (#4)
Browse files Browse the repository at this point in the history
* refactor stats modules to allow inheritance

* updates after arviz-base and recent releases

* add histogram and more flexibility in make_ufunc

* add documentation start

* update datatree accessor

* prepare release
  • Loading branch information
OriolAbril authored Jun 20, 2024
1 parent a40917b commit fdc4eb5
Show file tree
Hide file tree
Showing 33 changed files with 2,399 additions and 4,875 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ repos:
rev: v0.3.1
hooks:
- id: absolufy-imports
args: ["--never", "--application-directories", "src"]
args: ["--application-directories", "src"]
files: ^src/arviz_stats/.+\.py$

- repo: https://github.com/MarcoGorelli/madforhooks
Expand Down
3 changes: 2 additions & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
python: "3.11"

sphinx:
fail_on_warning: true
Expand All @@ -16,3 +16,4 @@ python:
path: .
extra_requirements:
- doc
- xarray
Binary file added docs/source/_static/ArviZ.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/ArviZ_white.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions docs/source/_static/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
html[data-theme="light"] {
--pst-color-primary: rgb(11 117 145);
--pst-color-secondary: rgb(238 144 64);
}

html[data-theme="dark"] {
--pst-color-primary: rgb(0 192 191);
--pst-color-secondary: rgb(238 144 64);
}
Binary file added docs/source/_static/favicon.ico
Binary file not shown.
5 changes: 5 additions & 0 deletions docs/source/_templates/name.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<div class="sd-d-flex-row sd-align-major-spaced sd-align-minor-center">
<a href="https://arviz-base.readthedocs.io"><div class="sd-fs-6 sd-font-weight-lighter sd-text-muted">BASE</div></a>
<a href="{{ pathto(root_doc) }}"><div class="sd-fs-3 sd-font-weight-bold sd-text-primary">STATS</div></a>
<a href="https://arviz-plots.readthedocs.io"><div class="sd-fs-6 sd-font-weight-lighter sd-text-muted">PLOTS</div></a>
</div>
81 changes: 68 additions & 13 deletions docs/source/api/index.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,90 @@
# API reference

## Accessors
Currently, using accessors is the recommended way to call functions from `arviz_stats`.

### Dataset accessors

```{eval-rst}
.. autosummary::
:toctree: generated/
:template: autosummary/accessor_attribute.rst
xarray.Dataset.azstats.ds
.. autosummary::
:toctree: generated/
:template: autosummary/accessor_method.rst
xarray.Dataset.azstats.filter_vars
xarray.Dataset.azstats.eti
xarray.Dataset.azstats.hdi
xarray.Dataset.azstats.ess
xarray.Dataset.azstats.rhat
xarray.Dataset.azstats.mcse
xarray.Dataset.azstats.kde
xarray.Dataset.azstats.histogram
xarray.Dataset.azstats.ecdf
```



## DataArray facing functions

### Base submodule

```{eval-rst}
.. autosummary::
:toctree: generated/
arviz_base.convert_to_dataset
arviz_base.convert_to_datatree
arviz_base.dict_to_dataset
arviz_base.extract
arviz_base.generate_dims_coords
arviz_base.make_attrs
arviz_stats.base.dataarray_stats.eti
arviz_stats.base.dataarray_stats.hdi
arviz_stats.base.dataarray_stats.ess
arviz_stats.base.dataarray_stats.rhat
arviz_stats.base.dataarray_stats.mcse
arviz_stats.base.dataarray_stats.histogram
arviz_stats.base.dataarray_stats.kde
```

### Numba submodule
The numba accelerated computations are available as the same methods
but of the `arviz_stats.numba.dataarray_stats` class.
Both their API and implementation is the same as for the base module,
the only difference being that one calls `arviz_stats.base.array_stats`
for array facing functions whereas the other one calls `arviz_stats.numba.array_stats`.

Implementation differences are thus documented below, at the array facing classes.


## Array facing functions

## Example datasets
### Base submodule

```{eval-rst}
.. autosummary::
:toctree: generated/
arviz_base.load_arviz_data
arviz_base.list_datasets
arviz_base.get_data_home
arviz_base.clear_data_home
arviz_stats.base.array_stats.eti
arviz_stats.base.array_stats.hdi
arviz_stats.base.array_stats.ess
arviz_stats.base.array_stats.rhat
arviz_stats.base.array_stats.mcse
arviz_stats.base.array_stats.get_bins
arviz_stats.base.array_stats.histogram
arviz_stats.base.array_stats.kde
```

## Configuration
### Numba submodule
Some functions are accelerated internally without changes to the public API,
others are purely inherited from the base backend, and a last group is partially
or completely reimplemented. This last group is documented here:

```{eval-rst}
.. autosummary::
:toctree: generated/
arviz_base.rc_context
arviz_stats.numba.array_stats.quantile
arviz_stats.numba.array_stats.histogram
arviz_stats.numba.array_stats.kde
```
37 changes: 37 additions & 0 deletions docs/source/background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Architecture
Currently `arviz_stats` has some top level general functionality, and then submodules
that take care of the actual computations. Submodules can be completely independent or
build on top of one another.

## Top level functionality
This includes the accessors for dataarray, Dataset and Datatree objects,
the dispatcher mechanism and some general dataclasses and utilities like `ELPDData`.

## Computation submodules
Computation submodules are structured into two main classes: an array facing class and a dataarray facing class. All submodules should in principle have both available.

The array facing class takes array_like inputs, and aims to have an API similar to NumPy/SciPy.
It can be used independently of the dataarray class (to the point of not needing to have
`arviz_base` nor `xarray` installed) but it is (consequently) lower level interface.
There are more required arguments, no `rcParams` integration...

To make integration with the dataarray facing class easier, all functions should take
`axes` (or equivalent arguments) which should allow either integers or sequences of axes
for the functions to work over.
It is also imperative that whenever new axes are added to an array,
these are added as ending dimensions.

The dataarray facing class, builds on top of the array facing one, and takes
{class}`~xarray.DataArray` inputs aiming to have a more xarray-like API.
As the array facing class API is defined and should be common between submodules,
that means that this class can very often be limited to an instance of the base array facing class.


### Base (aka numpy+scipy)
This is the core backend which should have most functionality available and that defines
the general API for both array and dataarray facing classes.

### Numba
The numba submodule builds on top of the base submodule, using numba to both
accelerate computations and generate better behaved ufuncs, ensuring compatibility
with Dask for example.
35 changes: 29 additions & 6 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
import os
from importlib.metadata import metadata

import sphinx_autosummary_accessors

# -- Project information

_metadata = metadata("arviz-base")
_metadata = metadata("arviz-stats")

project = _metadata["Name"]
author = _metadata["Author-email"].split("<", 1)[0].strip()
Expand Down Expand Up @@ -35,9 +37,10 @@
"sphinx_copybutton",
"sphinx_design",
"jupyter_sphinx",
"sphinx_autosummary_accessors",
]

templates_path = ["_templates"]
templates_path = ["_templates", sphinx_autosummary_accessors.templates_path]

exclude_patterns = [
"Thumbs.db",
Expand All @@ -54,10 +57,13 @@
# -- Options for extensions

extlinks = {
"issue": ("https://github.com/arviz-devs/arviz-base/issues/%s", "GH#%s"),
"pull": ("https://github.com/arviz-devs/arviz-base/pull/%s", "PR#%s"),
"issue": ("https://github.com/arviz-devs/arviz-stats/issues/%s", "GH#%s"),
"pull": ("https://github.com/arviz-devs/arviz-stats/pull/%s", "PR#%s"),
}

copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
copybutton_prompt_is_regexp = True

nb_execution_mode = "auto"
nb_execution_excludepatterns = ["*.ipynb"]
nb_kernel_rgx_aliases = {".*": "python3"}
Expand Down Expand Up @@ -91,5 +97,22 @@

# -- Options for HTML output

html_theme = "furo"
# html_static_path = ["_static"]
html_theme = "sphinx_book_theme"
html_theme_options = {
"logo": {
"image_light": "_static/ArviZ.png",
"image_dark": "_static/ArviZ_white.png",
}
}
html_favicon = "_static/favicon.ico"
html_static_path = ["_static"]
html_css_files = ["custom.css"]
html_sidebars = {
"**": [
"navbar-logo.html",
"name.html",
"icon-links.html",
"search-button-field.html",
"sbt-sidebar-nav.html",
]
}
40 changes: 15 additions & 25 deletions docs/source/index.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,32 @@
# arviz-base
ArviZ base features and converters.
# arviz-stats
ArviZ statistics and diagnostics functions.

## Installation

It currenly can only be installed with pip and from GitHub:

```bash
pip install arviz-base @ git+https://github.com/arviz-devs/arviz-base
pip install "arviz-stats[xarray] @ git+https://github.com/arviz-devs/arviz-stats"
```

Note that `arviz-base` is a minimal package, which only depends on
xarray (and xarray-datatree), numpy and typing-extensions.
Everything else (netcdf, zarr, dask...) are optional dependencies.
This allows installing only those that are needed, e.g. if you
only plan to use zarr, there is no need to install netcdf.
Note that it is also possible to install `arviz-stats` without the `[xarray]`.
Doing that will install a minimal package, which only depends on numpy and scipy.
Consequently, the functions that take arrays as inputs will be available,
but many features won't be available. This is only recommended for libraries
to depend on so they can use diagnostics and statistical summaries but don't want
to depend on xarray.

For convenience, some bundles are available to be installed with:
```{toctree}
:hidden:
```bash
pip install "arviz-base[<option>] @ git+https://github.com/arviz-devs/arviz-base"
api/index
```

where `<option>` can be one of:

* `netcdf`
* `h5netcdf`
* `zarr`
* `test` (for developers)
* `doc` (for developers)


You can install multiple bundles of optional dependencies separating them with commas.
Thus, to install all user facing optional dependencies you should use `xarray-einstats[einops,numba]`

```{toctree}
:caption: Background
:hidden:
api/index
background
```

```{toctree}
Expand All @@ -45,5 +35,5 @@ api/index
Twitter <https://twitter.com/arviz_devs>
Mastodon <https://bayes.club/@ArviZ>
GitHub repository <https://github.com/arviz-devs/xarray-einstats>
GitHub repository <https://github.com/arviz-devs/arviz-stats>
```
22 changes: 12 additions & 10 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,7 @@ classifiers = [
dynamic = ["version", "description"]
dependencies = [
"numpy>=1.23",
"xarray>=2022.6.0",
"xarray-datatree",
"arviz-base @ git+https://github.com/arviz-devs/arviz-base",
"xarray-einstats",
"scipy",
"numba",
"scipy>=1.10",
]

[tool.flit.module]
Expand All @@ -44,24 +39,31 @@ documentation = "https://arviz-stats.readthedocs.io"
funding = "https://opencollective.com/arviz"

[project.optional-dependencies]
xarray = [
"xarray>=2022.6.0",
"xarray-datatree",
"arviz-base==0.2",
"xarray-einstats",
"numba",
]
test = [
"hypothesis",
"pytest",
"pytest-cov",
"h5netcdf",
]
doc = [
"furo",
"sphinx-book-theme",
"myst-parser[linkify]",
"myst-nb",
"sphinx-copybutton",
"numpydoc",
"sphinx>=5",
"sphinx-design",
"jupyter-sphinx",
"netcdf4"
"netcdf4",
"sphinx_autosummary_accessors",
]
accel = [
numba = [
"numba",
"xarray_einstats[einops]",
]
Expand Down
8 changes: 6 additions & 2 deletions src/arviz_stats/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# pylint: disable=wildcard-import
"""Statistical computation and diagnostics for ArviZ."""
from .utils import *
from .accessors import *

try:
from arviz_stats.utils import *
from arviz_stats.accessors import *
except ModuleNotFoundError:
pass
2 changes: 1 addition & 1 deletion src/arviz_stats/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""ArviZ version."""
__version__ = "0.1.0"
__version__ = "0.2.0"
Loading

0 comments on commit fdc4eb5

Please sign in to comment.