Skip to content

Commit

Permalink
Effective samples (#285)
Browse files Browse the repository at this point in the history
* changed channel_capacity() to effective_samples() to allow for kish estimate

* bumped beta version number

* updating doc string

* updated tests for effective samples code

* flak8 on tests

* updating effective samples function

* docstyle testing and flake8

* updating entropy calcualtion for effective samples

* overloading gamma and updating tests

* flake8

* pydocstyle

* Update version in README.rst

* Update _version.py

* updating treatment of gamma="kish", changing gamma=1 treatment and bug fixing sample_compression_1d test

* bug fixing in tests and propagating gamma in utils.py

* Update README.rst

* Update _version.py

* gamma --> beta

* propagating up beta parameter for functions that call smaple_sompression_1d and compress_weights

* running flake8

* allow `str(float)` input for `beta`, useful as optional input to `ncompress` in other functions to be piped through to `effective_samples`

* change `utils.compress_weights` to only use `ncompress` kwarg (not `beta`), but if `ncompress` is a string pass on as `ncompress=effective_samples(w, beta=ncompress)`

* update `effective_samples` docstring that `beta` also takes string input

* use channel capacity (aka `'entropy'`) for compression in `compress_weights`

* use `ncompress='entropy'` as default
* map `ncompress=True` to `ncompress='entropy'`

* change `utils.sample_compression_1d` to only use `ncompress` kwarg (not `beta`)

* if `ncompress` is a string pass it on as
  `ncompress=effective_samples(w, beta=ncompress)`
* map `ncompress=True` to `ncompress='entropy'`
* update docstring accordingly

* fix typos in `effective_samples` docstring

* adapt changes to compression in `utils.py` also in `weighted_pandas.py`

* use only `ncompress` (not `beta`) in `WeightedDataFrame.compress` and
  `WeightedSeries.compress`, but allow string input for `ncompress`
  which gets piped into `beta`
* update docstrings accordingly

* remove no longer used `beta` kwarg in `sample_compression_1d` in `kde_plot_1d`

* update docstring for `ncompress` parameter in `plot.py` for new optional string input

* allow string input for `n` in `triangular_sample_compression_2d` to determine `n` through `effective_samples`

* add check to `test_triangular_sample_compression_2d` for string input to `n`

* change default back to `ncompress=True`

* expand `test_effective_samples` to also check for `str(int)` and `str(float)` input

* rewrite string option checks for `beta==1` in `effective_samples` to also work for `'1.00'`

* add tests to `test_effective_samples` for `beta='1.'` and `beta='1.00'`

* changed effective_samples -> neff

* Completed coverage

* reverting change to test_sample_compression_1d

* properly reverting change to test_sample_compression_1d

* clarify error message for `ncompress<0` and make more in line with similar error messages about anesthetic 1.0 behaviour

* unify and clean up `neff` docstring

* further simplify neff docstring by reducing repetition

* change defaults for `ncompress` to `'equal'`

`'equal'` corresponds to a less arbitrary value than 1000 and a smaller
value than for `'entropy'` ensuring speedy explorative plotting.

* adapt `test_weighted_pandas` to new `ncompress` defaults for scatter plots

* version bump

* add test for `ncompress>len(w)` in `test_triangular_sample_compression_2d` to complete coverage

---------

Co-authored-by: Lukas Hergt <[email protected]>
Co-authored-by: Lukas Hergt <[email protected]>
Co-authored-by: Will Handley <[email protected]>
Co-authored-by: AdamOrmondroyd <[email protected]>
  • Loading branch information
5 people authored Jun 14, 2023
1 parent 7a64948 commit 8a93e0b
Show file tree
Hide file tree
Showing 9 changed files with 187 additions and 56 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
anesthetic: nested sampling post-processing
===========================================
:Authors: Will Handley and Lukas Hergt
:Version: 2.0.0-beta.33
:Version: 2.0.0-beta.34
:Homepage: https://github.com/handley-lab/anesthetic
:Documentation: http://anesthetic.readthedocs.io/

Expand Down
2 changes: 1 addition & 1 deletion anesthetic/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '2.0.0b33'
__version__ = '2.0.0b34'
42 changes: 33 additions & 9 deletions anesthetic/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -808,11 +808,16 @@ def kde_plot_1d(ax, data, *args, **kwargs):
weights : np.array, optional
Sample weights.
ncompress : int, default=False
ncompress : int, str, default=False
Degree of compression.
If int: number of samples returned.
If True: compresses to the channel capacity.
If False: no compression.
* If ``False``: no compression.
* If ``True``: compresses to the channel capacity, equivalent to
``ncompress='entropy'``.
* If ``int``: desired number of samples after compression.
* If ``str``: determine number from the Huggins-Roy family of
effective samples in :func:`anesthetic.utils.neff`
with ``beta=ncompress``.
nplot_1d : int, default=100
Number of plotting points to use.
Expand All @@ -837,6 +842,9 @@ def kde_plot_1d(ax, data, *args, **kwargs):
bw_method : str, scalar or callable, optional
Forwarded to :class:`scipy.stats.gaussian_kde`.
beta : int, float, default = 1
The value of beta used to calculate the number of effective samples
Returns
-------
lines : :class:`matplotlib.lines.Line2D`
Expand Down Expand Up @@ -1103,11 +1111,16 @@ def kde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs):
Has to be ordered from outermost to innermost contour.
Default: [0.95, 0.68]
ncompress : int, default=1000
ncompress : int, str, default='equal'
Degree of compression.
If int: number of samples returned.
If True: compresses to the channel capacity.
If False: no compression.
* If ``int``: desired number of samples after compression.
* If ``False``: no compression.
* If ``True``: compresses to the channel capacity, equivalent to
``ncompress='entropy'``.
* If ``str``: determine number from the Huggins-Roy family of
effective samples in :func:`anesthetic.utils.neff`
with ``beta=ncompress``.
nplot_2d : int, default=1000
Number of plotting points to use.
Expand All @@ -1133,7 +1146,7 @@ def kde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs):
data_y = data_y[weights != 0]
weights = weights[weights != 0]

ncompress = kwargs.pop('ncompress', 1000)
ncompress = kwargs.pop('ncompress', 'equal')
nplot = kwargs.pop('nplot_2d', 1000)
bw_method = kwargs.pop('bw_method', None)
label = kwargs.pop('label', None)
Expand Down Expand Up @@ -1294,6 +1307,17 @@ def scatter_plot_2d(ax, data_x, data_y, *args, **kwargs):
data_x, data_y : np.array
x and y coordinates of uniformly weighted samples to plot.
ncompress : int, str, default='equal'
Degree of compression.
* If ``int``: desired number of samples after compression.
* If ``False``: no compression.
* If ``True``: compresses to the channel capacity, equivalent to
``ncompress='entropy'``.
* If ``str``: determine number from the Huggins-Roy family of
effective samples in :func:`anesthetic.utils.neff`
with ``beta=ncompress``.
Returns
-------
lines : :class:`matplotlib.lines.Line2D`
Expand Down
2 changes: 1 addition & 1 deletion anesthetic/plotting/_matplotlib/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def _get_xticks(self, convert_period: bool = False):

def _compress_weights(kwargs, data):
if isinstance(data, _WeightedObject):
return data.compress(kwargs.pop('ncompress', True))
return data.compress(kwargs.pop('ncompress', 'equal'))
else:
return data

Expand Down
2 changes: 1 addition & 1 deletion anesthetic/samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -1100,7 +1100,7 @@ def live_points(self, logL=None):

def posterior_points(self, beta=1):
"""Get equally weighted posterior points at temperature beta."""
return self.set_beta(beta).compress(-1)
return self.set_beta(beta).compress('equal')

def prior_points(self, params=None):
"""Get equally weighted prior points."""
Expand Down
116 changes: 93 additions & 23 deletions anesthetic/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import pandas
from scipy import special
from scipy.interpolate import interp1d
from scipy.stats import kstwobign
from scipy.stats import kstwobign, entropy
from matplotlib.tri import Triangulation
import contextlib
import inspect
Expand Down Expand Up @@ -32,21 +32,77 @@ def logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False):
return_sign=return_sign)


def channel_capacity(w):
r"""Channel capacity (effective sample size).
def neff(w, beta=1):
r"""Calculate effective number of samples.
.. math::
Using the Huggins-Roy family of effective samples
(https://aakinshin.net/posts/huggins-roy-ess/).
Parameters
----------
beta : int, float, str, default = 1
The value of beta used to calculate the number of effective samples
according to
.. math::
N_{eff} &= \bigg(\sum_{i=0}^n w_i^\beta \bigg)^{\frac{1}{1-\beta}}
w_i &= \frac{w_i}{\sum_j w_j}
Beta can take any positive value. Larger beta corresponds to a greater
compression such that:
.. math::
\beta_1 < \beta_2 \Rightarrow N_{eff}(\beta_1) > N_{eff}(\beta_2)
Alternatively, beta can take one of the following strings as input:
* If 'inf' or 'equal' is supplied (equivalent to beta=inf), then the
resulting number of samples is the number of samples when compressed
to equal weights, and given by:
.. math::
w_i &= \frac{w_i}{\sum_j w_j}
H = \sum_i p_i \ln p_i
N_{eff} &= \frac{1}{\max_i[w_i]}
p_i = \frac{w_i}{\sum_j w_j}
* If 'entropy' is supplied (equivalent to beta=1), then the estimate
is determined via the entropy based calculation, also referred to as
the channel capacity:
.. math::
H &= -\sum_i p_i \ln p_i
p_i &= \frac{w_i}{\sum_j w_j}
N_{eff} &= e^{H}
* If 'kish' is supplied (equivalent to beta=2), then a Kish estimate
is computed (Kish, Leslie (1965). Survey Sampling.
New York: John Wiley & Sons, Inc. ISBN 0-471-10949-5):
.. math::
N_{eff} = \frac{(\sum_i w_i)^2}{\sum_i w_i^2}
* str(float) input gets converted to the corresponding float value.
N = e^{-H}
"""
with np.errstate(divide='ignore', invalid='ignore'):
W = np.array(w)/sum(w)
H = np.nansum(np.log(W)*W)
return np.exp(-H)
w = w / np.sum(w)
if beta == np.inf or beta == 'inf' or beta == 'equal':
return 1 / np.max(w)
elif beta == 'entropy' or beta != 'kish' and str(float(beta)) == '1.0':
return np.exp(entropy(w))
else:
if beta == 'kish':
beta = 2
elif isinstance(beta, str):
beta = float(beta)
return np.sum(w**beta)**(1/(1-beta))


def compress_weights(w, u=None, ncompress=True):
Expand All @@ -57,15 +113,21 @@ def compress_weights(w, u=None, ncompress=True):
if w is None:
w = np.ones_like(u)

if ncompress is True:
ncompress = channel_capacity(w)
if ncompress is True or isinstance(ncompress, str):
if ncompress is True:
ncompress = 'entropy'
ncompress = neff(w, beta=ncompress)
elif ncompress is False:
return w

if ncompress <= 0:
W = w/w.max()
else:
W = w * ncompress / w.sum()
# TODO: remove this in version >= 2.1
if ncompress < 0:
raise ValueError(
"ncompress<0 is anesthetic 1.0 behaviour. For equally weighted "
"samples you should now use ncompress='inf' or ncompress='equal'."
)

W = w * ncompress / w.sum()

fraction, integer = np.modf(W)
extra = (u < fraction).astype(int)
Expand Down Expand Up @@ -328,6 +390,9 @@ def triangular_sample_compression_2d(x, y, cov, w=None, n=1000):
if w is None:
w = pandas.Series(index=x.index, data=np.ones_like(x))

if isinstance(n, str):
n = int(neff(w, beta=n))

# Select samples for triangulation
if (w != 0).sum() < n:
i = x.index
Expand Down Expand Up @@ -366,9 +431,12 @@ def sample_compression_1d(x, w=None, ncompress=True):
ncompress : int, default=True
Degree of compression.
* If int: number of samples returned.
* If True: compresses to the channel capacity.
* If False: no compression.
* If ``int``: number of samples returned.
* If ``True``: compresses to the channel capacity
(same as ``ncompress='entropy'``).
* If ``False``: no compression.
* If ``str``: determine number from the Huggins-Roy family of effective
samples in :func:`neff` with ``beta=ncompress``.
Returns
-------
Expand All @@ -377,16 +445,18 @@ def sample_compression_1d(x, w=None, ncompress=True):
"""
if ncompress is False:
return x, w
elif ncompress is True:
ncompress = channel_capacity(w)
elif ncompress is True or isinstance(ncompress, str):
if ncompress is True:
ncompress = 'entropy'
ncompress = neff(w, beta=ncompress)
x = np.array(x)
if w is None:
w = np.ones_like(x)
w = np.array(w)

# Select inner samples for triangulation
if len(x) > ncompress:
x_ = np.random.choice(x, size=ncompress, replace=False)
x_ = np.random.choice(x, size=int(ncompress), replace=False)
else:
x_ = x.copy()
x_.sort()
Expand Down
34 changes: 24 additions & 10 deletions anesthetic/weighted_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from pandas.util._exceptions import find_stack_level
from pandas.util import hash_pandas_object
from numpy.ma import masked_array
from anesthetic.utils import (compress_weights, channel_capacity, quantile,
from anesthetic.utils import (compress_weights, neff as neff_, quantile,
temporary_seed, adjust_docstrings)
from pandas.core.dtypes.missing import notna

Expand Down Expand Up @@ -225,7 +225,7 @@ def reset_index(self, level=None, drop=False, inplace=False,
def neff(self, axis=0):
"""Effective number of samples."""
if self.isweighted(axis):
return channel_capacity(self.get_weights(axis))
return neff_(self.get_weights(axis))
else:
return self.shape[axis]

Expand Down Expand Up @@ -326,10 +326,17 @@ def compress(self, ncompress=True):
Parameters
----------
ncompress : int, optional
effective number of samples after compression. If not supplied
(or True), then reduce to the channel capacity (theoretical optimum
compression). If <=0, then compress so that all weights are unity.
ncompress : int, str, default=True
Degree of compression.
* If ``True`` (default): reduce to the channel capacity
(theoretical optimum compression), equivalent to
``ncompress='entropy'``.
* If ``> 0``: desired number of samples after compression.
* If ``<= 0``: compress so that all remaining weights are unity.
* If ``str``: determine number from the Huggins-Roy family of
effective samples in :func:`anesthetic.utils.neff`
with ``beta=ncompress``.
"""
i = compress_weights(self.get_weights(), self._rand(), ncompress)
Expand Down Expand Up @@ -559,10 +566,17 @@ def compress(self, ncompress=True, axis=0):
Parameters
----------
ncompress : int, optional
effective number of samples after compression. If not supplied
(or True), then reduce to the channel capacity (theoretical optimum
compression). If <=0, then compress so that all weights are unity.
ncompress : int, str, default=True
Degree of compression.
* If ``True`` (default): reduce to the channel capacity
(theoretical optimum compression), equivalent to
``ncompress='entropy'``.
* If ``> 0``: desired number of samples after compression.
* If ``<= 0``: compress so that all remaining weights are unity.
* If ``str``: determine number from the Huggins-Roy family of
effective samples in :func:`anesthetic.utils.neff`
with ``beta=ncompress``.
"""
if self.isweighted(axis):
Expand Down
Loading

0 comments on commit 8a93e0b

Please sign in to comment.