Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial documentation #4

Merged
merged 5 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/workflows/readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# .github/workflows/documentation-links.yml

name: readthedocs/actions
on:
pull_request_target:
types:
- opened
# Execute this action only on PRs that touch
# documentation files.
# paths:
# - "doc/**"

permissions:
pull-requests: write

jobs:
documentation-links:
runs-on: ubuntu-latest
steps:
- uses: readthedocs/actions/preview@v1
with:
project-slug: "xarray-ms"
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
jobs:
post_install:
- pip install poetry==1.8.3
- poetry config virtualenvs.create false
- poetry install --with docs

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py
116 changes: 116 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
xarray-ms
=========

xarray-ms presents a Measurement Set v4 view (MSv4) over
`CASA Measurement Sets <https://casa.nrao.edu/Memos/229.html>`_ (MSv2).
It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications
to be developed on well-understood MSv2 data.

.. code-block:: python

>>> import xarray_ms
>>> import xarray
>>> ds = xarray.open_dataset("/data/L795830_SB001_uv.MS/",
chunks={"time": 2000, "baseline": 1000})
>>> ds
<xarray.Dataset> Size: 70GB
Dimensions: (time: 28760, baseline: 2775, frequency: 16,
polarization: 4, uvw_label: 3)
Coordinates:
antenna1_name (baseline) object 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
antenna2_name (baseline) object 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
baseline_id (baseline) int64 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
* frequency (frequency) float64 128B 1.202e+08 ... 1.204e+08
* polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
* time (time) float64 230kB 1.601e+09 ... 1.601e+09
Dimensions without coordinates: baseline, uvw_label
Data variables:
EFFECTIVE_INTEGRATION_TIME (time, baseline) float64 638MB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
FLAG (time, baseline, frequency, polarization) uint8 5GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
TIME_CENTROID (time, baseline) float64 638MB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
UVW (time, baseline, uvw_label) float64 2GB dask.array<chunksize=(2000, 1000, 3), meta=np.ndarray>
VISIBILITY (time, baseline, frequency, polarization) complex64 41GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
WEIGHT (time, baseline, frequency, polarization) float32 20GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
Attributes:
antenna_xds: <xarray.Dataset> Size: 4kB\nDimensions: (...
version: 0.0.1
creation_date: 2024-09-10T14:29:22.587984+00:00
data_description_id: 0

Measurement Set v4
------------------

NRAO_/SKAO_ are developing a new xarray-based `Measurement Set v4 specification <msv4-spec_>`_.
While there are many changes some of the major highlights are:

* xarray_ is used to define the specification.
* MSv4 data consists of Datasets of ndarrays on a regular time-channel grid.
MSv2 data is tabular and, while in many instances the time-channel grid is regular,
this was not guaranteed, especially after MSv2 datasets had been transformed by various tasks.


xarray_ Datasets are self-describing and they are therefore easier to reason about and work with.
Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio
------

`casangi/xradio <xradio_>`_ provides a reference implementation that converts
CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore_
package.

Why xarray-ms?
--------------

* By developing against an MSv4 xarray view over MSv2 data,
developers can develop applications on well-understood data,
and then seamlessly transition to newer formats.
Data can also be exported to newer formats (principally zarr_) via xarray's
native I/O routines.
However, the xarray view of either format looks the same to the software developer.

* xarray-ms builds on xarray's
`backend API <https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html>`_:
Implementing a formal CASA MSv2 backend has a number of automatically benefits:

* Use of xarray's internal I/O routines such as ``open_dataset`` or ``to_zarr``.
* Use of xarray's `lazy loading mechanism <xarray_lazy_>`_.
* Automatic access to any `chunked array types <xarray_chunked_arrays_>`_
supported by xarray including, but not limited to dask_.
* Arbitrary chunking along any xarray dimension.

* xarray-ms uses arcae_, a high-performance backend to CASA Tables implementing
a subset of python-casacore_'s interface.
* Some limited support for irregular MSv2 data via padding.

Work in Progress
----------------

.. warning::

xarray-ms is currently under active development and does not yet
have feature parity with xradio_.

.. warning::

The Measurement Set v4 specification is currently under active development.

Most measures information and many secondary sub-tables are currently missing.
However, the most important parts of the ``MAIN`` tables,
as well as the ``ANTENNA``, ``POLARIZATON`` and ``SPECTRAL_WINDOW``
sub-tables are implemented and should be sufficient to start
developing software that uses xarray-ms.

.. _SKAO: https://www.skao.int/
.. _NRAO: https://public.nrao.edu/
.. _msv4-spec: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/
.. _xradio: https://github.com/casangi/xradio
.. _dask-ms: https://github.com/ratt-ru/dask-ms
.. _arcae: https://github.com/ratt-ru/arcae
.. _dask: https://www.dask.org/
.. _python-casacore: https://github.com/casacore/python-casacore/
.. _xarray: https://github.com/pydata/xarray
.. _xarray_backend: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html
.. _xarray_lazy: https://docs.xarray.dev/en/latest/internals/internal-design.html#lazy-indexing-classes
.. _xarray_chunked_arrays: https://docs.xarray.dev/en/latest/internals/chunked-arrays.html
.. _zarr: https://zarr.dev/
20 changes: 20 additions & 0 deletions doc/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions doc/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
8 changes: 8 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
API
===

Opening Measurement Sets
------------------------

.. autoclass:: xarray_ms.backend.msv2.entrypoint.MSv2PartitionEntryPoint
:members: open_dataset, open_datatree
65 changes: 65 additions & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

# type: ignore

project = "xarray-ms"
copyright = "2024, Simon Perkins"
author = "Simon Perkins"
release = "0.2.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.extlinks",
"sphinx_copybutton",
"sphinx.ext.doctest",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
]

templates_path = ["_templates"]
exclude_patterns = []

# Napoleon settings
napoleon_google_docstring = True
napoleon_numpy_docstring = False
napoleon_include_init_with_doc = False
napoleon_include_private_with_doc = False
napoleon_include_special_with_doc = True
napoleon_use_admonition_for_examples = False
napoleon_use_admonition_for_notes = False
napoleon_use_admonition_for_references = False
napoleon_use_ivar = False
napoleon_use_param = True
napoleon_use_rtype = True
napoleon_preprocess_types = False
napoleon_type_aliases = None
napoleon_attr_annotations = True

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "pydata_sphinx_theme"
html_static_path = ["_static"]

extlinks = {
"issue": ("https://github.com/ratt-ru/xarray-ms/issues/%s", "GH#"),
"pr": ("https://github.com/ratt-ru/xarray-ms/pull/%s", "GH#"),
}

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"dask": ("https://dask.pydata.org/en/stable", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"python": ("https://docs.python.org/3/", None),
"xarray": ("https://docs.xarray.dev/en/stable", None),
}
20 changes: 20 additions & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. xarray-ms documentation master file, created by
sphinx-quickstart on Tue Sep 10 10:36:27 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

xarray-ms documentation
=======================

Add your content using ``reStructuredText`` syntax. See the
`reStructuredText <https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html>`_
documentation for details.


.. toctree::
:maxdepth: 2
:caption: Contents:

readme
install
api
59 changes: 59 additions & 0 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Installation
============

.. code-block:: bash

$ pip install xarray-ms

Development
===========

Firstly, install Python `Poetry <poetry_>`_.

.. _poetry: https://python-poetry.org/

Then, the following commands will install the required dependencies,
optional testing dependencies, documentation and development dependencies
in a suitable virtual environment:

.. code-block:: bash

$ cd /code/arcae
$ poetry env use 3.11
$ poetry install -E testing --with doc --with dev
$ poetry run pre-commit install
$ poetry shell

The pre-commit hooks can be manually executed as follows:

.. code-block:: bash

$ poetry run pre-commit run -a


Test Suite
----------

Run the following command within the arcae source code directory to
execute the test suite

.. code-block:: bash

$ cd /code/arcae
$ poetry install -E testing --with dev
$ poetry run py.test -s -vvv tests/


Documentation
-------------

Run the following command within the doc sub-directory to
build the Sphinx documentation

.. code-block:: bash

$ cd /code/arcae
$ poetry install --with doc
$ poetry shell
$ cd doc
$ make html
1 change: 1 addition & 0 deletions doc/source/readme.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.. include:: ../../README.rst
15 changes: 15 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,21 @@ testing = ["dask", "distributed", "pytest"]
[tool.poetry.plugins."xarray.backends"]
"xarray-ms:msv2" = "xarray_ms.backend.msv2.entrypoint:MSv2PartitionEntryPoint"

[tool.poetry.group.dev]
optional = true

[tool.poetry.group.dev.dependencies]
pre-commit = "^3.8.0"

[tool.poetry.group.doc]
optional = true

[tool.poetry.group.doc.dependencies]
sphinx = "^8.0.2"
pygments = "^2.18.0"
sphinx-copybutton = "^0.5.2"
pydata-sphinx-theme = "^0.15.4"

[tool.ruff]
line-length = 88
indent-width = 2
Expand Down