Skip to content

Commit

Permalink
Add a short data-model section pointing to the file format
Browse files Browse the repository at this point in the history
  • Loading branch information
hyanwong committed Jan 14, 2025
1 parent 0427239 commit bd5e6a1
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 5 deletions.
25 changes: 21 additions & 4 deletions docs/data-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,10 +387,10 @@ The tree sequence itself also has metadata stored as a byte array.
### Valid tree sequence requirements

Arbitrary data can be stored in tables using the classes in the
{ref}`sec_tables_api`. However, only a {class}`TableCollection`
that fulfils a set of requirements represents
a valid {class}`TreeSequence` object which can be obtained
using the {meth}`TableCollection.tree_sequence` method. In this
{ref}`sec_tables_api`. The {meth}`TableCollection.tree_sequence` method
can be used to turn such a {class}`TableCollection` into an immutable
{class}`TreeSequence` object, but this requires the tables to
fulfil a specific set of requirements. In this
section we list these requirements, and explain their rationale.
Violations of most of these requirements are detected when the
user attempts to load a tree sequence via {func}`tskit.load` or
Expand Down Expand Up @@ -598,6 +598,23 @@ can be used to create an index on a table collection if necessary.
Add more details on what the indexes actually are.
:::


(sec_data_model_saving)=

### Saving to file

When serializing (e.g. storing a {class}`TreeSequence` to disk using
{meth}`dump<TreeSequence.dump>`), the underlying tables are stored along with the
indexes, top-level metadata, attributes such as the sequence length and time units, and
the {ref}`sec_data_model_reference_sequence` if it exists. {func}`Loading <load>` such a
file returns an immutable tree sequence object, with pre-calculated indexes immediately
available. See the {ref}`sec_tree_sequence_file_format` section for more details.

Although data in a raw {class}`TableCollection` need not conform to the
{ref}`sec_valid_tree_sequence_requirements`, it too can be
{meth}`dumped <TableCollection.dump>` to a file (with indexes stored if they exist).


(sec_data_model_data_encoding)=

## Data encoding
Expand Down
4 changes: 3 additions & 1 deletion docs/file-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ files. We also refer to them as "tree sequence files".

:::{todo}
Link to the documentation for kastore, and describe the arrays that are
stored as well as the top-level metadata.
stored as well as the top-level metadata. Note that a structured listing of
all the data stored in a tree sequence file can be shown using
e.g. ``python -m kastore ls file.trees``.
:::


Expand Down

0 comments on commit bd5e6a1

Please sign in to comment.