+
+ +
+

Phased Operators

+
+

Scan Operations

+
+
+class padocc.phases.scan.ScanOperation(proj_code: str, workdir: str, groupID: str = None, label: str = 'scan', **kwargs)
+

Bases: ProjectOperation

+
+
+help(fn=<built-in function print>)
+

Public user functions for the project operator.

+
+ +
+ +
+
+

Compute Operations

+
+
+class padocc.phases.compute.CfaDS(proj_code: str, workdir: str, groupID: str = None, stage: str = 'in_progress', thorough: bool = None, concat_msg: str = 'See individual files for more details', limiter: int = None, skip_concat: bool = False, label: str = 'compute', is_trial: bool = False, **kwargs)
+

Bases: ComputeOperation

+
+ +
+
+class padocc.phases.compute.ComputeOperation(proj_code: str, workdir: str, groupID: str = None, stage: str = 'in_progress', thorough: bool = None, concat_msg: str = 'See individual files for more details', limiter: int = None, skip_concat: bool = False, label: str = 'compute', is_trial: bool = False, **kwargs)
+

Bases: ProjectOperation

+

PADOCC Dataset Processor Class, capable of processing a single +dataset’s worth of input files into a single aggregated file/store.

+
+
+property filelist
+

Quick function for obtaining a subset of the whole fileset. Originally +used to open all the files using Xarray for concatenation later.

+
+ +
+
+help(fn=<built-in function print>)
+

Public user functions for the project operator.

+
+ +
+ +
+
+class padocc.phases.compute.KerchunkConverter(logger=None, bypass_driver=False, verbose=1, label=None, fh=None, logid=None)
+

Bases: LoggedOperation

+

Class for converting a single file to a Kerchunk reference object. Handles known +or unknown file types (NetCDF3/4 versions).

+
+
+run(nfile: str, filehandler=None, extension=None, **kwargs) dict
+

Safe creation allows for known issues and tries multiple drivers

+
+
Returns:
+

dictionary of Kerchunk references if successful, raises error +otherwise if unsuccessful.

+
+
+
+ +
+ +
+
+class padocc.phases.compute.KerchunkDS(proj_code, workdir, stage='in_progress', **kwargs)
+

Bases: ComputeOperation

+
+
+create_refs(check_dimensions: bool = False) None
+

Organise creation and loading of refs +- Load existing cached refs +- Create new refs +- Combine metadata and global attributes into a single set +- Coordinate combining and saving of data

+
+ +
+ +
+
+class padocc.phases.compute.ZarrDS(proj_code, workdir, stage='in_progress', mem_allowed: str = '100MB', preferences=None, **kwargs)
+

Bases: ComputeOperation

+
+ +
+
+padocc.phases.compute.cfa_handler(instance, file_limit: int | None = None)
+

Handle the creation of a CFA-netCDF file using the CFAPyX package

+
+
Parameters:
+
    +
  • instance – (obj) The reference instance of ProjectOperation +from which to pull project-specific info.

  • +
  • file_limit – (obj) The file limit to apply to a set of files.

  • +
+
+
+
+ +
+
+

Validate Operations

+
+
+class padocc.phases.validate.Report(fh=None)
+

Bases: object

+

Special report class, capable of utilising recursive +dictionary value-setting.

+
+ +
+
+class padocc.phases.validate.ValidateDatasets(datasets: list, id: str, filehandlers: list[padocc.core.filehandlers.JSONFileHandler] | list[dict] | None = None, dataset_labels: list = None, preslice_fns: list = None, logger=None, label: str = None, fh: str = None, logid: str = None, verbose: int = 0)
+

Bases: LoggedOperation

+

ValidateDatasets object for performing validations between two +pseudo-identical Xarray Dataset objects.

+

4th Dec Note: +Validate metadata using single NetCDF(Xarray) vs Kerchunk +Validate data using combined NetCDF or CFA vs Kerchunk +(for best performance)

+
+
+control_dataset_var(var)
+

Get a variable DataArray from the control dataset, +performing preslice functions.

+
+ +
+
+replace_dataset(new_ds: Dataset, label: str = None, index: int = None, dstype: str = None) None
+

Replace dataset by type, label or index.

+
+ +
+
+replace_preslice(new_preslice: Dataset, label: str = None, index: int = None, dstype: str = None) None
+

Replace dataset by type, label or index.

+
+ +
+
+test_dataset_var(var)
+

Get a variable DataArray from the test dataset, +performing preslice functions.

+
+ +
+
+validate_data()
+

Perform data validations using the growbox method for all variable DataArrays.

+
+ +
+
+validate_global_attrs(allowances: dict = None)
+

Validate the set of global attributes across all datasets

+
+ +
+
+validate_metadata(allowances: dict = None) dict
+

Run all validation steps on this set of datasets.

+
+ +
+ +
+
+class padocc.phases.validate.ValidateOperation(*args, **kwargs)
+

Bases: ProjectOperation

+

Encapsulate all validation testing into a single class. Instantiate for a specific project, +the object could then contain all project info (from detail-cfg) opened only once. Also a +copy of the total datasets (from native and cloud sources). Subselections can be passed +between class methods along with a variable index (class variables: variable list, dimension list etc.)

+

Class logger attribute so this doesn’t need to be passed between functions. +Bypass switch contained here with all switches.

+
+ +
+
+padocc.phases.validate.check_for_nan(box, bypass, logger, label=None)
+

Special function for assessing if a box selection has non-NaN values within it. +Needs further testing using different data types.

+
+ +
+
+padocc.phases.validate.mem_to_value(mem) float
+

Convert a memory value i.e 2G into a value

+
+
Returns:
+

Int value of e.g. ‘2G’ in bytes.

+
+
+
+ +
+
+padocc.phases.validate.slice_all_dims(data_arr: DataArray, intval: int)
+

Slice all dimensions for the DataArray according +to the integer value.

+
+ +
+
+padocc.phases.validate.value_to_mem(value) str
+

Convert a number of bytes i.e 1000000000 into a string

+
+
Returns:
+

String value of the above (1000000000 -> 1M)

+
+
+
+ +
+
+

Ingest Operations

+
+

Note

+

Not featured in this development, the ingest operations are still being planned and scoped.

+
+
+ +

Add the download link to each of the Kerchunk references

+
+ +
+
+padocc.phases.ingest.ingest_config(args, logger)
+

Configure for ingestion of a set of project codes, currently defined +by a repeat_id but this could be changed later to apply to all project +codes fitting some parameters

+
+ +
+
+ + +
+