Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain ROWID coordinates during MS conversion #286

Merged
merged 2 commits into from
Sep 14, 2023

Conversation

sjperkins
Copy link
Member

@sjperkins sjperkins commented Sep 13, 2023

Removing the ROWID coordinate prevents newer formats from mapping back to CASA Measurement Sets.

  • Backup and restore apps may fail when moving between formats QuartiCal#287

  • Tests added / passed

    $ py.test -v -s daskms/tests

    If the pep8 tests fail, the quickest way to correct
    this is to run autopep8 and then flake8 and
    pycodestyle to fix the remaining issues.

    $ pip install -U autopep8 flake8 pycodestyle
    $ autopep8 -r -i daskms
    $ flake8 daskms
    $ pycodestyle daskms
    
  • Fully documented, including HISTORY.rst for all changes
    and one of the docs/*-api.rst files for new API

    To build the docs locally:

    pip install -r requirements.readthedocs.txt
    cd docs
    READTHEDOCS=True make html
    

Copy link
Collaborator

@landmanbester landmanbester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjperkins sjperkins merged commit 9351e14 into master Sep 14, 2023
@sjperkins sjperkins deleted the retain-rowid-columns-during-conversion branch September 14, 2023 08:51
@JSKenyon
Copy link
Collaborator

Would you like to try this out with the backup and restore functionality @landmanbester?

@landmanbester
Copy link
Collaborator

Yup, will do. Just need to convert some data again

@sjperkins
Copy link
Member Author

Let me know if a release containing this functionality would be desirable.

@landmanbester
Copy link
Collaborator

Hmmm, I got the following error with the latest master

(dms) ╭─bester@oates ~/projects/ESO137/msdir
╰─➤  dask-ms convert ms1_primary.ms -g "FIELD_ID,DATA_DESC_ID,SCAN_NUMBER" -o ms1_primary.zarr --chunks="{row:50000,chan:256}" --format zarr --force        2023-09-14 15:03:58,242 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 0 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,407 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 98332 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,510 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 196664 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,622 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 294996 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,716 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 393328 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,873 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 491660 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:59,015 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 589992 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:59,154 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 688324 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
2023-09-14 15:03:59,495 - dask-ms - INFO - Input: 'MeasurementSet' file:///home/bester/projects/ESO137/msdir/ms1_primary.ms
2023-09-14 15:03:59,495 - dask-ms - INFO - Output: 'zarr' file:///home/bester/projects/ESO137/msdir/ms1_primary.zarr
2023-09-14 15:04:07,907 - dask-ms - WARNING - Ignoring SOURCE
2023-09-14 15:04:07,911 - dask-ms - WARNING - Ignoring 'TARGET': Unable to infer shape of column 'TARGET' due to:
'TableProxy::getCell: no such row'
2023-09-14 15:04:07,912 - dask-ms - WARNING - Ignoring 'DIRECTION': Unable to infer shape of column 'DIRECTION' due to:
'TableProxy::getCell: no such row'
Traceback (most recent call last):
  File "/home/bester/.venv/dms/bin/dask-ms", line 8, in <module>
    sys.exit(main())
  File "/home/bester/software/dask-ms/daskms/apps/entrypoint.py", line 9, in main
    return EntryPoint(sys.argv[1:]).execute()
  File "/home/bester/software/dask-ms/daskms/apps/entrypoint.py", line 33, in execute
    cmd.execute()
  File "/home/bester/software/dask-ms/daskms/apps/convert.py", line 193, in execute
    dask.compute(writes)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/base.py", line 599, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/optimization.py", line 990, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/software/dask-ms/daskms/reads.py", line 186, in getter_wrapper
    return future.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/bester/software/dask-ms/daskms/reads.py", line 65, in ndarray_getcolslice
    getcolslicenp(
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/casacore/tables/table.py", line 1099, in getcolslicenp
    return self._getcolslicevh(columnname, blc, trc, inc,
RuntimeError: Table DataManager error: TiledStMan: calcCacheSize: invalid arguments

Any idea what's going wrong?

@sjperkins
Copy link
Member Author

Any idea what's going wrong?

Not immediately. Is this in a fresh venv? If not and you can reproduce in a fresh VM, can you create a new issue?

@landmanbester
Copy link
Collaborator

Rerunning in a fresh python3.10 venv now. Will open an issue if the problem persists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants