Replace `TransportChunk` with `re_sorbet::ChunkBatch` #8945

emilk · 2025-02-06T08:33:49Z

What

An arrow record batch needs to follow a specific schema to be be compatible with Rerun,
and that schema is defined in ChunkSchema. If a record batch matches the schema, it can be converted to a `ChunkBatche.

The schema has:

One RowId column
N index (time) columns
N data (component) columns, all ListArrays

ChunkBatch::try_from(RecordBatfch) will automatically wrap data columns in ListArray if needed.

impl AsRef<ArrowRecordBatch> for ChunkBatch { … }
impl From<ChunkBatch> for ArrowRecordBatch { … }
impl TryFrom<ArrowRecordBatch> for ChunkBatch { … }

TODO

Fix all TODOs
Fix rust tests
Fix python roundtrip tests
Run full test suite

For future PRs

Put the magic string constants into constants
Introduce DataframeBatch
Rename ComponentColumnDescriptor to DataColumnSchema, etc
ComponentColumnDescriptor is both used for dataframes and chunks. Resolve somehow

github-actions · 2025-02-06T08:34:18Z

Web viewer built successfully. If applicable, you should also test it:

I have tested the web viewer

Result	Commit	Link	Manifest
✅	`8282fc5`	https://rerun.io/viewer/pr/8945	`+nightly` `+main`

^{Note: This comment is updated whenever you push a commit.}

emilk · 2025-02-11T08:48:51Z

@rerun-bot full-check

github-actions · 2025-02-11T08:49:25Z

Started a full build: https://github.com/rerun-io/rerun/actions/runs/13259039896

emilk · 2025-02-11T08:50:55Z

@rerun-bot full-check

github-actions · 2025-02-11T08:51:26Z

Started a full build: https://github.com/rerun-io/rerun/actions/runs/13259070067 ✅

emilk · 2025-02-11T09:05:02Z

rerun_py/rerun_sdk/rerun/dataframe.py

@@ -61,9 +61,9 @@ def __init__(self, metadata: dict[bytes, bytes], col: pa.Array):
    def component_descriptor(self) -> ComponentDescriptor:
        kwargs = {}
        if SORBET_ARCHETYPE_NAME in self.metadata:
-            kwargs["archetype_name"] = "rerun.archetypes" + self.metadata[SORBET_ARCHETYPE_NAME].decode("utf-8")


Note: this used to be buggy (missing period after archetypes)

zehiko · 2025-02-11T09:52:30Z

crates/store/re_grpc_client/src/redap/mod.rs

@@ -333,84 +327,32 @@ async fn stream_catalog_async(

    re_log::info!("Starting to read...");
    while let Some(result) = resp.next().await {
-        let input = TransportChunk::from(result.map_err(TonicStatusError)?);
-
-        // Catalog received from the ReDap server isn't suitable for direct conversion to a Rerun Chunk:


nice to see this gone!

zehiko

nice work! this was quite easy to follow and makes sense to me.

crates/store/re_sorbet/src/chunk_batch.rs

crates/store/re_grpc_client/src/redap/mod.rs

teh-cmc · 2025-02-11T11:48:29Z

crates/store/re_sorbet/src/column_schema.rs

@@ -35,6 +35,16 @@ pub enum ColumnDescriptor {
 }

 impl ColumnDescriptor {
+    /// Debug-only sanity check.


There's nothing in this method that guarantees debug-only behavior though? (same elsewhere)

No guarantees other than the docstring contract

crates/top/rerun/src/commands/rrd/filter.rs

crates/store/re_sorbet/src/chunk_batch.rs

crates/store/re_sorbet/src/lib.rs

emilk · 2025-02-11T12:11:40Z

@teh-cmc as described in #8744 there are two types of record batches we care about:

chunks (single-entity)
dataframes (multi-entity)

This PR implements the first, with the second one coming as a second PR

teh-cmc · 2025-02-11T12:21:28Z

@teh-cmc as described in #8744 there are two types of record batches we care about:
* chunks (single-entity)

* dataframes (multi-entity)
This PR implements the first, with the second one coming as a second PR

I guess most of my confusion comes from the mention of sorbet -- all this of this very much looks plain-old Rerun (although I guess plain-old Rerun can do multi-entity records now, which is the new thing).

(That ticket even says: "We will stop using sorbet name until we have cycles to make this a more universal spec.")

emilk · 2025-02-11T13:12:25Z

I see "sorbet" as the spec for how we encode Rerun data on-the-wire. As such, it covers both chunks and dataframes.

We don't use sorbet as a name in any of the meta tags (yet).

### Related * Follow-up to #8945

emilk added 🏹 arrow concerning arrow exclude from changelog PRs with this won't show up in CHANGELOG.md labels Feb 6, 2025

emilk force-pushed the emilk/chunk-schema branch 2 times, most recently from 7e171a3 to 3e5ad9c Compare February 9, 2025 09:44

emilk added 20 commits February 10, 2025 15:56

Add RowIdColumnDescriptor

a61d997

Add struct ChunkSchema

65a90df

Add converted from arrow schema

a5b597f

Add re_sorbet::ChunkBatch

b095d59

Convert Chunk to ChunkBatch

e90fdbe

Roundtrip it

42d3a20

Less use of TransportChunk

40fd804

Remove unwrap; fix compilation of tests

8a4b5d6

Remove one use of TransportChunk

538b9b3

Use re_sobet in rerun cli filtering

851c7fe

No TransportChunk in re_grpc_client

c42d842

Remove Chunk::from_transport

b4ed1e0

More robust parsing of ComponentColumnDescriptor

7fc7293

Fix roundtripping

d0b92c9

Use "true" insteado of "yes", to make it more JSON-like

ab2ad6f

Use similar_asserts

f79bdc2

Less TransportChunk

52be0a4

Remove some use of TransportChunk

3519a74

Remove TransportChunk a concrete type

3a51b62

Remove the last of TransportChunk

f01efda

emilk force-pushed the emilk/chunk-schema branch from 3e5ad9c to f01efda Compare February 10, 2025 14:56

emilk changed the title ~~Add re_sorbet::ChunkSchema~~ Replace TransportChunk with re_sorbet::ChunkBatch Feb 10, 2025

emilk added 3 commits February 10, 2025 16:14

Cleanup

8326fae

Use full component and archetype names in metadata for roundtripping

5d0edb2

Fix TODOs

2058129

emilk added 3 commits February 11, 2025 09:47

Add sanity checks to find source of doubly-prefixed rerun.components.

3d9147c

Fix full/short name errors

0842f84

Add #[track_caller] to sanity checks

861e496

py-fmt

b094db7

emilk commented Feb 11, 2025

View reviewed changes

emilk added 2 commits February 11, 2025 10:06

Fix bad sanity checks

f9df88e

Explain some TODOs

39ecc47

zehiko reviewed Feb 11, 2025

View reviewed changes

zehiko approved these changes Feb 11, 2025

View reviewed changes