Releases: estuary/flow
v0.5.8
What's Changed
- flowctl: use new view_logs RPC with logged_at bound #1739
- flowctl raw bearer-logs: add --since parameter with 1 hour default #1752
- flowctl: add
raw spec
support for materializations #1798 - protocols/flow: add array inference to protocol #1787
Full Changelog: v0.5.7...v0.5.8
v0.5.7
v0.5.6
v0.5.4
What's Changed
flowctl
supports federated data-planes, and fetches dynamic authorizations when inspecting logs, stats, or collectionsflowctl
now supports single-use refresh tokens, which rotate their secret on each use.
Full Changelog: v0.5.3...v0.5.4
v0.5.3
What's Changed
v0.5.2
What's Changed
Couple of fixes for flowctl
- fix:
flowctl catalog
commands do need pagination after all by @jshearer in #1626 - .devcontainer/release.Dockerfile: add additional podman packages & newline fix by @jgraettinger in #1625
Full Changelog: v0.5.1...v0.5.2
v0.5.1
v0.5.0
v0.4.0
This release introduces number of big changes in different areas, including:
- Schema evolution
- Inferred schema handling
- Flowctl
- Re-using old spec names
- General control-plane operation
Schema evolution
Schema evolution in streaming systems is hard. When we first released Flow, the approach to schema evolution was one of it's "killer features" because we were able to validate type-compatibility of heterogeneous pipelines (e.g. Postgres->BigQuery) end-to-end. But detecting incompatible schema changes is one thing, and deciding what to do about them is another. Real life data pipelines have many complex requirements, which clearly can't be handled by the one evolveIncompatibleCollections
boolean we had on capture specs. People wanted more control over how our automation responds to incompatible schema changes.
So we're introducing a new onIncompatibleSchemaChange
field on materialization specs, which allows you to configure how the system responds when incompatible schema changes are detected. You can specify onIncompatibleSchemaChange
at the top level of a materialization spec, and/or as part of each binding. The top-level property serves as a default for any binding that does not set its own onIncompatibleSchemaChange
. It has four possible values:
backfill
(default if unspecified): increment thebackfill
counter of affected bindings, which re-creates the destination resources to fit the new schema and backfills them.disableBinding
: disable the affected bindings. A human will need to re-enable them and decide how to resolve the incompatible fieldsdisableTask
: disable the entire materialization. A human will need to re-enable it and decide how to resolve the incompatible fieldsabort
: don't take any automated action. A human will need to decide what to do
These behaviors apply only when an automated action observes an incompatible schema change. If you're making changes manually via the UI, onIncompatibleSchemaChange
is ignored.
Note: You won't see onIncompatibleSchemaChange
in the main UI yet, but it can now be set using flowctl
or the "Advanced specification editor".
Note: With the introduction of onIncompatibleSchemaChange
, the behavior of the existing evolveIncompatibleCollections
field of captures no longer makes much sense. For the very short term, that behavior will remain unchanged. But soon we will seek to greatly simplify it. Today, that one boolean, on the capture spec, controls how the system responds to incompatible schema changes in any of the captured collections. In the future, evolveIncompatibleCollections
will only pertain to collections that need to be re-created entirely. In other words, its meaning will be "re-create collections as necessary in order to publish them". In practice, this would only ever be required if you change either the key
of the collection or the logical partitioning configuration.
Inferred schema handling
As a user, it's hard to get direct visibility to what the inferred schema of a collection is at any given moment. That's all changing, because now we're moving to an approach where the inferred schema gets added directly to your collection specs. The inferred schema gets added under $defs
with a key of flow://inferred-schema
, so it's still possible to customize other parts of the read schema, just as you would have before. The difference is that you can now see the inferred schema that's being used for each collection.
But that's not the only difference, because you can now use inferred schemas with derivations, too! To do so, just include "$ref": "flow://inferred-schema"
as part of the collection's readSchema
, just like any other collection. Our automation will periodically update the collection spec to inline the actual inferred schema as it notices it changing.
Lastly, we're introducing a more aggressive heuristic for inferred schema updates. Collections that have more frequent inferred schema updates will be checked much more frequently, and inferred schemas that have gone a while without any updates will be checked somewhat less frequently, up to a maximum interval of every 2 hours.
Flowctl changes
All flowctl
users will need to upgrade to the latest release in order to maintain compatibility.
In addition, there's some new behavior in flowctl
to help prevent accidentally overwriting changes to specs. Flowctl will now set the expectPubId
property whenever you run catalog pull-specs
. This property contains the id of the publication that most recently modified the spec. When publishing, we return an error if a spec has been published since the expectPubId
. If this happens, you'll need to run catalog pull-specs
again in order to get the freshest copy of the spec and try your changes again. This is especially important now that we in-line inferred schemas as part of collection specs, as it prevents users from accidentally publishing an outdated inferred schema.
Re-using old spec names
Previously, our control plane would prevent you from re-using a name that you'd used before, even after deleting the original specs. This was because we used the spec names as the storage prefix in cloud storage buckets, so we couldn't be sure that a new collection would be starting out with an empty storage prefix if it had the same name as a previously deleted one. Now, we add a unique alphanumeric path segment to the cloud storage path for each journal, like acmeCo/my-collection/112233445566abcd/
. If you delete acmeCo/my-collection
, you can now create another collection with the same name, and it will have a different alphanumeric suffix. The previous naming restriction was a common source of annoyance, so we're glad to finally get this working in a way that's much more in line with user expectations.
Note that cloud storage paths for existing collections and task recovery logs will remain unchanged. The suffix will only be added for new specifications.
General control-plane operation
These changes are grouped together because they were all enabled by the same fundamental changes to the code that handles publications and background automations.
We've made publications faster and more reliable by minimizing the tasks that get re-validated as part of a given publication. For example, if you publish a materialization, we no longer re-validate other materializations that happen to source from the same collections. And we now update the data-plane shard/journal specs (that represent the actual work/data of your pipelines) asynchronously, after the publication has committed. This keeps the UI faster, and also allows our data-plane updates to be more reliable.
Finally, we introduced a new internal framework for writing background automations. This is what has enabled the changes to inferred schema handling, schema evolution, and our asynchronous shard/journal spec updates. We're looking forward to many more features that are enabled by this framework.
v0.3.13
What's Changed
sum
annotation now supports arbitrary precision using string-encoded numerics- Add experimental
flowctl raw stats
sub-command - Various minor JSON Schema handling improvements.
- Switch to simd-json for fast JSON parsing and transcoding.
Filtered PRs impacting flowctl
:
- crates/json: don't validate strings with underscores as integers or numbers by @williamhbaker in #1364
- Update
runtime::container::start()
to take a newallow_local
flag by @jshearer in #1361 - json: fix ordering of integers greater than i64::MAX by @psFried in #1367
- validation: fix bucket name validation for GCS and Azure by @psFried in #1370
- thread through
--allow-local
argument when running locally by @psFried in #1374 - validation: allow unsatisfiable constraints on excluded fields by @psFried in #1375
- update a number of dependencies, including RocksDB (to 8.10) by @jgraettinger in #1389
- connector-init: set connector_type on protocol check Spec by @jgraettinger in #1400
- models/journals: region configuration for S3 storage mappings by @williamhbaker in #1410
- improve schema validation errors by including metadata about the collection that failed by @jgraettinger in #1408
- flowctl: resurrect stats subcommand under raw by @psFried in #1432
- make: codesign binaries on mac by @mdibaiee in #1436
- simd-doc, gazette, avro, and dekaf crates by @jgraettinger in #1448
- flowctl(preview): multiple bindings may read from one collection by @mdibaiee in #1466
- crates/doc: support arbitrary precision with
sum
annotation by @jgraettinger in #1477 - crates/doc: relax
sum
inspection to allow numeric strings by @jgraettinger in #1481
Full Changelog: v0.3.12...v0.3.13