Skip to content

Commit

Permalink
docs: add references to numaproj-contrib (#1700)
Browse files Browse the repository at this point in the history
Signed-off-by: Keran Yang <[email protected]>
  • Loading branch information
KeranYang authored and whynowy committed Apr 24, 2024
1 parent 88a60e6 commit 4251fbb
Show file tree
Hide file tree
Showing 33 changed files with 89 additions and 63 deletions.
2 changes: 1 addition & 1 deletion api/json-schema/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -17858,7 +17858,7 @@
"type": "object"
},
"io.numaproj.numaflow.v1alpha1.Container": {
"description": "Container is used to define the container properties for user defined functions, sinks, etc.",
"description": "Container is used to define the container properties for user-defined functions, sinks, etc.",
"properties": {
"args": {
"items": {
Expand Down
2 changes: 1 addition & 1 deletion api/openapi-spec/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -17862,7 +17862,7 @@
}
},
"io.numaproj.numaflow.v1alpha1.Container": {
"description": "Container is used to define the container properties for user defined functions, sinks, etc.",
"description": "Container is used to define the container properties for user-defined functions, sinks, etc.",
"type": "object",
"properties": {
"args": {
Expand Down
2 changes: 1 addition & 1 deletion docs/APIs.md
Original file line number Diff line number Diff line change
Expand Up @@ -791,7 +791,7 @@ Container
</p>
<p>
<p>
Container is used to define the container properties for user defined
Container is used to define the container properties for user-defined
functions, sinks, etc.
</p>
</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/core-concepts/vertex.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ There are 3 types of `Vertex` in Numaflow today:

1. `Source` - To ingest data from sources.
1. `Sink` - To forward processed data to sinks.
1. `UDF` - User Defined Function, which is used to define data processing logic.
1. `UDF` - User-defined Function, which is used to define data processing logic.

We have defined a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for `Vertex`. A `Pipeline` containing multiple vertices will automatically generate multiple `Vertex` objects by the controller. As a user, you should NOT create a `Vertex` object directly.

Expand Down
2 changes: 1 addition & 1 deletion docs/core-concepts/watermarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ spec:

## Watermark API

When processing data in [User Defined Functions](../user-guide/user-defined-functions/user-defined-functions.md), you can get the current watermark through
When processing data in [user-defined functions](../user-guide/user-defined-functions/user-defined-functions.md), you can get the current watermark through
an API. Watermark API is supported in all our client SDKs.

### Example Golang
Expand Down
6 changes: 3 additions & 3 deletions docs/operations/metrics/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,9 @@ These metrics can be used to determine throughput of your pipeline.
These metrics can be used to determine the latency of your pipeline.

| Metric name | Metric type | Labels | Description |
| ---------------------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| `source_forwarder_transformer_processing_time` | Histogram | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Provides a histogram distribution of the processing times of User Defined Source Transformer |
| `forwarder_udf_processing_time` | Histogram | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` | Provides a histogram distribution of the processing times of User Defined Functions. (UDF's) |
| ---------------------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |------------------------------------------------------------------------------------------------|
| `source_forwarder_transformer_processing_time` | Histogram | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Provides a histogram distribution of the processing times of User-defined Source Transformer |
| `forwarder_udf_processing_time` | Histogram | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` | Provides a histogram distribution of the processing times of User-defined Functions. (UDF's) |
| `forwarder_forward_chunk_processing_time` | Histogram | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` | Provides a histogram distribution of the processing times of the forwarder function as a whole |
| `reduce_pnf_process_time` | Histogram | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `replica=<replica-index>` | Provides a histogram distribution of the processing times of the reducer |
| `reduce_pnf_forward_time` | Histogram | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `replica=<replica-index>` | Provides a histogram distribution of the forwarding times of the reducer |
Expand Down
4 changes: 2 additions & 2 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ View the UI for the advanced pipeline at https://localhost:8443/.

![Numaflow UI](assets/numaflow-ui-advanced-pipeline.png)

The source code of the `even-odd` [User Defined Function](user-guide/user-defined-functions/user-defined-functions.md) can be found [here](https://github.com/numaproj/numaflow-go/tree/main/pkg/mapper/examples/even_odd). You also can replace the [Log](./user-guide/sinks/log.md) Sink with some other sinks like [Kafka](./user-guide/sinks/kafka.md) to forward the data to Kafka topics.
The source code of the `even-odd` [user-defined function](user-guide/user-defined-functions/user-defined-functions.md) can be found [here](https://github.com/numaproj/numaflow-go/tree/main/pkg/mapper/examples/even_odd). You also can replace the [Log](./user-guide/sinks/log.md) Sink with some other sinks like [Kafka](./user-guide/sinks/kafka.md) to forward the data to Kafka topics.

The pipeline can be deleted by

Expand All @@ -200,6 +200,6 @@ Try more examples in the [`examples`](https://github.com/numaproj/numaflow/tree/

After exploring how Numaflow pipelines run, you can check what data [Sources](./user-guide/sources/generator.md)
and [Sinks](./user-guide/sinks/kafka.md) Numaflow supports out of the box, or learn how to write
[User Defined Functions](user-guide/user-defined-functions/user-defined-functions.md).
[User-defined Functions](user-guide/user-defined-functions/user-defined-functions.md).

Numaflow can also be paired with Numalogic, a collection of ML models and algorithms for real-time data analytics and AIOps including anomaly detection. Visit the [Numalogic homepage](https://numalogic.numaproj.io/) for more information.
4 changes: 2 additions & 2 deletions docs/specifications/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,9 +92,9 @@ Logic:
1. Write to the sink;
1. Ack the data in the upstream buffer.

**UDF (User Defined Function)**
**UDF (User-defined Function)**

- User Defined Functions run in data processors.
- Use-defined Functions run in data processors.
- UDFs implements a unified interface to process data.
- UDFs are typically implemented by end-users, but there will be some
built-in functions that can be used without writing any code.
Expand Down
15 changes: 9 additions & 6 deletions docs/specifications/side-inputs.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Side Inputs

Side Inputs allow the user defined functions (including UDF, UDSink, Transformer, etc.) to access slow updated data or configuration (such as database, file system, etc.) without needing to load it during each message processing. Side Inputs are read-only and can be used in both batch and streaming jobs.
Side Inputs allow the user-defined functions (including UDF, UDSink, Transformer, etc.) to access slow updated data or configuration (such as database, file system, etc.) without needing to load it during each message processing. Side Inputs are read-only and can be used in both batch and streaming jobs.

## Requirements

Expand Down Expand Up @@ -47,8 +47,8 @@ A pipeline may have multiple Side Inputs sources, each of them will have a Side
Each of the Side Inputs Manager pods contains:

- An init container, which checks if the data store is ready.
- A user defined container, which runs a predefined Numaflow SDK to start a service, calling a user implemented function to get Side Input data.
- A numa container, which runs a cron like job to call the service in the user defined container, and store the returned data in the data store.
- A user-defined container, which runs a predefined Numaflow SDK to start a service, calling a user implemented function to get Side Input data.
- A numa container, which runs a cron like job to call the service in the user-defined container, and store the returned data in the data store.

![Diagram](../assets/side-inputs-manager.png)

Expand All @@ -58,7 +58,7 @@ The communication protocol between the 2 containers could be based on UDS or FIF

Side Inputs Manager needs to run with Active-Passive HA, which requires a leader election mechanism support. Kubernetes has a native leader election API backed by etcd, but it requires extra RBAC privileges to use it.

Considering a similar leader election mechanism is needed in some other scenarios such as Active-Passive User Defined Source, a proposal is to implement our own leader election mechanism by leveraging ISB Service.
Considering a similar leader election mechanism is needed in some other scenarios such as Active-Passive User-defined Source, a proposal is to implement our own leader election mechanism by leveraging ISB Service.

#### Why NOT CronJob?

Expand All @@ -70,11 +70,14 @@ Using K8s CronJob/Job will be a [challenge](https://github.com/istio/istio/issue

![Diagram](../assets/side-inputs-vertex-pod.png)

When Side Inputs is enabled for a pipeline, each of its vertex pods will have a second init container added, the init container will have a shared volume (emptyDir) mounted, and the same volume will be mounted to the User Defined Function/Sink/Transformer container. The init container reads from the data store, and saves to the shared volume.
When Side Inputs is enabled for a pipeline, each of its vertex pods will have a second init container added,
the init container will have a shared volume (emptyDir) mounted,
and the same volume will be mounted to the User-defined Function/Sink/Transformer container.
The init container reads from the data store, and saves to the shared volume.

A sidecar container will also be injected by the controller, and it mounts the same volume as above. The sidecar runs a service provided by numaflow, watching the Side Inputs data from the data store, if there’s any update, reads the data and updates the shared volume.

In the User Defined Function/Sink/Sink container, a helper function will be provided by Numaflow SDK, to return the Side Input data. The helper function caches the Side Inputs data in the memory, but performs thread safe updates if it watches the changes in the shared volume.
In the User-defined Function/Sink/Sink container, a helper function will be provided by Numaflow SDK, to return the Side Input data. The helper function caches the Side Inputs data in the memory, but performs thread safe updates if it watches the changes in the shared volume.

### Numaflow SDK

Expand Down
8 changes: 4 additions & 4 deletions docs/user-guide/reference/side-inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ spec:
- redis
```
### Implementing User Defined Side Inputs
### Implementing User-defined Side Inputs
To use the `Side Inputs` feature, a User-defined function implementing an interface defined in the Numaflow SDK
([Go](https://github.com/numaproj/numaflow-go/blob/main/pkg/sideinput/),
Expand All @@ -48,11 +48,11 @@ To use the `Side Inputs` feature, a User-defined function implementing an interf
is needed to retrieve the data.

You can choose the SDK of your choice to create a
User Defined Side Input image which implements the
User-defined Side Input image which implements the
Side Inputs Update.

#### Example in Golang
Here is an example of how to write a User Defined Side Input in Golang,
Here is an example of how to write a User-defined Side Input in Golang,

```go
// handle is the side input handler function.
Expand Down Expand Up @@ -91,7 +91,7 @@ sideinputsdk.NoBroadcastMessage()

### UDF
Users need to add a watcher on the filesystem to fetch the
updated side inputs in their User Defined Source/Function/Sink
updated side inputs in their User-defined Source/Function/Sink
in order to apply the new changes into the data process.

For each side input there will be a file with the given path and after any update to the side input value the file will be updated.
Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide/sinks/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ Numaflow currently supports the following Sinks
* [Kafka](./kafka.md)
* [Log](./log.md)
* [Black Hole](./blackhole.md)
* [User Defined Sink](./user-defined-sinks.md)
* [User-defined Sink](./user-defined-sinks.md)

A user-defined sink is a custom Sink that a user can write using Numaflow SDK when
the user needs to output the processed data to a system or using a certain transformation that is not
supported by the platform's built-in sinks. As an example, once we have processed the input messages,
we can use Elasticsearch as a User defined sink to store the processed data and enable search and
we can use Elasticsearch as a user-defined sink to store the processed data and enable search and
analysis on the data.

## Fallback Sink (DLQ)
Expand Down
26 changes: 21 additions & 5 deletions docs/user-guide/sinks/user-defined-sinks.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,20 @@
# User Defined Sinks
# User-defined Sinks

A `Pipeline` may have multiple Sinks, those sinks could either be a pre-defined sink such as `kafka`, `log`, etc., or a `User Defined Sink`.
A `Pipeline` may have multiple Sinks, those sinks could either be a pre-defined sink such as `kafka`, `log`, etc., or a `user-defined sink`.

A pre-defined sink vertex runs single-container pods, a user defined sink runs two-container pods.
A pre-defined sink vertex runs single-container pods, a user-defined sink runs two-container pods.

A user defined sink vertex looks like below.
## Build Your Own User-defined Sinks

You can build your own user-defined sinks in multiple languages.

Check the links below to see the examples for different languages.

- [Golang](https://github.com/numaproj/numaflow-go/tree/main/pkg/sinker/examples/)
- [Java](https://github.com/numaproj/numaflow-java/tree/main/examples/src/main/java/io/numaproj/numaflow/examples/sink/simple/)
- [Python](https://github.com/numaproj/numaflow-python/tree/main/examples/sink/)

A user-defined sink vertex looks like below.

```yaml
spec:
Expand All @@ -18,10 +28,16 @@ spec:
## Available Environment Variables
Some environment variables are available in the user defined sink container:
Some environment variables are available in the user-defined sink container:
- `NUMAFLOW_NAMESPACE` - Namespace.
- `NUMAFLOW_POD` - Pod name.
- `NUMAFLOW_REPLICA` - Replica index.
- `NUMAFLOW_PIPELINE_NAME` - Name of the pipeline.
- `NUMAFLOW_VERTEX_NAME` - Name of the vertex.

## User-defined Sinks contributed from the open source community

If you're looking for examples and usages contributed by the open source community, head over to [the numaproj-contrib repositories](https://github.com/orgs/numaproj-contrib/repositories).

These user-defined sinks like AWS SQS, GCP Pub/Sub, provide valuable insights and guidance on how to use and write a user-defined sink.
6 changes: 3 additions & 3 deletions docs/user-guide/sources/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ In Numaflow, we currently support the following sources
* [HTTP](./http.md)
* [Ticker](./generator.md)
* [Nats](./nats.md)
* [User Defined Source](./user-defined-sources.md)
* [User-defined Source](./user-defined-sources.md)

A user defined source is a custom source that a user can write using Numaflow SDK when
the user needs to read data from a system that is not supported by the platform's built-in sources. User defined source also supports custom acknowledge management for
A user-defined source is a custom source that a user can write using Numaflow SDK when
the user needs to read data from a system that is not supported by the platform's built-in sources. User-defined source also supports custom acknowledge management for
exactly-once reading.
3 changes: 2 additions & 1 deletion docs/user-guide/sources/transformer/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ There are some [Built-in Transformers](builtin-transformers/README.md) that can

## Build Your Own Transformer

You can build your own transformer in multiple languages. A User Defined Transformer could be as simple as the example below in Golang.
You can build your own transformer in multiple languages.
A user-defined transformer could be as simple as the example below in Golang.
In the example, the transformer extracts event times from `timestamp` of the JSON payload and assigns them to messages as new event times. It also filters out unwanted messages based on `filterOut` of the payload.

```golang
Expand Down
16 changes: 11 additions & 5 deletions docs/user-guide/sources/user-defined-sources.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# User Defined Sources
# User-defined Sources

A `Pipeline` may have multiple Sources, those sources could either be a pre-defined source such as `kafka`, `http`, etc., or a `User Defined Source`.
A `Pipeline` may have multiple Sources, those sources could either be a pre-defined source such as `kafka`, `http`, etc., or a `user-defined source`.

With no source data transformer, A pre-defined source vertex runs single-container pods; a user-defined source runs two-container pods.

## Build Your Own User Defined Sources
## Build Your Own User-defined Sources

You can build your own user defined sources in multiple languages.
You can build your own user-defined sources in multiple languages.

Check the links below to see the examples for different languages.

Expand All @@ -28,10 +28,16 @@ spec:
## Available Environment Variables
Some environment variables are available in the user defined source container:
Some environment variables are available in the user-defined source container:
- `NUMAFLOW_NAMESPACE` - Namespace.
- `NUMAFLOW_POD` - Pod name.
- `NUMAFLOW_REPLICA` - Replica index.
- `NUMAFLOW_PIPELINE_NAME` - Name of the pipeline.
- `NUMAFLOW_VERTEX_NAME` - Name of the vertex.

## User-defined sources contributed from the open source community

If you're looking for examples and usages contributed by the open source community, head over to [the numaproj-contrib repositories](https://github.com/orgs/numaproj-contrib/repositories).

These user-defined sources like AWS SQS, GCP Pub/Sub, provide valuable insights and guidance on how to use and write a user-defined source.
2 changes: 1 addition & 1 deletion docs/user-guide/user-defined-functions/map/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ View the UI for a pipeline at https://localhost:8443/.

![Numaflow UI](../../../assets/numaflow-ui-advanced-pipeline.png)

The source code of the `even-odd` [User Defined Function](../user-defined-functions.md) can be found [here](https://github.com/numaproj/numaflow-go/tree/main/pkg/mapper/examples/even_odd). You also can replace the [Log](../../sinks/log.md) Sink with some other sinks like [Kafka](../../sinks/kafka.md) to forward the data to Kafka topics.
The source code of the `even-odd` [user-defined function](../user-defined-functions.md) can be found [here](https://github.com/numaproj/numaflow-go/tree/main/pkg/mapper/examples/even_odd). You also can replace the [Log](../../sinks/log.md) Sink with some other sinks like [Kafka](../../sinks/kafka.md) to forward the data to Kafka topics.

The pipeline can be deleted by

Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/user-defined-functions/map/map.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Check the links below to see the UDF examples in streaming mode for different la

### Available Environment Variables

Some environment variables are available in the user defined function container, they might be useful in your own UDF implementation.
Some environment variables are available in the user-defined function container, they might be useful in your own UDF implementation.

- `NUMAFLOW_NAMESPACE` - Namespace.
- `NUMAFLOW_POD` - Pod name.
Expand Down
Loading

0 comments on commit 4251fbb

Please sign in to comment.