Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_s3: add description of parquet compression #1380

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
26 changes: 25 additions & 1 deletion pipeline/outputs/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ See [here](https://github.com/fluent/fluent-bit-docs/tree/43c4fe134611da471e706b
| sts\_endpoint | Custom endpoint for the STS API. | None |
| profile | Option to specify an AWS Profile for credentials. | default |
| canned\_acl | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | None |
| compression | Compression type for S3 objects. 'gzip' is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can also use 'arrow'. For gzip compression, the Content-Encoding HTTP Header will be set to 'gzip'. Gzip compression can be enabled when `use_put_object` is 'on' or 'off' (PutObject and Multipart). Arrow compression can only be enabled with `use_put_object On`. | None |
| compression | Compression type for S3 objects. 'gzip' and 'parquet' are currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can also use 'arrow'. If columnify command is installed, you can also compress as parquet format. For gzip compression, the Content-Encoding HTTP Header will be set to 'gzip'. Gzip and parquet compression can be enabled when `use_put_object` is 'on' or 'off' (PutObject and Multipart). Arrow compression can only be enabled with `use_put_object On`. | None |
cosmo0920 marked this conversation as resolved.
Show resolved Hide resolved
cosmo0920 marked this conversation as resolved.
Show resolved Hide resolved
| content\_type | A standard MIME type for the S3 object; this will be set as the Content-Type HTTP header. | None |
| send\_content\_md5 | Send the Content-MD5 header with PutObject and UploadPart requests, as is required when Object Lock is enabled. | false |
| auto\_retry\_requests | Immediately retry failed requests to AWS services once. This option does not affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which may help improve throughput when there are transient/random networking issues. | true |
Expand All @@ -49,6 +49,13 @@ See [here](https://github.com/fluent/fluent-bit-docs/tree/43c4fe134611da471e706b
| storage\_class | Specify the [storage class](https://docs.aws.amazon.com/AmazonS3/latest/API/API\_PutObject.html#AmazonS3-PutObject-request-header-StorageClass) for S3 objects. If this option is not specified, objects will be stored with the default 'STANDARD' storage class. | None |
| retry\_limit | Integer value to set the maximum number of retries allowed. Note: this configuration is released since version 1.9.10 and 2.0.1. For previous version, the number of retries is 5 and is not configurable. | 1 |
| external\_id | Specify an external ID for the STS API, can be used with the role\_arn parameter if your role requires an external ID. | None |
| parquet.compression | Compression type for parquet. 'uncompressed', 'snappy', 'gzip', 'zstd' are the supported values by default. 'lzo', 'brotli', 'lz4' are not supported for now. | SNAPPY |
patrick-stephens marked this conversation as resolved.
Show resolved Hide resolved
| parquet.pagesize | Page size of parquet format. Defaults to 8192 bytes (8KiB). | 8192 |
cosmo0920 marked this conversation as resolved.
Show resolved Hide resolved
| parquet.row\_group\_size | Row group size of parquet format. Defaults to 134217728 bytes (128MiB). | 134217728 |
| parquet.record\_type | Format type of records on parquet format. Defaults to json. | json |
| parquet.schema\_type | Format type of schema on parquet format. Defaults to json. | avro |
| parquet.schema\_file | Specify path to schema file for parquet compression. | None |


## TLS / SSL

Expand Down Expand Up @@ -282,6 +289,23 @@ Example:

Then, the records will be stored into the MinIO server.

## Usage for Parquet Compression

For parquet compression, it needs to install [columnify](https://github.com/reproio/columnify) in the running system or container at runtime.
cosmo0920 marked this conversation as resolved.
Show resolved Hide resolved

After installing that command, out_s3 can handle parquet compression:

```
[OUTPUT]
Name s3
Match *
bucket your-bucket
Use_Put_object true
compression parquet
parquet.schema_file /path/to/your-schema.avsc
parquet.compression snappy
```

## Getting Started

In order to send records into Amazon S3, you can run the plugin from the command line or through the configuration file.
Expand Down