Skip to content

Commit

Permalink
docs: Add guides for using clp-json with object storage; Update compr…
Browse files Browse the repository at this point in the history
…ession scripts docs missed in previous PRs. (#683)

Co-authored-by: Haiqi Xu <[email protected]>
  • Loading branch information
kirkrodrigues and haiqi96 authored Jan 24, 2025
1 parent 66067d6 commit 230d518
Show file tree
Hide file tree
Showing 8 changed files with 415 additions and 2 deletions.
14 changes: 14 additions & 0 deletions docs/src/user-guide/guides-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Overview

The guides below describe how to use CLP in different use cases.

::::{grid} 1 1 2 2
:gutter: 2

:::{grid-item-card}
:link: guides-using-object-storage/index
Using object storage
^^^
Using CLP to ingest logs from object storage and store archives on object storage.
:::
::::
78 changes: 78 additions & 0 deletions docs/src/user-guide/guides-using-object-storage/clp-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Configuring CLP

To use object storage with CLP, follow the steps below to configure each use case you require.

:::{note}
If CLP is already running, shut it down, update its configuration, and then start it again.
:::

## Configuration for archive storage

To configure CLP to store archives on S3, update the `archive_output.storage` key in
`<package>/etc/clp-config.yml` with the values in the code block below, replacing the fields in
angle brackets (`<>`) with the appropriate values:

```yaml
archive_output:
storage:
type: "s3"
staging_directory: "var/data/staged-archives" # Or a path of your choosing
s3_config:
region_code: "<region-code>"
bucket: "<bucket-name>"
key_prefix: "<key-prefix>"
credentials:
access_key_id: "<aws-access-key-id>"
secret_access_key: "<aws-secret-access-key>"

# archive_output's other config keys
```

* `staging_directory` is the local filesystem directory where archives will be temporarily stored
before being uploaded to S3.
* `s3_config` configures both the S3 bucket where archives should be stored and the credentials
for accessing it.
* `<region-code>` is the AWS region [code][aws-region-codes] for the bucket.
* `<bucket-name>` is the bucket's name.
* `<key-prefix>` is the "directory" where all archives will be stored within the bucket and
must end with a trailing forward slash (e.g., `archives/`).
* `credentials` contains the CLP IAM user's credentials.

## Configuration for stream storage

To configure CLP to cache stream files on S3, update the `stream_output.storage` key in
`<package>/etc/clp-config.yml` with the values in the code block below, replacing the fields in
angle brackets (`<>`) with the appropriate values:

```yaml
stream_output:
storage:
type: "s3"
staging_directory: "var/data/staged-streams" # Or a path of your choosing
s3_config:
region_code: "<region-code>"
bucket: "<bucket-name>"
key_prefix: "<key-prefix>"
credentials:
access_key_id: "<aws-access-key-id>"
secret_access_key: "<aws-secret-access-key>"

# stream_output's other config keys
```

* `staging_directory` is the local filesystem directory where streams will be temporarily stored
before being uploaded to S3.
* `s3_config` configures both the S3 bucket where streams should be stored and the credentials
for accessing it.
* `<region-code>` is the AWS region [code][aws-region-codes] for the bucket.
* `<bucket-name>` is the bucket's name.
* `<key-prefix>` is the "directory" where all streams will be stored within the bucket and
must end with a trailing forward slash (e.g., `streams/`).
* `credentials` contains the CLP IAM user's credentials.

:::{note}
CLP currently doesn't explicitly delete the cached streams. This limitation will be addressed in a
future release.
:::

[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability
52 changes: 52 additions & 0 deletions docs/src/user-guide/guides-using-object-storage/clp-usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Using CLP with object storage

To compress logs from S3, follow the steps in the section below. For all other operations, you
should be able to use CLP as described in the [quick start](../quick-start-overview.md) guide.

## Compressing logs from S3

To compress logs from S3, use the `s3` subcommand as follows, replacing the fields in angle brackets
(`<>`) with the appropriate values:

```bash
sbin/compress.sh \
s3 \
--aws-credentials-file <credentials-file> \
--timestamp-key <timestamp-key> \
https://<bucket-name>.s3.<region-code>.amazonaws.com/<prefix>
```

* `<credentials-file>` is the path to an AWS credentials file like the following:

```ini
[default]
aws_access_key_id = <aws-access-key-id>
aws_secret_access_key = <aws-secret-access-key>
```

* CLP expects the credentials to be in the `default` section.
* `<aws-access-key-id>` and `<aws-secret-access-key>` are the access key ID and secret access
key of the CLP IAM user.
* If you don't want to use a credentials file, you can specify the credentials on the command
line using the `--aws-access-key-id` and `--aws-secret-access-key` flags (note that this may
expose your credentials to other users running on the system).

* `<timestamp-key>` is the field path of the kv-pair that contains the timestamp in each log event.
* `<bucket-name>` is the name of the S3 bucket containing your logs.
* `<region-code>` is the AWS region [code][aws-region-codes] for the S3 bucket containing your logs.
* `<prefix>` is the prefix of all logs you wish to compress and must begin with the
`<all-logs-prefix>` value from the [compression IAM policy][compression-iam-policy].

:::{note}
The `s3` subcommand only supports a single URL but will compress any logs that have the given
prefix.

If you wish to compress a single log file, specify the entire path to the log file. However, if that
log file's path is a prefix of another log file's path, then both log files will be compressed
(e.g., with two files "logs/syslog" and "logs/syslog.1", a prefix like "logs/syslog" will cause
both logs to be compressed). This limitation will be addressed in a future release.
:::

[add-iam-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#embed-inline-policy-console
[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability
[compression-iam-policy]: ./object-storage-config.md#configuration-for-compression
95 changes: 95 additions & 0 deletions docs/src/user-guide/guides-using-object-storage/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Using object storage

CLP can:

* compress logs from object storage (e.g., S3);
* store archives on object storage; and
* cache stream files (used for viewing compressed logs) on object storage.

This guide explains how to configure and use CLP for all three use cases. Note that you can choose
to use object storage for any combination of the three use cases (e.g., compress logs from S3 and
cache the stream files on S3, but store archives on the local filesystem).

:::{note}
Currently, only the [clp-json][release-choices] release supports object storage. Support for
`clp-text` will be added in a future release.
:::

:::{note}
Currently, CLP only supports using S3 as object storage. Support for other object storage services
will be added in a future release.
:::

## Prerequisites

1. This guide assumes you're able to configure, start, stop, and use a CLP cluster as described in
the [quick-start guide](../quick-start-overview.md).
2. An S3 bucket and [key prefix][aws-key-prefixes] containing the logs you wish to compress.
3. An S3 bucket and key prefix where you wish to store compressed archives.
4. An S3 bucket and key prefix where you wish to cache stream files.
5. An AWS IAM user with the necessary permissions to access the S3 bucket(s) and prefixes mentioned
above.
* To create a user, follow [this guide][aws-create-iam-user].
* You don't need to assign any groups or policies to the user at this stage since we will
attach policies in later steps, depending on which object storage use cases you require.
* You may use a single IAM user for all use cases, or a separate one for each.
* For brevity, we'll refer to this user as the "CLP IAM user" in the rest of this guide.
6. IAM user (long-term) credentials for the IAM user(s) created in step (4) above.
* To create these credentials, follow [this guide][aws-create-access-keys].
* Choose the "Other" use case to generate long-term credentials.

:::{note}
CLP currently requires IAM user (long-term) credentials to access the relevant S3 buckets.
Support for other authentication methods (e.g., temporary credentials) will be added in a future
release.
:::

## Configuration

The subsections below explain how to configure your object storage bucket and CLP for each use case:

::::{grid} 1 1 1 1
:gutter: 2

:::{grid-item-card}
:link: object-storage-config
Configuring object storage
^^^
Configuring your object storage bucket for each use case.
:::

:::{grid-item-card}
:link: clp-config
Configuring CLP
^^^
Configuring CLP to use object storage for each use case.
:::
::::

## Using CLP with object storage

The subsection below explains how to use CLP with object storage for each use case:

::::{grid} 1 1 1 1
:gutter: 2

:::{grid-item-card}
:link: clp-usage
Using CLP with object storage
^^^
Using CLP to compress, search, and view log files from object storage.
:::
::::

:::{toctree}
:hidden:

object-storage-config
clp-config
clp-usage
:::

[aws-create-access-keys]: https://docs.aws.amazon.com/keyspaces/latest/devguide/create.keypair.html
[aws-create-iam-user]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html
[aws-key-prefixes]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.html
[release-choices]: ../quick-start-cluster-setup/index.md#choosing-a-release
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Configuring object storage

To use object storage with CLP, follow the steps below to configure the CLP IAM user and your object
storage bucket(s) for each use case you require.

## Configuration for compression

[Attach the inline policy][add-iam-policy] below to the CLP IAM user (you can use the JSON editor),
replacing the fields in angle brackets (`<>`) with the appropriate values:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::<bucket-name>/<all-logs-prefix>*"
]
},
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": [
"arn:aws:s3:::<bucket-name>"
],
"Condition": {
"StringLike": {
"s3:prefix": "<all-logs-prefix>*"
}
}
}
]
}
```

* `<bucket-name>` should be the name of the S3 bucket containing your logs.
* `<all-logs-prefix>` should be the prefix of all logs you wish to compress.

:::{note}
If you want to enforce that only logs under a directory-like prefix, e.g., `logs/`, can be
compressed, you can append a trailing slash (`/`) after the `<all-logs-prefix>` value. This will
prevent CLP from compressing logs with prefixes like `logs-private`. However, note that to
compress all logs under the `logs/` prefix, you will need to include the trailing slash when
invoking `sbin/compress.sh` below.
:::

## Configuration for archive storage

[Attach the inline policy][add-iam-policy] below to the CLP IAM user (you can use the JSON editor),
replacing the fields in angle brackets (`<>`) with the appropriate values:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<bucket-name>/<key-prefix>/*"
]
}
]
}
```

* `<bucket-name>` should be the name of the S3 bucket where compressed archives should be stored.
* `<key-prefix>` should be the prefix (used like a directory path) where compressed archives should
be stored.

## Configuration for stream storage

The [log viewer][yscope-log-viewer] currently supports viewing [IR][uber-clp-blog-1] and JSONL
stream files but not CLP archives; thus, to view the compressed logs from a CLP archive, CLP first
converts the compressed logs into stream files. These streams can be cached on the filesystem, or on
object storage.

:::{note}
A future version of the log viewer will support viewing CLP archives directly.
:::

Storing streams on S3 requires both configuring the CLP IAM user and setting up a cross-origin
resource sharing (CORS) policy for the S3 bucket.

### IAM user configuration

[Attach the inline policy][add-iam-policy] below to the CLP IAM user (you can use the JSON editor),
replacing the fields in angle brackets (`<>`) with the appropriate values:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<bucket-name>/<key-prefix>/*"
]
}
]
}
```

* `<bucket-name>` should be the name of the S3 bucket where cached streams should be stored.
* `<key-prefix>` should be the prefix (used like a directory path) where cached streams should be
stored.

### Cross-origin resource sharing (CORS) configuration

For CLP's log viewer to be able to access the cached stream files from S3 over the internet, the S3
bucket must have a CORS policy configured.

Add the CORS configuration below to your bucket by following [this guide][aws-cors-guide]:

```json
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [
"Access-Control-Allow-Origin"
]
}
]
```

:::{tip}
The CORS policy above allows requests from any host (origin). If you already know what hosts will
access CLP's web interface, you can enhance security by changing `AllowedOrigins` from `["*"]` to
the specific list of hosts that will access the web interface.
:::

[aws-cors-guide]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html
[add-iam-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#embed-inline-policy-console
[uber-clp-blog-1]: https://www.uber.com/en-US/blog/reducing-logging-cost-by-two-orders-of-magnitude-using-clp
[yscope-log-viewer]: https://github.com/y-scope/yscope-log-viewer
Loading

0 comments on commit 230d518

Please sign in to comment.