Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add guides for using clp-json with object storage; Update compression scripts docs missed in previous PRs. #683

Merged
merged 32 commits into from
Jan 24, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b13ce7d
WIP
kirkrodrigues Jan 20, 2025
72398d8
WIP
kirkrodrigues Jan 20, 2025
9f620b8
WIP
kirkrodrigues Jan 20, 2025
2bd546f
Update guides-using-object-storage.md
kirkrodrigues Jan 20, 2025
a298676
WIP
kirkrodrigues Jan 21, 2025
f1a40e8
Merge branch 'main' into s3-docs
kirkrodrigues Jan 21, 2025
4ea2018
Add details about configuring an IAM user; Fix config keys; General r…
kirkrodrigues Jan 21, 2025
19e01c3
draft
haiqi96 Jan 21, 2025
542cba2
Add prerequisites section; Move AWS user setup into prerequisites; Fi…
kirkrodrigues Jan 22, 2025
22feafb
Split docs into their own pages to reduce cognitive load.
kirkrodrigues Jan 22, 2025
baeffb1
Move object storage compression tip.
kirkrodrigues Jan 22, 2025
b95b45f
Touch-ups.
kirkrodrigues Jan 22, 2025
c99bce3
Explain staging_directory.
kirkrodrigues Jan 22, 2025
bdcf423
Fix links.
kirkrodrigues Jan 22, 2025
82caf11
Apply the Rabbit's suggestions.
kirkrodrigues Jan 22, 2025
d7d2800
Remove misleading S3 path diagram; Add step for creating IAM user cre…
kirkrodrigues Jan 22, 2025
2f9ae7f
Change from 'viewing compressed logs' to 'caching stream files'.
kirkrodrigues Jan 22, 2025
90dbe51
Diction.
kirkrodrigues Jan 22, 2025
f6171e7
Fix S3 permissions for ingestion; Use virtual-host style URL; Clarify…
kirkrodrigues Jan 23, 2025
11e6991
Add timestamp-key to compression example.
kirkrodrigues Jan 23, 2025
9c4ee8a
Clarify key-prefix for ingestion.
kirkrodrigues Jan 23, 2025
59b72e9
Clarify the prefix of the prefix must be the prefix.
kirkrodrigues Jan 23, 2025
4ae7d43
Fix placement of timestamp key line.
kirkrodrigues Jan 23, 2025
847c042
Linebreak before s3 for clarity.
kirkrodrigues Jan 23, 2025
c752945
Clarify some instructions based on AWS' UI.
kirkrodrigues Jan 23, 2025
454cb87
Restructure to make docs easier to follow and use.
kirkrodrigues Jan 23, 2025
57690ea
Apply the rabbit's suggestions.
kirkrodrigues Jan 23, 2025
e1e4444
Mention that users need to be familiar with the quick start guide.
kirkrodrigues Jan 23, 2025
a57daaf
Apply suggestions from code review
kirkrodrigues Jan 24, 2025
73de4de
Fix indentation.
kirkrodrigues Jan 24, 2025
5808e0d
Marco review.
kirkrodrigues Jan 24, 2025
8e06c1a
Minor edit for consistency.
kirkrodrigues Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/src/user-guide/guides-overview.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider renaming this section from guides to object storage. Everything in user-guide is technically a guide. We could then renameUsing object storage to Using AWS S3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our plan is actually to rename "User guide" to "User docs" and "Developer guide" to "Developer docs". Although technically everything is a guide, we do want to differentiate "guides" (tutorials) from reference docs. Technically we could also move the quick start section into the guides section, but I'd need to restructure it a little.

What do you think about moving in that direction instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I feel maybe the restructure can wait after we release the software to the intended user

Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Overview

The guides below describe how to use CLP in a variety of use cases.

::::{grid} 1 1 2 2
:gutter: 2

:::{grid-item-card}
:link: guides-using-object-storage
Using object storage
^^^
Using CLP to ingest logs from object storage and store archives on object storage.
:::
::::
97 changes: 97 additions & 0 deletions docs/src/user-guide/guides-using-object-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Using object storage

CLP can both compress logs from object storage (e.g., S3) and store archives on object storage. This
guide explains how to configure CLP for both use cases.

:::{note}
Currently, only the [clp-json][release-choices] release supports object storage. Support for
clp-text will be added in a future release.
:::

:::{note}
Currently, CLP only supports using S3 as object storage. Support for other object storage services
will be added in a future release.
:::

## Compressing logs from object storage

To compress logs from S3, use the `s3` subcommand of the `compress.sh` script:

```bash
sbin/compress.sh s3 s3://<bucket-name>/<path-prefix>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to also mention about credentials?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we want to give user a list of permission that are required for ingestion credentials and compression/stream extraction credentials?

They have slightly different permission requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. What permissions do they require?

Copy link
Contributor

@haiqi96 haiqi96 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ingestion:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::<bucket_name>/<access_allowed_path_prefix>/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": "<access_allowed_path_prefix>/*"
                }
            }
        }
    ]
}

Not tested, but I believe if you don't care about limiting premission to a specific path, just do:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::<bucket_name>/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>"
            ]
        }
    ]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compression/stream,

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>/<path_prefix>/*"
            ]
        }
    ]
}

```

* `<bucket-name>` is the name of the S3 bucket containing your logs.
* `<path-prefix>` is the path prefix of all logs you wish to compress.

:::{note}
The `s3` subcommand only supports a single URL but will compress any logs that have the given path
prefix.

If you wish to compress a single log file, specify the entire path to the log file. However, if that
log file's path is a prefix of another log file's path, then both log files will be compressed. This
limitation will be addressed in a future release.
:::

## Storing archives on object storage

To configure CLP to store archives on S3, update the `archive_output.storage` key in
`<package>/etc/clp-config.yml`:

```yaml
archive_output:
storage:
type: "s3"
staging_directory: "var/data/staged-archives" # Or a path of your choosing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we explain what does this mean?

s3_config:
region: "<aws-region-code>"
bucket: "<s3-bucket-name>"
key-prefix: "<s3-key-prefix>"
credentials:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we note that we only support long term credential, as documented here https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html

access_key_id: "<aws-access-key-id>"
secret_access_key: "<aws-secret-access-key>"

# archive_output's other config keys
```

* `s3_config` configures both the S3 bucket where archives should be stored as well as credentials
for accessing it.
* `<aws-region-code>` is the AWS region [code][aws-region-codes] for the bucket.
* `<s3-bucket-name>` is the bucket's name.
* `<s3-key-prefix>` is the "directory" where all archives will be stored within the bucket and
must end with `/`.
* `credentials` contains the S3 credentials necessary for accessing the bucket.

To configure CLP to be able to view compressed log files from S3, you'll need to configure a bucket
where CLP can store intermediate files that the log viewer can open. To do so, update the
`stream_output.storage` key in `<package>/etc/clp-config.yml`:

```yaml
stream_output:
storage:
type: "s3"
staging_directory: "var/data/staged-streams" # Or a path of your choosing
s3_config:
region: "<aws-region-code>"
bucket: "<s3-bucket-name>"
key-prefix: "<s3-key-prefix>"
credentials:
access_key_id: "<aws-access-key-id>"
secret_access_key: "<aws-secret-access-key>"

# stream_output's other config keys
```

The configuration keys above function identically to those in `archive_output.storage`, except they
Copy link
Contributor

@haiqi96 haiqi96 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention that log viewing required the bucket to be configured with cross region access permission.

a typical configuration to add is:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "http://localhost:<LOG_VIEWER_WEBUI_PORT_NUMBER>"
            # or "*", known to work but maybe too wild
            # or maybe "http://localhost:*", never tried though.
        ],
        "ExposeHeaders": []
    }
]

should be configured to use a different S3 path (i.e., a different key-prefix in the same bucket or
a different bucket entirely).

:::{note}
To view compressed log files, clp-text currently converts them into IR streams that the log viewer
can open, while clp-json converts them into JSONL streams. These streams only need to be stored for
as long as the streams are being viewed in the viewer, however CLP currently doesn't explicitly
delete the streams. This limitation will be addressed in a future release.
:::

[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability
[release-choices]: http://localhost:8080/user-guide/quick-start-cluster-setup/index.html#choosing-a-release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix localhost URL in reference link

The reference link contains a localhost URL which won't work in production.

-[release-choices]: http://localhost:8080/user-guide/quick-start-cluster-setup/index.html#choosing-a-release
+[release-choices]: ../quick-start-cluster-setup/index.md#choosing-a-release
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability
[release-choices]: http://localhost:8080/user-guide/quick-start-cluster-setup/index.html#choosing-a-release
[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability
[release-choices]: ../quick-start-cluster-setup/index.md#choosing-a-release

16 changes: 16 additions & 0 deletions docs/src/user-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@ Quick start
A quick start guide for setting up a CLP cluster, compressing your logs, and searching them.
:::

:::{grid-item-card}
:link: guides-overview
Guides
^^^
Guides for using CLP in a variety of use cases.
:::

:::{grid-item-card}
:link: core-overview
Core
Expand Down Expand Up @@ -47,6 +54,15 @@ quick-start-compression/index
quick-start-search/index
:::

:::{toctree}
:hidden:
:caption: Guides
:glob:

guides-overview
guides-using-object-storage
:::

:::{toctree}
:hidden:
:caption: Core
Expand Down
8 changes: 7 additions & 1 deletion docs/src/user-guide/quick-start-compression/json.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,15 @@
To compress JSON logs, from inside the package directory, run:

```bash
sbin/compress.sh --timestamp-key '<timestamp-key>' <path1> [<path2> ...]
sbin/compress.sh fs --timestamp-key '<timestamp-key>' <path1> [<path2> ...]
```

* `fs` is a subcommand for compressing logs from the filesystem.
:::{tip}
To learn how to compress logs from object storage, see
[Using object storage](../guides-using-object-storage.md).
:::

* `<timestamp-key>` is the field path of the kv-pair that contains the timestamp in each log event.
* E.g., if your log events look like
`{"timestamp": {"iso8601": "2024-01-01 00:01:02.345", ...}}`, you should enter
Expand Down
2 changes: 1 addition & 1 deletion docs/src/user-guide/quick-start-compression/text.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
To compress unstructured text logs, from inside the package directory, run:

```bash
sbin/compress.sh <path1> [<path2> ...]
sbin/compress.sh fs <path1> [<path2> ...]
```

`<path...>` are paths to unstructured text log files or directories containing such files.
Expand Down
Loading