Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
  • Loading branch information
kolchfa-aws and natebower authored Dec 9, 2024
1 parent 7b81f6a commit ed9b807
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions _analyzers/tokenizers/standard.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 130

# Standard tokenizer

The `standard` tokenizer is the default tokenizer in OpenSearch. It tokenizes text based on word boundaries using a grammar-based approach that recognizes letters, digits, and other characters like punctuation. It is highly versatile and suitable for many languages, because it follows Unicode text segmentation rules ([UAX#29](https://unicode.org/reports/tr29/)) to break text into tokens.
The `standard` tokenizer is the default tokenizer in OpenSearch. It tokenizes text based on word boundaries using a grammar-based approach that recognizes letters, digits, and other characters like punctuation. It is highly versatile and suitable for many languages because it uses Unicode text segmentation rules ([UAX#29](https://unicode.org/reports/tr29/)) to break text into tokens.

## Example usage

Expand Down Expand Up @@ -107,5 +107,5 @@ The `standard` tokenizer can be configured with the following parameter.

Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`max_token_length` | Optional | Integer | Sets the maximum length of the produced token. If this length is exceeded, the token is split into multiple tokens at length configured in the `max_token_length`. Default is `255`.
`max_token_length` | Optional | Integer | Sets the maximum length of the produced token. If this length is exceeded, the token is split into multiple tokens at the length configured in `max_token_length`. Default is `255`.

0 comments on commit ed9b807

Please sign in to comment.