From ed9b8073efb979902cc636230d84e0756103b86d Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 9 Dec 2024 12:24:31 -0500 Subject: [PATCH] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _analyzers/tokenizers/standard.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_analyzers/tokenizers/standard.md b/_analyzers/tokenizers/standard.md index 32946dd6af..c10f25802b 100644 --- a/_analyzers/tokenizers/standard.md +++ b/_analyzers/tokenizers/standard.md @@ -7,7 +7,7 @@ nav_order: 130 # Standard tokenizer -The `standard` tokenizer is the default tokenizer in OpenSearch. It tokenizes text based on word boundaries using a grammar-based approach that recognizes letters, digits, and other characters like punctuation. It is highly versatile and suitable for many languages, because it follows Unicode text segmentation rules ([UAX#29](https://unicode.org/reports/tr29/)) to break text into tokens. +The `standard` tokenizer is the default tokenizer in OpenSearch. It tokenizes text based on word boundaries using a grammar-based approach that recognizes letters, digits, and other characters like punctuation. It is highly versatile and suitable for many languages because it uses Unicode text segmentation rules ([UAX#29](https://unicode.org/reports/tr29/)) to break text into tokens. ## Example usage @@ -107,5 +107,5 @@ The `standard` tokenizer can be configured with the following parameter. Parameter | Required/Optional | Data type | Description :--- | :--- | :--- | :--- -`max_token_length` | Optional | Integer | Sets the maximum length of the produced token. If this length is exceeded, the token is split into multiple tokens at length configured in the `max_token_length`. Default is `255`. +`max_token_length` | Optional | Integer | Sets the maximum length of the produced token. If this length is exceeded, the token is split into multiple tokens at the length configured in `max_token_length`. Default is `255`.