Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guidance for using transformation during migration #9063

Closed

Conversation

peternied
Copy link
Member

Description

Add guidance for migrations that need to use the transformation systems.

Version

The most recent version of the Migration Assistant.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@peternied peternied marked this pull request as draft January 14, 2025 20:43
@peternied
Copy link
Member Author

@AndreKurait I know you are focusing in on the transformation scenarios, documenting what I had locally. Let me know if you had other documentation in mind that would better replace this PR and I'll close this out one.


During a migration there is an opportunity change names and paths of index and data structures that are not modifiable after they are declared. This page provides a cookbook of different scenarios and templates that can be used to make these adjustments.

### Rename an index

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we suspect that this will be a common requirement/ask from customers? I suspect that most customers will either NOT want to rename any of their indices, or if they do, will have very specific reasons and requirements around doing that.

Is this guidance general enough that more than one customer (ideally, more than 10%) would benefit from it? If not, I'd rather keep this in a set of internal runbooks for more targeted communications with users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rename is common ask and its a simple and useful example case. The only request that is more popular is how to change the shard count - requiring more thoughtful messaging of the calculations that I would prefer not to start with.

Do you think we should have a different scenario we could do that instead of this one?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://opensearch.atlassian.net/browse/MIGRATIONS-2359 for us to write a blog post covering this

```json
[
{
"JsonConditionalTransformerProvider": [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreKurait - do you think we'll keep conditional transformers in place once we support writing transformations in scripting languages (e.g. javascript)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is functional - when JavaScript becomes available we can always edit this example to be cleaner.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got approval on the UPL license yesterday, so I'd expect that Javascript will be available imminently (and jolt transforms will be removed probably at the 2.2 release). Please replace these instructions with javascript ones so that this description isn't DOA and confusing to customers.

2. Add/Update the key `reindexFromSnapshotExtraArgs` to include `--doc-transformer-config-file /shared-logs-output/rfs-transform.json`
3. Redeploy the Migration Assistant
4. Navigate on to the Migration Assistant console
5. Create a file with `vim /shared-logs-output/rfs-transform.json`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't recommend to users that they WRITE a transform in a shared log. It's arguable if that volume should be writable by the migration console. Same argument for the other file below.

Copy link
Member Author

@peternied peternied Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - what is an alternative way to accomplish this that a customer can use today?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a file, here's how you can convert it into json and then base64

jq -n --rawfile script1 ~/Downloads/es-load-test.py '{"fileContent": $script1}'| base64

That could then be pushed into the extra args for the applications ("--transformer-config-base64 ..."). Notice that we'll need the json to match what the JsonJSTransformerProvider expects in the PR (see here).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://opensearch.atlassian.net/browse/MIGRATIONS-2273 for a better way for users to package this

```
10. Replace both `{{INDEX_ORIGINAL_NAME}}` and `{{INDEX_NEW_NAME}}`
11. Run metadata migration with the additional parameter `console metadata migrate --doc-transformer-config-file /shared-logs-output/rfs-transform.json`
12. Run backfill as normal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't include directions for the replayer. Is this documentation repo the best place for this?
Maybe this should be an RFC style github issue. From there, we can show what we do now & what we'd like to do...
Here's a 'renameIndex' transform that's configured like so (1 config, automatically works in all 3 transform-capable applications).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to write one, would you like to add one so we can have both at the same time?

Copy link

@gregschohn gregschohn Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be complicated and would possibly need to evolve over time. Index names are used in a lot of places! That's even more reason to have a complete story here - so that we don't confuse users.

If this is one of the most popular transform requests, we should have a direct transformation to provide this functionality. We should start with the interface for the TypeMappingsSanitization transform and refactor some of the common logic between renaming indices. Would you be comfortable if we extended this jira to track this work?

These instructions are great to show a customer that's using the system today (or yesterday) how they could do this for part of the system, but documentation can be permanently persistent once crawlers pick it up. Showing such an incomplete and in-flux setup isn't likely to our customers in a better spot or to put us in a better spot to support them. If we know that there's a better way to do something we shouldn't put more light on deprecated solutions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peternied
Copy link
Member Author

Thanks for the reviews, after talking this over we don't like the user experience and are going to come back with documentation after we've cleaned up the experience in Migration Assistant.

@peternied peternied closed this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants