Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Ingest pipeline bulk update issue #16663

Closed
NehaV0307 opened this issue Nov 16, 2024 · 6 comments
Closed

[BUG] Ingest pipeline bulk update issue #16663

NehaV0307 opened this issue Nov 16, 2024 · 6 comments
Labels
bug Something isn't working ingest-pipeline Other v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@NehaV0307
Copy link

Describe the bug

Ingest Pipeline works fine for single call of create, index and Update for pipeline.
Bulk create, bulk index works fine for pipeline only when we are performing bulk update it doesn't work.

Related component

Other

To Reproduce

  1. create ingest pipeline

PUT _ingest/pipeline/update_timestamp
{
"description": "Automatically updates the 'updated' field on insert or update",
"processors": [
{
"set": {
"field": "updated",
"value": "{{_ingest.timestamp}}"
}
}
]
}

Output

{
"acknowledged": true
}

2.Create index

PUT /on_boarding_employees-1
{
"settings": {
"index": {
"default_pipeline": "update_timestamp"
}
}
}

Output

{
"acknowledged": true,
"shards_acknowledged": true,
"index": "on_boarding_employees-1"
}

Adding Doc:

POST /on_boarding_employees-1/_doc
{
"type": "ONBOARDING_EMPLOYEE",
"name": “Rahul”
}

Output

{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}

Match query Output:

{
"took": 620,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_score": 1,
"_source": {
"name": “Rahul”,
"type": "ONBOARDING_EMPLOYEE",
"updated": "2024-11-16T06:29:30.826236733Z"
}
}
]
}
}

Normal Update:

POST /on_boarding_employees-1/_update/9f2pM5MB70XT8uT4kP1K
{
"doc": {
"type": "ONBOARDING_EMPLOYEE_UPDATED"
}
}

Output

{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}

Match query Output:

"took": 268,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_score": 1,
"_source": {
"name": “Rahul”,
"type": "ONBOARDING_EMPLOYEE_UPDATED",
"updated": "2024-11-16T06:33:05.478645288Z"
}
}
]
}
}

Bulk Update:

POST /on_boarding_employees-1/_bulk?pipeline=update_timestamp
{"update":{"_id":"9f2pM5MB70XT8uT4kP1K"}}
{"doc":{"type":"ONBOARDING_EMPLOYEE14","name":"Aman2"}}
{"update":{"_id":"9v2xM5MB70XT8uT4uv0x"}}
{"doc":{"type":"ONBOARDING_EMPLOYEE13","name":"Neha"}}

Match query Output:

{
"took": 777,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "on_boarding_employees-1",
"_id": "9v2xM5MB70XT8uT4uv0x",
"_score": 1,
"_source": {
"name": "Neha",
"type": "ONBOARDING_EMPLOYEE13",
"updated": "2024-11-16T06:38:25.841280080Z"
}
},
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_score": 1,
"_source": {
"name": "Aman2",
"type": "ONBOARDING_EMPLOYEE14",
"updated": "2024-11-16T06:33:05.478645288Z"
}
}
]
}
}

Expected behavior

Expected behaviour would be updating the timefield, but it remains same for bulk operation
"updated": "2024-11-16T06:33:05.478645288Z"

Additional Details

No response

@NehaV0307 NehaV0307 added bug Something isn't working untriaged labels Nov 16, 2024
@github-actions github-actions bot added the Other label Nov 16, 2024
@gaobinlong
Copy link
Collaborator

gaobinlong commented Nov 19, 2024

Similar issue: #10864, the root cause is that Update API converts the updateRequest to an indexRequest if the document exists, so the default ingest pipeline is executed, but Bulk API keep the updateRequest as the origin.

By checking the code, I think ingest pipeline was designed only for index operation, not for update operation, we can also see that the Index API supports pipeline parameter but Update API doesn't, so maybe we should prevent the default ingest pipeline from being executed in Update API.

For this use case, I've tried to find some workaround, one option is that use painless script to update the updated field, like this:

POST /on_boarding_employees-1/_update/1
{
  "script": {
    "source": "ctx._source.updated =ctx._now;ctx._source.type=params.type",
    "params": {
      "type": "ONBOARDING_EMPLOYEE_UPDATED"
    }
  }
}

or 

POST /on_boarding_employees-1/_bulk
{"update":{"_id":"1"}}
{"script":{"source":"ctx._source.updated =ctx._now;ctx._source.type=params.type","params":{"type":"ONBOARDING_EMPLOYEE_UPDATED"}}}

@andrross @macohen @reta what do you think about this?

@reta
Copy link
Collaborator

reta commented Nov 19, 2024

Thanks @gaobinlong for looking into it

By checking the code, I think ingest pipeline was designed only for index operation, not for update operation, we can also see that the Index API supports pipeline parameter but Update API doesn't, so maybe we should prevent the default ingest pipeline from being executed in Update API.

Found this long thread on the matter [1], TLDR; is that Update API does not support ingest pipelines, we should probably document that (and prevent if possible).

[1] elastic/elasticsearch#17895

@gaobinlong
Copy link
Collaborator

Thanks @reta, I've created an document issue for this and will open a PR later.

For the code, does it make sense that we return an deprecation warning in 2.x version for the update API and then remove the support in 3.0.0? It maybe a breaking change for some users.

@reta
Copy link
Collaborator

reta commented Nov 21, 2024

Thanks @gaobinlong

Thanks @reta, I've created an document issue for this and will open a PR later.

👍

For the code, does it make sense that we return an deprecation warning in 2.x version for the update API and then remove the support in 3.0.0?

But this functionality does not work, does it?

@gaobinlong
Copy link
Collaborator

But this functionality does not work, does it?

For update API, we cannot specify a pipeline explicitly, but if the index has a default pipeline or a final pipeline and the specified document exists, the default or final pipeline will be executed, this behavior is unexpected, and not consistent with the bulk update(the default pipeline or final pipeline never has a chance to execute).

Preventing the default or final pipeline from being executed in update API is possible, but it's a breaking change IMO, so we may firstly show a warning header for users.

@reta
Copy link
Collaborator

reta commented Nov 22, 2024

Preventing the default or final pipeline from being executed in update API is possible, but it's a breaking change IMO, so we may firstly show a warning header for users.

Got it now, thanks, it makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ingest-pipeline Other v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

No branches or pull requests

3 participants