Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free heap allocated Datums after the flush #2469

Open
wants to merge 2 commits into
base: BABEL_4_X_DEV
Choose a base branch
from

Conversation

staticlibs
Copy link
Contributor

Description

When BCP import is performed, Datums are created for all non-NULL incoming fields. Some of these Datums are allocated on heap. These Datums need to be kept in memory until the call to table_multi_insert.

It seems to be not possible to distinguish between heap-allocated Datums and passed-by-value Datums using only the pointer to such Datum. The proposed patch attempts to keep track of all heap-allocated Datums (in a similar manner, like isNull is tracked) so they can be freed immediately after the flush.

Pointers to Datums and isAllocated flags are kept in BulkCopyStateData in growing lists. Pointers from incoming batch are appended there before processing. And these Datums are freed and lists are trimmed after each flush.

There are also 2 unrelated memory cleanup changes included: freeing of TDSRequestBulkLoadData->rowData contents on TDS side and freeing of attnums list on TSQL side. attnums is created for every batch (I assume this is needed for error-checking), but only used for the first incoming batch.

Memory usage when importing 1 million of varchars (see details in linked issue), without the patch:

Figure_1

With patch applied:

Figure_2

Issues Resolved

#2468

Test Scenarios Covered

There are no functional changes, so no new tests.

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Alex Kasko [email protected]

@KushaalShroff KushaalShroff self-requested a review April 8, 2024 06:32
staticlibs added a commit to wiltondb/babelfish_extensions that referenced this pull request Apr 14, 2024
for (int i = 0; i < rowCount * colCount; i++)
{
cstate->bufferedValues = lappend(cstate->bufferedValues, (void *) Values[i]);
cstate->bufferedValueAllocFlags = lappend_int(cstate->bufferedValueAllocFlags, ValueAllocFlags[i] ? 1 : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant we directly store booleans in cstate->bufferedValueAllocFlags instead of integers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to be explicit with T_IntList usage. Updated this line to remove int conversion.

Comment on lines 724 to 734
rowData->isAllocated[i] = true;
break;
case TDS_TYPE_NCHAR:
case TDS_TYPE_NVARCHAR:
rowData->columnValues[i] = TdsTypeNCharToDatum(temp);
rowData->isAllocated[i] = true;
break;
case TDS_TYPE_BINARY:
case TDS_TYPE_VARBINARY:
rowData->columnValues[i] = TdsTypeVarbinaryToDatum(temp);
rowData->isAllocated[i] = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common condition in all of the cases, can we move it out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it out of the condition and added a comment.

Comment on lines 804 to 812
rowData->isAllocated[i] = true;
break;
case TDS_TYPE_NTEXT:
rowData->columnValues[i] = TdsTypeNCharToDatum(temp);
rowData->isAllocated[i] = true;
break;
case TDS_TYPE_IMAGE:
rowData->columnValues[i] = TdsTypeVarbinaryToDatum(temp);
rowData->isAllocated[i] = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common condition between all cases, can we move it out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it out of the condition and added a comment.

shardgupta pushed a commit that referenced this pull request Jul 19, 2024
In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed.

It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch.

Backend memory usage (with both this patch and #2469 patch included) while importing 16 million varchar records (4x the table from the linked issue):

![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49)

Signed-off-by: Alex Kasko <[email protected]>
tanscorpio7 pushed a commit to tanscorpio7/babelfish_extensions that referenced this pull request Jul 23, 2024
…stgresql#2487)

In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed.

It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch.

Backend memory usage (with both this patch and babelfish-for-postgresql#2469 patch included) while importing 16 million varchar records (4x the table from the linked issue):

![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49)

Signed-off-by: Alex Kasko <[email protected]>
shardgupta pushed a commit that referenced this pull request Jul 23, 2024
In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed.

It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch.

Backend memory usage (with both this patch and #2469 patch included) while importing 16 million varchar records (4x the table from the linked issue):

![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49)

Signed-off-by: Alex Kasko <[email protected]>
sharathbp pushed a commit to amazon-aurora/babelfish_extensions that referenced this pull request Aug 20, 2024
…stgresql#2487)

In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed.

It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch.

Backend memory usage (with both this patch and babelfish-for-postgresql#2469 patch included) while importing 16 million varchar records (4x the table from the linked issue):

![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49)

Signed-off-by: Alex Kasko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants