-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Free heap allocated Datums after the flush #2469
base: BABEL_4_X_DEV
Are you sure you want to change the base?
Free heap allocated Datums after the flush #2469
Conversation
Signed-off-by: Alex Kasko <[email protected]>
…2469) Signed-off-by: Alex Kasko <[email protected]>
for (int i = 0; i < rowCount * colCount; i++) | ||
{ | ||
cstate->bufferedValues = lappend(cstate->bufferedValues, (void *) Values[i]); | ||
cstate->bufferedValueAllocFlags = lappend_int(cstate->bufferedValueAllocFlags, ValueAllocFlags[i] ? 1 : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cant we directly store booleans in cstate->bufferedValueAllocFlags instead of integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to be explicit with T_IntList
usage. Updated this line to remove int conversion.
rowData->isAllocated[i] = true; | ||
break; | ||
case TDS_TYPE_NCHAR: | ||
case TDS_TYPE_NVARCHAR: | ||
rowData->columnValues[i] = TdsTypeNCharToDatum(temp); | ||
rowData->isAllocated[i] = true; | ||
break; | ||
case TDS_TYPE_BINARY: | ||
case TDS_TYPE_VARBINARY: | ||
rowData->columnValues[i] = TdsTypeVarbinaryToDatum(temp); | ||
rowData->isAllocated[i] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common condition in all of the cases, can we move it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it out of the condition and added a comment.
rowData->isAllocated[i] = true; | ||
break; | ||
case TDS_TYPE_NTEXT: | ||
rowData->columnValues[i] = TdsTypeNCharToDatum(temp); | ||
rowData->isAllocated[i] = true; | ||
break; | ||
case TDS_TYPE_IMAGE: | ||
rowData->columnValues[i] = TdsTypeVarbinaryToDatum(temp); | ||
rowData->isAllocated[i] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common condition between all cases, can we move it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it out of the condition and added a comment.
Signed-off-by: Alex Kasko <[email protected]>
In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and #2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>
…stgresql#2487) In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and babelfish-for-postgresql#2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>
In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and #2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>
…stgresql#2487) In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and babelfish-for-postgresql#2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>
Description
When BCP import is performed,
Datums
are created for all non-NULL incoming fields. Some of theseDatums
are allocated on heap. TheseDatums
need to be kept in memory until the call totable_multi_insert
.It seems to be not possible to distinguish between heap-allocated
Datums
and passed-by-valueDatums
using only the pointer to suchDatum
. The proposed patch attempts to keep track of all heap-allocatedDatums
(in a similar manner, likeisNull
is tracked) so they can be freed immediately after the flush.Pointers to
Datums
andisAllocated
flags are kept inBulkCopyStateData
in growing lists. Pointers from incoming batch are appended there before processing. And theseDatums
are freed and lists are trimmed after each flush.There are also 2 unrelated memory cleanup changes included: freeing of
TDSRequestBulkLoadData->rowData
contents on TDS side and freeing ofattnums
list on TSQL side.attnums
is created for every batch (I assume this is needed for error-checking), but only used for the first incoming batch.Memory usage when importing 1 million of
varchars
(see details in linked issue), without the patch:With patch applied:
Issues Resolved
#2468
Test Scenarios Covered
There are no functional changes, so no new tests.
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Signed-off-by: Alex Kasko [email protected]