Free heap allocated Datums after the flush #2469

staticlibs · 2024-04-05T17:59:43Z

Description

When BCP import is performed, Datums are created for all non-NULL incoming fields. Some of these Datums are allocated on heap. These Datums need to be kept in memory until the call to table_multi_insert.

It seems to be not possible to distinguish between heap-allocated Datums and passed-by-value Datums using only the pointer to such Datum. The proposed patch attempts to keep track of all heap-allocated Datums (in a similar manner, like isNull is tracked) so they can be freed immediately after the flush.

Pointers to Datums and isAllocated flags are kept in BulkCopyStateData in growing lists. Pointers from incoming batch are appended there before processing. And these Datums are freed and lists are trimmed after each flush.

There are also 2 unrelated memory cleanup changes included: freeing of TDSRequestBulkLoadData->rowData contents on TDS side and freeing of attnums list on TSQL side. attnums is created for every batch (I assume this is needed for error-checking), but only used for the first incoming batch.

Memory usage when importing 1 million of varchars (see details in linked issue), without the patch:

With patch applied:

Issues Resolved

#2468

Test Scenarios Covered

There are no functional changes, so no new tests.

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Alex Kasko [email protected]

Signed-off-by: Alex Kasko <[email protected]>

…2469) Signed-off-by: Alex Kasko <[email protected]>

KushaalShroff · 2024-04-18T06:04:59Z

contrib/babelfishpg_tsql/src/pltsql_bulkcopy.c

+	for (int i = 0; i < rowCount * colCount; i++)
+	{
+		cstate->bufferedValues = lappend(cstate->bufferedValues, (void *) Values[i]);
+		cstate->bufferedValueAllocFlags = lappend_int(cstate->bufferedValueAllocFlags, ValueAllocFlags[i] ? 1 : 0);


Why cant we directly store booleans in cstate->bufferedValueAllocFlags instead of integers?

Just wanted to be explicit with T_IntList usage. Updated this line to remove int conversion.

KushaalShroff · 2024-04-18T06:09:00Z

contrib/babelfishpg_tds/src/backend/tds/tdsbulkload.c

+								rowData->isAllocated[i] = true;
 								break;
 							case TDS_TYPE_NCHAR:
 							case TDS_TYPE_NVARCHAR:
 								rowData->columnValues[i] = TdsTypeNCharToDatum(temp);
+								rowData->isAllocated[i] = true;
 								break;
 							case TDS_TYPE_BINARY:
 							case TDS_TYPE_VARBINARY:
 								rowData->columnValues[i] = TdsTypeVarbinaryToDatum(temp);
+								rowData->isAllocated[i] = true;


common condition in all of the cases, can we move it out?

Moved it out of the condition and added a comment.

KushaalShroff · 2024-04-18T06:09:29Z

contrib/babelfishpg_tds/src/backend/tds/tdsbulkload.c

+								rowData->isAllocated[i] = true;
 								break;
 							case TDS_TYPE_NTEXT:
 								rowData->columnValues[i] = TdsTypeNCharToDatum(temp);
+								rowData->isAllocated[i] = true;
 								break;
 							case TDS_TYPE_IMAGE:
 								rowData->columnValues[i] = TdsTypeVarbinaryToDatum(temp);
+								rowData->isAllocated[i] = true;


Common condition between all cases, can we move it out?

Moved it out of the condition and added a comment.

Signed-off-by: Alex Kasko <[email protected]>

In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and #2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>

…stgresql#2487) In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and babelfish-for-postgresql#2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>

In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and #2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>

…stgresql#2487) In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed. It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch. Backend memory usage (with both this patch and babelfish-for-postgresql#2469 patch included) while importing 16 million varchar records (4x the table from the linked issue): ![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49) Signed-off-by: Alex Kasko <[email protected]>

Free heap allocated Datums after the flush

a962e19

Signed-off-by: Alex Kasko <[email protected]>

KushaalShroff self-requested a review April 8, 2024 06:32

This was referenced Apr 10, 2024

[Bug]: BCP import memory leak when table has an index #2486

Closed

BCP import - open and close indices for every batch #2487

Merged

staticlibs added a commit to wiltondb/babelfish_extensions that referenced this pull request Apr 14, 2024

Free heap allocated Datums after the flush (babelfish-for-postgresql#…

aaf9cc6

…2469) Signed-off-by: Alex Kasko <[email protected]>

KushaalShroff requested changes Apr 18, 2024

View reviewed changes

KushaalShroff reviewed Apr 18, 2024

View reviewed changes

Common conditions and int conversion fixes

8cde0a0

Signed-off-by: Alex Kasko <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Free heap allocated Datums after the flush #2469

Free heap allocated Datums after the flush #2469

staticlibs commented Apr 5, 2024

KushaalShroff Apr 18, 2024

staticlibs Apr 18, 2024

KushaalShroff Apr 18, 2024

staticlibs Apr 18, 2024

KushaalShroff Apr 18, 2024

staticlibs Apr 18, 2024

Free heap allocated Datums after the flush #2469

Are you sure you want to change the base?

Free heap allocated Datums after the flush #2469

Conversation

staticlibs commented Apr 5, 2024

Description

Issues Resolved

Test Scenarios Covered

Check List

KushaalShroff Apr 18, 2024

Choose a reason for hiding this comment

staticlibs Apr 18, 2024

Choose a reason for hiding this comment

KushaalShroff Apr 18, 2024

Choose a reason for hiding this comment

staticlibs Apr 18, 2024

Choose a reason for hiding this comment

KushaalShroff Apr 18, 2024

Choose a reason for hiding this comment

staticlibs Apr 18, 2024

Choose a reason for hiding this comment