Skip to content

Commit

Permalink
BCP import - open and close indices for every batch (babelfish-for-po…
Browse files Browse the repository at this point in the history
…stgresql#2487)

In `ExecuteBulkCopy` indices on the target table are opened for every incoming batch, but never closed.

It is proposed to open indices instead for every executor batch just before the call to `table_multi_insert` and close them after they are updated with tuples from this batch.

Backend memory usage (with both this patch and babelfish-for-postgresql#2469 patch included) while importing 16 million varchar records (4x the table from the linked issue):

![Figure_3](https://github.com/babelfish-for-postgresql/babelfish_extensions/assets/9497509/4b36faab-79d7-4a82-aef1-8404a3ee2c49)

Signed-off-by: Alex Kasko <[email protected]>
  • Loading branch information
staticlibs authored and tanscorpio7 committed Jul 23, 2024
1 parent 30bff19 commit 2d7e5c1
Showing 1 changed file with 32 additions and 3 deletions.
35 changes: 32 additions & 3 deletions contrib/babelfishpg_tsql/src/pltsql_bulkcopy.c
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
*/
save_cur_lineno = cstate->cur_rowno;

/* Open indices to update them after the multi insert */
ExecOpenIndices(resultRelInfo, false);
/*
* table_multi_insert may leak memory, so switch to short-lived memory
* context before calling it.
Expand All @@ -361,7 +363,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
mycid,
ti_options,
buffer->bistate);
MemoryContextSwitchTo(oldcontext);

for (i = 0; i < nused; i++)
{
Expand All @@ -383,6 +384,36 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
ExecClearTuple(slots[i]);
}

/*
* ExecInsertIndexTuples also leaks memory, so only switch back to old
* context after it.
*/
MemoryContextSwitchTo(oldcontext);

/* Close the indices we've opened before multi insert */
ExecCloseIndices(resultRelInfo);

/*
* ExecCloseIndices does not free neither resulsting arrays, allocated
* in ExecOpenIndices, nor its contents. Instead of moving open/close into
* short-lived context lets clean it up explicitly, so indices open/close
* can be untied from batch handling in future if needed.
*
* There is an additional call to ExecCloseIndices in
* EndBulkCopy->ExecCloseResultRelations, we reset ri_NumIndices to make
* it no-op.
*/
if (resultRelInfo->ri_NumIndices > 0)
{
for (i = 0; i < resultRelInfo->ri_NumIndices; i++)
{
pfree(resultRelInfo->ri_IndexRelationInfo[i]);
}
pfree(resultRelInfo->ri_IndexRelationInfo);
pfree(resultRelInfo->ri_IndexRelationDescs);
resultRelInfo->ri_NumIndices = 0;
}

/* Mark that all slots are free. */
buffer->nused = 0;

Expand Down Expand Up @@ -587,8 +618,6 @@ ExecuteBulkCopy(BulkCopyState cstate, int rowCount, int colCount,
RelationGetRelationName(cstate->rel))));
}

ExecOpenIndices(cstate->resultRelInfo, false);

econtext = GetPerTupleExprContext(cstate->estate);

/* Set up callback to identify error line number. */
Expand Down

0 comments on commit 2d7e5c1

Please sign in to comment.