Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support RETURNING #578

Merged
merged 6 commits into from
Jul 1, 2016
Merged

Support RETURNING #578

merged 6 commits into from
Jul 1, 2016

Conversation

anarazel
Copy link
Contributor

@anarazel anarazel commented Jun 3, 2016

Most of the code changes aren't really about returning itself, but about making it easy to support RETURNING. I've tried to break it down to individually review-able commits. There's one bigger commit "Combine router executor paths for select and modify commands." which I previously had split up further; I can do that again.

Fixes #242

@anarazel
Copy link
Contributor Author

anarazel commented Jun 3, 2016

I think handling of node failures is the part requiring most testing, since I've whacked that around; and it's pretty much not covered in the regression tests.

@onderkalaci
Copy link
Contributor

Thanks for splitting the commits, it was very useful.

I've one high-level question on the change. We execute the RETURNING on the workers, and use rows that return from one of the workers.
Once we have Evaluating functions on the master, are we going to get nextval() from the workers or from the master when users issue RETURNING on a serial? If we are to use the worker sequences, would that become a problem when we DROP/CREATE the placements via shard rebalancer and/or master_copy_shard_placement()?

Also, there seems to be a bug, which I couldn't find where to write on the changes, so I add to here. When we define cursors on a router query, I observe an unexpected behaviour once the data finishes on the cursor:

SELECT *
        FROM multiple_hash
        WHERE category = '0'
        ORDER BY category;

 category | data 
----------+------
 0        | 1-1
 0        | 2-1
 0        | 3-1
 0        | 4-1
 0        | 5-1
 0        | 6-1
(6 rows)

-- now go with the cursors
BEGIN;
DECLARE test_cursor CURSOR FOR 
    SELECT *
        FROM multiple_hash
        WHERE category = '0'
        ORDER BY category;
FETCH FORWARD 5 test_cursor;
 category | data 
----------+------
 0        | 1-1
 0        | 2-1
 0        | 3-1
 0        | 4-1
 0        | 5-1
(5 rows)

 FETCH FORWARD 5 test_cursor;
 category | data 
----------+------
 0        | 6-1
(1 row)

FETCH FORWARD 5 test_cursor;
ERROR:  scan directions other than forward scans are unsupported

The scan direction is appeared as NoMovementScanDirection. On regular tables (or real-time executor), we don't hit the above error. Is this issue related to this PR?

ExecuteSingleShardSelect(QueryDesc *queryDesc, uint64 tupleCount, Task *task,
EState *executorState, TupleDesc tupleDescriptor,
DestReceiver *destination)
static int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It looks like we need to return uint. Also, this function does not use executorState parameter. You may want to remove that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
-static void
-ExecuteSingleShardSelect(QueryDesc *queryDesc, uint64 tupleCount, Task *task,

  •                    EState *executorState, TupleDesc tupleDescriptor,
    
  •                    DestReceiver *destination)
    
    +static int64

Minor: It looks like we need to return uint.

There's a related discussion (#554), and it looks like it's going to be
int64.

Also, this function does not use executorState parameter. You may
want to remove that.

Sound sensible.

@anarazel
Copy link
Contributor Author

anarazel commented Jun 8, 2016

I've one high-level question on the change. We execute the RETURNING
on the workers, and use rows that return from one of the workers.
Once we have Evaluating functions on the master, are we going to get
nextval() from the workers or from the master when users issue
RETURNING on a serial? If we are to use the worker sequences, would
that become a problem when we DROP/CREATE the placements via shard
rebalancer and/or master_copy_shard_placement()?

It's going to come from the master.

Also, there seems to be a bug, which I couldn't find where to write on
the changes, so I add to here. When we define cursors on a router
query, I observe an unexpected behaviour once the data finishes on the
cursor:

Hm, that's rather likely not related to this change. Afaics that
behaviour already existed. We need a specific check for
NoMovementDirection, like in standard_ExecutorRun:
/*
* run plan
*/
if (!ScanDirectionIsNoMovement(direction))
ExecutePlan(estate,
queryDesc->planstate,
queryDesc->plannedstmt->parallelModeNeeded,
operation,
sendTuples,
count,
direction,
dest);

@onderkalaci
Copy link
Contributor

Just a note, the issue with Cursors I mentioned here doesn't show up on the master branch.

* distributed table is successful. ExecuteDistributedModify returns the number
* of modified rows in that case and errors in all others. This function will
* also generate warnings for individual placement failures.
* ExecuteTaskAndStoreResults executes the task on the remote node, retrieves
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecuteTaskAndStoreResults: It looks like the name and comment are obsolete. This function may not store the results anymore in case of ConsumeQueryResults. So, maybe change the name to ExecuteRouterTask or so?

@onderkalaci
Copy link
Contributor

@anarazel sorry for being late on the review. I added couple of notes for naming etc.. Other than that, the only issue I came across is the cursor issue mentioned above.

(As you mentioned node failure testing, I did lots of failure testing. All looks good.)

After you consider my comments, I'd like to do a final review and testing.

@lfittl lfittl mentioned this pull request Jun 11, 2016
11 tasks
@sumedhpathak sumedhpathak added this to the 5.2 Release milestone Jun 20, 2016
@anarazel anarazel force-pushed the feature/returning branch from bfbe4c8 to 3ff1eda Compare June 22, 2016 18:52
@anarazel
Copy link
Contributor Author

Hi Onder. I can actually reproduce the CURSOR issue on master without any problem. Which isn't surprising, given that the relevant code didn't change. Could you verify that you can also reproduce it there, and open an issue if so?

@anarazel anarazel force-pushed the feature/returning branch from 3ff1eda to 4922432 Compare June 22, 2016 21:37
@anarazel
Copy link
Contributor Author

@onderkalaci I think this should be ready to go, what do you think?

@onderkalaci
Copy link
Contributor

Hi @anarazel,
I partially reproduce the issue with cursors on the master branch. See the steps below:

CREATE TABLE multiple_hash
(
    category text NOT NULL,
    data text NOT NULL 
);

INSERT INTO multiple_hash VALUES ('1', '1');
INSERT INTO multiple_hash VALUES ('1', '2');
INSERT INTO multiple_hash VALUES ('1', '3');
INSERT INTO multiple_hash VALUES ('1', '4');
INSERT INTO multiple_hash VALUES ('1', '5');
INSERT INTO multiple_hash VALUES ('1', '6');
INSERT INTO multiple_hash VALUES ('1', '7');

BEGIN;
DECLARE test_cursor CURSOR FOR 
    SELECT *
        FROM multiple_hash
        WHERE category = '0'
        ORDER BY category;

-- the same error exists on both `master` and `feature/returning` branches:
FETCH FORWARD 4 test_cursor;
FETCH FORWARD 4 test_cursor;
ERROR:  scan directions other than forward scans are unsupported


BEGIN;
DECLARE test_cursor CURSOR FOR 
    SELECT *
        FROM multiple_hash
        WHERE category = '1'
        ORDER BY category;

FETCH FORWARD 4 test_cursor;
FETCH FORWARD 4 test_cursor;

-- for the following commands error happens on `feature/returning` branch 
-- and not in `master` branch
FETCH FORWARD 4 test_cursor; 
FETCH FORWARD 4 test_cursor;

Though I couldn't locate exactly which change has led to the cursor issue, postgres changes the scan direction at this point. And portal->atEnd is set to true just below it.

Reading this comment makes me think that we should only error out when scan direction is BackwardScanDirection and allow both NoMovementScanDirection and ForwardScanDirection on router executor.

Since this patch introduces a small error that doesn't exist in master, I'm leaving the decision to you. I could either open another issue or we can fix it here at all.

* placements.
*/
category = ERRCODE_TO_CATEGORY(ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION);
raiseError = SqlStateMatchesCategory(sqlStateString, category);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was doing some failure testing, I realized that sqlStateString becomes empty when two nodes are down but the connections have already been cached (i.e., status == PGRES_FATAL_ERROR). This makes SqlStateMatchesCategory() to crash.

Steps to reproduce:

CREATE TABLE multiple_hash 
( 
category text NOT NULL PRIMARY KEY, 
data text NOT NULL
);

SELECT master_create_distributed_table('multiple_hash', 'category', 'hash');
SELECT master_create_worker_shards('multiple_hash', 4, 2);

INSERT INTO multiple_hash VALUES ('1', '1');
-- now terminate both worker nodes nodes
/usr/local/pgsql/bin/pg_ctl -D /Users/onderkalaci/Documents/data_dir/worker_9700 -o "-p 9700" -l   /tmp/logfile_9700 stop -m f

/usr/local/pgsql/bin/pg_ctl -D /Users/onderkalaci/Documents/data_dir/worker_9701 -o "-p 9701" -l   /tmp/logfile_9701 stop -m f
INSERT INTO multiple_hash VALUES ('1', '1');
WARNING:  terminating connection due to administrator command
CONTEXT:  while executing command on localhost:9701
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, this behavior does not exist on master branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slight change in your recipe shows the same result in master. If you kill -9/use -m immediate, instead of a normal restart, the same crash happens in master. I opened #634 for that issue. I think we just need a NULL check in SqlStateMatchesCategory().

@anarazel
Copy link
Contributor Author

anarazel commented Jul 1, 2016

On 2016-06-28 04:46:24 -0700, Önder Kalacı wrote:

Since this patch introduces a small error that doesn't exist in master, I'm leaving the decision to you. I could either open another issue or we can fix it here at all.

I don't think this has anything to do with the patch, the conditions
how the bug is triggered differe slightly, but it's clearly a
pre-existing bug. I opened a separate issue for it (#635).

@anarazel anarazel force-pushed the feature/returning branch from 4922432 to 9f8db7d Compare July 1, 2016 08:20
@anarazel
Copy link
Contributor Author

anarazel commented Jul 1, 2016

I pushed two fixups (which will be squashed before merge), first adding a regression test showing the bug, and then a fix.

#if (PG_VERSION_NUM < 90600)

/* before 9.6, PostgreSQL used a uint32 for this field, so check */
Assert(currentAffectedTupleCount <= PG_UINT32_MAX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this line does not compile on 9.4.6. How could travis pass the tests :)

executor/multi_router_executor.c:695:40: error: use of undeclared identifier 'PG_UINT32_MAX'
                        Assert(currentAffectedTupleCount <= PG_UINT32_MAX);

@onderkalaci
Copy link
Contributor

Hi @anarazel ,

I only want to ask you about my comment on the ExecuteTaskAndStoreResults() function (it's name and function comment). I still think that my comment is still valid. What do you think?

Other than that, as I noted, a single line does not compile on 9.4.6 when asserts are enabled. After the code compiles, I'll run check-full on 9.4.6, and then it's a :shipit:

@anarazel
Copy link
Contributor Author

anarazel commented Jul 1, 2016

I don't really agree. Your suggested function name leaves the separation of concerns between RouterExecutorStart and ExecuteRouterTasks less clear imo. And not having any rows to return is still computing the return value. I'll update the comment though.

anarazel added 3 commits July 1, 2016 12:50
This unfortunately requires adding a new table, triggering renumbering
of a number of shard ids.
…d scans.

The targetlist contains TargetEntrys containing expressions, not
expressions directly. That didn't matter so far, but with the upcoming
RETURNING support, the targetlist is inspected to build a TupleDesc.
ExecCleanTypeFromTL hits an assert when looking at something that's not
a TargetEntry.

Mark the entry as resjunk, so it's not actually used.
The old targetlist wasn't used so far, but the upcoming RETURNING
support relies on it.

This also allows to get rid of some crufty code in
multi_executor.c:multi_ExecutorStart(), which used the worker query's
targetlist instead of the main statement's (which didn't have one up to
now).
@anarazel anarazel force-pushed the feature/returning branch from 9f8db7d to 038b485 Compare July 1, 2016 20:06
anarazel added 3 commits July 1, 2016 13:07
The upcoming RETURNING support would otherwise require too much
duplication.  This contains most of the pieces required for RETURNING
support, except removing the planner checks and adjusting regression
test output.
@anarazel anarazel force-pushed the feature/returning branch from 038b485 to 4549e06 Compare July 1, 2016 20:07
@anarazel
Copy link
Contributor Author

anarazel commented Jul 1, 2016

@onderkalaci I've reworded the comment. If you're not happy with that, we can continue refining that. But I'm merging, because that unblocks other PRs (and you gave your :shipit: )

The 9.4 issue worked on travis, because that compiles without assertions :(. See #619

@anarazel anarazel merged commit 18236b1 into master Jul 1, 2016
@anarazel anarazel deleted the feature/returning branch July 11, 2016 18:39
DimCitus pushed a commit that referenced this pull request Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants