Skip to content

Commit

Permalink
MDEV-34877 Port "Bug #11745929 Change lock priority so that the trans…
Browse files Browse the repository at this point in the history
…action holding S-lock gets X-lock first" fix from MySQL to MariaDB

This commit implements
mysql/mysql-server@7037a0b
functionality, i.e. if some transaction A holds not-gap S-lock on some
record, and some other transactions B={b1, b2, ..., bn} have not-gap
X-locks waiting for the S-lock of transaction A, and transaction A
requests not-gap and not insert intention X-lock which conflicts with
the X-locks of transactions B and does not conflict with another locks
in the queue, then grant the X-lock to transaction A.

MySQL's commit contains the following explanation of why insert-intention
locks must not overtake a waiting ordinary or gap locks:

"It is important that this decission rule doesn't allow
INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`,
`X`, `X|GAP`), as inserting a record into a gap would split such WAITING
lock, violating the invariant that each transaction can have at most
single WAITING lock at any time."

I would add to the explanation the following. Suppose we have trx 1 which
holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t"
or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and
MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record.
And then trx 1 wants to insert some record just before the locked record.
It requests insert-intention lock, and if the lock overtakes trx 2 lock,
there will be phantom records for trx 2 in RR. lock_delete_updated.test
shows how "DELETE" allows to insert some records in already scanned gap
and misses some records to delete.

The current implementation differs from MySQL implementation. There are
two key differences:

1. Lock queue ordering. In MySQL all waiting locks precede all granted
   locks. A new waiting lock is added to the head of the queue, a new
   granted lock is added to the end of the queue, if some waiting lock
   is granted, it's moved to the end of the queue. In MariaDB any new
   lock is added to the end of the queue and waiting lock does not change
   its position in the queue where the lock is granted. The rule is that
   blocking lock must be located before blocked lock in lock queue. We
   maintain the rule with inserting bypassing lock just before bypassed
   one.

2. MySQL implementation uses some object(locksys::Trx_locks_cache) which
   can be passed to consecutive calls to rec_lock_has_to_wait() for the
   same trx and heap_no to cache the result of checking if trx has a
   granted lock which is blocking the waiting lock(see
   locksys::Trx_locks_cache::has_granted_blocker()). The current
   implementation does not use such object, because it looks for such
   granted lock on the level of lock_rec_other_has_conflicting() and
   lock_rec_has_to_wait_in_queue(). I.e. there is no need in additional
   lock queue iteration in
   locksys::Trx_locks_cache::has_granted_blocker(), as we already iterate
   it in lock_rec_other_has_conflicting() and
   lock_rec_has_to_wait_in_queue().

During the testing the following case was found. Suppose we have
delete-marked record and going to do inplace insert into
that delete-marked record. Usually we don't create explicit lock if
there are no conlicting with not gap X-lock locks(see
lock_clust_rec_modify_check_and_lock(), btr_cur_update_in_place()). The
implicit lock will be converted to explicit one by demand.

That can happen during INSERT, the not-gap S-lock can
be acquired on searching for duplicates(see
row_ins_duplicate_error_in_clust()), and, if delete-marked record is
found, inplace insert(see btr_cur_upd_rec_in_place()) modifies the
record, what is treated as implicit lock.

But there can be a case when some transaction trx1 holds not-gap S-lock,
another transaction trx2 creates waiting X-lock, and then trx2 tries to
do inplace insert. Before the fix the waiting X-lock of trx2 would be
conflicting lock, and trx1 would try to create explicit X-lock, what
would cause deadlock, and one of the transactions whould be rolled back.
But after the fix, trx2 waiting X-lock is not treated as conflicting
with trx1 X-lock anymore, as trx1 already holds S-lock. If we don't create
explicit lock, then some other transaction trx3 can create it during
implicit to explicit lock conversion and place it at the end of the
queue. So there can be the following locks order in the queue:

S1(granted) X2(waiting) X1(granted)

The above queue is not valid, because all granted trx1 locks must be
placed before waiting trx2 lock. Besides, lock_rec_release_try() can
remove S(granted, trx1) lock and grant X lock to trx 2, and there can be
two granted X-locks on the same record:

X2(granted) X1(granted)

Taking into account that lock_rec_release_try() can release cell and
lock_sys latches leaving some locks unreleased, the queue validation
function can fail in any unexpected place.

It can be fixed with two ways:

1) Place explicit X(granted, trx1) lock before X(waiting, trx2) lock
   during implicit to explicit lock conversion. This option is implemented
   in MySQL, as granted lock is always placed at the top of locks queue,
   and waiting locks are placed at the bottom of the queue. MariaDB does
   not do this, and implementing this variant would require conflicting
   locks search before converting implicit to explicit lock, what, in
   turns, would require cell and/or lock_sys latch acquiring.

2) Create and place X(granted, trx1) lock before X(waiting, trx2) during
   inplace INSERT, i.e. when lock_rec_lock() is invoked from
   lock_clust_rec_modify_check_and_lock() or
   lock_sec_rec_modify_check_and_lock(), if X(waiting, trx2) is
   bypassed. Such a way we don't need in additional conflicting locks
   search, as they are searched anyway in lock_rec_low().

This fix implements the second variant(see the changes around
c_lock_info.insert_after in lock_rec_lock). I.e. if some record was
delete-marked and we do inplace insert in such a record, and some lock for
bypass was found, create explicit lock to avoid conflicting lock search on
each implicit to explicit lock conversion. We can remove it if MDEV-35624
is implemented.

lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue():
search locks to bypass along with conflicting locks searching in the
same loop. The result is returned in conflicting_lock_info object.
There can be several locks to bypass, only the first one is returned to
limit lock_rec_find_similar_on_page() with the first bypassed lock to
preserve "blocking before blocked" invariant. conflicting_lock_info also
contains a pointer to the lock, after which we can insert bypassing
lock. This lock precedes bypassed one.

Bypassing lock can be next-key lock, and the following cases are
possible:

1. S1(not-gap, granted) II2(granted) X3(waiting for S1),

   When new X1(ordinary) lock is acquired, there will be the following
   locks queue:

   S1(not-gap, granted) II2(granted) X1(ordinary, granted) X3(waiting for
   S1)

   If we had inserted new X1 lock just after S1, and S1 had been released
   on transaction commit or rollback, we would have the following
   sequence in the locks queue:

   X1(ordinary, granted) II2(granted) X3(waiting for X1)
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   This is not a real issue as II lock once granted can be
   ignored but it could possibly hit some assert(taking into account
   that lock_release_try() can release lock_sys latch, and other threads
   can acquire the latch and validate lock queue) as it breaks our design
   constraint that any granted lock in the queue should not conflict
   with locks ahead in the queue. But lock_rec_queue_validate() does not
   check the above contraint. We place new bypassing lock just before
   bypassed one, but there still can be the case when lock bitmap is used
   instead of creating new lock object(see lock_rec_add_to_queue() and
   lock_rec_find_similar_on_page()), and the lock, which owns the
   bitmap, can precede II2(granted). We can either disable
   lock_rec_find_similar_on_page() space optimization for bypassing locks
   or treat "X1(ordinary, granted) II2(granted)" sequence as valid. As
   we don't currently have the function which would fail on the above
   sequence, let treat it as valid for the case, when lock_release()
   execution is in process.

2. S1(ordinary, granted) II2(waiting for S1) X3(waiting for S1)

   When new X1(ordinary) lock is acquired, there will be the following
   locks queue:

   S1(ordinary, granted) II2(waiting for S1) X1(ordinary, granted)
   X3(waiting for S1).

   After S1 releasing there will be:

   II2(granted) X1(ordinary, granted) X3(waiting for S1)
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   The above queue is valid because ordinary lock does not conflict with
   II-lock(see lock_rec_has_to_wait()).

lock_rec_create_low(): insert new lock to the position which
lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue()
return if the lock is bypassing.

lock_rec_find_similar_on_page(): add ability to limit similiar lock search
with the certain lock to preserve "blocking before blocked" invariant for
all bypassed locks.

lock_rec_add_to_queue(): don't treat bypassed locks as waiting ones to
let lock bitmap reusing for bypassing locks.

lock_rec_lock(): fix inplace insert case, explained above.

lock_rec_dequeue_from_page(), lock_rec_rebuild_waiting_queue: move
bypassing lock to the correct place to preserve "blocking before blocked"
invariant.
  • Loading branch information
vlad-lesin committed Dec 25, 2024
1 parent 0616935 commit 90b051e
Show file tree
Hide file tree
Showing 10 changed files with 761 additions and 202 deletions.
129 changes: 129 additions & 0 deletions mysql-test/suite/innodb/r/avoid_deadlock_with_blocked.result
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
connect stop_purge,localhost,root;
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connect con1,localhost,root,,;
connect con2,localhost,root,,;
connect con3,localhost,root,,;
connection default;
CREATE TABLE t1 (id INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=0;
INSERT INTO t1 (id) VALUES (1);
connection con1;
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;
id
1
connection con2;
BEGIN;
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con2_will_wait';
SELECT * FROM t1 FOR UPDATE;
connection con1;
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
SELECT * FROM t1 FOR UPDATE;
id
1
COMMIT;
connection con2;
id
1
COMMIT;
connection con1;
BEGIN;
SELECT * FROM t1 WHERE id=1 FOR UPDATE;
id
1
connection con2;
BEGIN;
SET DEBUG_SYNC = 'lock_wait_start SIGNAL con2_will_wait';
SELECT * FROM t1 LOCK IN SHARE MODE;
connection con1;
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
INSERT INTO t1 VALUES (0);
ROLLBACK;
connection con2;
ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
COMMIT;
connection con1;
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;
id
1
connection con2;
BEGIN;
SELECT * FROM t1 WHERE id=1 LOCK IN SHARE MODE;
id
1
connection default;
connection con3;
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con3_will_wait';
SELECT * FROM t1 FOR UPDATE;
connection con1;
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
SET DEBUG_SYNC = 'lock_wait_start SIGNAL con1_will_wait';
INSERT INTO t1 VALUES (0);
connection con2;
SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
COMMIT;
connection con1;
ROLLBACK;
connection con3;
ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
connection con1;
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;
id
1
connection con2;
BEGIN;
SELECT * FROM t1 WHERE id=1 LOCK IN SHARE MODE;
id
1
connection default;
connection con3;
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con3_will_wait';
SELECT * FROM t1 FOR UPDATE;
connection con1;
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con1_will_wait';
SELECT * FROM t1 WHERE id=1 FOR UPDATE;
connection con2;
SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
COMMIT;
connection con1;
id
1
COMMIT;
connection con3;
id
1
COMMIT;
connection con1;
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;
id
1
connection con2;
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con2_will_wait';
SELECT * FROM t1 FOR UPDATE;
connection con3;
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con3_will_wait';
SELECT * FROM t1 FOR UPDATE;
connection con1;
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
SELECT * FROM t1 WHERE id=1 FOR UPDATE;
id
1
COMMIT;
connection con2;
id
1
COMMIT;
connection con3;
id
1
COMMIT;
connection default;
disconnect con1;
disconnect con2;
disconnect con3;
disconnect stop_purge;
DROP TABLE t1;
194 changes: 194 additions & 0 deletions mysql-test/suite/innodb/t/avoid_deadlock_with_blocked.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
--source include/have_innodb.inc
--source include/have_debug_sync.inc
--source include/count_sessions.inc

--disable_query_log
call mtr.add_suppression("InnoDB: Transaction was aborted due to ");
--enable_query_log

connect stop_purge,localhost,root;
START TRANSACTION WITH CONSISTENT SNAPSHOT;

--connect (con1,localhost,root,,)
--connect (con2,localhost,root,,)
--connect (con3,localhost,root,,)

--connection default
CREATE TABLE t1 (id INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=0;
INSERT INTO t1 (id) VALUES (1);
# Simplest scenario:
# <con1, S, granted>,
# <con1, S, granted>, <con2, X, waiting for con1>,
# Before MDEV-34877:
# <con1, S, granted>, <con2, X, waiting for con1>, <con1, X, waiting for con1>
# After MDEV-34877:
# <con1, S, granted>, <con1, X, granted>, <con2, X, waiting for con1>
# Expected: instead of deadlocking, the con1's request should ingore con2's

--connection con1
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;

--connection con2
BEGIN;
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con2_will_wait';
--send SELECT * FROM t1 FOR UPDATE

--connection con1
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
SELECT * FROM t1 FOR UPDATE;
COMMIT;

--connection con2
--reap
COMMIT;

# A variant of the above scenario:
# <con1, X REC_NOT_GAP, granted>,
# <con1, X REC_NOT_GAP, granted>, <con2, S, waiting for con1>,
# <con1, X REC_NOT_GAP, granted>, <con2, S, waiting for con1>, <con1, INSERT INTENTION, waiting for con1>
# Expected: a deadlock, as INSERT INTENTION should not overtake locks on gap, to not slice them
--connection con1
BEGIN;
SELECT * FROM t1 WHERE id=1 FOR UPDATE;

--connection con2
BEGIN;
SET DEBUG_SYNC = 'lock_wait_start SIGNAL con2_will_wait';
--send SELECT * FROM t1 LOCK IN SHARE MODE

--connection con1
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
INSERT INTO t1 VALUES (0);
ROLLBACK;

--connection con2
--error ER_LOCK_DEADLOCK
--reap
COMMIT;

# More complicated scenario:
# <con1, S, granted>,
# <con1, S, granted>, <con2, S REC_NOT_GAP, granted>,
# <con1, S, granted>, <con2, S REC_NOT_GAP, granted>, <con3, X, waiting for con2>
# <con1, S, granted>, <con2, S REC_NOT_GAP, granted>, <con3, X, waiting for con1>, <con1, INSERT_INTENTION, waiting for con3>
# <con1, S, granted>, <con3, X, waiting for con1>, <con1, INSERT_INTENTION, waiting for con3>
# Expected: a deadlock, as INSERT INTENTION should not overtake locks on gap, to not slice them

--connection con1
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;

--connection con2
BEGIN;
SELECT * FROM t1 WHERE id=1 LOCK IN SHARE MODE;

--connection default

--connection con3
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con3_will_wait';
--send SELECT * FROM t1 FOR UPDATE

--connection con1
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
SET DEBUG_SYNC = 'lock_wait_start SIGNAL con1_will_wait';
--send INSERT INTO t1 VALUES (0)

--connection con2
SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
COMMIT;

--connection con1
--reap
ROLLBACK;


--connection con3
--error ER_LOCK_DEADLOCK
--reap

# More complicated scenario.
# <con1, S, granted>,
# <con1, S, granted>, <con2, S REC_NOT_GAP, granted>,
# <con1, S, granted>, <con2, S REC_NOT_GAP, granted>, <con3, X, waiting for con1>
# <con1, S, granted>, <con2, S REC_NOT_GAP, granted>, <con3, X, waiting for con1>, <con1, X REC_NOT_GAP, waiting for con2>
# Before MDEV-34877:
# <con1, S, granted>, <con3, X, waiting for con1>, <con1, X REC_NOT_GAP, waiting for con3>
# After MDEV-34877:
# <con1, S, granted>, <con1, X REC_NOT_GAP, granted>, <con3, X, waiting for con1>


--connection con1
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;

--connection con2
BEGIN;
SELECT * FROM t1 WHERE id=1 LOCK IN SHARE MODE;

--connection default

--connection con3
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con3_will_wait';
--send SELECT * FROM t1 FOR UPDATE

--connection con1
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con1_will_wait';
--send SELECT * FROM t1 WHERE id=1 FOR UPDATE

--connection con2
SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
COMMIT;

--connection con1
--reap
COMMIT;

--connection con3
--reap
COMMIT;

# A secenario, where con1 has to bypass two transactions:
# <con1, S, granted>
# <con1, S, granted> <con2, X, waiting>
# <con1, S, granted> <con2, X, waiting> <con3, X, waiting>
# Before MDEV-34877:
# <con1, S, granted> <con2, X, waiting> <con3, X, waiting> <con1, X REC_NOT_GAP, waiting for con2>
# After MDEV-34877:
# <con1, S, granted> <con1, X REC_NOT_GAP, granted> <con2, X, waiting> <con3, X, waiting>
--connection con1
BEGIN;
SELECT * FROM t1 LOCK IN SHARE MODE;

--connection con2
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con2_will_wait';
--send SELECT * FROM t1 FOR UPDATE

--connection con3
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con3_will_wait';
--send SELECT * FROM t1 FOR UPDATE

--connection con1
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
SELECT * FROM t1 WHERE id=1 FOR UPDATE;
COMMIT;

--connection con2
--reap
COMMIT;

--connection con3
--reap
COMMIT;

--connection default
--disconnect con1
--disconnect con2
--disconnect con3
--disconnect stop_purge

DROP TABLE t1;

--source include/wait_until_count_sessions.inc
32 changes: 32 additions & 0 deletions storage/innobase/include/hash0hash.h
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,38 @@ struct hash_cell_t
{
remove(search(next, [&element](const T *p){return p==&element;}), next);
}

/** Delete an element.
@tparam T type of the element
@param remove the being-removed element
@param next the next-element pointer in T */
template<typename T>
void remove(const T &remove, T *T::*next)
{
T *prev;
for (prev= static_cast<T *>(node); prev && prev->*next != &remove;
prev= prev->*next);
ut_a(prev);
prev->*next= remove.*next;
}

/** Insert an element after another.
@tparam T type of the element
@param after the element after which to insert
@param insert the being-inserted element
@param next the next-element pointer in T */
template <typename T> void insert_after(T &after, T &insert, T *T::*next)
{
#ifdef UNIV_DEBUG
for (const T *c= static_cast<const T *>(node); c; c= c->*next)
if (c == &after)
goto found;
ut_error;
found:
#endif
insert.*next= after.*next;
after.*next= &insert;
}
};

/** Hash table with singly-linked overflow lists */
Expand Down
Loading

0 comments on commit 90b051e

Please sign in to comment.