10.6 mdev 34877 #3723

vlad-lesin · 2024-12-25T09:31:41Z

The pool request contains fixes for MDEV-35708 and
MDEV-34877. See commit messages for detailed description of the changes.

It's supposed that the function gets the previous lock set on a record. But if there are several locks set on a record, it will return only the first one. Continue locks list iteration till the certain lock even if the certain bit in lock bitmap is set.

CLAassistant · 2024-12-25T09:31:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…action holding S-lock gets X-lock first" fix from MySQL to MariaDB This commit implements mysql/mysql-server@7037a0b functionality, i.e. if some transaction A holds not-gap S-lock on some record, and some other transactions B={b1, b2, ..., bn} have not-gap X-locks waiting for the S-lock of transaction A, and transaction A requests not-gap and not insert intention X-lock which conflicts with the X-locks of transactions B and does not conflict with another locks in the queue, then grant the X-lock to transaction A. MySQL's commit contains the following explanation of why insert-intention locks must not overtake a waiting ordinary or gap locks: "It is important that this decission rule doesn't allow INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`, `X`, `X|GAP`), as inserting a record into a gap would split such WAITING lock, violating the invariant that each transaction can have at most single WAITING lock at any time." I would add to the explanation the following. Suppose we have trx 1 which holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t" or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record. And then trx 1 wants to insert some record just before the locked record. It requests insert-intention lock, and if the lock overtakes trx 2 lock, there will be phantom records for trx 2 in RR. lock_delete_updated.test shows how "DELETE" allows to insert some records in already scanned gap and misses some records to delete. The current implementation differs from MySQL implementation. There are two key differences: 1. Lock queue ordering. In MySQL all waiting locks precede all granted locks. A new waiting lock is added to the head of the queue, a new granted lock is added to the end of the queue, if some waiting lock is granted, it's moved to the end of the queue. In MariaDB any new lock is added to the end of the queue and waiting lock does not change its position in the queue where the lock is granted. The rule is that blocking lock must be located before blocked lock in lock queue. We maintain the rule with inserting bypassing lock just before bypassed one. 2. MySQL implementation uses some object(locksys::Trx_locks_cache) which can be passed to consecutive calls to rec_lock_has_to_wait() for the same trx and heap_no to cache the result of checking if trx has a granted lock which is blocking the waiting lock(see locksys::Trx_locks_cache::has_granted_blocker()). The current implementation does not use such object, because it looks for such granted lock on the level of lock_rec_other_has_conflicting() and lock_rec_has_to_wait_in_queue(). I.e. there is no need in additional lock queue iteration in locksys::Trx_locks_cache::has_granted_blocker(), as we already iterate it in lock_rec_other_has_conflicting() and lock_rec_has_to_wait_in_queue(). During the testing the following case was found. Suppose we have delete-marked record and going to do inplace insert into that delete-marked record. Usually we don't create explicit lock if there are no conlicting with not gap X-lock locks(see lock_clust_rec_modify_check_and_lock(), btr_cur_update_in_place()). The implicit lock will be converted to explicit one by demand. That can happen during INSERT, the not-gap S-lock can be acquired on searching for duplicates(see row_ins_duplicate_error_in_clust()), and, if delete-marked record is found, inplace insert(see btr_cur_upd_rec_in_place()) modifies the record, what is treated as implicit lock. But there can be a case when some transaction trx1 holds not-gap S-lock, another transaction trx2 creates waiting X-lock, and then trx2 tries to do inplace insert. Before the fix the waiting X-lock of trx2 would be conflicting lock, and trx1 would try to create explicit X-lock, what would cause deadlock, and one of the transactions whould be rolled back. But after the fix, trx2 waiting X-lock is not treated as conflicting with trx1 X-lock anymore, as trx1 already holds S-lock. If we don't create explicit lock, then some other transaction trx3 can create it during implicit to explicit lock conversion and place it at the end of the queue. So there can be the following locks order in the queue: S1(granted) X2(waiting) X1(granted) The above queue is not valid, because all granted trx1 locks must be placed before waiting trx2 lock. Besides, lock_rec_release_try() can remove S(granted, trx1) lock and grant X lock to trx 2, and there can be two granted X-locks on the same record: X2(granted) X1(granted) Taking into account that lock_rec_release_try() can release cell and lock_sys latches leaving some locks unreleased, the queue validation function can fail in any unexpected place. It can be fixed with two ways: 1) Place explicit X(granted, trx1) lock before X(waiting, trx2) lock during implicit to explicit lock conversion. This option is implemented in MySQL, as granted lock is always placed at the top of locks queue, and waiting locks are placed at the bottom of the queue. MariaDB does not do this, and implementing this variant would require conflicting locks search before converting implicit to explicit lock, what, in turns, would require cell and/or lock_sys latch acquiring. 2) Create and place X(granted, trx1) lock before X(waiting, trx2) during inplace INSERT, i.e. when lock_rec_lock() is invoked from lock_clust_rec_modify_check_and_lock() or lock_sec_rec_modify_check_and_lock(), if X(waiting, trx2) is bypassed. Such a way we don't need in additional conflicting locks search, as they are searched anyway in lock_rec_low(). This fix implements the second variant(see the changes around c_lock_info.insert_after in lock_rec_lock). I.e. if some record was delete-marked and we do inplace insert in such a record, and some lock for bypass was found, create explicit lock to avoid conflicting lock search on each implicit to explicit lock conversion. We can remove it if MDEV-35624 is implemented. lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue(): search locks to bypass along with conflicting locks searching in the same loop. The result is returned in conflicting_lock_info object. There can be several locks to bypass, only the first one is returned to limit lock_rec_find_similar_on_page() with the first bypassed lock to preserve "blocking before blocked" invariant. conflicting_lock_info also contains a pointer to the lock, after which we can insert bypassing lock. This lock precedes bypassed one. Bypassing lock can be next-key lock, and the following cases are possible: 1. S1(not-gap, granted) II2(granted) X3(waiting for S1), When new X1(ordinary) lock is acquired, there will be the following locks queue: S1(not-gap, granted) II2(granted) X1(ordinary, granted) X3(waiting for S1) If we had inserted new X1 lock just after S1, and S1 had been released on transaction commit or rollback, we would have the following sequence in the locks queue: X1(ordinary, granted) II2(granted) X3(waiting for X1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is not a real issue as II lock once granted can be ignored but it could possibly hit some assert(taking into account that lock_release_try() can release lock_sys latch, and other threads can acquire the latch and validate lock queue) as it breaks our design constraint that any granted lock in the queue should not conflict with locks ahead in the queue. But lock_rec_queue_validate() does not check the above contraint. We place new bypassing lock just before bypassed one, but there still can be the case when lock bitmap is used instead of creating new lock object(see lock_rec_add_to_queue() and lock_rec_find_similar_on_page()), and the lock, which owns the bitmap, can precede II2(granted). We can either disable lock_rec_find_similar_on_page() space optimization for bypassing locks or treat "X1(ordinary, granted) II2(granted)" sequence as valid. As we don't currently have the function which would fail on the above sequence, let treat it as valid for the case, when lock_release() execution is in process. 2. S1(ordinary, granted) II2(waiting for S1) X3(waiting for S1) When new X1(ordinary) lock is acquired, there will be the following locks queue: S1(ordinary, granted) II2(waiting for S1) X1(ordinary, granted) X3(waiting for S1). After S1 releasing there will be: II2(granted) X1(ordinary, granted) X3(waiting for S1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The above queue is valid because ordinary lock does not conflict with II-lock(see lock_rec_has_to_wait()). lock_rec_create_low(): insert new lock to the position which lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue() return if the lock is bypassing. lock_rec_find_similar_on_page(): add ability to limit similiar lock search with the certain lock to preserve "blocking before blocked" invariant for all bypassed locks. lock_rec_add_to_queue(): don't treat bypassed locks as waiting ones to let lock bitmap reusing for bypassing locks. lock_rec_lock(): fix inplace insert case, explained above. lock_rec_dequeue_from_page(), lock_rec_rebuild_waiting_queue: move bypassing lock to the correct place to preserve "blocking before blocked" invariant.

mariadb-DebarunBanerjee

Hi @vlad-lesin,

This is a very nice improvement patch. I agree with the design. I have one key point about the type of locks we are bypassing and few other minor comments. Please have look.

-Deb

storage/innobase/include/hash0hash.h

mariadb-DebarunBanerjee · 2025-01-06T13:51:26Z

storage/innobase/include/lock0types.h

+        static inline bool is_rec_granted_X_not_ii_gap(unsigned type_mode)
+        {
+          return (type_mode & (LOCK_INSERT_INTENTION | LOCK_GAP |
+                               LOCK_MODE_MASK)) == LOCK_X;
+        }
+
+        bool is_rec_granted_X_not_ii_gap() const {
+          return is_rec_granted_X_not_ii_gap(type_mode);
+        }


The lock could be in granted or waiting state as per the implementation. Not sure why the name has _granted_.

II lock and Gap lock are not X lock logically even if the type is LOCK_X. I think a logical function name would be better. We are essentially interested to find X record lock. So I suggest a simple name like is_rec_X_lock().

For the implementation, I would suggest a direct check for the specific lock type we are looking for. e.g.

static inline bool is_rec_X_lock(unsigned type_mode, bool is_supremum) { /* Allow X waiting lock and X lock without covering GAP */ type_mode &= ~(LOCK_REC_NOT_GAP | LOCK_WAIT); return !is_supremum && type_mode == LOCK_X ; } bool is_rec_X_lock(bool is_supremum) { ut_ad(!is_table()); return is_rec_X_lock(type_mode); }

I haven't tested this pseudo code. This is just to convey the idea and your consideration.

Yes, you are right about "granted" word. The function is called for different cases, when the lock can or can't be waiting.

Yes, logically it does not matter if II and Gap locks are X or S, and it's reflected in lock_rec_has_to_wait() function. But as well as we have X and S bits along with II and Gap bits in the same bits set, there would be some kind of ambiguity if we missed "not_ii_not_gap" suffix in the function name.

Good point about "ut_ad(!is_table())". I will add it. II-lock is also a gap lock, the correspondent code can be found in lock_rec_insert_check_and_lock(). It's enough just to check gap bit. Thus we can use existing is_rec_granted_exclusive_not_gap() function removing "granted" from the function's name. I would also add debug check for simultaneous II and GAP bits set. I wouldn't add is_supremum parameter to reuse existing function.

storage/innobase/include/lock0lock.h

mariadb-DebarunBanerjee · 2025-01-06T14:40:38Z

storage/innobase/lock/lock0lock.cc

  for (lock_t *lock= lock_sys_t::get_first(*cell, id); lock != in_lock;
       lock= lock_rec_get_next_on_page(lock))
    if (lock_rec_get_nth_bit(lock, heap_no))
-      return lock;
+      prev_lock= lock;



It is great to see that we have found and corrected this issue.

Is it possible to add a testcase that could catch the issue ? I understand this is the fix for MDEV-35708 which should affect the output of Information_schema.innodb_trx* and .innodb_lock*.

This is existing code but the construct looks dangerous without check for null/end of the list. This is the invalid/assert case if in_lock is not found in the list. Can we add the null check for the exit condition of the loop and add a debug assert ?

mariadb-DebarunBanerjee · 2025-01-06T15:25:34Z

storage/innobase/lock/lock0lock.cc

+      ut_d(if (!bypassed)
+        bypassed= lock;)


If @dr-m agrees, I would prefer the code in single line.
ut_d(if (!bypassed) bypassed= lock;)

mariadb-DebarunBanerjee · 2025-01-06T16:20:06Z

storage/innobase/lock/lock0lock.cc

+    if (bypass_mode && lock->trx == trx && !lock->is_gap() &&
+        !lock->is_waiting() && !lock->is_insert_intention() &&
+        lock_mode_stronger_or_eq(lock->mode(), LOCK_S))
+    {
+      has_s_lock_or_stronger= true;
+    }


The condition for setting has_s_lock_or_stronger repeats in multiple places. Can we have a inline function to capture the set in a single place ?

mariadb-DebarunBanerjee · 2025-01-07T04:24:00Z

mysql-test/suite/innodb/t/avoid_deadlock_with_blocked.test

+# A variant of the above scenario:
+# <con1, X REC_NOT_GAP, granted>,
+# <con1, X REC_NOT_GAP, granted>, <con2, S, waiting for con1>,
+# <con1, X REC_NOT_GAP, granted>, <con2, S, waiting for con1>, <con1, INSERT INTENTION, waiting for con1>
+# Expected: a deadlock, as INSERT INTENTION should not overtake locks on gap, to not slice them


Nice explanation. Can you please also echo some comment line(s) to result files ? It helps mapping the result file lines to the test case in case of a failure. Something like adding --echo before a comment line.

--echo # Test-2: A variant of the first scenario:

mariadb-DebarunBanerjee · 2025-01-07T04:56:00Z

mysql-test/suite/innodb/t/avoid_deadlock_with_blocked.test

+# A secenario, where con1 has to bypass two transactions:
+# <con1, S, granted>
+# <con1, S, granted> <con2, X, waiting>
+# <con1, S, granted> <con2, X, waiting> <con3, X, waiting>
+# Before MDEV-34877:
+# <con1, S, granted> <con2, X, waiting> <con3, X, waiting> <con1, X REC_NOT_GAP, waiting for con2>
+# After MDEV-34877:
+# <con1, S, granted> <con1, X REC_NOT_GAP, granted> <con2, X, waiting> <con3, X, waiting>


After reviewing the code, I checked another similar scenario.
<con1, S, granted> <con2, X, waiting> <con3, S, waiting><con1, X waiting>

con1

BEGIN; SELECT * FROM t1 LOCK IN SHARE MODE;

con2

BEGIN; SELECT * FROM t1 FOR UPDATE; // Waits

con3

BEGIN; SELECT * FROM t1 LOCK IN SHARE MODE; // Waits

con1

SELECT * FROM t1 FOR UPDATE;

con3

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Now, don't we want to avoid deadlock in such cases and allow con1 to directly acquire the X lock ? If we need to allow this we have to relax lock_t::can_be_bypassed() to bypass an S lock and also to not care if the bypassed lock waiting transaction is current transaction or not.

Even if we don't want to attempt to address this deadlock, I think it makes sense to add this testcase to capture the behaviour for such cases.

mariadb-DebarunBanerjee · 2025-01-07T05:09:53Z

mysql-test/suite/innodb/t/avoid_deadlock_with_blocked.test

+--connection con3
+  SET DEBUG_SYNC = 'lock_wait_before_suspend SIGNAL con3_will_wait';
+  --send SELECT * FROM t1 FOR UPDATE
+
+--connection con1
+  SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
+  SET DEBUG_SYNC = 'lock_wait_start SIGNAL con1_will_wait';
+  --send INSERT INTO t1 VALUES (0)
+
+--connection con2
+  SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
+  COMMIT;
+
+--connection con1
+  --reap
+  ROLLBACK;
+
+
+--connection con3
+  --error ER_LOCK_DEADLOCK
+  --reap


This is a very nice test ensuring II (INSERT INTENTION) lock doesn't try to bypass other locks and deadlock is expected here.

One thing about deadlock detection is we never want to guarantee the deadlock victim that would be chosen in case of a deadlock. What do you think ? Is it fine to assume con3 as deadlock victim here or should we try to handle the case if con1 is chosen as victim ?

vlad-lesin requested review from dr-m and mariadb-DebarunBanerjee December 25, 2024 09:31

vlad-lesin force-pushed the 10.6-MDEV-34877 branch from 90b051e to c08ed78 Compare December 26, 2024 09:51

cvicentiu added the MariaDB Corporation label Dec 31, 2024

mariadb-DebarunBanerjee reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10.6 mdev 34877 #3723

10.6 mdev 34877 #3723

vlad-lesin commented Dec 25, 2024

CLAassistant commented Dec 25, 2024

mariadb-DebarunBanerjee left a comment

mariadb-DebarunBanerjee Jan 6, 2025

vlad-lesin Jan 10, 2025

mariadb-DebarunBanerjee Jan 6, 2025

mariadb-DebarunBanerjee Jan 6, 2025

mariadb-DebarunBanerjee Jan 6, 2025

mariadb-DebarunBanerjee Jan 7, 2025

mariadb-DebarunBanerjee Jan 7, 2025

mariadb-DebarunBanerjee Jan 7, 2025

10.6 mdev 34877 #3723

Are you sure you want to change the base?

10.6 mdev 34877 #3723

Conversation

vlad-lesin commented Dec 25, 2024

CLAassistant commented Dec 25, 2024

mariadb-DebarunBanerjee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment