Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-34307 On startup, [FATAL] InnoDB: Page ... still fixed or dirty #3310

Merged
merged 1 commit into from
Jun 6, 2024

Conversation

dr-m
Copy link
Contributor

@dr-m dr-m commented Jun 5, 2024

  • The Jira issue number for this PR is: MDEV-34307

Description

buf_pool_invalidate(): Properly wait for os_aio_wait_until_no_pending_writes() to ensure so that there are no pending buf_page_t::write_complete() or buf_page_write_complete() operations. This will avoid a failure of buf_pool.assert_all_freed().

This bug should affect debug builds only. At this point, the buf_pool.flush_list should be clear and all changes should have been written out. The loop around buf_LRU_scan_and_free_block() should have eventually completed and freed all pages as soon as buf_page_t::write_complete() had a chance to release the page latches.

This regression was introduced in a55b951.

Release Notes

This only fixes a debug check and should have no impact on actual operations.

How can this PR be tested?

This was found while testing #3282, running mariadb-backup --prepare on HDD. This code should be covered by killing the server and restarting with a smaller innodb_buffer_pool_size, but it should be hard to hit the race condition that we are fixing here. See MDEV-34307 for an analysis of an rr replay trace of a failure.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@dr-m dr-m self-assigned this Jun 5, 2024
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

buf_pool_invalidate(): Properly wait for
os_aio_wait_until_no_pending_writes() to ensure so that there
are no pending buf_page_t::write_complete() or buf_page_write_complete()
operations. This will avoid a failure of buf_pool.assert_all_freed().

This bug should affect debug builds only. At this point, the
buf_pool.flush_list should be clear and all changes should have
been written out. The loop around buf_LRU_scan_and_free_block() should
have eventually completed and freed all pages as soon as
buf_page_t::write_complete() had a chance to release the page latches.

It is worth noting that buf_flush_wait() is working as intended.
As soon as buf_flush_page_cleaner() invokes
buf_pool.get_oldest_modification() it will observe that
buf_page_t::write_complete() had assigned oldest_modification_ to 1,
and remove such blocks from buf_pool.flush_list. Upon reaching
buf_pool.flush_list.count=0 the buf_flush_page_cleaner() will mark
itself idle and wake buf_flush_wait() by broadcasting
buf_pool.done_flush_list.

This regression was introduced in
commit a55b951 (MDEV-26827).

Reviewed by: Debarun Banerjee
@dr-m dr-m force-pushed the 10.6-MDEV-34307 branch from bf046df to bc36609 Compare June 6, 2024 07:22
@dr-m dr-m merged commit bc36609 into 10.6 Jun 6, 2024
14 of 19 checks passed
@dr-m dr-m deleted the 10.6-MDEV-34307 branch June 6, 2024 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants