Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add support for running migration in batches #7755

Open
wants to merge 39 commits into
base: epic/campaigns
Choose a base branch
from

Conversation

alaca
Copy link
Member

@alaca alaca commented Feb 21, 2025

Description

This PR adds support to run migration in batches using Action Scheduler.

Affects

Visuals

Testing Instructions

Pre-review Checklist

  • Acceptance criteria satisfied and marked in related issue
  • Relevant @unreleased tags included in DocBlocks
  • Includes unit tests
  • Reviewed by the designer (if follows a design)
  • Self Review of code and UX completed

Copy link
Contributor

@JasonTheAdams JasonTheAdams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the start, @alaca! A couple thoughts:

  1. I really want to use interfaces, but we're going to run into issues because we really should be using inheritance to for sync and async migrations, but interfaces can't enforce static methods until PHP 7.4.
  2. So for now, let's:
    • make a BaseMigration abstract class with the static id() and timestamp() methods
    • Change BatchMigration to an abstract class that extends this
    • Extend the Migration class with this
    • Use BaseMigration in the migrations runner and registrar
  3. We still want batch migrations to hold up the queue, as all migrations need to run in order, regardless of what kind of migration they are

Love seeing this! So cool!

@alaca
Copy link
Member Author

alaca commented Feb 24, 2025

@JasonTheAdams I refactored this PR completely. I also added support for resuming the update from the Migrations list table if some of the batch actions failed or were canceled for whatever reason.

@alaca
Copy link
Member Author

alaca commented Feb 24, 2025

There is one big potential issue with running migrations in batches. Batch migration will stop the Migration Runner until it processes all batches, which means migrations that are next in the queue won't be processed. The problem will occur if the next migrations are supposed to change the table structure or add new tables. We will get a bunch of errors if we want to use new properties from the "updated" table or query the table that is not created yet.

We have to make sure to run all batch migrations last. Luckily, there is an easy solution for this. So instead of using strtotime(date) for batch migration timestamp, we just have to use time(). This will ensure that all batch migrations run last.

@alaca
Copy link
Member Author

alaca commented Feb 24, 2025

@JasonTheAdams check this c620062
Our migration system was running without respecting migration order 😱

@JasonTheAdams JasonTheAdams self-requested a review February 24, 2025 15:50
Copy link
Contributor

@JasonTheAdams JasonTheAdams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor, @alaca! So much good work here! I left a bunch of comments. Hahah! You're getting my head going with this one, which is fun!

As a note, you mentioned concern over batch migrations causing issues if things like table mutations and the like aren't done in the right order. That's the very reason the migrations are intended to run in sort order, so this can be controlled. And if there's a batch migration that relies on a previous migration, then that should either have an earlier timestamp or else be included in the batch migration on the first step. In any case, I think this system is flexible enough to allow that concern to live in the implementing code, and not something we should try and tackle in the migration framework. And honestly, no, I would not recommend ever using a function like time() to create a dynamic return value, as that breaks the whole system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of notes/questions on this:

Namespace
The new RESTful routes in GiveWP are going to use the givewp/v3 namespace, and I believe exist in a V3 sub-domain — double-check with @jonwaldstein on this

Starting / Restarting Migrations
The concept of rescheduling failed actions is interesting. If I recall correctly, the system works such that only a single migration can actually be failed at a time, which then prevents subsequent migrations from running. So resuming migrations really means restarting the failed migration, and once it's successful then the subsequent migrations will follow. So they're not "failed", they're simply blocked.

This makes me wonder if we really need to introduce the concept of "failed" to the API, or if it's simply sufficient to say that a given migration can be "started". If it's an automatic migration then this is usually unnecessary, but can be triggered in the admin if necessary. For a batch migration, this may be a more necessary step — and it may not really matter whether the migration has yet to run or failed to run and needs to be attempted again.

RESTful API
This makes me think we could actually support a PUT/PATCH for the migration resource:

PATCH migrations/123
{
    "status": "run"
}

This makes it RESTful, and would even allow for JS Entities for interacting with the migration. Status would be a simple property of the resource, and the admin can change it to only this one status — wherein we also check that the existing status is in a state that's able to be switched to run (as opposed to try to run a running migration).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, the system works such that only a single migration can actually be failed at a time, which then prevents subsequent migrations from running

Are you talking about our migration system or AS?

This is true for our migration system, but not for AS.

The thing with Action Scheduler is that it doesn't care about failed/canceled actions. AS runs actions concurrently so there can be many failed/canceled actions at the same time, but that doesn't stop AS from running the rest. Once an action is failed or canceled, AS will not try to run it again. Actually, it will delete all of them after some time. Also, devs behind AS are recommending if the action fails, but you need that action to run, you should reschedule it.

I believe we can't really leave any orphaned data like in this example - Assign donations to a campaign. That's why I introduced this. Also, if batch migration has failed/canceled actions, it's not marked as failed, it is marked as incomplete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about the migration system. I guess we do need to think about what happens if batch migration enters into a partial state — wherein some of the steps have succeeded and others have failed. 🤔

To be clear, at no point should two migrations be running — even if they're batch migrations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, at no point should two migrations be running — even if they're batch migrations.

Yeah, that is not happening. Not even in our current broken system 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to hop on a call, at a time that isn't interrupting work, and think through this a bit more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, I wish we were using entities here... 😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, me too 😅

*
* @unreleased
*/
abstract class BaseMigration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be useful to add another method or a class constant:

public static function startAutomatically(): bool {
    return true;
}

In the not too distant future we'll have migrations that we want to prompt the user to begin. Those migrations should set this to false. All it does is that if the MigrationsRunner encounters this as false, then it simply stops and doesn't proceed.

No need to build a UI or anything for this now, as it's outside of the scope of what you're doing. But it came to mind as something we'll want in this class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I say all that, maybe it's best to not introduce this now and instead introduce it when we build the UI, later... 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a simple change, but yeah, we can add that later.

@alaca
Copy link
Member Author

alaca commented Feb 24, 2025

As a note, you mentioned concern over batch migrations causing issues if things like table mutations and the like aren't done in the right order. That's the very reason the migrations are intended to run in sort order, so this can be controlled. And if there's a batch migration that relies on a previous migration, then that should either have an earlier timestamp or else be included in the batch migration on the first step.

Make sense. We should be extra careful with timestamps then. I was setting a dynamic timestamp as a safe bet, so I don't have to worry about it. And it kinda makes sense. I mean, the purpose of batch migration is to move/update data and the logical order for that action is to be the last step.

@JasonTheAdams
Copy link
Contributor

Make sense. We should be extra careful with timestamps then. I was setting a dynamic timestamp as a safe bet, so I don't have to worry about it. And it kinda makes sense. I mean, the purpose of batch migration is to move/update data and the logical order for that action is to be the last step.

Exactly. The best way to think of the timestamp is as a unique and sortable identifier across all systems (core plus add-ons). Something like setting a simple step integer (1, 2, 3) would be gnarly to keep track of. Instead, this relies on the simple fact that when the dev builds the migration they're making the assumption of the shape of the database at that moment in time. So by setting the time to that moment, it locks that migration into that step of the process. Make sense?

@alaca
Copy link
Member Author

alaca commented Feb 24, 2025

So by setting the time to that moment, it locks that migration into that step of the process. Make sense?

Yes, it makes perfect sense.

Let's imagine this scenario:
The user is running Give 3.21.1, but the most recent Give version is 4.0.
There are many versions between these two where we introduced many new features, new tables, models, etc.
The user decides to update to 4.0
What will happen?

You mentioned that using new tables, models, etc. should be an implementation detail. I agree, but we will have to constantly do checks if something is implemented. That's why batch migrations are a little bit different than regular migrations. Batch migrations shouldn't stop migration runner if some of the migration actions are failed, and they should run at the end when everything is already set.

@alaca
Copy link
Member Author

alaca commented Feb 25, 2025

@JasonTheAdams I believe we are close to the finish line. I really like we are passing a range of IDs to each batch which makes me feel better knowing no data will be left behind 😅
The are a couple of minor things I want to discuss, mainly used wording and the last migration step, but overall, I'm pleased 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants