Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow 2 to 3 auto migration rules #41641

Open
Lee-W opened this issue Aug 21, 2024 · 57 comments
Open

Airflow 2 to 3 auto migration rules #41641

Lee-W opened this issue Aug 21, 2024 · 57 comments
Assignees
Labels
airflow3.0:candidate Potential candidates for Airflow 3.0 area:upgrade Facilitating migration to a newer version of Airflow kind:feature Feature Requests
Milestone

Comments

@Lee-W
Copy link
Member

Lee-W commented Aug 21, 2024

Description

Why

As we're introducing breaking changes to the main branch, it would be better to begin recording the things we could use migration tools to help our users migrate from Airflow 2 to 3.

The breaking changes can be found at https://github.com/apache/airflow/pulls?q=is%3Apr+label%3Aairflow3.0%3Abreaking and through newsfragments/.*.significant.rst

What

Sub-issues

List of significant news fragments and rules between `44080` and `45017`

List of significant news fragments and rules before `44080`

The following rules has been reorganized and merged into #44556 and #44555

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Lee-W Lee-W added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet airflow3.0:candidate Potential candidates for Airflow 3.0 area:dev-tools area:upgrade Facilitating migration to a newer version of Airflow and removed needs-triage label for new issues that we didn't triage yet area:dev-tools labels Aug 21, 2024
@Lee-W Lee-W added this to the Airflow 3.0.0 milestone Aug 21, 2024
@Lee-W
Copy link
Member Author

Lee-W commented Aug 21, 2024

The Rules now is an example of how these changes can be recorded. I will check the existing breaking changes and update the rules. It would be great if folks could help update this list if you know there are breaking changes.

@potiuk
Copy link
Member

potiuk commented Aug 21, 2024

I pinned the issue - this way it will show up at the top of "Issues" list in the repo

@potiuk
Copy link
Member

potiuk commented Aug 21, 2024

image

@Lee-W
Copy link
Member Author

Lee-W commented Aug 21, 2024

Thanks!

@eladkal
Copy link
Contributor

eladkal commented Aug 24, 2024

We can just go over all the significant newsfragments and create a rule for them or keep some reasoning why it doesn't require one

@kaxil
Copy link
Member

kaxil commented Oct 24, 2024

We should add something for the public API change too. API v1 won't work anymore. Those are being changed as part of AIP-84 to a new FastApi based app. GitHub project for it: https://github.com/orgs/apache/projects/414

@pierrejeambrun
Copy link
Member

Issue here to regroup Rest API breaking changes #43378

@tirkarthi
Copy link
Contributor

I have started prototyping a small package based on LibCST to build a Python 2to3 like tool for Airflow 2to3 that does simple and straight forward replacements. My main motivation was around lot of our users in our Airflow instance using schedule_interval in Airflow 2 that was deprecated and renamed to schedule in Airflow 3. It would require updating thousands of dags manually and some automation could help. This could also help in places with import statements changes .E.g. Task SDK need to be updated from from airflow import DAG to from airflow.sdk import DAG. Something like this could eventually become part of Airflow cli so that users can run airflow migrate /airflow/dags for migration or serve as a starter point for migration. It can update the file in place or show diff. Currently it does the following changes :

Dags

  • schedule_interval -> schedule
  • timetable -> schedule
  • concurrency -> max_active_tasks
  • Removal of unused full_filepath parameter
  • default_view (tree -> grid)

Operators

  • task_concurrency -> max_active_tis_per_dag
  • trigger_rule (none_failed_or_skipped -> none_failed_min_one_success)

Sample file

import datetime

from airflow import DAG
from airflow.decorators import dag, task
from airflow.operators.empty import EmptyOperator
from airflow.timetables.events import EventsTimetable


with DAG(
    dag_id="my_dag_name",
    default_view="tree",
    start_date=datetime.datetime(2021, 1, 1),
    schedule_interval="@daily",
    concurrency=2,
):
    op = EmptyOperator(
        task_id="task", task_concurrency=1, trigger_rule="none_failed_or_skipped"
    )


@dag(
    default_view="graph",
    start_date=datetime.datetime(2021, 1, 1),
    schedule_interval=EventsTimetable(event_dates=[datetime.datetime(2022, 4, 5)]),
    max_active_tasks=2,
    full_filepath="/tmp/test_dag.py"
)
def my_decorated_dag():
    op = EmptyOperator(task_id="task")


my_decorated_dag()

Sample usage

python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 tests/test_dag.py
Calculating full-repo metadata...
Executing codemod...
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.
--- /home/karthikeyan/stuff/python/libcst-tut/tests/test_dag.py
+++ /home/karthikeyan/stuff/python/libcst-tut/tests/test_dag.py
@@ -10,6 +10,6 @@
     dag_id="my_dag_name",
-    default_view="tree",
+    default_view="grid",
     start_date=datetime.datetime(2021, 1, 1),
-    schedule_interval="@daily",
-    concurrency=2,
+    schedule="@daily",
+    max_active_tasks=2,
 ):
@@ -23,5 +23,4 @@
     start_date=datetime.datetime(2021, 1, 1),
-    schedule_interval=EventsTimetable(event_dates=[datetime.datetime(2022, 4, 5)]),
+    schedule=EventsTimetable(event_dates=[datetime.datetime(2022, 4, 5)]),
     max_active_tasks=2,
-    full_filepath="/tmp/test_dag.py"
 )
Finished codemodding 1 files!
 - Transformed 1 files successfully.
 - Skipped 0 files.
 - Failed to codemod 0 files.
 - 0 warnings were generated.

Repo : https://github.com/tirkarthi/Airflow-2to3

@potiuk
Copy link
Member

potiuk commented Oct 27, 2024

NICE! @tirkarthi -> you should start a thread about it at devlist and propose adding it to the repo. The sooner we start working on it and let poeple test it, the better it will be. And we can already start adding not only the newsfragments but also rules to the migration tools (cc: @vikramkoka @kaxil ) - we can even think about keeping a database of old-way-dags and running such migration tool on them and letting airflow scheduler from Airflow 3 process them (and maybe even execute) as part of our CI. This would tremendously help with maintaining and updating such a tool if we will make it a part of our CI pipeline.

@potiuk
Copy link
Member

potiuk commented Oct 27, 2024

BTW. I like it a lot how simple it is with libCST - we previously used quite a bit more complex tool from Facebook that allowed to do refactoring at scale in parallell (https://github.com/facebookincubator/Bowler) , but it was rather brittle to develop rules for it and it had some weird problems and missing features. One thing that was vere useful - is that it had a nice "parallelism" features - which allowed to refactor 1000s of files in seconds (but also made it difficult to debug).

I think if we get it working with libCST - it will be way more generic and maintainable, also we can easily add parallelism later on when/if we see it is slow.

@potiuk
Copy link
Member

potiuk commented Oct 27, 2024

One small watchout though - such a tool should have a way to isolate rules - so that they are not in a single big method - some abstraction that will allow us to easily develop and selectively apply (or skip) different rules - see https://github.com/apache/airflow/tree/v1-10-test/airflow/upgrade where we have documentation and information about the upgrade check we've done in Airflow 1 -> 2 migration.

Also we have to discuss, whether it should be a separate repo or whether it should be in airflow's monorepo. Both have pros and cons - in 1.10 we chose to keep it 1.10 branch of airflow, because it imported some of the airflow code and it was easier, but we could likely create a new repo for it, add CI there and keep it there.

We even have this archived repo https://github.com/apache/airflow-upgrade-check which we never used and archived, we could re-open it. We also have https://pypi.org/project/apache-airflow-upgrade-check/ - package in PyPI - and we could release new upgrade check versions (2.* ?) with "apache-airflow>=2.11.0" as dependency.

All that should likely be discussed at devlist :)

@tirkarthi
Copy link
Contributor

tirkarthi commented Oct 27, 2024

Thanks @potiuk for the details. I will start a discussion on this at the devlist and continue there. Bowler looks interesting. Using libcst.tool from cli parallelizes the process. Right now this needs python -m libcst.tool to execute it as a codemod. Initially I had designed them as standalone Transformer for each category like (dag, operator) where the updated AST from one transformer can be passed to another. The codemod looked like a recommended abstraction for running it and changed it that way to later find cli accepts only one codemod at a time. I need to check how composable they are.

python -m libcst.tool codemod --help | grep -i -A 1 'jobs JOBS'
  -j JOBS, --jobs JOBS  Number of jobs to use when processing files. Defaults to number of cores

time python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 ~/airflow/dags > /dev/null 2>&1 
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 ~/airflow/dags >  
6.95s user 0.61s system 410% cpu 1.843 total

# Single core
time python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 1 ~/airflow/dags > /dev/null 2>&1
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 1  > 
/dev/nul  4.66s user 0.38s system 99% cpu 5.035 total

# 4 core
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 4 ~/airflow/dags > /dev/null 2>&1
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 4  > 
/dev/nul  5.45s user 0.54s system 253% cpu 2.358 total

@potiuk
Copy link
Member

potiuk commented Oct 27, 2024

Bowler looks interesting.

Don't be deceived by it :).

It was helpful for Provider's migration at some point in time, but I had many rough edges - like debugging a problem was a nightmare until we learned how to do it properly, also it had some annoying limitations - you had to learn a completely new non-standard abstractions (an SQLAlchemy-like DSL to perform modifications) - which did not cover all the refactorings we wanted to do. We had to really dig-deep into the code an find some workarounds for things we wanted to do, when authors of Bowler have not thoght about them. And sometimes those were nasty workarounds.

query = (
    Query(<paths to modify>)
    .select_function("old_name")
    .rename("new_name")
    .diff(interactive=True)
)

Example that I remember above is that we could not rename some of the object types easily because it was not "foreseen" (can't remember exactly) - we had a few surprises there.

Also Bowler seems to be not maintained for > 3 years and it means that it's unlikely to handle some constructs even in 3.9+ Airflow.

What I like about libcst is that it is really "low-level" interface that you have to program in Python rather than in abstract DSL - similar to "ast". You write actual python code to perform what you want to perform rather than rely on incomplete abstractions, even if you have to copy&paste rename code between different "rules" (for example) (which you can then abstract away as 'common` util if you need, so no big deal).

@potiuk
Copy link
Member

potiuk commented Oct 27, 2024

BTW. Codemod .... is also 5 years not maintained. Not that it is disqualification - but they list python2 as their dependency ... so .....

@Lee-W
Copy link
Member Author

Lee-W commented Oct 28, 2024

I tried to use libcst in airflow as a tiny POC of this issue here

. It mostly works great except for its speed. I was also thinking about whether to add these migrations thing info ruff airflow linter but not yet explore much on the rust/ruff side.

@potiuk
Copy link
Member

potiuk commented Oct 28, 2024

👀 👀 rust project :) ...

Me ❤️ it (but I doubt we want to invest in it as it might be difficult to maintain, unless we find quite a few committers who are somewhat ruff profficient to at least be able to review the code) . But it's tempting I must admit.

But to be honest - while I'd love to finally get a serious rust project, it's not worth it I think we are talking of one-time migration for even a 10.000 dags it will take at most single minutes and we can turn it maybe in under one minute with rust - so not a big gain for a lot of pain :) . Or at lest this is what my intuition tells me.

I think parallelism will do the job nicely. My intuition tells me (but this is just intuition and understanding on some limits ans speed of certain operation) - that we will get from multiple 10s of minutes (when running such migration sequentially) to single minutes when we allow to run migration in parallel using multiple processors and processes - even with Python and libcst. This task is really suitable for such parallelisation because each file is complete, independent task that can be run in complete isolation from all other tasks - so spawning multiple paralllel interpreters, ideally forking them right after all the imports and common code is loaded so that they use shared memory for those - this should do the job nicely (at least intuitively).

Using RUST for that might be classic premature optimisation - we might likely not need it :). But would be worth to make some calculations and get some "numbers" for big installation - i.e. how many dags of what size are out there, and how long it will be to parse them all with libcst and write back (even unmodified or with a simple modification). I presume that parsing and writing back will be the bulk of the job - and modifications will add very little overhead as they will be mostly operating on in memory data structures.

@Lee-W
Copy link
Member Author

Lee-W commented Oct 30, 2024

Me ❤️ it (but I doubt we want to invest in it as it might be difficult to maintain, unless we find quite a few committers who are somewhat ruff profficient to at least be able to review the code) . But it's tempting I must admit.

But to be honest - while I'd love to finally get a serious rust project, it's not worth it I think we are talking of one-time migration for even a 10.000 dags it will take at most single minutes and we can turn it maybe in under one minute with rust - so not a big gain for a lot of pain :) . Or at lest this is what my intuition tells me.

Yep, totally agree. I just want to raise this idea which might be interesting. 👀

I presume that parsing and writing back will be the bulk of the job - and modifications will add very little overhead as they will be mostly operating on in memory data structures.

Yep, I think you're right. My previous default deferrable script took around 10 sec to process ~400 operators. Using ast for checking took around 1 sec

@vincbeck
Copy link
Contributor

I am not sure #42042. I moved out one property from BaseUser but I dont users are using this class directly.

@potiuk
Copy link
Member

potiuk commented Nov 22, 2024

There is no straightforward way to do it. Indeed, it can be even a standalone application communicating with the Airflow API without other interactions. It could be even in different programming languages such as go or scheduled/management bash scripts in CI.

I think we can only do it reasonably well if we assume the user uses Python Cliant of ours and then we should be able to say the users they could run their custom code through Ruff with the new python client installed to detect wrong parameters. Not sure if we need to have custom ruff rules for those changes, or maybe it's "mypy" kind of check for types ? I know astral works on a mypy replacement as well, so there is a chance that we will get mypy checks from Astral befor we publish the tool (or we could use mypy for now if needed. Some quick check on new/old client with some test code for that might be useful.

For the rest of the clients, I think what we could also do - potentially - is to have a custom error code in fast API - where we wil handle errors generated when "known old-style requests" are issued to the new API and return pretty descriptive error " You are using old-style API ble, ble, ble ... you need to change your paremeters to .... ." - maybe we can generalize it in our API in the way to handle "typical" mistakes from multiple APIs by the same error handler?

@bugraoz93
Copy link
Collaborator

I think we can only do it reasonably well if we assume the user uses Python Cliant of ours and then we should be able to say the users they could run their custom code through Ruff with the new python client installed to detect wrong parameters. Not sure if we need to have custom ruff rules for those changes, or maybe it's "mypy" kind of check for types ? I know astral works on a mypy replacement as well, so there is a chance that we will get mypy checks from Astral befor we publish the tool (or we could use mypy for now if needed. Some quick check on new/old client with some test code for that might be useful.

I agree, we should make this assumption and limit the check to a reasonable scope. I like the idea of restricting it so that only our Python Client will be affected. Otherwise, it could turn into a project of its own. :)

For the rest of the clients, I think what we could also do - potentially - is to have a custom error code in fast API - where we wil handle errors generated when "known old-style requests" are issued to the new API and return pretty descriptive error " You are using old-style API ble, ble, ble ... you need to change your paremeters to .... ." - maybe we can generalize it in our API in the way to handle "typical" mistakes from multiple APIs by the same error handler?

I was considering a similar approach, returning an error response if the old request is provided but I wasn’t entirely sure about the scope. If the goal is to catch these issues before upgrading the version, I am unsure how we can easily provide that. Simply reading the API changes documentation seems easier than creating a transition API and asking users to route calls through it to catch errors. Otherwise, such responses would indicate that their processes have already failed with an error.
If the goal is also to warn users after upgrading the version, then this is the way to go for me too.
I am just trying to better understand the scope of when and how we want to warn users.

@potiuk
Copy link
Member

potiuk commented Nov 24, 2024

I was considering a similar approach, returning an error response if the old request is provided but I wasn’t entirely sure
about the scope.

Yeah. Maybe we can do something in Airlfow 2.11 ? Since our goal is that 2.11 should be the "bridge" release - maybe we could do a variation of my original proposal - see if the message coming is in "old" format and raise a deprecation and also manually implement "new" format there (that would likely require some manual modificaiton of the openapi specification there and some conditional code in 2.11 (if possible).

That could follow our pattern of "Make sure 2.11 raises no warnings and then migration to 3 should be smooth".

@pierrejeambrun
Copy link
Member

For the rest of the clients, I think what we could also do - potentially - is to have a custom error code in fast API - where we wil handle errors generated when "known old-style requests" are issued to the new API and return pretty descriptive error " You are using old-style API ble, ble, ble ... you need to change your paremeters to .... ." - maybe we can generalize it in our API in the way to handle "typical" mistakes from multiple APIs by the same error handler?

Indeed upgrading the client will automatically highlights type errors in users code.

For people bypassing the python client and making direct request to the API (or any other non-python system), we can indeed catch errors of such breaking change and return a clear message that this is not accepted anymore, and maybe even better give them the new way of how to achieve this. It's just more work but possible.

Otherwise reading the significant newsfragment for the RestAPI would be a good start when migrating their API code.

@potiuk
Copy link
Member

potiuk commented Nov 26, 2024

For people bypassing the python client and making direct request to the API (or any other non-python system), we can indeed catch errors of such breaking change and return a clear message that this is not accepted anymore, and maybe even better give them the new way of how to achieve this. It's just more work but possible.

Just to repeat above - yes. I think it's more work, and I think it might require some nasty workarounds in FAST API that we will have to keep forever, but maybe we can do a 2.11-only change that will raise warnings if the old way is used instead (and allow to use new way) ? Not sure if possible and how many such breaking channges we will have, but it would really be nice to tell the users "if you have no warnings on 2.11, you are good to go".

@Lee-W
Copy link
Member Author

Lee-W commented Nov 26, 2024

I am not sure #42042. I moved out one property from BaseUser but I dont users are using this class directly.

After reading it again, I don't think users are using it either. 🤔 Then I'll just remove it. Thanks for checking it!

@Lee-W
Copy link
Member Author

Lee-W commented Nov 29, 2024

For people bypassing the python client and making direct request to the API (or any other non-python system), we can indeed catch errors of such breaking change and return a clear message that this is not accepted anymore, and maybe even better give them the new way of how to achieve this. It's just more work but possible.

Just to repeat above - yes. I think it's more work, and I think it might require some nasty workarounds in FAST API that we will have to keep forever, but maybe we can do a 2.11-only change that will raise warnings if the old way is used instead (and allow to use new way) ? Not sure if possible and how many such breaking channges we will have, but it would really be nice to tell the users "if you have no warnings on 2.11, you are good to go".

I just tried to follow the API discussion 🙌 so we're now doing

  1. warning if they're using airflow python client
  2. return an error and guide in FastAPI

but wouldn't it be easier for us to do only the second one and solve it all at once?

Should we trace those API changes only in #43378?

@Lee-W
Copy link
Member Author

Lee-W commented Nov 29, 2024

Hi @kaxil ,

would like to confirm with you on the following rules. Does it make sense for us to block import from airflow.executors.* and airflow.hook.*? (related PRs #43289, #43291

@kaxil
Copy link
Member

kaxil commented Nov 29, 2024

No, no. We need to block a user from passing executor, operator, sensors and hooks when they inherit AirflowPlugin.

class AirflowTestPlugin(AirflowPlugin):
    name = "test_plugin"
    # --- Invalid now
    operators = [PluginOperator]
    sensors = [PluginSensorOperator]
    hooks = [PluginHook]
    executors = [PluginExecutor]
    # --- Invalid now ^^^
    macros = [plugin_macro]
    flask_blueprints = [bp]
    appbuilder_views = [v_appbuilder_package]
    appbuilder_menu_items = [appbuilder_mitem, appbuilder_mitem_toplevel]
    global_operator_extra_links = [
        AirflowLink(),
        GithubLink(),
    ]
    operator_extra_links = [GoogleLink(), AirflowLink2(), CustomOpLink(), CustomBaseIndexOpLink(1)]
    timetables = [CustomCronDataIntervalTimetable]
    listeners = [empty_listener, ClassBasedListener()]
    ti_deps = [CustomTestTriggerRule()]
    priority_weight_strategies = [CustomPriorityWeightStrategy]

Ref:

@Lee-W
Copy link
Member Author

Lee-W commented Nov 29, 2024

@uranusjr and I previously discussed how to handle configuration migration. We believe it might be better to manage this process within Airflow itself, rather than using ruff. However, for other code-related changes, we will continue to leverage ruff.

Today, @sunank200 and I spent a significant amount of time listing all the rules we could think of for the important files before #44040. We also noticed that the standard provider was not included in the news fragment. As a result, I created this #44482.

TLDR, we're spliting the list into

note that except for AIR301 (and probably AIR302?), other numbers are not confirmed. just something we're considering and will need to dicuss with the ruff team.


in this full list, #41348 is treated as a special category as it's easier for me to trace.

the full list ## airflow config

Removal

Rename

Ruff

AIR302: removal

package

module

class

function

constant / variable

attribute

parameter

context key

AIR303: rename

#41348

  • module airflow.datasetsairflow.sdk.definitions.asset
    • class
      • DatasetAliasAssetAlias
      • DatasetAllAssetAll
      • DatasetAnyAssetAny
    • function
      • expand_alias_to_datasetsexpand_alias_to_assets
    • class DatasetAliasEventAssetAliasEvent
      • attribute dest_dataset_uriBaseAsset
    • class
      • BaseDatasetBaseAsset
      • DatasetAsset
      • method
        • iter_datasetsiter_assets
        • iter_dataset_aliasesiter_asset_aliases
  • module airflow.datasets.managerairflow.assets.manager
    • variable dataset_managerasset_manager
    • function resolve_dataset_managerresolve_asset_manager
    • class DatasetManagerAssetManager
      • method
        • register_dataset_changeregister_asset_change
        • create_datasetscreate_assets
        • register_dataset_changenotify_asset_created
        • notify_dataset_changednotify_asset_changed
        • notify_dataset_alias_creatednotify_asset_alias_created
  • module airflow.listeners.spec.datasetairflow.listeners.spec.asset
    • function
      • on_dataset_createdon_asset_created
      • on_dataset_changedon_asset_changed
  • module airflow.timetables.datasetsairflow.timetables.assets
    • class DatasetOrTimeScheduleAssetOrTimeSchedule
  • class airflow.lineage.hook.DatasetLineageInfoairflow.lineage.hook.AssetLineageInfo
    • attribute datasetasset
  • package airflow.providers.amazon.aws.datasetsairflow.providers.amazon.aws.assets
    • in module s3
      • method create_datasetcreate_asset
      • method convert_dataset_to_openlineageconvert_asset_to_openlineage
  • package airflow.providers.common.io.datasetsairflow.providers.common.io.assets
    • in module file
      • method create_datasetcreate_asset
      • method convert_dataset_to_openlineageconvert_asset_to_openlineage
  • package
    • airflow.providers.postgres.datasetsairflow.providers.postgres.assets
    • airflow.providers.mysql.datasetsairflow.providers.mysql.assets
    • airflow.providers.trino.datasetsairflow.providers.trino.assets
  • module
    • airflow.datasets.metadataairflow.sdk.definitions.asset.metadata
  • class
    • airflow.timetables.datasets.DatasetOrTimeScheduleairflow.timetables.assets.AssetOrTimeSchedule
    • airflow.auth.managers.models.resource_details.DatasetDetailsairflow.auth.managers.models.resource_details.AssetDetails
    • airflow.timetables.simple.DatasetTriggeredTimetableairflow.timetables.simple.AssetTriggeredTimetable
    • airflow.providers.openlineage.utils.utils.DatasetInfoairflow.providers.openlineage.utils.utils.AssetInfo
  • method
    • airflow.providers.amazon.auth_manager.aws_auth_manager.AwsAuthManager.is_authorized_datasetairflow.providers.amazon.auth_manager.aws_auth_manager.AwsAuthManager.is_authorized_asset
    • airflow.lineage.hook.HookLineageCollector.create_datasetairflow.lineage.hook.HookLineageCollector.create_asset
    • airflow.lineage.hook.HookLineageCollector.add_input_datasetairflow.lineage.hook.HookLineageCollector.add_input_asset
    • airflow.lineage.hook.HookLineageCollector.add_output_datasetairflow.lineage.hook.HookLineageCollector.dd_output_asset
    • airflow.lineage.hook.HookLineageCollector.collected_datasetsairflow.lineage.hook.HookLineageCollector.collected_assets
    • airflow.providers_manager.ProvidersManager.initialize_providers_dataset_uri_resourcesairflow.providers_manager.ProvidersManager.initialize_providers_asset_uri_resources
  • function
    • airflow.api_connexion.security.requires_access_datasetairflow.api_connexion.security.requires_access_dataset.requires_access_asset
    • airflow.auth.managers.base_auth_manager.is_authorized_datasetairflow.auth.managers.base_auth_manager.is_authorized_asset
    • airflow.www.auth.has_access_datasetairflow.www.auth.has_access_dataset.has_access_asset
    • airflow.providers.fab.auth_manager.fab_auth_manager.is_authorized_datasetairflow.providers.fab.auth_manager.fab_auth_manager.is_authorized_asset
    • airflow.providers.openlineage.utils.utils.translate_airflow_datasetairflow.providers.openlineage.utils.utils.translate_airflow_asset
  • property
    • airflow.providers_manager.ProvidersManager.dataset_factoriesairflow.providers_manager.ProvidersManager.asset_factories
    • airflow.providers_manager.ProvidersManager.dataset_uri_handlersairflow.providers_manager.ProvidersManager.asset_uri_handlers
    • airflow.providers_manager.ProvidersManager.dataset_to_openlineage_convertersairflow.providers_manager.ProvidersManager.asset_to_openlineage_converters
  • constant / variable
    • airflow.security.permissions.RESOURCE_DATASETairflow.security.permissions.RESOURCE_ASSET
    • airflow.providers.amazon.auth_manager.avp.entities.AvpEntities.DATASETairflow.providers.amazon.auth_manager.avp.entities.AvpEntities.ASSET
  • context key
    • triggering_dataset_eventstriggering_asset_events
  • resource key
    • dataset-urisasset-uris (for providers amazon, common.io, mysql, fab, postgres, trino)

class

function

parameter

AIR304: moved to provider

module

class

function

constant / variable

AIR310: models related changes (AIP-72) not going to do it


@sunank200
Copy link
Collaborator

Created the PR for airflow config lint : #44908

@Lee-W
Copy link
Member Author

Lee-W commented Jan 2, 2025

Most of the rules have now been added to ruff or airflow config lint. We currently have two open PRs waiting for the ruff team to review astral-sh/ruff#15144 and astral-sh/ruff#15216. One rule not yet included in the previous PRs is blocked by astral-sh/ruff#15144, but it could be done no longer after astral-sh/ruff#15144 is merged.

@jscheffl
Copy link
Contributor

jscheffl commented Jan 2, 2025

@sunank200 / @Lee-W COOL!

@potiuk
Copy link
Member

potiuk commented Jan 2, 2025

Nice. Glad to see Astral team cooperates on it :).

BTW. Do they have any plans to be able (if possible) to implement some kind of plugins that we would be able to release on our own maybe? I remember that in the past that was a bit problematic because of the way how RUST ABI worked (or so I remember) - but also I think this has been solved already.

While now it can take quite some time for things to iterate and get released in the new ruff version, when Airflow 3 gets released, we will have sometimes likely a quick fix or new rule to be released fairly quickly, and I think it's not a good idea to overburden Astral team with reviews, merges and releases, and it woudl be cool if we could have our own "plugin" of sorts that we could release and implement changes on our own.

Has this been discussed or considered at all @Lee-W ? Should we start such a discussion ?

@Lee-W
Copy link
Member Author

Lee-W commented Jan 2, 2025

Nice. Glad to see Astral team cooperates on it :).

BTW. Do they have any plans to be able (if possible) to implement some kind of plugins that we would be able to release on our own maybe? I remember that in the past that was a bit problematic because of the way how RUST ABI worked (or so I remember) - but also I think this has been solved already.

While now it can take quite some time for things to iterate and get released in the new ruff version, when Airflow 3 gets released, we will have sometimes likely a quick fix or new rule to be released fairly quickly, and I think it's not a good idea to overburden Astral team with reviews, merges and releases, and it woudl be cool if we could have our own "plugin" of sorts that we could release and implement changes on our own.

I think it's still an open issue. astral-sh/ruff#283 🤔

Has this been discussed or considered at all @Lee-W ? Should we start such a discussion ?

It's not yet been discussed. Not sure whether there will be rules really need to be released that quickly since it's not actually breaking Airflow and rules can be ignored 🤔

@potiuk
Copy link
Member

potiuk commented Jan 2, 2025

It's not yet been discussed. Not sure whether there will be rules really need to be released that quickly since it's not actually breaking Airflow and rules can be ignored 🤔

True, that's why I am not too worried as this is just a "supplemental" code. And yeah - the plugin system is still in discussion i see, and I do not think we have strong enough case to "badly need" it - it's more that I generally do not like when "someone else" controls some airflow-specific code than Airlfow PMC. And this is not something I have against Astral, not at all, it's just we as PMC do not have final saying there, and someone else can add new rules or change ours - so that's a bit of a danger I see).

Various scenarios here are possible - and it's just a little bit of an "itch" that I wonder if we should "scratch".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airflow3.0:candidate Potential candidates for Airflow 3.0 area:upgrade Facilitating migration to a newer version of Airflow kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests