Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) #45304

Open
1 of 2 tasks
bugraoz93 opened this issue Dec 30, 2024 · 8 comments
Open
1 of 2 tasks

AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) #45304

bugraoz93 opened this issue Dec 30, 2024 · 8 comments
Labels
AIP-81 Enhanced Security in CLI via Integration of API area:API Airflow's REST/HTTP API area:logging kind:feature Feature Requests

Comments

@bugraoz93
Copy link
Collaborator

Description

We should be able to insert Log (from airflow.models.log import Log) similar to here,

def default_action_log(

This will enable us to use the logging feature without relying on the session.

Use case/motivation

To provide the same functionality for the CLI with API integration.
AIP-81

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@bugraoz93 bugraoz93 added kind:feature Feature Requests area:API Airflow's REST/HTTP API AIP-81 Enhanced Security in CLI via Integration of API labels Dec 30, 2024
@bugraoz93 bugraoz93 changed the title AIP-81 Implement Log Insert Endpoint to REST API AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) Dec 30, 2024
@dosubot dosubot bot added the area:logging label Dec 30, 2024
@jason810496
Copy link
Contributor

jason810496 commented Jan 3, 2025

Hi @bugraoz93,

If I may suggest, this could also be achieved by using a command-line-specific audience in the JWT. Once the new action_logging decorator is applied to FastAPI, the audit log logic could be handled in a single API call.
This approach could also reduce the chances of missing audit logs. For instance, if the CLI is forcibly terminated right after calling the core API and before the audit log API is invoked, the audit log might not be recorded.

What do you think?
cc @pierrejeambrun @vincbeck

@perry2of5
Copy link
Contributor

What would this log from the CLI code that shouldn't already be logged by the REST API itself?

@bugraoz93
Copy link
Collaborator Author

Hi @bugraoz93,

If I may suggest, this could also be achieved by using a command-line-specific audience in the JWT. Once the new action_logging decorator is applied to FastAPI, the audit log logic could be handled in a single API call. This approach could also reduce the chances of missing audit logs. For instance, if the CLI is forcibly terminated right after calling the core API and before the audit log API is invoked, the audit log might not be recorded.

What do you think? cc @pierrejeambrun @vincbeck

@jason810496 It makes sense. There is one small thing here, we should identify when the event originates from cli_ and log the full_command. One way to achieve this is by using headers in the API, though this would result in a custom implementation specific to the CLI on the API side.
If we intend to include this functionality in the REST API for the CLI use case, I would be happy to close this in favour of API auditing the CLI actions. Otherwise, I would suggest moving forward with this approach.

@bugraoz93
Copy link
Collaborator Author

What would this log from the CLI code that shouldn't already be logged by the REST API itself?

We are passing the event as a cli_ with extra as full_command (json.dumps({"host_name": host_name, "full_command": full_command})). This is a specific event sent by the CLI that wraps all the CLI action commands. Below, cli_action_logger will stay for local commands, but it should be replaced for remote commands because this uses the session. There are two things to consider here. The first is API call Audit Logs, which will again come from signed in users, which is the regular API auditing. The second one is we should also log this event coming from CLI means event: cli_<> and extra: full_command, while understanding that the requests are coming from the CLI via headers or similar. If we only use the API auditing, there will be nothing to differentiate if the action is taken from the CLI or not. This again requires bespoke implementation on the API end to make it possible. This brings about maintaining code pieces in the API for the CLI, which can make the two things more tightly coupled. That's why I thought it would be easier just to implement an endpoint for it. I think understanding if a call is coming from the CLI would make things a lot easier to debug in case of security emergencies or for collecting evidence for breaches.

cli_action_loggers.on_post_execution(**metrics)

Behave similar to ``action_logging``; default action logger callback.
The difference is this function uses the global ORM session, and pushes a
``Log`` row into the database instead of actually logging.
"""
from sqlalchemy.exc import OperationalError, ProgrammingError
from airflow.models.log import Log
from airflow.utils import timezone
try:
# Use bulk_insert_mappings here to avoid importing all models (which using the classes does) early
# on in the CLI
session.bulk_insert_mappings(
Log,
[
{
"event": f"cli_{sub_command}",
"task_instance": None,
"owner": user,
"extra": json.dumps({"host_name": host_name, "full_command": full_command}),
"task_id": task_id,
"dag_id": dag_id,
"logical_date": logical_date,
"dttm": timezone.utcnow(),
}
],
)
session.commit()
except (OperationalError, ProgrammingError) as e:
expected = [
'"log" does not exist', # postgres
"no such table", # sqlite
"log' doesn't exist", # mysql
]
error_is_ok = e.args and any(x in e.args[0] for x in expected)
if not error_is_ok:
logger.warning("Failed to log action %s", e)
session.rollback()
except Exception as e:
logger.warning("Failed to log action %s", e)
session.rollback()

@perry2of5
Copy link
Contributor

Thanks.

Perhaps this is a solved problem, but I can't think of a way to guarantee we know if a client is CLI or something else if a malicious actor is involved. I would think they could do the OAuth dance in the CLI way or the WebUI way and get a different audience unless a particular account was locked to only one type or the other. Still, it would be a known, though likely compromised, account that accessed the API..... In this case knowing the IP address the request reached the outermost load balancer is probably as much as we can rely on. Possibly my knowledge is out of date here.

For a non-malicious actor, knowing the exact CLI command might help to research why something was changed a particular way. I suppose the question is: is there is enough value to knowing the exact CLI command in addition to the REST API request parameters and body to be worth adding the special case? Beyond that, knowing the audience (and knowing not to depend on it) might be as much as is worth implementing. I'm wishy-washy on the value of this. I hope my ramblings are useful :)

@bugraoz93
Copy link
Collaborator Author

I was giving an example of why we should maintain the behaviour and explaining why this feature is needed in case the current behaviours persist. Keeping an additional audit log is always valuable when it comes to security.

I think every idea is useful here, as it brings another perspective to the table. Thanks! :)

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Jan 7, 2025

All CLI actions will eventually go through the API and the API is responsible for login any user action already with the action_logging or something similar on the new FastAPI API.

I assume that API authentication when calling from the CLI has specific permissions etc... and that it is authenticated through a dedicated role / has a specific token / something identifiable from the API to know that this is a call that is coming from the CLI. Because for instance some endpoints are only accessible from the CLI (specific permissions etc...). Thus I think the API should be able to log properly the actions and the owner should be cli. Command can be pushed into extra ?

I think this comes down to:

  • How is the API authentication working when calling from the CLI ? How is the token retrieved, what user and role, etc.

@bugraoz93
Copy link
Collaborator Author

All CLI actions will eventually go through the API and the API is responsible for login any user action already with the action_logging or something similar on the new FastAPI API.

I assume that API authentication when calling from the CLI has specific permissions etc... and that it is authenticated through a dedicated role / has a specific token / something identifiable from the API to know that this is a call that is coming from the CLI. Because for instance some endpoints are only accessible from the CLI (specific permissions etc...). Thus I think the API should be able to log properly the actions and the owner should be cli. Command can be pushed into extra ?

I think this comes down to:

  • How is the API authentication working when calling from the CLI ? How is the token retrieved, what user and role, etc.

Many thanks for your perspective on this! I have very high-level answers at the moment which may not be enough to make decisions. The answers to these questions will enlighten multiple parts of this project. I will answer them in a detailed way very soon to be brought up into discussion. So that we won't do any duplicate or overlapping work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-81 Enhanced Security in CLI via Integration of API area:API Airflow's REST/HTTP API area:logging kind:feature Feature Requests
Projects
Development

No branches or pull requests

4 participants