Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): Use system packages for execution #1252

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

leseb
Copy link
Contributor

@leseb leseb commented Feb 25, 2025

What does this PR do?

Users prefer to rely on the main CLI rather than invoking the server through a Python module. Users interact with a high-level CLI rather than needing to know internal module structures.

Now, when running llama stack run , the server will attempt to use the system package or a virtual environment if one is active.

This also eliminates the current process dependency chain when running from a virtual environment:

-> llama stack run
       -> start_env.sh
             -> python -m server...

Signed-off-by: Sébastien Han [email protected]

Test Plan

Run:

ollama run llama3.2:3b-instruct-fp16 --keepalive=2m &
llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6

Notice that the server starts and shutdowns normally.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 25, 2025
@ashwinb
Copy link
Contributor

ashwinb commented Feb 25, 2025

I could not understand the precise motivation despite your description. Could you perhaps describe the problem people faced before so I understand the solution?

@leseb
Copy link
Contributor Author

leseb commented Feb 26, 2025

I could not understand the precise motivation despite your description. Could you perhaps describe the problem people faced before so I understand the solution?

Absolutely. The current approach to running the server is somewhat complex, as it involves a Python interpreter calling a Bash script, which in turn calls another Python interpreter. The process looks like this:

-> llama stack run
  -> start_env.sh
    -> python -m server...

The current stack does not handle signals correctly, causing the main “run” command to exit abruptly upon receiving SIGINT or SIGTERM. Additionally, if all required packages are available - such as in a container - we can run “llama stack run …” instead of invoking the “server” module directly, which provides a more user and admin-friendly experience. I don't think we want to encourage user to run the server with python -m server... so this makes the "run" command more robust.

Hope this clarifies.

@booxter
Copy link
Contributor

booxter commented Feb 26, 2025

In addition to the signal handling benefits that Sebastien described, this will be needed in production where tools like virtualenvs / conda / uv / pip are not desired or forbidden.

@ashwinb
Copy link
Contributor

ashwinb commented Feb 26, 2025

Thank you both for the details. Reviewing now...

Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run the code (I don't have an environment with all packages installed as system packages at hand at the moment), so this review is just from my reading of the code. Please ignore if I'm missing something.

@leseb leseb force-pushed the run-server-sys-pkg branch 3 times, most recently from 895abbf to ff58776 Compare February 27, 2025 10:22
@leseb leseb requested review from ashwinb and booxter February 27, 2025 10:23
@leseb
Copy link
Contributor Author

leseb commented Feb 27, 2025

I haven't run the code (I don't have an environment with all packages installed as system packages at hand at the moment), so this review is just from my reading of the code. Please ignore if I'm missing something.

If you have venv activated that would do :)

@booxter
Copy link
Contributor

booxter commented Feb 27, 2025

I'm testing this locally, and I see this behavior.


When killing the server, I get:

INFO 2025-02-27 11:45:16,452 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:45:16,453 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
ERROR 2025-02-27 11:45:16,454 asyncio:1758: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-11' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py:147> exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
Traceback (most recent call last):
  File "/Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py", line 179, in shutdown
    loop.stop()
UnboundLocalError: local variable 'loop' referenced before assignment

With this change:

--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -161,7 +161,6 @@ def handle_signal(app, signum, _) -> None:
                     logger.exception("Failed to shutdown %s: %s", impl_name, {e})

             # Gather all running tasks
-            loop = asyncio.get_running_loop()
             tasks = [task for task in asyncio.all_tasks(loop) if task is not asyncio.current_task()]

             # Cancel all tasks

I get the following:

INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE]
                       [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config
llama stack run: error: Failed to run server: Event loop stopped before Future completed.

When I use --image-type venv, I don't see this behavior.

@leseb
Copy link
Contributor Author

leseb commented Feb 28, 2025

I'm testing this locally, and I see this behavior.

When killing the server, I get:

INFO 2025-02-27 11:45:16,452 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:45:16,453 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
ERROR 2025-02-27 11:45:16,454 asyncio:1758: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-11' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py:147> exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
Traceback (most recent call last):
  File "/Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py", line 179, in shutdown
    loop.stop()
UnboundLocalError: local variable 'loop' referenced before assignment

With this change:

--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -161,7 +161,6 @@ def handle_signal(app, signum, _) -> None:
                     logger.exception("Failed to shutdown %s: %s", impl_name, {e})

             # Gather all running tasks
-            loop = asyncio.get_running_loop()
             tasks = [task for task in asyncio.all_tasks(loop) if task is not asyncio.current_task()]

             # Cancel all tasks

I get the following:

INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE]
                       [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config
llama stack run: error: Failed to run server: Event loop stopped before Future completed.

When I use --image-type venv, I don't see this behavior.

I only see this behavior on Python < 3.12.

"""Start the LlamaStack server."""
parser = argparse.ArgumentParser(description="Start the LlamaStack server.")
parser.add_argument(
"--yaml-config",
"--config",
dest="config",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, much nicer :)

wonder if we could instead add a --config argument separately, and add a deprecation warning for --yaml-config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

try:
from llama_stack.distribution.server.server import main as server_main
except ImportError as e:
self.parser.error(f"Failed to import server module: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this is a parser error. why trap the error and not let the Exception bubble up? for better UX?

server_args = argparse.Namespace()
for arg in vars(args):
# if this is a function, avoid passing it
if callable(getattr(args, arg)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this bit for? seems fishy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how args looks like:

args Namespace(func=<bound method StackRun._run_stack_run_cmd of <llama_stack.cli.stack.run.StackRun object at 0x10484b010>>, config='./llama_stack/templates/ollama/run.yaml', port=8321, image_name=None, disable_ipv6=True, env=None, tls_keyfile=None, tls_certfile=None, image_type=None)

So I think we want to avoid passing func=<bound method StackRun._run_stack_run_cmd of <llama_stack.cli.stack.run.StackRun object at 0x10484b010>>.

# Run the server
server_main(server_args)
except Exception as e:
self.parser.error(f"Failed to run server: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not a parser error at this point. we are way beyond the point of things being the responsibility of the argparser


# Run the server
server_main(server_args)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather not catch these exceptions and let them bubble. or if you do catch, ensure you always print some backtrace

@leseb leseb force-pushed the run-server-sys-pkg branch from ff58776 to 8db9aef Compare March 3, 2025 10:24
Users prefer to rely on the main CLI rather than invoking the server
through a Python module. Users interact with a high-level CLI rather
than needing to know internal module structures.

Now, when running llama stack run <path-to-config>, the server will
attempt to use the system package or a virtual environment if one is
active.

This also eliminates the current process dependency chain when running
from a virtual environment:

-> llama stack run
  -> start_env.sh
    -> python -m server...

Signed-off-by: Sébastien Han <[email protected]>
@leseb leseb force-pushed the run-server-sys-pkg branch from 8db9aef to 67d0c0a Compare March 3, 2025 10:46
@leseb leseb requested a review from ashwinb March 3, 2025 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants