feat(server): Use system packages for execution #1252

leseb · 2025-02-25T16:10:52Z

What does this PR do?

Users prefer to rely on the main CLI rather than invoking the server through a Python module. Users interact with a high-level CLI rather than needing to know internal module structures.

Now, when running llama stack run , the server will attempt to use the system package or a virtual environment if one is active.

This also eliminates the current process dependency chain when running from a virtual environment:

-> llama stack run
-> start_env.sh
-> python -m server...

Signed-off-by: Sébastien Han [email protected]

Test Plan

Run:

ollama run llama3.2:3b-instruct-fp16 --keepalive=2m &
llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6

Notice that the server starts and shutdowns normally.

ashwinb · 2025-02-25T21:27:56Z

I could not understand the precise motivation despite your description. Could you perhaps describe the problem people faced before so I understand the solution?

leseb · 2025-02-26T09:36:20Z

I could not understand the precise motivation despite your description. Could you perhaps describe the problem people faced before so I understand the solution?

Absolutely. The current approach to running the server is somewhat complex, as it involves a Python interpreter calling a Bash script, which in turn calls another Python interpreter. The process looks like this:

-> llama stack run
  -> start_env.sh
    -> python -m server...

The current stack does not handle signals correctly, causing the main “run” command to exit abruptly upon receiving SIGINT or SIGTERM. Additionally, if all required packages are available - such as in a container - we can run “llama stack run …” instead of invoking the “server” module directly, which provides a more user and admin-friendly experience. I don't think we want to encourage user to run the server with python -m server... so this makes the "run" command more robust.

Hope this clarifies.

booxter · 2025-02-26T23:01:26Z

In addition to the signal handling benefits that Sebastien described, this will be needed in production where tools like virtualenvs / conda / uv / pip are not desired or forbidden.

ashwinb · 2025-02-26T23:04:55Z

Thank you both for the details. Reviewing now...

llama_stack/cli/stack/run.py

booxter

I haven't run the code (I don't have an environment with all packages installed as system packages at hand at the moment), so this review is just from my reading of the code. Please ignore if I'm missing something.

llama_stack/cli/stack/run.py

leseb · 2025-02-27T13:12:20Z

I haven't run the code (I don't have an environment with all packages installed as system packages at hand at the moment), so this review is just from my reading of the code. Please ignore if I'm missing something.

If you have venv activated that would do :)

booxter · 2025-02-27T16:47:59Z

I'm testing this locally, and I see this behavior.

When killing the server, I get:

INFO 2025-02-27 11:45:16,452 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:45:16,453 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
ERROR 2025-02-27 11:45:16,454 asyncio:1758: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-11' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py:147> exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
Traceback (most recent call last):
  File "/Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py", line 179, in shutdown
    loop.stop()
UnboundLocalError: local variable 'loop' referenced before assignment

With this change:

--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -161,7 +161,6 @@ def handle_signal(app, signum, _) -> None:
                     logger.exception("Failed to shutdown %s: %s", impl_name, {e})

             # Gather all running tasks
-            loop = asyncio.get_running_loop()
             tasks = [task for task in asyncio.all_tasks(loop) if task is not asyncio.current_task()]

             # Cancel all tasks

I get the following:

INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE]
                       [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config
llama stack run: error: Failed to run server: Event loop stopped before Future completed.

When I use --image-type venv, I don't see this behavior.

leseb · 2025-02-28T09:37:03Z

I'm testing this locally, and I see this behavior.

When killing the server, I get:

INFO 2025-02-27 11:45:16,452 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:45:16,453 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
ERROR 2025-02-27 11:45:16,454 asyncio:1758: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-11' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py:147> exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
Traceback (most recent call last):
  File "/Users/ihrachys/src/llama-stack/sys-packages/llama_stack/distribution/server/server.py", line 179, in shutdown
    loop.stop()
UnboundLocalError: local variable 'loop' referenced before assignment

With this change:

--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -161,7 +161,6 @@ def handle_signal(app, signum, _) -> None:
                     logger.exception("Failed to shutdown %s: %s", impl_name, {e})

             # Gather all running tasks
-            loop = asyncio.get_running_loop()
             tasks = [task for task in asyncio.all_tasks(loop) if task is not asyncio.current_task()]

             # Cancel all tasks

I get the following:

INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:145: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-27 11:46:27,772 llama_stack.distribution.server.server:152: Shutting down ModelsRoutingTable
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE]
                       [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config
llama stack run: error: Failed to run server: Event loop stopped before Future completed.

When I use --image-type venv, I don't see this behavior.

I only see this behavior on Python < 3.12.

ashwinb · 2025-03-01T18:43:44Z

llama_stack/distribution/server/server.py

    """Start the LlamaStack server."""
    parser = argparse.ArgumentParser(description="Start the LlamaStack server.")
    parser.add_argument(
        "--yaml-config",
+        "--config",
+        dest="config",


ah, much nicer :)

wonder if we could instead add a --config argument separately, and add a deprecation warning for --yaml-config?

ashwinb · 2025-03-01T18:44:52Z

llama_stack/cli/stack/run.py

+            try:
+                from llama_stack.distribution.server.server import main as server_main
+            except ImportError as e:
+                self.parser.error(f"Failed to import server module: {e}")


don't think this is a parser error. why trap the error and not let the Exception bubble up? for better UX?

ashwinb · 2025-03-01T18:45:27Z

llama_stack/cli/stack/run.py

+                server_args = argparse.Namespace()
+                for arg in vars(args):
+                    # if this is a function, avoid passing it
+                    if callable(getattr(args, arg)):


what is this bit for? seems fishy

This is how args looks like:

args Namespace(func=<bound method StackRun._run_stack_run_cmd of <llama_stack.cli.stack.run.StackRun object at 0x10484b010>>, config='./llama_stack/templates/ollama/run.yaml', port=8321, image_name=None, disable_ipv6=True, env=None, tls_keyfile=None, tls_certfile=None, image_type=None)

So I think we want to avoid passing func=<bound method StackRun._run_stack_run_cmd of <llama_stack.cli.stack.run.StackRun object at 0x10484b010>>.

ashwinb · 2025-03-01T18:46:04Z

llama_stack/cli/stack/run.py

+                # Run the server
+                server_main(server_args)
+            except Exception as e:
+                self.parser.error(f"Failed to run server: {e}")


again, not a parser error at this point. we are way beyond the point of things being the responsibility of the argparser

ashwinb · 2025-03-01T18:46:42Z

llama_stack/cli/stack/run.py

+
+                # Run the server
+                server_main(server_args)
+            except Exception as e:


i'd rather not catch these exceptions and let them bubble. or if you do catch, ensure you always print some backtrace

Users prefer to rely on the main CLI rather than invoking the server through a Python module. Users interact with a high-level CLI rather than needing to know internal module structures. Now, when running llama stack run <path-to-config>, the server will attempt to use the system package or a virtual environment if one is active. This also eliminates the current process dependency chain when running from a virtual environment: -> llama stack run -> start_env.sh -> python -m server... Signed-off-by: Sébastien Han <[email protected]>

leseb requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang and terrytangyuan as code owners February 25, 2025 16:10

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 25, 2025

leseb force-pushed the run-server-sys-pkg branch from 95ddd2b to 86d7c37 Compare February 26, 2025 09:37

leseb mentioned this pull request Feb 26, 2025

fix(cli): Missing default for --image-type in stack run command #1274

Merged

ashwinb reviewed Feb 26, 2025

View reviewed changes

llama_stack/cli/stack/run.py Outdated Show resolved Hide resolved

ashwinb reviewed Feb 26, 2025

View reviewed changes

llama_stack/cli/stack/run.py Outdated Show resolved Hide resolved

booxter suggested changes Feb 26, 2025

View reviewed changes

llama_stack/cli/stack/run.py Outdated Show resolved Hide resolved

llama_stack/cli/stack/run.py Outdated Show resolved Hide resolved

llama_stack/cli/stack/run.py Outdated Show resolved Hide resolved

leseb force-pushed the run-server-sys-pkg branch 3 times, most recently from 895abbf to ff58776 Compare February 27, 2025 10:22

leseb requested review from ashwinb and booxter February 27, 2025 10:23

ashwinb reviewed Mar 1, 2025

View reviewed changes

leseb force-pushed the run-server-sys-pkg branch from ff58776 to 8db9aef Compare March 3, 2025 10:24

leseb force-pushed the run-server-sys-pkg branch from 8db9aef to 67d0c0a Compare March 3, 2025 10:46

leseb requested a review from ashwinb March 3, 2025 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): Use system packages for execution #1252

feat(server): Use system packages for execution #1252

leseb commented Feb 25, 2025 •

edited

Loading

ashwinb commented Feb 25, 2025

leseb commented Feb 26, 2025

booxter commented Feb 26, 2025

ashwinb commented Feb 26, 2025

booxter left a comment

leseb commented Feb 27, 2025

booxter commented Feb 27, 2025

leseb commented Feb 28, 2025

ashwinb Mar 1, 2025

leseb Mar 3, 2025

ashwinb Mar 1, 2025

ashwinb Mar 1, 2025

leseb Mar 3, 2025

ashwinb Mar 1, 2025

ashwinb Mar 1, 2025

feat(server): Use system packages for execution #1252

Are you sure you want to change the base?

feat(server): Use system packages for execution #1252

Conversation

leseb commented Feb 25, 2025 • edited Loading

What does this PR do?

Test Plan

ashwinb commented Feb 25, 2025

leseb commented Feb 26, 2025

booxter commented Feb 26, 2025

ashwinb commented Feb 26, 2025

booxter left a comment

Choose a reason for hiding this comment

leseb commented Feb 27, 2025

booxter commented Feb 27, 2025

leseb commented Feb 28, 2025

ashwinb Mar 1, 2025

Choose a reason for hiding this comment

leseb Mar 3, 2025

Choose a reason for hiding this comment

ashwinb Mar 1, 2025

Choose a reason for hiding this comment

ashwinb Mar 1, 2025

Choose a reason for hiding this comment

leseb Mar 3, 2025

Choose a reason for hiding this comment

ashwinb Mar 1, 2025

Choose a reason for hiding this comment

ashwinb Mar 1, 2025

Choose a reason for hiding this comment

leseb commented Feb 25, 2025 •

edited

Loading