diff --git a/README.md b/README.md
index f8c2557849..3a4b17bad9 100644
--- a/README.md
+++ b/README.md
@@ -125,20 +125,6 @@ class ExampleService:
         self.megaservice.flow_to(embedding, llm)
 ```
 
-## Gateway
-
-The `Gateway` serves as the interface for users to access the `Megaservice`, providing customized access based on user requirements. It acts as the entry point for incoming requests, routing them to the appropriate `Microservices` within the `Megaservice` architecture.
-
-`Gateways` support API definition, API versioning, rate limiting, and request transformation, allowing for fine-grained control over how users interact with the underlying `Microservices`. By abstracting the complexity of the underlying infrastructure, `Gateways` provide a seamless and user-friendly experience for interacting with the `Megaservice`.
-
-For example, the `Gateway` for `ChatQnA` can be built like this:
-
-```python
-from comps import ChatQnAGateway
-
-self.gateway = ChatQnAGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
-```
-
 ## Contributing to OPEA
 
 Welcome to the OPEA open-source community! We are thrilled to have you here and excited about the potential contributions you can bring to the OPEA platform. Whether you are fixing bugs, adding new GenAI components, improving documentation, or sharing your unique use cases, your contributions are invaluable.
diff --git a/comps/__init__.py b/comps/__init__.py
index 8fe3ac5fdf..240302c75c 100644
--- a/comps/__init__.py
+++ b/comps/__init__.py
@@ -47,23 +47,6 @@
 from comps.cores.mega.orchestrator import ServiceOrchestrator
 from comps.cores.mega.orchestrator_with_yaml import ServiceOrchestratorWithYaml
 from comps.cores.mega.micro_service import MicroService, register_microservice, opea_microservices
-from comps.cores.mega.gateway import (
-    Gateway,
-    ChatQnAGateway,
-    CodeGenGateway,
-    CodeTransGateway,
-    DocSumGateway,
-    TranslationGateway,
-    SearchQnAGateway,
-    AudioQnAGateway,
-    RetrievalToolGateway,
-    FaqGenGateway,
-    VideoQnAGateway,
-    VisualQnAGateway,
-    MultimodalQnAGateway,
-    GraphragGateway,
-    AvatarChatbotGateway,
-)
 
 # Telemetry
 from comps.cores.telemetry.opea_telemetry import opea_telemetry
diff --git a/comps/agent/langchain/README.md b/comps/agent/langchain/README.md
index 9e858c2520..585ff5d964 100644
--- a/comps/agent/langchain/README.md
+++ b/comps/agent/langchain/README.md
@@ -11,11 +11,10 @@ We currently support the following types of agents:
 1. ReAct: use `react_langchain` or `react_langgraph` or `react_llama` as strategy. First introduced in this seminal [paper](https://arxiv.org/abs/2210.03629). The ReAct agent engages in "reason-act-observe" cycles to solve problems. Please refer to this [doc](https://python.langchain.com/v0.2/docs/how_to/migrate_agent/) to understand the differences between the langchain and langgraph versions of react agents. See table below to understand the validated LLMs for each react strategy.
 2. RAG agent: use `rag_agent` or `rag_agent_llama` strategy. This agent is specifically designed for improving RAG performance. It has the capability to rephrase query, check relevancy of retrieved context, and iterate if context is not relevant. See table below to understand the validated LLMs for each rag agent strategy.
 3. Plan and execute: `plan_execute` strategy. This type of agent first makes a step-by-step plan given a user request, and then execute the plan sequentially (or in parallel, to be implemented in future). If the execution results can solve the problem, then the agent will output an answer; otherwise, it will replan and execute again.
-4. SQL agent: use `sql_agent_llama` or `sql_agent` strategy. This agent is specifically designed and optimized for answering questions aabout data in SQL databases. For more technical details read descriptions [here](src/strategy/sqlagent/README.md).
 
 **Note**:
 
-1. Due to the limitations in support for tool calling by TGI and vllm, we have developed subcategories of agent strategies (`rag_agent_llama`, `react_llama` and `sql_agent_llama`) specifically designed for open-source LLMs served with TGI and vllm.
+1. Due to the limitations in support for tool calling by TGI and vllm, we have developed subcategories of agent strategies (`rag_agent_llama` and `react_llama`) specifically designed for open-source LLMs served with TGI and vllm.
 2. For advanced developers who want to implement their own agent strategies, please refer to [Section 5](#5-customize-agent-strategy) below.
 
 ### 1.2 LLM engine
@@ -26,16 +25,14 @@ Agents use LLM for reasoning and planning. We support 3 options of LLM engine:
 2. Open-source LLMs served with vllm. Follow the instructions in [Section 2.2.2](#222-start-agent-microservices-with-vllm).
 3. OpenAI LLMs via API calls. To use OpenAI llms, specify `llm_engine=openai` and `export OPENAI_API_KEY=<your-openai-key>`
 
-| Agent type       | `strategy` arg    | Validated LLMs (serving SW)                                                                                                                                                                                    | Notes                                                                                                                                                                                                                                                                                                                                           |
-| ---------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| ReAct            | `react_langchain` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (tgi-gaudi)                                                                                                                  | Only allows tools with one input variable                                                                                                                                                                                                                                                                                                       |
-| ReAct            | `react_langgraph` | GPT-4o-mini, [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) (vllm-gaudi),                                                                                               | if using vllm, need to specify `--enable-auto-tool-choice --tool-call-parser ${model_parser}`, refer to vllm docs for more info                                                                                                                                                                                                                 |
-| ReAct            | `react_llama`     | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (tgi-gaudi) (vllm-gaudi)                                                                                                     | Recommended for open-source LLMs                                                                                                                                                                                                                                                                                                                |
-| RAG agent        | `rag_agent`       | GPT-4o-mini                                                                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                 |
-| RAG agent        | `rag_agent_llama` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (tgi-gaudi) (vllm-gaudi)                                                                                                     | Recommended for open-source LLMs, only allows 1 tool with input variable to be "query"                                                                                                                                                                                                                                                          |
-| Plan and execute | `plan_execute`    | GPT-4o-mini, [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) (vllm-gaudi), [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) (vllm-gaudi) | Currently, due to some issues with guaided decoding of vllm-gaudi, this strategy does not work properly with vllm-gaudi. We are actively debugging. Stay tuned. In the meanwhile, you can use OpenAI's models with this strategy.                                                                                                               |
-| SQL agent        | `sql_agent_llama` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (vllm-gaudi)                                                                                                                 | database query tool is natively integrated using Langchain's [QuerySQLDataBaseTool](https://python.langchain.com/api_reference/community/tools/langchain_community.tools.sql_database.tool.QuerySQLDataBaseTool.html#langchain_community.tools.sql_database.tool.QuerySQLDataBaseTool). User can also register their own tools with this agent. |
-| SQL agent        | `sql_agent`       | GPT-4o-mini                                                                                                                                                                                                    | database query tool is natively integrated using Langchain's [QuerySQLDataBaseTool](https://python.langchain.com/api_reference/community/tools/langchain_community.tools.sql_database.tool.QuerySQLDataBaseTool.html#langchain_community.tools.sql_database.tool.QuerySQLDataBaseTool). User can also register their own tools with this agent. |
+| Agent type       | `strategy` arg    | Validated LLMs (serving SW)                                                                                                                                                                                    | Notes                                                                                                                           |
+| ---------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
+| ReAct            | `react_langchain` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (tgi-gaudi)                                                                                                                  | Only allows tools with one input variable                                                                                       |
+| ReAct            | `react_langgraph` | GPT-4o-mini, [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) (vllm-gaudi),                                                                                               | if using vllm, need to specify `--enable-auto-tool-choice --tool-call-parser ${model_parser}`, refer to vllm docs for more info |
+| ReAct            | `react_llama`     | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (tgi-gaudi)                                                                                                                  | Recommended for open-source LLMs                                                                                                |
+| RAG agent        | `rag_agent`       | GPT-4o-mini                                                                                                                                                                                                    |                                                                                                                                 |
+| RAG agent        | `rag_agent_llama` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (tgi-gaudi)                                                                                                                  | Recommended for open-source LLMs, only allows 1 tool with input variable to be "query"                                          |
+| Plan and execute | `plan_execute`    | GPT-4o-mini, [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) (vllm-gaudi), [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) (vllm-gaudi) |                                                                                                                                 |
 
 ### 1.3 Tools
 
@@ -123,12 +120,12 @@ Once microservice starts, user can use below script to invoke.
 
 ```bash
 curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
-     "query": "What is OPEA project?"
+     "query": "What is the weather today in Austin?"
     }'
 
 # expected output
 
-data: 'The OPEA project is .....</s>' # just showing partial example here.
+data: 'The temperature in Austin today is 78°F.</s>'
 
 data: [DONE]
 
@@ -213,4 +210,4 @@ data: [DONE]
 ## 5. Customize agent strategy
 
 For advanced developers who want to implement their own agent strategies, you can add a separate folder in `src\strategy`, implement your agent by inherit the `BaseAgent` class, and add your strategy into the `src\agent.py`. The architecture of this agent microservice is shown in the diagram below as a reference.
-![Architecture Overview](assets/agent_arch.jpg)
+![Architecture Overview](agent_arch.jpg)
diff --git a/comps/agent/langchain/assets/agent_arch.jpg b/comps/agent/langchain/agent_arch.jpg
similarity index 100%
rename from comps/agent/langchain/assets/agent_arch.jpg
rename to comps/agent/langchain/agent_arch.jpg
diff --git a/comps/agent/langchain/assets/sql_agent.png b/comps/agent/langchain/assets/sql_agent.png
deleted file mode 100644
index a4e2e3b33c..0000000000
Binary files a/comps/agent/langchain/assets/sql_agent.png and /dev/null differ
diff --git a/comps/agent/langchain/assets/sql_agent_llama.png b/comps/agent/langchain/assets/sql_agent_llama.png
deleted file mode 100644
index 6d832d4d4f..0000000000
Binary files a/comps/agent/langchain/assets/sql_agent_llama.png and /dev/null differ
diff --git a/comps/agent/langchain/log b/comps/agent/langchain/log
deleted file mode 100644
index 65e62c8c0d..0000000000
--- a/comps/agent/langchain/log
+++ /dev/null
@@ -1,46 +0,0 @@
-[HumanMessage(content="what's the most recent album from the founder of ysl records?", id='cfde4aba-0464-4ad9-bd1c-d3fc40bbb46e'), AIMessage(content='', addi
-tional_kwargs={'tool_calls': [ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'query': 'founder of YSL Records'}, name
-e='duckduckgo_search', description=None), id='142bd66e-7ec6-4381-bcb6-2d0bd4fcecd3', type='function')]}, id='6430421d-2238-452b-9fbe-4d6cc8bc0cd8', tool_call
-s=[{'name': 'duckduckgo_search', 'args': {'query': 'founder of YSL Records'}, 'id': '142bd66e-7ec6-4381-bcb6-2d0bd4fcecd3', 'type': 'tool_call'}]), ToolMessa
-ge(content='Prosecutors allege the chart-topping artist\'s label Young Stoner Life Records also stands for Young Slime ... a YSL co-founder who reached a ple
-a deal in 2022 and claimed Young Thug was a part ... In 2016, Young Thug founded YSL Records, with its full name, "Young Stoner Life," perfectly representing
- the enigmatic persona of the label. YSL Records quickly became known for its ... The longest criminal trial in the State of Georgia\'s history has been plag
-ued by endless problems, including the recusal of not one but two judges in charge of the case. Young Thug and his lawyer ... Prosecutors say YSL - the acron
-ym for the artist\'s label, Young Stoner Life Records - also stands for Young Slime Life, an Atlanta-based street gang affiliated with the national Bloods ga
-ng. YSL Records calls its roster of artists the "Slime Family." ... YSL co-founder Walter Murphy entered a guilty plea on a single count of conspiracy to vio
-late the state\'s Racketeer Influenced ...', name='duckduckgo_search', id='1d9ebf6b-bf62-469a-8d45-365c0fa15bba', tool_call_id='142bd66e-7ec6-4381-bcb6-2d0bd
-4fcecd3'), HumanMessage(content='Retrieved document is not sufficient or relevant to answer the query. Reformulate the query to search knowledge base again.'
-, id='6f2fb9db-e5b4-4483-b69a-06122c996eb9'), AIMessage(content='', additional_kwargs={'tool_calls': [ChatCompletionOutputToolCall(function=ChatCompletionOut
-putFunctionDefinition(arguments={'query': 'YSL Records founder'}, name='duckduckgo_search', description=None), id='6d073578-5bf1-449e-8d32-4f3b5f999a06', typ
-e='function')]}, id='50a644af-3117-4513-be56-5fb4b16d0481', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'YSL Records founder'}, 'id': '6d0735
-78-5bf1-449e-8d32-4f3b5f999a06', 'type': 'tool_call'}]), ToolMessage(content='That same year, he founded the YSL record label, which the rapper has used to p
-ropel close friends and family members to industry success. The Birth Of YSL Records And Its Impact On The Music Industry In 2016, Young Thug founded YSL Rec
-ords, with its full name, "Young Stoner Life," perfectly representing the enigmatic persona of ... Young Thug, who runs the Young Stoner Life label, has been
- accused of co-founding the Young Slime Life Atlanta gang and violating the RICO act, among other charges. Here\'s what to know about the ... The rapper foun
-dead the record label Young Stoner Life in 2016 as an imprint of 300 Entertainment. YSL Records calls its roster of artists the "Slime Family." One of the YSL
- Records founder\'s charges includes conspiracy to violate the Racketeer Influenced and Corrupt Organizations Act (RICO).', name='duckduckgo_search', id='afc
-cee2f-f36f-4d86-8abd-8d05d733649d', tool_call_id='6d073578-5bf1-449e-8d32-4f3b5f999a06'), HumanMessage(content='Retrieved document is not sufficient or relev
-ant to answer the query. Reformulate the query to search knowledge base again.', id='22af0af8-92d0-4876-ad4e-16457c37f1f3'), AIMessage(content='', additional
-_kwargs={'tool_calls': [ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'query': 'latest album by Young Thug'}, name=
-'duckduckgo_search', description=None), id='2315fb35-11ec-4a6b-be86-405a1efef411', type='function')]}, id='189ba925-9f82-4e9e-bdab-6dcfc49a196e', tool_calls=
-[{'name': 'duckduckgo_search', 'args': {'query': 'YSL Records founder name'}, 'id': 'cddeca43-88e2-4279-90aa-4c0b58b8b3c4', 'type': 'tool_call'}, {'name': 'd
-uckduckgo_search', 'args': {'query': 'latest album by Young Thug'}, 'id': '2315fb35-11ec-4a6b-be86-405a1efef411', 'type': 'tool_call'}]), ToolMessage(content
-='In 2016, Young Thug founded YSL Records, with its full name, "Young Stoner Life," perfectly representing the enigmatic persona of the label. YSL Records qu
-ickly became known for its ... Thug is also the founder of Young Stoner Life Records. However, the rapper was arrested in May 2022 and charged with conspirin
-g to violate the state\'s Racketeer Influenced and Corrupt Organizations (RICO) Act, according to. Prosecutors alleged that YSL was a street gang connected t
-o a sleuth of crimes. Prosecutors say YSL - the acronym for the artist\'s label, Young Stoner Life Records - also stands for Young Slime Life, an Atlanta-bas
-ed street gang affiliated with the national Bloods gang. Trontavious Stephens, a co-founder of YSL, took the witness stand for hours Wednesday and Thursday.
-Stephens testified that he had a criminal record, but he said he did not commit crimes with ... Young Thug\'s arrest sent shockwaves through the rap communit
-y. On May 9, 2022, the YSL rapper was apprehended along with 27 other alleged gang members as part of a sprawling 56-count indictment ...', name='duckduckgo_
-search', id='153464f0-aee1-4097-aae6-191ffa372aa6', tool_call_id='cddeca43-88e2-4279-90aa-4c0b58b8b3c4'), ToolMessage(content='The discography of American ra
-pper Young Thug consists of three studio albums, two compilation albums, twelve self-released mixtapes, seven commercial mixtapes, three extended plays, and
-sixty-nine singles (including 71 as a featured artist).. In 2015, Thug released his debut mixtape, Barter 6, which reached number 22 on the Billboard 200. Hi
-s 2016 mixtape, I\'m Up matched the same position. Atlanta singer (and Thugger\'s rumored girlfriend) also released her own "From a Woman" Friday. Even behin
-d bars, Young Thug remains prolific, with the rapper dropping his new single "From a Man ... Mariah The Scientist\'s new song "From A Woman" isn\'t musically
- connected to "From A Man," though Mariah\'s cover art works as a flipside to Thug\'s. She sings about finding someone ... Future, Young Thug & More Join Mus
-tard On New Album \'Faith Of A Mustard Seed\': Stream. News | Jul 28, 2024, 11:00 AM PDT. HipHopDX brings you all the newest Young Thug albums, songs, and ..
-. Find top songs and albums by Young Thug including Lifestyle (feat. Young Thug & Rich Homie Quan), It\'s Up and more. ... JEFFREY to the country-ish sides o
-f Beautiful Thugger Girls—has cracked the mainstream, laying the groundwork for a new crop of fellow eccentrics like Lil Uzi Vert and Playboi Carti. In other 
-words, Thug hasn\'t adjusted ...', name='duckduckgo_search', id='79667433-d46e-4536-a04b-842223008bdc', tool_call_id='2315fb35-11ec-4a6b-be86-405a1efef411')
-, HumanMessage(content='Retrieved document is not sufficient or relevant to answer the query. Reformulate the query to search knowledge base again.', id='2f1
-b3773-8837-4f14-82b5-0a8bbccf3948'), HumanMessage(content='I don’t know.', id='a08edb17-a41a-458f-903e-0f7e15908e3f')]
diff --git a/comps/agent/langchain/requirements.txt b/comps/agent/langchain/requirements.txt
index 431a5060a4..ab3ff0c6bc 100644
--- a/comps/agent/langchain/requirements.txt
+++ b/comps/agent/langchain/requirements.txt
@@ -1,11 +1,11 @@
 # used by microservice
 docarray[full]
+
+#used by tools
+duckduckgo-search
 fastapi
 huggingface_hub
 langchain
-
-#used by tools
-langchain-google-community
 langchain-huggingface
 langchain-openai
 langchain_community
diff --git a/comps/agent/langchain/src/agent.py b/comps/agent/langchain/src/agent.py
index a7713a29bb..0533826c59 100644
--- a/comps/agent/langchain/src/agent.py
+++ b/comps/agent/langchain/src/agent.py
@@ -33,15 +33,5 @@ def instantiate_agent(args, strategy="react_langchain", with_memory=False):
         from .strategy.ragagent import RAGAgent
 
         return RAGAgent(args, with_memory, custom_prompt=custom_prompt)
-    elif strategy == "sql_agent_llama":
-        print("Initializing SQL Agent Llama")
-        from .strategy.sqlagent import SQLAgentLlama
-
-        return SQLAgentLlama(args, with_memory, custom_prompt=custom_prompt)
-    elif strategy == "sql_agent":
-        print("Initializing SQL Agent")
-        from .strategy.sqlagent import SQLAgent
-
-        return SQLAgent(args, with_memory, custom_prompt=custom_prompt)
     else:
         raise ValueError(f"Agent strategy: {strategy} not supported!")
diff --git a/comps/agent/langchain/src/config.py b/comps/agent/langchain/src/config.py
index 6bb1b12dd2..4178e2d9f9 100644
--- a/comps/agent/langchain/src/config.py
+++ b/comps/agent/langchain/src/config.py
@@ -72,16 +72,3 @@
 
 if os.environ.get("timeout") is not None:
     env_config += ["--timeout", os.environ["timeout"]]
-
-# for sql agent
-if os.environ.get("db_path") is not None:
-    env_config += ["--db_path", os.environ["db_path"]]
-
-if os.environ.get("db_name") is not None:
-    env_config += ["--db_name", os.environ["db_name"]]
-
-if os.environ.get("use_hints") is not None:
-    env_config += ["--use_hints", os.environ["use_hints"]]
-
-if os.environ.get("hints_file") is not None:
-    env_config += ["--hints_file", os.environ["hints_file"]]
diff --git a/comps/agent/langchain/src/strategy/base_agent.py b/comps/agent/langchain/src/strategy/base_agent.py
index 8c0048b879..beb4fa9f8f 100644
--- a/comps/agent/langchain/src/strategy/base_agent.py
+++ b/comps/agent/langchain/src/strategy/base_agent.py
@@ -36,37 +36,5 @@ def compile(self):
     def execute(self, state: dict):
         pass
 
-    def prepare_initial_state(self, query):
+    def non_streaming_run(self, query, config):
         raise NotImplementedError
-
-    async def stream_generator(self, query, config):
-        initial_state = self.prepare_initial_state(query)
-        try:
-            async for event in self.app.astream(initial_state, config=config):
-                for node_name, node_state in event.items():
-                    yield f"--- CALL {node_name} ---\n"
-                    for k, v in node_state.items():
-                        if v is not None:
-                            yield f"{k}: {v}\n"
-
-                yield f"data: {repr(event)}\n\n"
-            yield "data: [DONE]\n\n"
-        except Exception as e:
-            yield str(e)
-
-    async def non_streaming_run(self, query, config):
-        initial_state = self.prepare_initial_state(query)
-        print("@@@ Initial State: ", initial_state)
-        try:
-            async for s in self.app.astream(initial_state, config=config, stream_mode="values"):
-                message = s["messages"][-1]
-                if isinstance(message, tuple):
-                    print(message)
-                else:
-                    message.pretty_print()
-
-            last_message = s["messages"][-1]
-            print("******Response: ", last_message.content)
-            return last_message.content
-        except Exception as e:
-            return str(e)
diff --git a/comps/agent/langchain/src/strategy/sqlagent/README.md b/comps/agent/langchain/src/strategy/sqlagent/README.md
deleted file mode 100644
index 69d573b1c7..0000000000
--- a/comps/agent/langchain/src/strategy/sqlagent/README.md
+++ /dev/null
@@ -1,44 +0,0 @@
-# SQL Agents
-
-We currently have two types of SQL agents:
-
-1. `sql_agent_llama`: for using with open-source LLMs, especially `meta-llama/Llama-3.1-70B-Instruct` model.
-2. `sql_agent`: for using with OpenAI models, we developed and validated with GPT-4o-mini.
-
-## Overview of sql_agent_llama
-
-The architecture of `sql_agent_llama` is shown in the figure below.
-The agent node takes user question, hints (optional) and history (when available), and thinks step by step to solve the problem.
-
-![SQL Agent Llama Architecture](../../../assets/sql_agent_llama.png)
-
-### Database schema:
-
-We use langchain's [SQLDatabase](https://python.langchain.com/api_reference/community/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase) API to get table names and schemas from the SQL database. User just need to specify `db_path` and `db_name`. The table schemas are incorporated into the prompts for the agent.
-
-### Hints module:
-
-If you want to use the hints module, you need to prepare a csv file that has 3 columns: `table_name`, `column_name`, `description`, and make this file available to the agent microservice. The `description` should include useful information (for example, domain knowledge) about a certain column in a table in the database. The hints module will pick up to five relevant columns together with their descriptions based on the user question using similarity search. The hints module will then pass these column descriptions to the agent node.
-
-### Output parser:
-
-Due to the current limitations of open source LLMs and serving frameworks (tgi and vllm) in generating tool call objects, we developed and optimized a custom output parser, together with our specially designed prompt templates. The output parser has 3 functions:
-
-1. Decide if a valid final answer presents in the raw agent output. This is needed because: a) we found sometimes agent would make guess or hallucinate data, so it is critical to double check, b) sometimes LLM does not strictly follow instructions on output format so simple string parsing can fail. We use one additional LLM call to perform this function.
-2. Pick out tool calls from raw agent output. And check if the agent has made same tool calls before. If yes, remove the repeated tool calls.
-3. Parse and review SQL query, and fix SQL query if there are errors. This proved to improve SQL agent performance since the initial query may contain errors and having a "second pair of eyes" can often spot the errors while the agent node itself may not be able to identify the errors in subsequent execution steps.
-
-## Overview of sql_agent
-
-The architecture of `sql_agent` is shown in the figure below.
-The agent node takes user question, hints (optional) and history (when available), and thinks step by step to solve the problem. The basic idea is the same as `sql_agent_llama`. However, since OpenAI APIs produce well-structured tool call objects, we don't need a special output parser. Instead, we only keep the query fixer.
-
-![SQL Agent Architecture](../../../assets/sql_agent.png)
-
-## Limitations
-
-1. Agent connects to local SQLite databases with uri.
-2. Agent is only allowed to issue "SELECT" commands to databases, i.e., agent can only query databases but cannot update databases.
-3. We currently does not support "streaming" agent outputs on the fly for `sql_agent_llama`.
-
-Please submit issues if you want new features to be added. We also welcome community contributions!
diff --git a/comps/agent/langchain/src/strategy/sqlagent/__init__.py b/comps/agent/langchain/src/strategy/sqlagent/__init__.py
deleted file mode 100644
index f8bf69ff4f..0000000000
--- a/comps/agent/langchain/src/strategy/sqlagent/__init__.py
+++ /dev/null
@@ -1,5 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-from .planner import SQLAgentLlama
-from .planner import SQLAgent
diff --git a/comps/agent/langchain/src/strategy/sqlagent/hint.py b/comps/agent/langchain/src/strategy/sqlagent/hint.py
deleted file mode 100644
index 06fb5b1553..0000000000
--- a/comps/agent/langchain/src/strategy/sqlagent/hint.py
+++ /dev/null
@@ -1,56 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import glob
-import os
-
-import pandas as pd
-
-
-def read_hints(hints_file):
-    """
-    hints_file: csv with columns: table_name, column_name, description
-    """
-    hints_df = pd.read_csv(hints_file)
-    cols_descriptions = []
-    values_descriptions = []
-    for _, row in hints_df.iterrows():
-        table_name = row["table_name"]
-        col_name = row["column_name"]
-        description = row["description"]
-        if not pd.isnull(description):
-            cols_descriptions.append(f"{table_name}.{col_name}: {description}")
-            values_descriptions.append(f"{col_name}: {description}")
-    return cols_descriptions, values_descriptions
-
-
-def sort_list(list1, list2):
-    import numpy as np
-
-    # Use numpy's argsort function to get the indices that would sort the second list
-    idx = np.argsort(list2)  # ascending order
-    return np.array(list1)[idx].tolist()[::-1], np.array(list2)[idx].tolist()[::-1]  # descending order
-
-
-def get_topk_cols(topk, cols_descriptions, similarities):
-    sorted_cols, similarities = sort_list(cols_descriptions, similarities)
-    top_k_cols = sorted_cols[:topk]
-    output = []
-    for col, sim in zip(top_k_cols, similarities[:topk]):
-        # print(f"{col}: {sim}")
-        if sim > 0.5:
-            output.append(col)
-    return output
-
-
-def pick_hints(query, model, column_embeddings, complete_descriptions, topk=5):
-    # use similarity to get the topk columns
-    query_embedding = model.encode(query, convert_to_tensor=True)
-    similarities = model.similarity(query_embedding, column_embeddings).flatten()
-
-    topk_cols_descriptions = get_topk_cols(topk, complete_descriptions, similarities)
-
-    hint = ""
-    for col in topk_cols_descriptions:
-        hint += col + "\n"
-    return hint
diff --git a/comps/agent/langchain/src/strategy/sqlagent/planner.py b/comps/agent/langchain/src/strategy/sqlagent/planner.py
deleted file mode 100644
index bc54d7d4d4..0000000000
--- a/comps/agent/langchain/src/strategy/sqlagent/planner.py
+++ /dev/null
@@ -1,322 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import json
-import os
-from typing import Annotated, Sequence, TypedDict
-
-from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage, ToolMessage
-from langchain_core.prompts import PromptTemplate
-from langchain_core.runnables import RunnableLambda
-from langgraph.graph import END, StateGraph
-from langgraph.graph.message import add_messages
-from langgraph.managed import IsLastStep
-from langgraph.prebuilt import ToolNode
-
-from ...utils import setup_chat_model, tool_renderer
-from ..base_agent import BaseAgent
-from .hint import pick_hints, read_hints
-from .prompt import AGENT_NODE_TEMPLATE, AGENT_SYSM, QUERYFIXER_PROMPT
-from .sql_tools import get_sql_query_tool, get_table_schema
-from .utils import (
-    LlamaOutputParserAndQueryFixer,
-    assemble_history,
-    convert_json_to_tool_call,
-    remove_repeated_tool_calls,
-)
-
-
-class AgentState(TypedDict):
-    """The state of the agent."""
-
-    messages: Annotated[Sequence[BaseMessage], add_messages]
-    is_last_step: IsLastStep
-    hint: str
-
-
-class AgentNodeLlama:
-    def __init__(self, args, tools):
-        self.llm = setup_chat_model(args)
-        self.args = args
-        # two types of tools:
-        # sql_db_query - always available, no need to specify
-        # other tools - user defined
-        # here, self.tools is a list of user defined tools
-        self.tools = tool_renderer(tools)
-        print("@@@@ Tools: ", self.tools)
-
-        self.chain = self.llm
-
-        self.output_parser = LlamaOutputParserAndQueryFixer(chat_model=self.llm)
-
-        if args.use_hints:
-            from sentence_transformers import SentenceTransformer
-
-            self.cols_descriptions, self.values_descriptions = read_hints(args.hints_file)
-            self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
-            self.column_embeddings = self.embed_model.encode(self.values_descriptions)
-
-    def __call__(self, state):
-        print("----------Call Agent Node----------")
-        question = state["messages"][0].content
-        table_schema, num_tables = get_table_schema(self.args.db_path)
-        if self.args.use_hints:
-            if not state["hint"]:
-                hints = pick_hints(question, self.embed_model, self.column_embeddings, self.cols_descriptions)
-            else:
-                hints = state["hint"]
-            print("@@@ Hints: ", hints)
-        else:
-            hints = ""
-
-        history = assemble_history(state["messages"])
-        print("@@@ History: ", history)
-
-        prompt = AGENT_NODE_TEMPLATE.format(
-            domain=self.args.db_name,
-            tools=self.tools,
-            num_tables=num_tables,
-            tables_schema=table_schema,
-            question=question,
-            hints=hints,
-            history=history,
-        )
-
-        output = self.chain.invoke(prompt)
-        output = self.output_parser.parse(
-            output.content, history, table_schema, hints, question, state["messages"]
-        )  # text: str, history: str, db_schema: str, hint: str
-        print("@@@@@ Agent output:\n", output)
-
-        # convert output to tool calls
-        tool_calls = []
-        for res in output:
-            if "tool" in res:
-                tool_call = convert_json_to_tool_call(res)
-                tool_calls.append(tool_call)
-
-        # check if same tool calls have been made before
-        # if yes, then remove the repeated tool calls
-        if tool_calls:
-            new_tool_calls = remove_repeated_tool_calls(tool_calls, state["messages"])
-            print("@@@@ New Tool Calls:\n", new_tool_calls)
-        else:
-            new_tool_calls = []
-
-        if new_tool_calls:
-            ai_message = AIMessage(content="", tool_calls=new_tool_calls)
-        elif tool_calls:
-            ai_message = AIMessage(content="Repeated previous steps.", tool_calls=tool_calls)
-        elif "answer" in output[0]:
-            ai_message = AIMessage(content=str(output[0]["answer"]))
-        else:
-            ai_message = AIMessage(content=str(output))
-
-        return {"messages": [ai_message], "hint": hints}
-
-
-class SQLAgentLlama(BaseAgent):
-    # need new args:
-    # # db_name and db_path
-    # # use_hints, hints_file
-    def __init__(self, args, with_memory=False, **kwargs):
-        super().__init__(args, local_vars=globals(), **kwargs)
-        # note: here tools only include user defined tools
-        # we need to add the sql query tool as well
-        print("@@@@ user defined tools: ", self.tools_descriptions)
-        agent = AgentNodeLlama(args, self.tools_descriptions)
-        sql_tool = get_sql_query_tool(args.db_path)
-        print("@@@@ SQL Tool: ", sql_tool)
-        tools = self.tools_descriptions + [sql_tool]
-        print("@@@@ ALL Tools: ", tools)
-        tool_node = ToolNode(tools)
-
-        workflow = StateGraph(AgentState)
-
-        # Define the nodes we will cycle between
-        workflow.add_node("agent", agent)
-        workflow.add_node("tools", tool_node)
-
-        workflow.set_entry_point("agent")
-
-        workflow.add_conditional_edges(
-            "agent",
-            self.decide_next_step,
-            {
-                # If `tools`, then we call the tool node.
-                "tools": "tools",
-                "agent": "agent",
-                "end": END,
-            },
-        )
-
-        # We now add a normal edge from `tools` to `agent`.
-        # This means that after `tools` is called, `agent` node is called next.
-        workflow.add_edge("tools", "agent")
-
-        self.app = workflow.compile()
-
-    def decide_next_step(self, state: AgentState):
-        messages = state["messages"]
-        last_message = messages[-1]
-        if last_message.tool_calls and last_message.content == "Repeated previous steps.":
-            print("@@@@ Repeated tool calls from previous steps, go back to agent")
-            return "agent"
-        elif last_message.tool_calls and last_message.content != "Repeated previous steps.":
-            print("@@@@ New Tool calls, go to tools")
-            return "tools"
-        else:
-            return "end"
-
-    def prepare_initial_state(self, query):
-        return {"messages": [HumanMessage(content=query)], "is_last_step": IsLastStep(False), "hint": ""}
-
-
-################################################
-# Below is SQL agent using OpenAI models
-################################################
-class AgentNode:
-    def __init__(self, args, llm, tools):
-        self.llm = llm.bind_tools(tools)
-        self.args = args
-        if args.use_hints:
-            from sentence_transformers import SentenceTransformer
-
-            self.cols_descriptions, self.values_descriptions = read_hints(args.hints_file)
-            self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
-            self.column_embeddings = self.embed_model.encode(self.values_descriptions)
-
-    def __call__(self, state):
-        print("----------Call Agent Node----------")
-        question = state["messages"][0].content
-        table_schema, num_tables = get_table_schema(self.args.db_path)
-        if self.args.use_hints:
-            if not state["hint"]:
-                hints = pick_hints(question, self.embed_model, self.column_embeddings, self.cols_descriptions)
-            else:
-                hints = state["hint"]
-        else:
-            hints = ""
-
-        sysm = AGENT_SYSM.format(num_tables=num_tables, tables_schema=table_schema, question=question, hints=hints)
-        _system_message = SystemMessage(content=sysm)
-        state_modifier_runnable = RunnableLambda(
-            lambda state: [_system_message] + state["messages"],
-            name="StateModifier",
-        )
-
-        chain = state_modifier_runnable | self.llm
-        response = chain.invoke(state)
-
-        return {"messages": [response], "hint": hints}
-
-
-class QueryFixerNode:
-    def __init__(self, args, llm):
-        prompt = PromptTemplate(
-            template=QUERYFIXER_PROMPT,
-            input_variables=["DATABASE_SCHEMA", "QUESTION", "HINT", "QUERY", "RESULT"],
-        )
-        self.chain = prompt | llm
-        self.args = args
-
-    def get_sql_query_and_result(self, state):
-        messages = state["messages"]
-        assert isinstance(messages[-1], ToolMessage), "The last message should be a tool message"
-        result = messages[-1].content
-        id = messages[-1].tool_call_id
-        query = ""
-        for msg in reversed(messages):
-            if isinstance(msg, AIMessage) and msg.tool_calls:
-                if msg.tool_calls[0]["id"] == id:
-                    query = msg.tool_calls[0]["args"]["query"]
-                    break
-        print("@@@@ Executed SQL Query: ", query)
-        print("@@@@ Execution Result: ", result)
-        return query, result
-
-    def __call__(self, state):
-        print("----------Call Query Fixer Node----------")
-        table_schema, _ = get_table_schema(self.args.db_path)
-        question = state["messages"][0].content
-        hint = state["hint"]
-        query, result = self.get_sql_query_and_result(state)
-        response = self.chain.invoke(
-            {
-                "DATABASE_SCHEMA": table_schema,
-                "QUESTION": question,
-                "HINT": hint,
-                "QUERY": query,
-                "RESULT": result,
-            }
-        )
-        # print("@@@@@ Query fixer output:\n", response.content)
-        return {"messages": [response]}
-
-
-class SQLAgent(BaseAgent):
-    def __init__(self, args, with_memory=False, **kwargs):
-        super().__init__(args, local_vars=globals(), **kwargs)
-
-        sql_tool = get_sql_query_tool(args.db_path)
-        tools = self.tools_descriptions + [sql_tool]
-        print("@@@@ ALL Tools: ", tools)
-
-        tool_node = ToolNode(tools)
-        agent = AgentNode(args, self.llm, tools)
-        query_fixer = QueryFixerNode(args, self.llm)
-
-        workflow = StateGraph(AgentState)
-
-        # Define the nodes we will cycle between
-        workflow.add_node("agent", agent)
-        workflow.add_node("query_fixer", query_fixer)
-        workflow.add_node("tools", tool_node)
-
-        workflow.set_entry_point("agent")
-
-        # We now add a conditional edge
-        workflow.add_conditional_edges(
-            "agent",
-            self.should_continue,
-            {
-                # If `tools`, then we call the tool node.
-                "continue": "tools",
-                "end": END,
-            },
-        )
-
-        workflow.add_conditional_edges(
-            "tools",
-            self.should_go_to_query_fixer,
-            {"true": "query_fixer", "false": "agent"},
-        )
-        workflow.add_edge("query_fixer", "agent")
-
-        self.app = workflow.compile()
-
-    # Define the function that determines whether to continue or not
-    def should_continue(self, state: AgentState):
-        messages = state["messages"]
-        last_message = messages[-1]
-        # If there is no function call, then we finish
-        if not last_message.tool_calls:
-            return "end"
-        # Otherwise if there is, we continue
-        else:
-            return "continue"
-
-    def should_go_to_query_fixer(self, state: AgentState):
-        messages = state["messages"]
-        last_message = messages[-1]
-        assert isinstance(last_message, ToolMessage), "The last message should be a tool message"
-        print("@@@@ Called Tool: ", last_message.name)
-        if last_message.name == "sql_db_query":
-            print("@@@@ Going to Query Fixer")
-            return "true"
-        else:
-            print("@@@@ Going back to Agent")
-            return "false"
-
-    def prepare_initial_state(self, query):
-        return {"messages": [HumanMessage(content=query)], "is_last_step": IsLastStep(False), "hint": ""}
diff --git a/comps/agent/langchain/src/strategy/sqlagent/prompt.py b/comps/agent/langchain/src/strategy/sqlagent/prompt.py
deleted file mode 100644
index dae63766fc..0000000000
--- a/comps/agent/langchain/src/strategy/sqlagent/prompt.py
+++ /dev/null
@@ -1,225 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-AGENT_NODE_TEMPLATE = """\
-You are an SQL expert tasked with answering questions about {domain}.
-In addition to the database, you have the following tools to gather information:
-{tools}
-
-You can access a database that has {num_tables} tables. The schema of the tables is as follows. Read the schema carefully.
-**Table Schema:**
-{tables_schema}
-
-**Hints:**
-{hints}
-
-When querying the database, remember the following:
-1. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to no more than 20 results.
-2. Only query columns that are relevant to the question. Remember to also fetch the ranking or filtering columns to check if they contain nulls.
-3. DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
-
-**Output format:**
-1. Write down your thinking process.
-2. When querying the database, write your SQL query in the following format:
-```sql
-SELECT column1, column2, ...
-```
-3. When making tool calls, you must use the following format. Make ONLY one tool call at a time.
-TOOL CALL: {{"tool": "tool1", "args": {{"arg1": "value1", "arg2": "value2", ...}}}}
-
-4. After you have arrived at the answer with data and reasoning, write your final answer after "FINAL ANSWER".
-
-You have done the following steps so far:
-**Your previous steps:**
-{history}
-
-**IMPORTANT:**
-* Review your previous steps carefully and utilize them to answer the question. Do not repeat your previous steps.
-* The database may not have all the information needed to answer the question. Use the additional tools provided if necessary.
-* If you did not get the answer at first, do not give up. Reflect on the steps that you have taken and try a different way. Think out of the box.
-
-Now take a deep breath and think step by step to answeer the following question.
-Question:
-{question}
-"""
-
-
-ANSWER_PARSER_PROMPT = """\
-Review the output from an SQL agent and determine if a correct answer has been provided and grounded on real data.
-
-Say "yes" when all the following conditions are met:
-1. The answer is complete and does not require additional steps to be taken.
-2. The answer does not have placeholders that need to be filled in.
-3. The agent has acquired data from database and its execution history is Not empty.
-4. If agent made mistakes in its execution history, the agent has corrected them.
-5. If agent has tried to get data several times but cannot get all the data needed, the agent has come up with an answer based on available data and reasonable assumptions.
-
-If the conditions above are not met, say "no".
-
-Here is the output from the SQL agent:
-{output}
-======================
-Here is the agent execution history:
-{history}
-======================
-
-Has a final answer been provided based on real data? Analyze the agent output and make your judgement "yes" or "no".
-"""
-
-
-SQL_QUERY_FIXER_PROMPT = """\
-You are an SQL database expert tasked with reviewing a SQL query written by an agent.
-**Procedure:**
-1. Review Database Schema:
-- Examine the table creation statements to understand the database structure.
-2. Review the Hint provided.
-- Use the provided hints to understand the domain knowledge relevant to the query.
-3. Check against the following common errors:
-- Failure to exclude null values, ranking or filtering columns have nulls, syntax errors, incorrect table references, incorrect column references, logical mistakes.
-4. Check if aggregation should be used:
-- Read the user question, and determine if user is asking for specific instances or aggregated info. If aggregation is needed, check if the original SQL query has used appropriate functions like COUNT and SUM.
-5. Correct the Query only when Necessary:
-- If issues were identified, modify the SQL query to address the identified issues, ensuring it correctly fetches the requested data according to the database schema and query requirements.
-
-======= Your task =======
-**************************
-Table creation statements
-{DATABASE_SCHEMA}
-**************************
-Hint:
-{HINT}
-**************************
-The SQL query to review:
-{QUERY}
-**************************
-User question:
-{QUESTION}
-**************************
-
-Now analyze the SQL query step by step. Present your reasonings.
-
-If you identified issues in the original query, write down the corrected SQL query in the format below:
-```sql
-SELECT column1, column2, ...
-```
-
-If the original SQL query is correct, just say the query is correct.
-
-Note: Some user questions can only be answered partially with the database. This is OK. The agent may use other tools in subsequent steps to get additional info. In some cases, the agent may have got additional info with other tools and have incorporated those in its query. Your goal is to review the SQL query and fix it when necessary.
-Only use the tables provided in the database schema in your corrected query. Do not join tables that are not present in the schema. Do not create any new tables.
-If you cannot do better than the original query, just say the query is correct.
-"""
-
-SQL_QUERY_FIXER_PROMPT_with_result = """\
-You are an SQL database expert tasked with reviewing a SQL query.
-**Procedure:**
-1. Review Database Schema:
-- Examine the table creation statements to understand the database structure.
-2. Review the Hint provided.
-- Use the provided hints to understand the domain knowledge relevant to the query.
-3. Analyze Query Requirements:
-- User Question: Consider what information the query is supposed to retrieve. Decide if aggregation like COUNT or SUM is needed.
-- Executed SQL Query: Review the SQL query that was previously executed.
-- Execution Result: Analyze the outcome of the executed query. Think carefully if the result makes sense.
-4. Check against the following common errors:
-- Failure to exclude null values, ranking or filtering columns have nulls, syntax errors, incorrect table references, incorrect column references, logical mistakes.
-5. Correct the Query only when Necessary:
-- If issues were identified, modify the SQL query to address the identified issues, ensuring it correctly fetches the requested data according to the database schema and query requirements.
-
-======= Your task =======
-**************************
-Table creation statements
-{DATABASE_SCHEMA}
-**************************
-Hint:
-{HINT}
-**************************
-User Question:
-{QUESTION}
-**************************
-The SQL query executed was:
-{QUERY}
-**************************
-The execution result:
-{RESULT}
-**************************
-
-Now analyze the SQL query step by step. Present your reasonings.
-
-If you identified issues in the original query, write down the corrected SQL query in the format below:
-```sql
-SELECT column1, column2, ...
-```
-
-If the original SQL query is correct, just say the query is correct.
-
-Note: Some user questions can only be answered partially with the database. This is OK. The agent may use other tools in subsequent steps to get additional info. In some cases, the agent may have got additional info with other tools and have incorporated those in its query. Your goal is to review the SQL query and fix it when necessary.
-Only use the tables provided in the database schema in your corrected query. Do not join tables that are not present in the schema. Do not create any new tables.
-If you cannot do better than the original query, just say the query is correct.
-"""
-
-
-##########################################
-## Prompt templates for SQL agent using OpenAI models
-##########################################
-AGENT_SYSM = """\
-You are an SQL expert tasked with answering questions about schools in California.
-You can access a database that has {num_tables} tables. The schema of the tables is as follows. Read the schema carefully.
-{tables_schema}
-****************
-Question: {question}
-
-Hints:
-{hints}
-****************
-
-When querying the database, remember the following:
-1. You MUST double check your SQL query before executing it. Reflect on the steps you have taken and fix errors if there are any. If you get an error while executing a query, rewrite the query and try again.
-2. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to no more than 20 results.
-3. Only query columns that are relevant to the question.
-4. DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
-
-IMPORTANT:
-* Divide the question into sub-questions and conquer sub-questions one by one.
-* You may need to combine information from multiple tables to answer the question.
-* If database does not have all the information needed to answer the question, use the web search tool or your own knowledge.
-* If you did not get the answer at first, do not give up. Reflect on the steps that you have taken and try a different way. Think out of the box. You hard work will be rewarded.
-
-Now take a deep breath and think step by step to solve the problem.
-"""
-
-QUERYFIXER_PROMPT = """\
-You are an SQL database expert tasked with reviewing a SQL query.
-**Procedure:**
-1. Review Database Schema:
-- Examine the table creation statements to understand the database structure.
-2. Review the Hint provided.
-- Use the provided hints to understand the domain knowledge relevant to the query.
-3. Analyze Query Requirements:
-- Original Question: Consider what information the query is supposed to retrieve.
-- Executed SQL Query: Review the SQL query that was previously executed.
-- Execution Result: Analyze the outcome of the executed query. Think carefully if the result makes sense. If the result does not make sense, identify the issues with the executed SQL query (e.g., null values, syntax
-errors, incorrect table references, incorrect column references, logical mistakes).
-4. Correct the Query if Necessary:
-- If issues were identified, modify the SQL query to address the identified issues, ensuring it correctly fetches the requested data
-according to the database schema and query requirements.
-5. If the query is correct, provide the same query as the final answer.
-
-======= Your task =======
-**************************
-Table creation statements
-{DATABASE_SCHEMA}
-**************************
-Hint:
-{HINT}
-**************************
-The original question is:
-Question:
-{QUESTION}
-The SQL query executed was:
-{QUERY}
-The execution result:
-{RESULT}
-**************************
-Based on the question, table schema, hint and the previous query, analyze the result. Fix the query if needed and provide your reasoning. If the query is correct, provide the same query as the final answer.
-"""
diff --git a/comps/agent/langchain/src/strategy/sqlagent/sql_tools.py b/comps/agent/langchain/src/strategy/sqlagent/sql_tools.py
deleted file mode 100644
index bb76716c6e..0000000000
--- a/comps/agent/langchain/src/strategy/sqlagent/sql_tools.py
+++ /dev/null
@@ -1,32 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool
-from langchain_community.utilities import SQLDatabase
-
-
-def connect_to_db(db_path):
-    uri = "sqlite:///{path}".format(path=db_path)
-    db = SQLDatabase.from_uri(uri)
-    return db
-
-
-def get_table_schema(db_path):
-    db = connect_to_db(db_path)
-    table_names = ", ".join(db.get_usable_table_names())
-    num_tables = len(table_names.split(","))
-    schema = db.get_table_info_no_throw([t.strip() for t in table_names.split(",")])
-    return schema, num_tables
-
-
-def get_sql_query_tool(db_path):
-    db = connect_to_db(db_path)
-    query_sql_database_tool_description = (
-        "Input to this tool is a detailed and correct SQL query, output is a "
-        "result from the database. If the query is not correct, an error message "
-        "will be returned. If an error is returned, rewrite the query, check the "
-        "query, and try again. "
-    )
-    db_query_tool = QuerySQLDataBaseTool(db=db, name="sql_db_query", description=query_sql_database_tool_description)
-    print("SQL Query Tool Created: ", db_query_tool)
-    return db_query_tool
diff --git a/comps/agent/langchain/src/strategy/sqlagent/utils.py b/comps/agent/langchain/src/strategy/sqlagent/utils.py
deleted file mode 100644
index 32bf611a9a..0000000000
--- a/comps/agent/langchain/src/strategy/sqlagent/utils.py
+++ /dev/null
@@ -1,219 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import json
-import uuid
-
-from langchain_core.messages import AIMessage, ToolMessage
-from langchain_core.messages.tool import ToolCall
-
-from .prompt import ANSWER_PARSER_PROMPT, SQL_QUERY_FIXER_PROMPT, SQL_QUERY_FIXER_PROMPT_with_result
-
-
-def parse_answer_with_llm(text, history, chat_model):
-    if "FINAL ANSWER:" in text.upper():
-        if history == "":
-            history = "The agent execution history is empty."
-
-        prompt = ANSWER_PARSER_PROMPT.format(output=text, history=history)
-        response = chat_model.invoke(prompt).content
-        print("@@@ Answer parser response: ", response)
-
-        temp = response[:5]
-        if "yes" in temp.lower():
-            return text.split("FINAL ANSWER:")[-1]
-        else:
-            temp = response.split("\n")[0]
-            if "yes" in temp.lower():
-                return text.split("FINAL ANSWER:")[-1]
-            return None
-    else:
-        return None
-
-
-def get_tool_calls_other_than_sql(text):
-    """Get the tool calls other than sql_db_query."""
-    tool_calls = []
-    text = text.replace("assistant", "")
-    json_lines = text.split("\n")
-    # only get the unique lines
-    json_lines = list(set(json_lines))
-    for line in json_lines:
-        if "TOOL CALL:" in line:
-            if "sql_db_query" not in line:
-                line = line.replace("TOOL CALL:", "")
-                if "assistant" in line:
-                    line = line.replace("assistant", "")
-                if "\\" in line:
-                    line = line.replace("\\", "")
-                try:
-                    parsed_line = json.loads(line)
-                    if isinstance(parsed_line, dict):
-                        if "tool" in parsed_line:
-                            tool_calls.append(parsed_line)
-
-                except:
-                    pass
-    return tool_calls
-
-
-def get_all_sql_queries(text):
-    queries = []
-    if "```sql" in text:
-        temp = text.split("```sql")
-        for t in temp:
-            if "```" in t:
-                query = t.split("```")[0]
-                if "SELECT" in query.upper() and "TOOL CALL" not in query.upper():
-                    queries.append(query)
-
-    return queries
-
-
-def get_the_last_sql_query(text):
-    queries = get_all_sql_queries(text)
-    if queries:
-        return queries[-1]
-    else:
-        return None
-
-
-def check_query_if_executed_and_result(query, messages):
-    # get previous sql_db_query tool calls
-    previous_tool_calls = []
-    for m in messages:
-        if isinstance(m, AIMessage) and m.tool_calls:
-            for tc in m.tool_calls:
-                if tc["name"] == "sql_db_query":
-                    previous_tool_calls.append(tc)
-    for tc in previous_tool_calls:
-        if query == tc["args"]["query"]:
-            return get_tool_output(messages, tc["id"])
-
-    return None
-
-
-def parse_and_fix_sql_query_v2(text, chat_model, db_schema, hint, question, messages):
-    chosen_query = get_the_last_sql_query(text)
-    if chosen_query:
-        # check if the query has been executed before
-        # if yes, pass execution result to the fixer
-        # if not, pass only the query to the fixer
-        result = check_query_if_executed_and_result(chosen_query, messages)
-        if result:
-            prompt = SQL_QUERY_FIXER_PROMPT_with_result.format(
-                DATABASE_SCHEMA=db_schema, HINT=hint, QUERY=chosen_query, QUESTION=question, RESULT=result
-            )
-        else:
-            prompt = SQL_QUERY_FIXER_PROMPT.format(
-                DATABASE_SCHEMA=db_schema, HINT=hint, QUERY=chosen_query, QUESTION=question
-            )
-
-        response = chat_model.invoke(prompt).content
-        print("@@@ SQL query fixer response: ", response)
-        if "query is correct" in response.lower():
-            return chosen_query
-        else:
-            # parse the fixed query
-            fixed_query = get_the_last_sql_query(response)
-            return fixed_query
-    else:
-        return None
-
-
-class LlamaOutputParserAndQueryFixer:
-    def __init__(self, chat_model):
-        self.chat_model = chat_model
-
-    def parse(self, text: str, history: str, db_schema: str, hint: str, question: str, messages: list):
-        print("@@@ Raw output from llm:\n", text)
-        answer = parse_answer_with_llm(text, history, self.chat_model)
-        if answer:
-            print("Final answer exists.")
-            return answer
-        else:
-            tool_calls = get_tool_calls_other_than_sql(text)
-            sql_query = parse_and_fix_sql_query_v2(text, self.chat_model, db_schema, hint, question, messages)
-            if sql_query:
-                sql_tool_call = [{"tool": "sql_db_query", "args": {"query": sql_query}}]
-                tool_calls.extend(sql_tool_call)
-            if tool_calls:
-                return tool_calls
-            else:
-                return text
-
-
-def convert_json_to_tool_call(json_str):
-    tool_name = json_str["tool"]
-    tool_args = json_str["args"]
-    tcid = str(uuid.uuid4())
-    tool_call = ToolCall(name=tool_name, args=tool_args, id=tcid)
-    return tool_call
-
-
-def get_tool_output(messages, id):
-    tool_output = ""
-    for msg in reversed(messages):
-        if isinstance(msg, ToolMessage):
-            if msg.tool_call_id == id:
-                tool_output = msg.content
-                tool_output = tool_output[:1000]  # limit to 1000 characters
-                break
-    return tool_output
-
-
-def assemble_history(messages):
-    """
-    messages: AI, TOOL, AI, TOOL, etc.
-    """
-    query_history = ""
-    breaker = "-" * 10
-    n = 1
-    for m in messages[1:]:  # exclude the first message
-        if isinstance(m, AIMessage):
-            # if there is tool call
-            if hasattr(m, "tool_calls") and len(m.tool_calls) > 0 and m.content != "Repeated previous steps.":
-                for tool_call in m.tool_calls:
-                    tool = tool_call["name"]
-                    tc_args = tool_call["args"]
-                    id = tool_call["id"]
-                    tool_output = get_tool_output(messages, id)
-                    if tool == "sql_db_query":
-                        sql_query = tc_args["query"]
-                        query_history += (
-                            f"Step {n}. Executed SQL query: {sql_query}\nQuery Result: {tool_output}\n{breaker}\n"
-                        )
-                    else:
-                        query_history += (
-                            f"Step {n}. Called tool: {tool} - {tc_args}\nTool Output: {tool_output}\n{breaker}\n"
-                        )
-                    n += 1
-            elif m.content == "Repeated previous steps.":  # repeated steps
-                query_history += f"Step {n}. Repeated tool calls from previous steps.\n{breaker}\n"
-                n += 1
-            else:
-                # did not make tool calls
-                query_history += f"Assistant Output: {m.content}\n"
-
-    return query_history
-
-
-def remove_repeated_tool_calls(tool_calls, messages):
-    """Remove repeated tool calls in the messages.
-
-    tool_calls: list of tool calls: ToolCall(name=tool_name, args=tool_args, id=tcid)
-    messages: list of messages: AIMessage, ToolMessage, HumanMessage
-    """
-    # first get all the previous tool calls in messages
-    previous_tool_calls = []
-    for m in messages:
-        if isinstance(m, AIMessage) and m.tool_calls and m.content != "Repeated previous steps.":
-            for tc in m.tool_calls:
-                previous_tool_calls.append({"tool": tc["name"], "args": tc["args"]})
-
-    unique_tool_calls = []
-    for tc in tool_calls:
-        if {"tool": tc["name"], "args": tc["args"]} not in previous_tool_calls:
-            unique_tool_calls.append(tc)
-
-    return unique_tool_calls
diff --git a/comps/agent/langchain/src/utils.py b/comps/agent/langchain/src/utils.py
index bdd25f5188..e8a317a5df 100644
--- a/comps/agent/langchain/src/utils.py
+++ b/comps/agent/langchain/src/utils.py
@@ -139,14 +139,8 @@ def get_args():
     parser.add_argument("--with_store", type=bool, default=False)
     parser.add_argument("--timeout", type=int, default=60)
 
-    # for sql agent
-    parser.add_argument("--db_path", type=str, help="database path")
-    parser.add_argument("--db_name", type=str, help="database name")
-    parser.add_argument("--use_hints", type=str, default="false", help="If this agent uses hints")
-    parser.add_argument("--hints_file", type=str, help="path to the hints file")
-
     sys_args, unknown_args = parser.parse_known_args()
-    print("env_config: ", env_config)
+    # print("env_config: ", env_config)
     if env_config != []:
         env_args, env_unknown_args = parser.parse_known_args(env_config)
         unknown_args += env_unknown_args
@@ -157,12 +151,5 @@ def get_args():
         sys_args.streaming = True
     else:
         sys_args.streaming = False
-
-    if sys_args.use_hints == "true":
-        print("SQL agent will use hints")
-        sys_args.use_hints = True
-    else:
-        sys_args.use_hints = False
-
     print("==========sys_args==========:\n", sys_args)
     return sys_args, unknown_args
diff --git a/comps/cores/mega/gateway.py b/comps/cores/mega/gateway.py
deleted file mode 100644
index 29642eea55..0000000000
--- a/comps/cores/mega/gateway.py
+++ /dev/null
@@ -1,1117 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import base64
-import os
-from io import BytesIO
-from typing import List, Union
-
-import requests
-from fastapi import File, Request, UploadFile
-from fastapi.responses import StreamingResponse
-from PIL import Image
-
-from ..proto.api_protocol import (
-    AudioChatCompletionRequest,
-    ChatCompletionRequest,
-    ChatCompletionResponse,
-    ChatCompletionResponseChoice,
-    ChatMessage,
-    DocSumChatCompletionRequest,
-    EmbeddingRequest,
-    UsageInfo,
-)
-from ..proto.docarray import DocSumDoc, LLMParams, LLMParamsDoc, RerankedDoc, RerankerParms, RetrieverParms, TextDoc
-from .constants import MegaServiceEndpoint, ServiceRoleType, ServiceType
-from .micro_service import MicroService
-
-
-def read_pdf(file):
-    from langchain.document_loaders import PyPDFLoader
-
-    loader = PyPDFLoader(file)
-    docs = loader.load_and_split()
-    return docs
-
-
-def read_text_from_file(file, save_file_name):
-    import docx2txt
-    from langchain.text_splitter import CharacterTextSplitter
-
-    # read text file
-    if file.headers["content-type"] == "text/plain":
-        file.file.seek(0)
-        content = file.file.read().decode("utf-8")
-        # Split text
-        text_splitter = CharacterTextSplitter()
-        texts = text_splitter.split_text(content)
-        # Create multiple documents
-        file_content = texts
-    # read pdf file
-    elif file.headers["content-type"] == "application/pdf":
-        documents = read_pdf(save_file_name)
-        file_content = [doc.page_content for doc in documents]
-    # read docx file
-    elif (
-        file.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
-        or file.headers["content-type"] == "application/octet-stream"
-    ):
-        file_content = docx2txt.process(save_file_name)
-
-    return file_content
-
-
-class Gateway:
-    def __init__(
-        self,
-        megaservice,
-        host="0.0.0.0",
-        port=8888,
-        endpoint=str(MegaServiceEndpoint.CHAT_QNA),
-        input_datatype=ChatCompletionRequest,
-        output_datatype=ChatCompletionResponse,
-    ):
-        self.megaservice = megaservice
-        self.host = host
-        self.port = port
-        self.endpoint = endpoint
-        self.input_datatype = input_datatype
-        self.output_datatype = output_datatype
-        self.service = MicroService(
-            self.__class__.__name__,
-            service_role=ServiceRoleType.MEGASERVICE,
-            service_type=ServiceType.GATEWAY,
-            host=self.host,
-            port=self.port,
-            endpoint=self.endpoint,
-            input_datatype=self.input_datatype,
-            output_datatype=self.output_datatype,
-        )
-        self.define_routes()
-        self.service.start()
-
-    def define_routes(self):
-        self.service.app.router.add_api_route(self.endpoint, self.handle_request, methods=["POST"])
-        self.service.app.router.add_api_route(str(MegaServiceEndpoint.LIST_SERVICE), self.list_service, methods=["GET"])
-        self.service.app.router.add_api_route(
-            str(MegaServiceEndpoint.LIST_PARAMETERS), self.list_parameter, methods=["GET"]
-        )
-
-    def add_route(self, endpoint, handler, methods=["POST"]):
-        self.service.app.router.add_api_route(endpoint, handler, methods=methods)
-
-    def stop(self):
-        self.service.stop()
-
-    async def handle_request(self, request: Request):
-        raise NotImplementedError("Subclasses must implement this method")
-
-    def list_service(self):
-        response = {}
-        for node, service in self.megaservice.services.items():
-            # Check if the service has a 'description' attribute and it is not None
-            if hasattr(service, "description") and service.description:
-                response[node] = {"description": service.description}
-            # Check if the service has an 'endpoint' attribute and it is not None
-            if hasattr(service, "endpoint") and service.endpoint:
-                if node in response:
-                    response[node]["endpoint"] = service.endpoint
-                else:
-                    response[node] = {"endpoint": service.endpoint}
-            # If neither 'description' nor 'endpoint' is available, add an error message for the node
-            if node not in response:
-                response[node] = {"error": f"Service node {node} does not have 'description' or 'endpoint' attribute."}
-        return response
-
-    def list_parameter(self):
-        pass
-
-    def _handle_message(self, messages):
-        images = []
-        if isinstance(messages, str):
-            prompt = messages
-        else:
-            messages_dict = {}
-            system_prompt = ""
-            prompt = ""
-            for message in messages:
-                msg_role = message["role"]
-                if msg_role == "system":
-                    system_prompt = message["content"]
-                elif msg_role == "user":
-                    if type(message["content"]) == list:
-                        text = ""
-                        text_list = [item["text"] for item in message["content"] if item["type"] == "text"]
-                        text += "\n".join(text_list)
-                        image_list = [
-                            item["image_url"]["url"] for item in message["content"] if item["type"] == "image_url"
-                        ]
-                        if image_list:
-                            messages_dict[msg_role] = (text, image_list)
-                        else:
-                            messages_dict[msg_role] = text
-                    else:
-                        messages_dict[msg_role] = message["content"]
-                elif msg_role == "assistant":
-                    messages_dict[msg_role] = message["content"]
-                else:
-                    raise ValueError(f"Unknown role: {msg_role}")
-
-            if system_prompt:
-                prompt = system_prompt + "\n"
-            for role, message in messages_dict.items():
-                if isinstance(message, tuple):
-                    text, image_list = message
-                    if text:
-                        prompt += role + ": " + text + "\n"
-                    else:
-                        prompt += role + ":"
-                    for img in image_list:
-                        # URL
-                        if img.startswith("http://") or img.startswith("https://"):
-                            response = requests.get(img)
-                            image = Image.open(BytesIO(response.content)).convert("RGBA")
-                            image_bytes = BytesIO()
-                            image.save(image_bytes, format="PNG")
-                            img_b64_str = base64.b64encode(image_bytes.getvalue()).decode()
-                        # Local Path
-                        elif os.path.exists(img):
-                            image = Image.open(img).convert("RGBA")
-                            image_bytes = BytesIO()
-                            image.save(image_bytes, format="PNG")
-                            img_b64_str = base64.b64encode(image_bytes.getvalue()).decode()
-                        # Bytes
-                        else:
-                            img_b64_str = img
-
-                        images.append(img_b64_str)
-                else:
-                    if message:
-                        prompt += role + ": " + message + "\n"
-                    else:
-                        prompt += role + ":"
-        if images:
-            return prompt, images
-        else:
-            return prompt
-
-
-class ChatQnAGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.CHAT_QNA), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        print("data in handle request", data)
-        stream_opt = data.get("stream", True)
-        chat_request = ChatCompletionRequest.parse_obj(data)
-        print("chat request in handle request", chat_request)
-        prompt = self._handle_message(chat_request.messages)
-        print("prompt in gateway", prompt)
-        parameters = LLMParams(
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-            chat_template=chat_request.chat_template if chat_request.chat_template else None,
-            model=(
-                chat_request.model
-                if chat_request.model
-                else os.getenv("MODEL_ID") if os.getenv("MODEL_ID") else "Intel/neural-chat-7b-v3-3"
-            ),
-        )
-        retriever_parameters = RetrieverParms(
-            search_type=chat_request.search_type if chat_request.search_type else "similarity",
-            k=chat_request.k if chat_request.k else 4,
-            distance_threshold=chat_request.distance_threshold if chat_request.distance_threshold else None,
-            fetch_k=chat_request.fetch_k if chat_request.fetch_k else 20,
-            lambda_mult=chat_request.lambda_mult if chat_request.lambda_mult else 0.5,
-            score_threshold=chat_request.score_threshold if chat_request.score_threshold else 0.2,
-        )
-        reranker_parameters = RerankerParms(
-            top_n=chat_request.top_n if chat_request.top_n else 1,
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"text": prompt},
-            llm_parameters=parameters,
-            retriever_parameters=retriever_parameters,
-            reranker_parameters=reranker_parameters,
-        )
-        for node, response in result_dict.items():
-            if isinstance(response, StreamingResponse):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="chatqna", choices=choices, usage=usage)
-
-
-class CodeGenGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.CODE_GEN), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        stream_opt = data.get("stream", True)
-        chat_request = ChatCompletionRequest.parse_obj(data)
-        prompt = self._handle_message(chat_request.messages)
-        parameters = LLMParams(
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.2,
-            streaming=stream_opt,
-            model=chat_request.model if chat_request.model else None,
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"query": prompt}, llm_parameters=parameters
-        )
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="codegen", choices=choices, usage=usage)
-
-
-class CodeTransGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.CODE_TRANS), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        language_from = data["language_from"]
-        language_to = data["language_to"]
-        source_code = data["source_code"]
-        prompt_template = """
-            ### System: Please translate the following {language_from} codes into {language_to} codes. Don't output any other content except translated codes.
-
-            ### Original {language_from} codes:
-            '''
-
-            {source_code}
-
-            '''
-
-            ### Translated {language_to} codes:
-
-        """
-        prompt = prompt_template.format(language_from=language_from, language_to=language_to, source_code=source_code)
-
-        parameters = LLMParams(
-            max_tokens=data.get("max_tokens", 1024),
-            top_k=data.get("top_k", 10),
-            top_p=data.get("top_p", 0.95),
-            temperature=data.get("temperature", 0.01),
-            repetition_penalty=data.get("repetition_penalty", 1.03),
-            streaming=data.get("stream", True),
-        )
-
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"query": prompt}, llm_parameters=parameters
-        )
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="codetrans", choices=choices, usage=usage)
-
-
-class TranslationGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.TRANSLATION), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        language_from = data["language_from"]
-        language_to = data["language_to"]
-        source_language = data["source_language"]
-        prompt_template = """
-            Translate this from {language_from} to {language_to}:
-
-            {language_from}:
-            {source_language}
-
-            {language_to}:
-        """
-        prompt = prompt_template.format(
-            language_from=language_from, language_to=language_to, source_language=source_language
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(initial_inputs={"query": prompt})
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="translation", choices=choices, usage=usage)
-
-
-class DocSumGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice,
-            host,
-            port,
-            str(MegaServiceEndpoint.DOC_SUMMARY),
-            input_datatype=DocSumChatCompletionRequest,
-            output_datatype=ChatCompletionResponse,
-        )
-
-    async def handle_request(self, request: Request, files: List[UploadFile] = File(default=None)):
-
-        if "application/json" in request.headers.get("content-type"):
-            data = await request.json()
-            stream_opt = data.get("stream", True)
-            chat_request = ChatCompletionRequest.model_validate(data)
-            prompt = self._handle_message(chat_request.messages)
-
-            initial_inputs_data = {data["type"]: prompt}
-
-        elif "multipart/form-data" in request.headers.get("content-type"):
-            data = await request.form()
-            stream_opt = data.get("stream", True)
-            chat_request = ChatCompletionRequest.model_validate(data)
-
-            data_type = data.get("type")
-
-            file_summaries = []
-            if files:
-                for file in files:
-                    file_path = f"/tmp/{file.filename}"
-
-                    if data_type is not None and data_type in ["audio", "video"]:
-                        raise ValueError(
-                            "Audio and Video file uploads are not supported in docsum with curl request, please use the UI."
-                        )
-
-                    else:
-                        import aiofiles
-
-                        async with aiofiles.open(file_path, "wb") as f:
-                            await f.write(await file.read())
-
-                        docs = read_text_from_file(file, file_path)
-                        os.remove(file_path)
-
-                        if isinstance(docs, list):
-                            file_summaries.extend(docs)
-                        else:
-                            file_summaries.append(docs)
-
-            if file_summaries:
-                prompt = self._handle_message(chat_request.messages) + "\n".join(file_summaries)
-            else:
-                prompt = self._handle_message(chat_request.messages)
-
-            data_type = data.get("type")
-            if data_type is not None:
-                initial_inputs_data = {}
-                initial_inputs_data[data_type] = prompt
-            else:
-                initial_inputs_data = {"query": prompt}
-
-        else:
-            raise ValueError(f"Unknown request type: {request.headers.get('content-type')}")
-
-        parameters = LLMParams(
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-            model=chat_request.model if chat_request.model else None,
-            language=chat_request.language if chat_request.language else "auto",
-        )
-
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs=initial_inputs_data, llm_parameters=parameters
-        )
-
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="docsum", choices=choices, usage=usage)
-
-
-class AudioQnAGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice,
-            host,
-            port,
-            str(MegaServiceEndpoint.AUDIO_QNA),
-            AudioChatCompletionRequest,
-            ChatCompletionResponse,
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-
-        chat_request = AudioChatCompletionRequest.parse_obj(data)
-        parameters = LLMParams(
-            # relatively lower max_tokens for audio conversation
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=False,  # TODO add streaming LLM output as input to TTS
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"byte_str": chat_request.audio}, llm_parameters=parameters
-        )
-
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["byte_str"]
-
-        return response
-
-
-class SearchQnAGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.SEARCH_QNA), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        stream_opt = data.get("stream", True)
-        chat_request = ChatCompletionRequest.parse_obj(data)
-        prompt = self._handle_message(chat_request.messages)
-        parameters = LLMParams(
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"text": prompt}, llm_parameters=parameters
-        )
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="searchqna", choices=choices, usage=usage)
-
-
-class FaqGenGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.FAQ_GEN), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request, files: List[UploadFile] = File(default=None)):
-        data = await request.form()
-        stream_opt = data.get("stream", True)
-        chat_request = ChatCompletionRequest.parse_obj(data)
-        file_summaries = []
-        if files:
-            for file in files:
-                file_path = f"/tmp/{file.filename}"
-
-                import aiofiles
-
-                async with aiofiles.open(file_path, "wb") as f:
-                    await f.write(await file.read())
-                docs = read_text_from_file(file, file_path)
-                os.remove(file_path)
-                if isinstance(docs, list):
-                    file_summaries.extend(docs)
-                else:
-                    file_summaries.append(docs)
-
-        if file_summaries:
-            prompt = self._handle_message(chat_request.messages) + "\n".join(file_summaries)
-        else:
-            prompt = self._handle_message(chat_request.messages)
-
-        parameters = LLMParams(
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-            model=chat_request.model if chat_request.model else None,
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"query": prompt}, llm_parameters=parameters
-        )
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="faqgen", choices=choices, usage=usage)
-
-
-class VisualQnAGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.VISUAL_QNA), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        stream_opt = data.get("stream", False)
-        chat_request = ChatCompletionRequest.parse_obj(data)
-        prompt, images = self._handle_message(chat_request.messages)
-        parameters = LLMParams(
-            max_new_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"prompt": prompt, "image": images[0]}, llm_parameters=parameters
-        )
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LVM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LVM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="visualqna", choices=choices, usage=usage)
-
-
-class VideoQnAGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice,
-            host,
-            port,
-            str(MegaServiceEndpoint.VIDEO_RAG_QNA),
-            ChatCompletionRequest,
-            ChatCompletionResponse,
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        stream_opt = data.get("stream", False)
-        chat_request = ChatCompletionRequest.parse_obj(data)
-        prompt = self._handle_message(chat_request.messages)
-        parameters = LLMParams(
-            max_new_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"text": prompt}, llm_parameters=parameters
-        )
-        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LVM.
-            if (
-                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LVM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["text"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="videoqna", choices=choices, usage=usage)
-
-
-class RetrievalToolGateway(Gateway):
-    """embed+retrieve+rerank."""
-
-    def __init__(self, megaservice, host="0.0.0.0", port=8889):
-        super().__init__(
-            megaservice,
-            host,
-            port,
-            str(MegaServiceEndpoint.RETRIEVALTOOL),
-            Union[TextDoc, ChatCompletionRequest],
-            Union[RerankedDoc, LLMParamsDoc],
-        )
-
-    async def handle_request(self, request: Request):
-        def parser_input(data, TypeClass, key):
-            chat_request = None
-            try:
-                chat_request = TypeClass.parse_obj(data)
-                query = getattr(chat_request, key)
-            except:
-                query = None
-            return query, chat_request
-
-        data = await request.json()
-        query = None
-        for key, TypeClass in zip(["text", "messages"], [TextDoc, ChatCompletionRequest]):
-            query, chat_request = parser_input(data, TypeClass, key)
-            if query is not None:
-                break
-        if query is None:
-            raise ValueError(f"Unknown request type: {data}")
-        if chat_request is None:
-            raise ValueError(f"Unknown request type: {data}")
-
-        if isinstance(chat_request, ChatCompletionRequest):
-            retriever_parameters = RetrieverParms(
-                search_type=chat_request.search_type if chat_request.search_type else "similarity",
-                k=chat_request.k if chat_request.k else 4,
-                distance_threshold=chat_request.distance_threshold if chat_request.distance_threshold else None,
-                fetch_k=chat_request.fetch_k if chat_request.fetch_k else 20,
-                lambda_mult=chat_request.lambda_mult if chat_request.lambda_mult else 0.5,
-                score_threshold=chat_request.score_threshold if chat_request.score_threshold else 0.2,
-            )
-            reranker_parameters = RerankerParms(
-                top_n=chat_request.top_n if chat_request.top_n else 1,
-            )
-
-            initial_inputs = {
-                "messages": query,
-                "input": query,  # has to be input due to embedding expects either input or text
-                "search_type": chat_request.search_type if chat_request.search_type else "similarity",
-                "k": chat_request.k if chat_request.k else 4,
-                "distance_threshold": chat_request.distance_threshold if chat_request.distance_threshold else None,
-                "fetch_k": chat_request.fetch_k if chat_request.fetch_k else 20,
-                "lambda_mult": chat_request.lambda_mult if chat_request.lambda_mult else 0.5,
-                "score_threshold": chat_request.score_threshold if chat_request.score_threshold else 0.2,
-                "top_n": chat_request.top_n if chat_request.top_n else 1,
-            }
-
-            result_dict, runtime_graph = await self.megaservice.schedule(
-                initial_inputs=initial_inputs,
-                retriever_parameters=retriever_parameters,
-                reranker_parameters=reranker_parameters,
-            )
-        else:
-            result_dict, runtime_graph = await self.megaservice.schedule(initial_inputs={"text": query})
-
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]
-        return response
-
-
-class MultimodalQnAGateway(Gateway):
-    def __init__(self, multimodal_rag_megaservice, lvm_megaservice, host="0.0.0.0", port=9999):
-        self.lvm_megaservice = lvm_megaservice
-        super().__init__(
-            multimodal_rag_megaservice,
-            host,
-            port,
-            str(MegaServiceEndpoint.MULTIMODAL_QNA),
-            ChatCompletionRequest,
-            ChatCompletionResponse,
-        )
-
-    # this overrides _handle_message method of Gateway
-    def _handle_message(self, messages):
-        images = []
-        messages_dicts = []
-        if isinstance(messages, str):
-            prompt = messages
-        else:
-            messages_dict = {}
-            system_prompt = ""
-            prompt = ""
-            for message in messages:
-                msg_role = message["role"]
-                messages_dict = {}
-                if msg_role == "system":
-                    system_prompt = message["content"]
-                elif msg_role == "user":
-                    if type(message["content"]) == list:
-                        text = ""
-                        text_list = [item["text"] for item in message["content"] if item["type"] == "text"]
-                        text += "\n".join(text_list)
-                        image_list = [
-                            item["image_url"]["url"] for item in message["content"] if item["type"] == "image_url"
-                        ]
-                        if image_list:
-                            messages_dict[msg_role] = (text, image_list)
-                        else:
-                            messages_dict[msg_role] = text
-                    else:
-                        messages_dict[msg_role] = message["content"]
-                    messages_dicts.append(messages_dict)
-                elif msg_role == "assistant":
-                    messages_dict[msg_role] = message["content"]
-                    messages_dicts.append(messages_dict)
-                else:
-                    raise ValueError(f"Unknown role: {msg_role}")
-
-            if system_prompt:
-                prompt = system_prompt + "\n"
-            for messages_dict in messages_dicts:
-                for i, (role, message) in enumerate(messages_dict.items()):
-                    if isinstance(message, tuple):
-                        text, image_list = message
-                        if i == 0:
-                            # do not add role for the very first message.
-                            # this will be added by llava_server
-                            if text:
-                                prompt += text + "\n"
-                        else:
-                            if text:
-                                prompt += role.upper() + ": " + text + "\n"
-                            else:
-                                prompt += role.upper() + ":"
-                        for img in image_list:
-                            # URL
-                            if img.startswith("http://") or img.startswith("https://"):
-                                response = requests.get(img)
-                                image = Image.open(BytesIO(response.content)).convert("RGBA")
-                                image_bytes = BytesIO()
-                                image.save(image_bytes, format="PNG")
-                                img_b64_str = base64.b64encode(image_bytes.getvalue()).decode()
-                            # Local Path
-                            elif os.path.exists(img):
-                                image = Image.open(img).convert("RGBA")
-                                image_bytes = BytesIO()
-                                image.save(image_bytes, format="PNG")
-                                img_b64_str = base64.b64encode(image_bytes.getvalue()).decode()
-                            # Bytes
-                            else:
-                                img_b64_str = img
-
-                            images.append(img_b64_str)
-                    else:
-                        if i == 0:
-                            # do not add role for the very first message.
-                            # this will be added by llava_server
-                            if message:
-                                prompt += role.upper() + ": " + message + "\n"
-                        else:
-                            if message:
-                                prompt += role.upper() + ": " + message + "\n"
-                            else:
-                                prompt += role.upper() + ":"
-        if images:
-            return prompt, images
-        else:
-            return prompt
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        stream_opt = bool(data.get("stream", False))
-        if stream_opt:
-            print("[ MultimodalQnAGateway ] stream=True not used, this has not support streaming yet!")
-            stream_opt = False
-        chat_request = ChatCompletionRequest.model_validate(data)
-        # Multimodal RAG QnA With Videos has not yet accepts image as input during QnA.
-        prompt_and_image = self._handle_message(chat_request.messages)
-        if isinstance(prompt_and_image, tuple):
-            # print(f"This request include image, thus it is a follow-up query. Using lvm megaservice")
-            prompt, images = prompt_and_image
-            cur_megaservice = self.lvm_megaservice
-            initial_inputs = {"prompt": prompt, "image": images[0]}
-        else:
-            # print(f"This is the first query, requiring multimodal retrieval. Using multimodal rag megaservice")
-            prompt = prompt_and_image
-            cur_megaservice = self.megaservice
-            initial_inputs = {"text": prompt}
-
-        parameters = LLMParams(
-            max_new_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-            chat_template=chat_request.chat_template if chat_request.chat_template else None,
-        )
-        result_dict, runtime_graph = await cur_megaservice.schedule(
-            initial_inputs=initial_inputs, llm_parameters=parameters
-        )
-        for node, response in result_dict.items():
-            # the last microservice in this megaservice is LVM.
-            # checking if LVM returns StreamingResponse
-            # Currently, LVM with LLAVA has not yet supported streaming.
-            # @TODO: Will need to test this once LVM with LLAVA supports streaming
-            if (
-                isinstance(response, StreamingResponse)
-                and node == runtime_graph.all_leaves()[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LVM
-            ):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-
-        if "text" in result_dict[last_node].keys():
-            response = result_dict[last_node]["text"]
-        else:
-            # text in not response message
-            # something wrong, for example due to empty retrieval results
-            if "detail" in result_dict[last_node].keys():
-                response = result_dict[last_node]["detail"]
-            else:
-                response = "The server fail to generate answer to your query!"
-        if "metadata" in result_dict[last_node].keys():
-            # from retrieval results
-            metadata = result_dict[last_node]["metadata"]
-        else:
-            # follow-up question, no retrieval
-            metadata = None
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response),
-                finish_reason="stop",
-                metadata=metadata,
-            )
-        )
-        return ChatCompletionResponse(model="multimodalqna", choices=choices, usage=usage)
-
-
-class AvatarChatbotGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice,
-            host,
-            port,
-            str(MegaServiceEndpoint.AVATAR_CHATBOT),
-            AudioChatCompletionRequest,
-            ChatCompletionResponse,
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-
-        chat_request = AudioChatCompletionRequest.model_validate(data)
-        parameters = LLMParams(
-            # relatively lower max_tokens for audio conversation
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            repetition_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 1.03,
-            streaming=False,  # TODO add streaming LLM output as input to TTS
-        )
-        # print(parameters)
-
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"byte_str": chat_request.audio}, llm_parameters=parameters
-        )
-
-        last_node = runtime_graph.all_leaves()[-1]
-        response = result_dict[last_node]["video_path"]
-        return response
-
-
-class GraphragGateway(Gateway):
-    def __init__(self, megaservice, host="0.0.0.0", port=8888):
-        super().__init__(
-            megaservice, host, port, str(MegaServiceEndpoint.GRAPH_RAG), ChatCompletionRequest, ChatCompletionResponse
-        )
-
-    async def handle_request(self, request: Request):
-        data = await request.json()
-        stream_opt = data.get("stream", True)
-        chat_request = ChatCompletionRequest.parse_obj(data)
-
-        def parser_input(data, TypeClass, key):
-            chat_request = None
-            try:
-                chat_request = TypeClass.parse_obj(data)
-                query = getattr(chat_request, key)
-            except:
-                query = None
-            return query, chat_request
-
-        query = None
-        for key, TypeClass in zip(["text", "input", "messages"], [TextDoc, EmbeddingRequest, ChatCompletionRequest]):
-            query, chat_request = parser_input(data, TypeClass, key)
-            if query is not None:
-                break
-        if query is None:
-            raise ValueError(f"Unknown request type: {data}")
-        if chat_request is None:
-            raise ValueError(f"Unknown request type: {data}")
-        prompt = self._handle_message(chat_request.messages)
-        parameters = LLMParams(
-            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
-            top_k=chat_request.top_k if chat_request.top_k else 10,
-            top_p=chat_request.top_p if chat_request.top_p else 0.95,
-            temperature=chat_request.temperature if chat_request.temperature else 0.01,
-            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
-            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
-            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
-            streaming=stream_opt,
-            chat_template=chat_request.chat_template if chat_request.chat_template else None,
-        )
-        retriever_parameters = RetrieverParms(
-            search_type=chat_request.search_type if chat_request.search_type else "similarity",
-            k=chat_request.k if chat_request.k else 4,
-            distance_threshold=chat_request.distance_threshold if chat_request.distance_threshold else None,
-            fetch_k=chat_request.fetch_k if chat_request.fetch_k else 20,
-            lambda_mult=chat_request.lambda_mult if chat_request.lambda_mult else 0.5,
-            score_threshold=chat_request.score_threshold if chat_request.score_threshold else 0.2,
-        )
-        initial_inputs = chat_request
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs=initial_inputs,
-            llm_parameters=parameters,
-            retriever_parameters=retriever_parameters,
-        )
-        for node, response in result_dict.items():
-            if isinstance(response, StreamingResponse):
-                return response
-        last_node = runtime_graph.all_leaves()[-1]
-        response_content = result_dict[last_node]["choices"][0]["message"]["content"]
-        choices = []
-        usage = UsageInfo()
-        choices.append(
-            ChatCompletionResponseChoice(
-                index=0,
-                message=ChatMessage(role="assistant", content=response_content),
-                finish_reason="stop",
-            )
-        )
-        return ChatCompletionResponse(model="chatqna", choices=choices, usage=usage)
diff --git a/comps/cores/mega/http_service.py b/comps/cores/mega/http_service.py
index 283540f493..799cc5c80c 100644
--- a/comps/cores/mega/http_service.py
+++ b/comps/cores/mega/http_service.py
@@ -1,7 +1,9 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 
+import asyncio
 import logging
+import multiprocessing
 import re
 from typing import Optional
 
@@ -83,6 +85,11 @@ async def _get_statistics():
 
         return app
 
+    def add_startup_event(self, func):
+        @self.app.on_event("startup")
+        async def startup_event():
+            asyncio.create_task(func)
+
     async def initialize_server(self):
         """Initialize and return HTTP server."""
         self.logger.info("Setting up HTTP server")
@@ -110,11 +117,9 @@ async def start_server(self, **kwargs):
                 """
                 await self.main_loop()
 
-        app = self.app
-
         self.server = UviServer(
             config=Config(
-                app=app,
+                app=self.app,
                 host=self.host_address,
                 port=self.primary_port,
                 log_level="info",
@@ -137,6 +142,24 @@ async def terminate_server(self):
         await self.server.shutdown()
         self.logger.info("Server termination completed")
 
+    def _async_setup(self):
+        self.event_loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(self.event_loop)
+        self.event_loop.run_until_complete(self.initialize_server())
+
+    def start(self):
+        """Running method to block the main thread.
+
+        This method runs the event loop until a Future is done. It is designed to be called in the main thread to keep it busy.
+        """
+        self.event_loop.run_until_complete(self.execute_server())
+
+    def stop(self):
+        self.event_loop.run_until_complete(self.terminate_server())
+        self.event_loop.stop()
+        self.event_loop.close()
+        self.logger.close()
+
     @staticmethod
     def check_server_readiness(ctrl_address: str, timeout: float = 1.0, logger=None, **kwargs) -> bool:
         """Check if server status is ready.
@@ -170,3 +193,6 @@ async def async_check_server_readiness(ctrl_address: str, timeout: float = 1.0,
         :return: True if status is ready else False.
         """
         return HTTPService.check_server_readiness(ctrl_address, timeout, logger=logger)
+
+    def add_route(self, endpoint, handler, methods=["POST"]):
+        self.app.router.add_api_route(endpoint, handler, methods=methods)
diff --git a/comps/cores/mega/micro_service.py b/comps/cores/mega/micro_service.py
index 458b097102..2d79d6414f 100644
--- a/comps/cores/mega/micro_service.py
+++ b/comps/cores/mega/micro_service.py
@@ -2,7 +2,6 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import asyncio
-import multiprocessing
 import os
 from collections import defaultdict, deque
 from enum import Enum
@@ -10,6 +9,7 @@
 
 from ..proto.docarray import TextDoc
 from .constants import ServiceRoleType, ServiceType
+from .http_service import HTTPService
 from .logger import CustomLogger
 from .utils import check_ports_availability
 
@@ -19,12 +19,12 @@
 logflag = os.getenv("LOGFLAG", False)
 
 
-class MicroService:
+class MicroService(HTTPService):
     """MicroService class to create a microservice."""
 
     def __init__(
         self,
-        name: str,
+        name: str = "",
         service_role: ServiceRoleType = ServiceRoleType.MICROSERVICE,
         service_type: ServiceType = ServiceType.LLM,
         protocol: str = "http",
@@ -44,7 +44,6 @@ def __init__(
         dynamic_batching_max_batch_size: int = 32,
     ):
         """Init the microservice."""
-        self.name = f"{name}/{self.__class__.__name__}" if name else self.__class__.__name__
         self.service_role = service_role
         self.service_type = service_type
         self.protocol = protocol
@@ -67,24 +66,35 @@ def __init__(
             self.uvicorn_kwargs["ssl_certfile"] = ssl_certfile
 
         if not use_remote_service:
+
+            if self.protocol.lower() == "http":
+                if not (check_ports_availability(self.host, self.port)):
+                    raise RuntimeError(f"port:{self.port}")
+
             self.provider = provider
             self.provider_endpoint = provider_endpoint
             self.endpoints = []
 
-            self.server = self._get_server()
-            self.app = self.server.app
+            runtime_args = {
+                "protocol": self.protocol,
+                "host": self.host,
+                "port": self.port,
+                "title": name,
+                "description": "OPEA Microservice Infrastructure",
+            }
+
+            super().__init__(uvicorn_kwargs=self.uvicorn_kwargs, runtime_args=runtime_args)
+
             # create a batch request processor loop if using dynamic batching
             if self.dynamic_batching:
                 self.buffer_lock = asyncio.Lock()
                 self.request_buffer = defaultdict(deque)
+                self.add_startup_event(self._dynamic_batch_processor())
 
-                @self.app.on_event("startup")
-                async def startup_event():
-                    asyncio.create_task(self._dynamic_batch_processor())
+            self._async_setup()
 
-            self.event_loop = asyncio.new_event_loop()
-            asyncio.set_event_loop(self.event_loop)
-            self.event_loop.run_until_complete(self._async_setup())
+        # overwrite name
+        self.name = f"{name}/{self.__class__.__name__}" if name else self.__class__.__name__
 
     async def _dynamic_batch_processor(self):
         if logflag:
@@ -125,75 +135,6 @@ def _validate_env(self):
                 "set use_remote_service to False if you want to use a local micro service!"
             )
 
-    def _get_server(self):
-        """Get the server instance based on the protocol.
-
-        This method currently only supports HTTP services. It creates an instance of the HTTPService class with the
-        necessary arguments.
-        In the future, it will also support gRPC services.
-        """
-        self._validate_env()
-        from .http_service import HTTPService
-
-        runtime_args = {
-            "protocol": self.protocol,
-            "host": self.host,
-            "port": self.port,
-            "title": self.name,
-            "description": "OPEA Microservice Infrastructure",
-        }
-
-        return HTTPService(uvicorn_kwargs=self.uvicorn_kwargs, runtime_args=runtime_args)
-
-    async def _async_setup(self):
-        """The async method setup the runtime.
-
-        This method is responsible for setting up the server. It first checks if the port is available, then it gets
-        the server instance and initializes it.
-        """
-        self._validate_env()
-        if self.protocol.lower() == "http":
-            if not (check_ports_availability(self.host, self.port)):
-                raise RuntimeError(f"port:{self.port}")
-
-            await self.server.initialize_server()
-
-    async def _async_run_forever(self):
-        """Running method of the server."""
-        self._validate_env()
-        await self.server.execute_server()
-
-    def run(self):
-        """Running method to block the main thread.
-
-        This method runs the event loop until a Future is done. It is designed to be called in the main thread to keep it busy.
-        """
-        self._validate_env()
-        self.event_loop.run_until_complete(self._async_run_forever())
-
-    def start(self, in_single_process=False):
-        self._validate_env()
-        if in_single_process:
-            # Resolve HPU segmentation fault and potential tokenizer issues by limiting to same process
-            self.run()
-        else:
-            self.process = multiprocessing.Process(target=self.run, daemon=False, name=self.name)
-            self.process.start()
-
-    async def _async_teardown(self):
-        """Shutdown the server."""
-        self._validate_env()
-        await self.server.terminate_server()
-
-    def stop(self):
-        self._validate_env()
-        self.event_loop.run_until_complete(self._async_teardown())
-        self.event_loop.stop()
-        self.event_loop.close()
-        self.server.logger.close()
-        if self.process.is_alive():
-            self.process.terminate()
-
     @property
     def endpoint_path(self):
         return f"{self.protocol}://{self.host}:{self.port}{self.endpoint}"
diff --git a/comps/cores/mega/utils.py b/comps/cores/mega/utils.py
index e5b2df4f5f..6749e66dea 100644
--- a/comps/cores/mega/utils.py
+++ b/comps/cores/mega/utils.py
@@ -1,15 +1,18 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 
+import base64
 import ipaddress
 import json
 import multiprocessing
 import os
 import random
+from io import BytesIO
 from socket import AF_INET, SOCK_STREAM, socket
 from typing import List, Optional, Union
 
 import requests
+from PIL import Image
 
 from .logger import CustomLogger
 
@@ -258,3 +261,73 @@ def __enter__(self):
     def __exit__(self, exc_type, exc_val, exc_tb):
         if exc_type:
             self.context_to_manage.__exit__(exc_type, exc_val, exc_tb)
+
+
+def handle_message(messages):
+    images = []
+    if isinstance(messages, str):
+        prompt = messages
+    else:
+        messages_dict = {}
+        system_prompt = ""
+        prompt = ""
+        for message in messages:
+            msg_role = message["role"]
+            if msg_role == "system":
+                system_prompt = message["content"]
+            elif msg_role == "user":
+                if type(message["content"]) == list:
+                    text = ""
+                    text_list = [item["text"] for item in message["content"] if item["type"] == "text"]
+                    text += "\n".join(text_list)
+                    image_list = [
+                        item["image_url"]["url"] for item in message["content"] if item["type"] == "image_url"
+                    ]
+                    if image_list:
+                        messages_dict[msg_role] = (text, image_list)
+                    else:
+                        messages_dict[msg_role] = text
+                else:
+                    messages_dict[msg_role] = message["content"]
+            elif msg_role == "assistant":
+                messages_dict[msg_role] = message["content"]
+            else:
+                raise ValueError(f"Unknown role: {msg_role}")
+
+        if system_prompt:
+            prompt = system_prompt + "\n"
+        for role, message in messages_dict.items():
+            if isinstance(message, tuple):
+                text, image_list = message
+                if text:
+                    prompt += role + ": " + text + "\n"
+                else:
+                    prompt += role + ":"
+                for img in image_list:
+                    # URL
+                    if img.startswith("http://") or img.startswith("https://"):
+                        response = requests.get(img)
+                        image = Image.open(BytesIO(response.content)).convert("RGBA")
+                        image_bytes = BytesIO()
+                        image.save(image_bytes, format="PNG")
+                        img_b64_str = base64.b64encode(image_bytes.getvalue()).decode()
+                    # Local Path
+                    elif os.path.exists(img):
+                        image = Image.open(img).convert("RGBA")
+                        image_bytes = BytesIO()
+                        image.save(image_bytes, format="PNG")
+                        img_b64_str = base64.b64encode(image_bytes.getvalue()).decode()
+                    # Bytes
+                    else:
+                        img_b64_str = img
+
+                    images.append(img_b64_str)
+            else:
+                if message:
+                    prompt += role + ": " + message + "\n"
+                else:
+                    prompt += role + ":"
+    if images:
+        return prompt, images
+    else:
+        return prompt
diff --git a/comps/embeddings/tei/langchain/README.md b/comps/embeddings/tei/langchain/README.md
index 96163c9156..2bbf30cc6c 100644
--- a/comps/embeddings/tei/langchain/README.md
+++ b/comps/embeddings/tei/langchain/README.md
@@ -33,26 +33,20 @@ docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$htt
 Then you need to test your TEI service using the following commands:
 
 ```bash
-curl localhost:$your_port/embed \
+curl localhost:$your_port/v1/embeddings \
     -X POST \
-    -d '{"inputs":"What is Deep Learning?"}' \
+    -d '{"input":"What is Deep Learning?"}' \
     -H 'Content-Type: application/json'
 ```
 
 Start the embedding service with the TEI_EMBEDDING_ENDPOINT.
 
 ```bash
-export TEI_EMBEDDING_ENDPOINT="http://localhost:$yourport"
+export TEI_EMBEDDING_ENDPOINT="http://localhost:$yourport/v1/embeddings"
 export TEI_EMBEDDING_MODEL_NAME="BAAI/bge-large-en-v1.5"
 python embedding_tei.py
 ```
 
-#### Start Embedding Service with Local Model
-
-```bash
-python local_embedding.py
-```
-
 ## 🚀2. Start Microservice with Docker (Optional 2)
 
 ### 2.1 Start Embedding Service with TEI
@@ -68,16 +62,16 @@ docker run -p $your_port:80 -v ./data:/data --name tei_server -e http_proxy=$htt
 Then you need to test your TEI service using the following commands:
 
 ```bash
-curl localhost:$your_port/embed \
+curl localhost:$your_port/embed/v1/embeddings \
     -X POST \
-    -d '{"inputs":"What is Deep Learning?"}' \
+    -d '{"input":"What is Deep Learning?"}' \
     -H 'Content-Type: application/json'
 ```
 
 Export the `TEI_EMBEDDING_ENDPOINT` for later usage:
 
 ```bash
-export TEI_EMBEDDING_ENDPOINT="http://localhost:$yourport"
+export TEI_EMBEDDING_ENDPOINT="http://localhost:$yourport/v1/embeddings"
 export TEI_EMBEDDING_MODEL_NAME="BAAI/bge-large-en-v1.5"
 ```
 
@@ -113,23 +107,7 @@ curl http://localhost:6000/v1/health_check\
 
 ### 3.2 Consume Embedding Service
 
-Use our basic API.
-
-```bash
-## query with single text
-curl http://localhost:6000/v1/embeddings\
-  -X POST \
-  -d '{"text":"Hello, world!"}' \
-  -H 'Content-Type: application/json'
-
-## query with multiple texts
-curl http://localhost:6000/v1/embeddings\
-  -X POST \
-  -d '{"text":["Hello, world!","How are you?"]}' \
-  -H 'Content-Type: application/json'
-```
-
-We are also compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/embeddings).
+The input/output follows [OpenAI API Embeddings](https://platform.openai.com/docs/api-reference/embeddings) format.
 
 ```bash
 ## Input single text
@@ -141,6 +119,6 @@ curl http://localhost:6000/v1/embeddings\
 ## Input multiple texts with parameters
 curl http://localhost:6000/v1/embeddings\
   -X POST \
-  -d '{"input":["Hello, world!","How are you?"], "dimensions":100}' \
+  -d '{"input":["Hello, world!","How are you?"], "encoding_format":"base64"}' \
   -H 'Content-Type: application/json'
 ```
diff --git a/comps/embeddings/tei/langchain/embedding_tei.py b/comps/embeddings/tei/langchain/embedding_tei.py
index 20e61196d1..e3b58e376e 100644
--- a/comps/embeddings/tei/langchain/embedding_tei.py
+++ b/comps/embeddings/tei/langchain/embedding_tei.py
@@ -4,7 +4,7 @@
 import json
 import os
 import time
-from typing import List, Union
+from typing import Dict, List, Union
 
 from huggingface_hub import AsyncInferenceClient
 
@@ -19,12 +19,7 @@
     statistics_dict,
 )
 from comps.cores.mega.utils import get_access_token
-from comps.cores.proto.api_protocol import (
-    ChatCompletionRequest,
-    EmbeddingRequest,
-    EmbeddingResponse,
-    EmbeddingResponseData,
-)
+from comps.cores.proto.api_protocol import EmbeddingRequest, EmbeddingResponse, EmbeddingResponseData
 
 logger = CustomLogger("embedding_tei_langchain")
 logflag = os.getenv("LOGFLAG", False)
@@ -45,9 +40,7 @@
     port=6000,
 )
 @register_statistics(names=["opea_service@embedding_tei_langchain"])
-async def embedding(
-    input: Union[TextDoc, EmbeddingRequest, ChatCompletionRequest]
-) -> Union[EmbedDoc, EmbeddingResponse, ChatCompletionRequest]:
+async def embedding(input: Union[TextDoc, EmbeddingRequest]) -> Union[EmbedDoc, EmbeddingResponse]:
     start = time.time()
     access_token = (
         get_access_token(TOKEN_URL, CLIENTID, CLIENT_SECRET) if TOKEN_URL and CLIENTID and CLIENT_SECRET else None
@@ -55,24 +48,18 @@ async def embedding(
     async_client = get_async_inference_client(access_token)
     if logflag:
         logger.info(input)
+
     if isinstance(input, TextDoc):
-        embed_vector = await aembed_query(input.text, async_client)
-        embedding_res = embed_vector[0] if isinstance(input.text, str) else embed_vector
-        res = EmbedDoc(text=input.text, embedding=embedding_res)
+        embedding_res = await aembed_query({"input": input.text}, async_client)
+        embedding_vec = [data["embedding"] for data in embedding_res["data"]]
+        embedding_vec = embedding_vec[0] if isinstance(input.text, str) else embedding_vec
+        res = EmbedDoc(text=input.text, embedding=embedding_vec)
     else:
-        embed_vector = await aembed_query(input.input, async_client)
-        if input.dimensions is not None:
-            embed_vector = [embed_vector[i][: input.dimensions] for i in range(len(embed_vector))]
-
-        # for standard openai embedding format
-        res = EmbeddingResponse(
-            data=[EmbeddingResponseData(index=i, embedding=embed_vector[i]) for i in range(len(embed_vector))]
+        embedding_res = await aembed_query(
+            {"input": input.input, "encoding_format": input.encoding_format, "model": input.model, "user": input.user},
+            async_client,
         )
-
-        if isinstance(input, ChatCompletionRequest):
-            input.embedding = res
-            # keep
-            res = input
+        res = EmbeddingResponse(**embedding_res)
 
     statistics_dict["opea_service@embedding_tei_langchain"].append_latency(time.time() - start, None)
     if logflag:
@@ -80,21 +67,9 @@ async def embedding(
     return res
 
 
-async def aembed_query(
-    text: Union[str, List[str]], async_client: AsyncInferenceClient, model_kwargs=None, task=None
-) -> List[List[float]]:
-    texts = [text] if isinstance(text, str) else text
-    response = await aembed_documents(texts, async_client, model_kwargs=model_kwargs, task=task)
-    return response
-
-
-async def aembed_documents(
-    texts: List[str], async_client: AsyncInferenceClient, model_kwargs=None, task=None
-) -> List[List[float]]:
-    texts = [text.replace("\n", " ") for text in texts]
-    _model_kwargs = model_kwargs or {}
-    responses = await async_client.post(json={"inputs": texts, **_model_kwargs}, task=task)
-    return json.loads(responses.decode())
+async def aembed_query(request: Dict, async_client: AsyncInferenceClient) -> Union[Dict, List[List[float]]]:
+    response = await async_client.post(json=request)
+    return json.loads(response.decode())
 
 
 def get_async_inference_client(access_token: str) -> AsyncInferenceClient:
diff --git a/comps/embeddings/tei/langchain/local_embedding_768.py b/comps/embeddings/tei/langchain/local_embedding_768.py
deleted file mode 100644
index dae52299bb..0000000000
--- a/comps/embeddings/tei/langchain/local_embedding_768.py
+++ /dev/null
@@ -1,27 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-from langchain_community.embeddings import HuggingFaceBgeEmbeddings
-
-from comps import EmbedDoc768, ServiceType, TextDoc, opea_microservices, opea_telemetry, register_microservice
-
-
-@register_microservice(
-    name="opea_service@local_embedding",
-    service_type=ServiceType.EMBEDDING,
-    endpoint="/v1/embeddings",
-    host="0.0.0.0",
-    port=6000,
-    input_datatype=TextDoc,
-    output_datatype=EmbedDoc768,
-)
-@opea_telemetry
-async def embedding(input: TextDoc) -> EmbedDoc768:
-    embed_vector = await embeddings.aembed_query(input.text)
-    res = EmbedDoc768(text=input.text, embedding=embed_vector)
-    return res
-
-
-if __name__ == "__main__":
-    embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-base-en-v1.5")
-    opea_microservices["opea_service@local_embedding"].start()
diff --git a/comps/llms/text-generation/vllm/langchain/README.md b/comps/llms/text-generation/vllm/langchain/README.md
index 1405273b0d..bb83f0dc59 100644
--- a/comps/llms/text-generation/vllm/langchain/README.md
+++ b/comps/llms/text-generation/vllm/langchain/README.md
@@ -223,29 +223,21 @@ User can set the following model parameters according to needs:
 - streaming(true/false): return text response in streaming mode or non-streaming mode
 
 ```bash
-# 1. Non-streaming mode
+# stream mode
 curl http://${your_ip}:9000/v1/chat/completions \
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
-  -H 'Content-Type: application/json'
+    -X POST \
+    -d '{"model": "${model_name}", "messages": "What is Deep Learning?", "max_tokens":17}' \
+    -H 'Content-Type: application/json'
 
-# 2. Streaming mode
 curl http://${your_ip}:9000/v1/chat/completions \
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-  -H 'Content-Type: application/json'
+    -X POST \
+    -d '{"model": "${model_name}", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
+    -H 'Content-Type: application/json'
 
-# 3. Custom chat template with streaming mode
+#Non-stream mode
 curl http://${your_ip}:9000/v1/chat/completions \
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true, "chat_template":"### You are a helpful, respectful and honest assistant to help the user with questions.\n### Context: {context}\n### Question: {question}\n### Answer:"}' \
-  -H 'Content-Type: application/json'
+    -X POST \
+    -d '{"model": "${model_name}", "messages": "What is Deep Learning?", "max_tokens":17, "stream":false}' \
+    -H 'Content-Type: application/json'
 
-4. #  Chat with SearchedDoc (Retrieval context)
-curl http://${your_ip}:9000/v1/chat/completions \
-  -X POST \
-  -d '{"initial_query":"What is Deep Learning?","retrieved_docs":[{"text":"Deep Learning is a ..."},{"text":"Deep Learning is b ..."}]}' \
-  -H 'Content-Type: application/json'
 ```
-
-For parameters, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
diff --git a/comps/llms/text-generation/vllm/langchain/llm.py b/comps/llms/text-generation/vllm/langchain/llm.py
index ccedec4513..143c9b9d0a 100644
--- a/comps/llms/text-generation/vllm/langchain/llm.py
+++ b/comps/llms/text-generation/vllm/langchain/llm.py
@@ -7,6 +7,7 @@
 from fastapi.responses import StreamingResponse
 from langchain_community.llms import VLLMOpenAI
 from langchain_core.prompts import PromptTemplate
+from openai import OpenAI
 from template import ChatTemplate
 
 from comps import (
@@ -194,6 +195,98 @@ async def stream_generator():
                 logger.info(response)
 
             return GeneratedDoc(text=response, prompt=input.query)
+    else:
+        if logflag:
+            logger.info("[ ChatCompletionRequest ] input in opea format")
+        client = OpenAI(
+            api_key="EMPTY",
+            base_url=llm_endpoint + "/v1",
+        )
+
+        if isinstance(input.messages, str):
+            prompt = input.messages
+            if prompt_template:
+                if sorted(input_variables) == ["context", "question"]:
+                    prompt = prompt_template.format(question=input.messages, context="\n".join(input.documents))
+                elif input_variables == ["question"]:
+                    prompt = prompt_template.format(question=input.messages)
+                else:
+                    logger.info(
+                        f"[ ChatCompletionRequest ] {prompt_template} not used, we only support 2 input variables ['question', 'context']"
+                    )
+            else:
+                if input.documents:
+                    # use rag default template
+                    prompt = ChatTemplate.generate_rag_prompt(input.messages, input.documents, input.model)
+
+            chat_completion = client.completions.create(
+                model=model_name,
+                prompt=prompt,
+                echo=input.echo,
+                frequency_penalty=input.frequency_penalty,
+                max_tokens=input.max_tokens,
+                n=input.n,
+                presence_penalty=input.presence_penalty,
+                seed=input.seed,
+                stop=input.stop,
+                stream=input.stream,
+                suffix=input.suffix,
+                temperature=input.temperature,
+                top_p=input.top_p,
+                user=input.user,
+            )
+        else:
+            if input.messages[0]["role"] == "system":
+                if "{context}" in input.messages[0]["content"]:
+                    if input.documents is None or input.documents == []:
+                        input.messages[0]["content"].format(context="")
+                    else:
+                        input.messages[0]["content"].format(context="\n".join(input.documents))
+            else:
+                if prompt_template:
+                    system_prompt = prompt_template
+                    if input_variables == ["context"]:
+                        system_prompt = prompt_template.format(context="\n".join(input.documents))
+                    else:
+                        logger.info(
+                            f"[ ChatCompletionRequest ] {prompt_template} not used, only support 1 input variables ['context']"
+                        )
+
+                    input.messages.insert(0, {"role": "system", "content": system_prompt})
+
+            chat_completion = client.chat.completions.create(
+                model=model_name,
+                messages=input.messages,
+                frequency_penalty=input.frequency_penalty,
+                max_tokens=input.max_tokens,
+                n=input.n,
+                presence_penalty=input.presence_penalty,
+                response_format=input.response_format,
+                seed=input.seed,
+                stop=input.stop,
+                stream=input.stream,
+                stream_options=input.stream_options,
+                temperature=input.temperature,
+                top_p=input.top_p,
+                user=input.user,
+            )
+
+        if input.stream:
+
+            def stream_generator():
+                for c in chat_completion:
+                    if logflag:
+                        logger.info(c)
+                    chunk = c.model_dump_json()
+                    if chunk not in ["<|im_end|>", "<|endoftext|>"]:
+                        yield f"data: {chunk}\n\n"
+                yield "data: [DONE]\n\n"
+
+            return StreamingResponse(stream_generator(), media_type="text/event-stream")
+        else:
+            if logflag:
+                logger.info(chat_completion)
+            return chat_completion
 
 
 if __name__ == "__main__":
diff --git a/comps/llms/text-generation/vllm/langchain/query.sh b/comps/llms/text-generation/vllm/langchain/query.sh
deleted file mode 100644
index 31fa187507..0000000000
--- a/comps/llms/text-generation/vllm/langchain/query.sh
+++ /dev/null
@@ -1,20 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-your_ip="0.0.0.0"
-model=$(curl http://localhost:8008/v1/models -s|jq -r '.data[].id')
-
-curl http://${your_ip}:8008/v1/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-  "model": "'$model'",
-  "prompt": "What is Deep Learning?",
-  "max_tokens": 32,
-  "temperature": 0
-  }'
-
-##query microservice
-curl http://${your_ip}:9000/v1/chat/completions \
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
-  -H 'Content-Type: application/json'
diff --git a/data/hub/version.txt b/data/hub/version.txt
deleted file mode 100644
index 56a6051ca2..0000000000
--- a/data/hub/version.txt
+++ /dev/null
@@ -1 +0,0 @@
-1
\ No newline at end of file
diff --git a/tests/agent/sql_agent_llama.yaml b/tests/agent/sql_agent_llama.yaml
deleted file mode 100644
index a2842d9e99..0000000000
--- a/tests/agent/sql_agent_llama.yaml
+++ /dev/null
@@ -1,37 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-services:
-  agent:
-    image: ${agent_image}
-    container_name: test-comps-agent-endpoint
-    volumes:
-      - ${TOOLSET_PATH}:/home/user/tools/ # tools
-      # - ${WORKDIR}/GenAIComps/comps:/home/user/comps # code
-      - ${WORKDIR}/TAG-Bench/:/home/user/TAG-Bench # SQL database and hints_file
-    ports:
-      - "9095:9095"
-    ipc: host
-    environment:
-      ip_address: ${ip_address}
-      strategy: sql_agent_llama
-      db_name: ${db_name}
-      db_path: ${db_path}
-      use_hints: false #true
-      hints_file: /home/user/TAG-Bench/${db_name}_hints.csv
-      recursion_limit: ${recursion_limit}
-      llm_engine: vllm
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      llm_endpoint_url: ${LLM_ENDPOINT_URL}
-      model: ${LLM_MODEL_ID}
-      temperature: ${temperature}
-      max_new_tokens: ${max_new_tokens}
-      streaming: false
-      tools: /home/user/tools/custom_tools.yaml #/home/user/tools/sql_agent_tools.yaml # change back to custom_tools.yaml
-      require_human_feedback: false
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      port: 9095
-      # GOOGLE_CSE_ID: ${GOOGLE_CSE_ID} #delete
-      # GOOGLE_API_KEY: ${GOOGLE_API_KEY} # delete
diff --git a/tests/agent/sql_agent_openai.yaml b/tests/agent/sql_agent_openai.yaml
deleted file mode 100644
index 124eccae99..0000000000
--- a/tests/agent/sql_agent_openai.yaml
+++ /dev/null
@@ -1,36 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-services:
-  agent:
-    image: ${agent_image}
-    container_name: test-comps-agent-endpoint
-    volumes:
-      - ${TOOLSET_PATH}:/home/user/tools/ # tools
-      - ${WORKDIR}/GenAIComps/comps:/home/user/comps # code
-      - ${WORKDIR}/TAG-Bench/:/home/user/TAG-Bench # SQL database and hints_file
-    ports:
-      - "9095:9095"
-    ipc: host
-    environment:
-      ip_address: ${ip_address}
-      strategy: sql_agent
-      db_name: ${db_name}
-      db_path: ${db_path}
-      use_hints: false #true
-      hints_file: /home/user/TAG-Bench/${db_name}_hints.csv
-      recursion_limit: ${recursion_limit}
-      llm_engine: openai
-      OPENAI_API_KEY: ${OPENAI_API_KEY}
-      model: "gpt-4o-mini-2024-07-18"
-      temperature: 0
-      max_new_tokens: ${max_new_tokens}
-      streaming: false
-      tools: /home/user/tools/sql_agent_tools.yaml # /home/user/tools/custom_tools.yaml #
-      require_human_feedback: false
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      port: 9095
-      GOOGLE_CSE_ID: ${GOOGLE_CSE_ID} #delete
-      GOOGLE_API_KEY: ${GOOGLE_API_KEY} # delete
diff --git a/tests/agent/sql_agent_test/generate_hints_file.py b/tests/agent/sql_agent_test/generate_hints_file.py
deleted file mode 100644
index 3551b7306f..0000000000
--- a/tests/agent/sql_agent_test/generate_hints_file.py
+++ /dev/null
@@ -1,45 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import glob
-import os
-
-import pandas as pd
-
-
-def generate_column_descriptions(db_name):
-    descriptions = []
-    working_dir = os.getenv("WORKDIR")
-    assert working_dir is not None, "WORKDIR environment variable is not set."
-    DESCRIPTION_FOLDER = os.path.join(
-        working_dir, f"TAG-Bench/dev_folder/dev_databases/{db_name}/database_description/"
-    )
-    table_files = glob.glob(os.path.join(DESCRIPTION_FOLDER, "*.csv"))
-    table_name_col = []
-    col_name_col = []
-    for table_file in table_files:
-        table_name = os.path.basename(table_file).split(".")[0]
-        print("Table name: ", table_name)
-        df = pd.read_csv(table_file)
-        for _, row in df.iterrows():
-            col_name = row["original_column_name"]
-            if not pd.isnull(row["value_description"]):
-                description = str(row["value_description"])
-                if description.lower() in col_name.lower():
-                    print("Description {} is same as column name {}".format(description, col_name))
-                    pass
-                else:
-                    description = description.replace("\n", " ")
-                    description = " ".join(description.split())
-                    descriptions.append(description)
-                    table_name_col.append(table_name)
-                    col_name_col.append(col_name)
-    hints_df = pd.DataFrame({"table_name": table_name_col, "column_name": col_name_col, "description": descriptions})
-    tag_bench_dir = os.path.join(working_dir, "TAG-Bench")
-    output_file = os.path.join(tag_bench_dir, f"{db_name}_hints.csv")
-    hints_df.to_csv(output_file, index=False)
-    print(f"Generated hints file: {output_file}")
-
-
-if __name__ == "__main__":
-    generate_column_descriptions("california_schools")
diff --git a/tests/agent/sql_agent_test/run_data_split.sh b/tests/agent/sql_agent_test/run_data_split.sh
deleted file mode 100644
index 2fc2dfcb0e..0000000000
--- a/tests/agent/sql_agent_test/run_data_split.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-DATAPATH=$WORKDIR/TAG-Bench/tag_queries.csv
-OUTFOLDER=$WORKDIR/TAG-Bench/query_by_db
-python3 split_data.py --path $DATAPATH --output $OUTFOLDER
diff --git a/tests/agent/sql_agent_test/split_data.py b/tests/agent/sql_agent_test/split_data.py
deleted file mode 100644
index 1b3f5cfc79..0000000000
--- a/tests/agent/sql_agent_test/split_data.py
+++ /dev/null
@@ -1,27 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import argparse
-import os
-
-import pandas as pd
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--path", type=str, required=True)
-    parser.add_argument("--output", type=str, required=True)
-    args = parser.parse_args()
-
-    # if output folder does not exist, create it
-    if not os.path.exists(args.output):
-        os.makedirs(args.output)
-
-    # Load the data
-    data = pd.read_csv(args.path)
-
-    # Split the data by domain
-    domains = data["DB used"].unique()
-    for domain in domains:
-        domain_data = data[data["DB used"] == domain]
-        out = os.path.join(args.output, f"query_{domain}.csv")
-        domain_data.to_csv(out, index=False)
diff --git a/tests/agent/sql_agent_test/sql_agent_tools.py b/tests/agent/sql_agent_test/sql_agent_tools.py
deleted file mode 100644
index fc14efe8ee..0000000000
--- a/tests/agent/sql_agent_test/sql_agent_tools.py
+++ /dev/null
@@ -1,19 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-
-def search_web(query: str) -> str:
-    """Search the web for information not contained in databases."""
-    from langchain_core.tools import Tool
-    from langchain_google_community import GoogleSearchAPIWrapper
-
-    search = GoogleSearchAPIWrapper()
-
-    tool = Tool(
-        name="google_search",
-        description="Search Google for recent results.",
-        func=search.run,
-    )
-
-    response = tool.run(query)
-    return response
diff --git a/tests/agent/sql_agent_test/sql_agent_tools.yaml b/tests/agent/sql_agent_test/sql_agent_tools.yaml
deleted file mode 100644
index ccd5c8e718..0000000000
--- a/tests/agent/sql_agent_test/sql_agent_tools.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-search_web:
-  description: Search the web for a given query.
-  callable_api: sql_agent_tools.py:search_web
-  args_schema:
-    query:
-      type: str
-      description: query
-  return_output: retrieved_data
diff --git a/tests/agent/sql_agent_test/test_sql_agent.sh b/tests/agent/sql_agent_test/test_sql_agent.sh
deleted file mode 100644
index 4a49b1a4fc..0000000000
--- a/tests/agent/sql_agent_test/test_sql_agent.sh
+++ /dev/null
@@ -1,193 +0,0 @@
-#!/bin/bash
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-#set -xe
-
-# this script should be run from tests directory
-#  bash agent/sql_agent_test/test_sql_agent.sh
-
-WORKPATH=$(dirname "$PWD")
-echo $WORKPATH
-LOG_PATH="$WORKPATH/tests"
-
-# WORKDIR is one level up from GenAIComps
-export WORKDIR=$(dirname "$WORKPATH")
-echo $WORKDIR
-
-export agent_image="opea/agent-langchain:comps"
-export agent_container_name="test-comps-agent-endpoint"
-
-export ip_address=$(hostname -I | awk '{print $1}')
-
-vllm_port=8086
-vllm_volume=${HF_CACHE_DIR}
-
-export model=meta-llama/Meta-Llama-3.1-70B-Instruct
-export HUGGINGFACEHUB_API_TOKEN=${HF_TOKEN}
-export LLM_MODEL_ID="meta-llama/Meta-Llama-3.1-70B-Instruct"
-export LLM_ENDPOINT_URL="http://${ip_address}:${vllm_port}"
-export temperature=0.01
-export max_new_tokens=4096
-export TOOLSET_PATH=$WORKPATH/comps/agent/langchain/tools/ # $WORKPATH/tests/agent/sql_agent_test/
-echo "TOOLSET_PATH=${TOOLSET_PATH}"
-export recursion_limit=15
-export db_name=california_schools
-export db_path=/home/user/TAG-Bench/dev_folder/dev_databases/${db_name}/${db_name}.sqlite
-
-# for using Google search API
-export GOOGLE_CSE_ID=${GOOGLE_CSE_ID}
-export GOOGLE_API_KEY=${GOOGLE_API_KEY}
-
-
-# download the test data
-function prepare_data() {
-    cd $WORKDIR
-
-    echo "Downloading data..."
-    git clone https://github.com/TAG-Research/TAG-Bench.git
-    cd TAG-Bench/setup
-    chmod +x get_dbs.sh
-    ./get_dbs.sh
-
-    echo "Split data..."
-    cd $WORKPATH/tests/agent/sql_agent_test
-    bash run_data_split.sh
-
-    echo "Data preparation done!"
-}
-
-function remove_data() {
-    echo "Removing data..."
-    cd $WORKDIR
-    rm -rf TAG-Bench
-    echo "Data removed!"
-}
-
-
-function generate_hints_for_benchmark() {
-    echo "Generating hints for benchmark..."
-    cd $WORKPATH/tests/agent/sql_agent_test
-    python3 generate_hints_file.py
-}
-
-function build_docker_images() {
-    echo "Building the docker images"
-    cd $WORKPATH
-    echo $WORKPATH
-    docker build --no-cache -t $agent_image --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -f comps/agent/langchain/Dockerfile .
-    if [ $? -ne 0 ]; then
-        echo "opea/agent-langchain built fail"
-        exit 1
-    else
-        echo "opea/agent-langchain built successful"
-    fi
-}
-
-function build_vllm_docker_images() {
-    echo "Building the vllm docker images"
-    cd $WORKPATH
-    echo $WORKPATH
-    if [ ! -d "./vllm" ]; then
-        git clone https://github.com/HabanaAI/vllm-fork.git
-    fi
-    cd ./vllm-fork
-    docker build --no-cache -f Dockerfile.hpu -t opea/vllm-gaudi:comps --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
-    if [ $? -ne 0 ]; then
-        echo "opea/vllm-gaudi:comps failed"
-        exit 1
-    else
-        echo "opea/vllm-gaudi:comps successful"
-    fi
-}
-
-function start_vllm_service() {
-    # redis endpoint
-    echo "token is ${HF_TOKEN}"
-
-    #single card
-    echo "start vllm gaudi service"
-    echo "**************model is $model**************"
-    docker run -d --runtime=habana --rm --name "test-comps-vllm-gaudi-service" -e HABANA_VISIBLE_DEVICES=0,1,2,3 -p $vllm_port:80 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host opea/vllm-gaudi:comps --model ${model} --host 0.0.0.0 --port 80 --block-size 128 --max-seq-len-to-capture 16384 --tensor-parallel-size 4
-    sleep 5s
-    echo "Waiting vllm gaudi ready"
-    n=0
-    until [[ "$n" -ge 100 ]] || [[ $ready == true ]]; do
-        docker logs test-comps-vllm-gaudi-service &> ${LOG_PATH}/vllm-gaudi-service.log
-        n=$((n+1))
-        if grep -q "Uvicorn running on" ${LOG_PATH}/vllm-gaudi-service.log; then
-            break
-        fi
-        if grep -q "No such container" ${LOG_PATH}/vllm-gaudi-service.log; then
-            echo "container test-comps-vllm-gaudi-service not found"
-            exit 1
-        fi
-        sleep 5s
-    done
-    sleep 5s
-    echo "Service started successfully"
-}
-# launch the agent
-function start_sql_agent_llama_service() {
-    echo "Starting sql_agent_llama agent microservice"
-    docker compose -f $WORKPATH/tests/agent/sql_agent_llama.yaml up -d
-    sleep 3m
-    docker logs test-comps-agent-endpoint
-    echo "Service started successfully"
-}
-
-
-function start_sql_agent_openai_service() {
-    export OPENAI_API_KEY=${OPENAI_API_KEY}
-    echo "Starting sql_agent_openai agent microservice"
-    docker compose -f $WORKPATH/tests/agent/sql_agent_openai.yaml up -d
-    sleep 3m
-    docker logs test-comps-agent-endpoint
-    echo "Service started successfully"
-}
-
-# run the test
-function run_test() {
-    echo "Running test..."
-    cd $WORKPATH/tests/agent/
-    python3 test.py --test-sql-agent
-}
-
-function run_benchmark() {
-    echo "Running benchmark..."
-    cd $WORKPATH/tests/agent/sql_agent_test
-    query_file=${WORKDIR}/TAG-Bench/query_by_db/query_california_schools.csv
-    outdir=$WORKDIR/sql_agent_output
-    outfile=california_school_agent_test_result.csv
-    python3 test_tag_bench.py --query_file $query_file --output_dir $outdir --output_file $outfile
-}
-
-# echo "Building docker image...."
-# build_docker_images
-
-echo "Preparing data...."
-prepare_data
-
-# echo "Building vllm docker image...."
-# build_vllm_docker_images
-
-# echo "Launching vllm service...."
-# start_vllm_service
-
-# echo "Generating hints_file..."
-# generate_hints_for_benchmark
-
-echo "launching sql_agent_llama service...."
-start_sql_agent_llama_service
-
-# echo "launching sql_agent_openai service...."
-# start_sql_agent_openai_service
-
-echo "Running test...."
-run_test
-
-# echo "Running benchmark...."
-# run_benchmark
-
-echo "Removing data...."
-remove_data
diff --git a/tests/agent/sql_agent_test/test_tag_bench.py b/tests/agent/sql_agent_test/test_tag_bench.py
deleted file mode 100644
index 6664759f31..0000000000
--- a/tests/agent/sql_agent_test/test_tag_bench.py
+++ /dev/null
@@ -1,55 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import argparse
-import os
-
-import pandas as pd
-import requests
-
-
-def generate_answer_agent_api(url, prompt):
-    proxies = {"http": ""}
-    payload = {
-        "query": prompt,
-    }
-    response = requests.post(url, json=payload, proxies=proxies)
-    answer = response.json()["text"]
-    return answer
-
-
-def save_json_lines(json_lines, args):
-    outfile = "sql_agent_results.json"
-    output = os.path.join(args.output_dir, outfile)
-    with open(output, "w") as f:
-        for line in json_lines:
-            f.write(str(line) + "\n")
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--query_file", type=str)
-    parser.add_argument("--output_dir", type=str)
-    parser.add_argument("--output_file", type=str)
-    args = parser.parse_args()
-
-    df = pd.read_csv(args.query_file)
-
-    if not os.path.exists(args.output_dir):
-        os.makedirs(args.output_dir)
-
-    ip_address = os.getenv("ip_address", "localhost")
-    url = f"http://{ip_address}:9095/v1/chat/completions"
-
-    json_lines = []
-    for _, row in df.iterrows():
-        query = row["Query"]
-        ref_answer = row["Answer"]
-        print("******Query:\n", query)
-        res = generate_answer_agent_api(url, query)
-        print("******Answer:\n", res)
-        # json_lines.append({"query": query,"answer":ref_answer, "agent_answer": res})
-        # save_json_lines(json_lines, args)
-        print("=" * 20)
-
-    df.to_csv(os.path.join(args.output_dir, args.output_file), index=False)
diff --git a/tests/agent/test.py b/tests/agent/test.py
index e345e89420..fdbfe1c5b3 100644
--- a/tests/agent/test.py
+++ b/tests/agent/test.py
@@ -45,16 +45,11 @@ def process_request(url, query, is_stream=False):
 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
     parser.add_argument("--stream", action="store_true", help="Stream the response")
-    parser.add_argument("--test-sql-agent", action="store_true", help="Test the SQL agent")
     args = parser.parse_args()
 
     ip_address = os.getenv("ip_address", "localhost")
     url = f"http://{ip_address}:9095/v1/chat/completions"
-    if args.test_sql_agent:
-        prompt = "How many schools have the average score in Math over 560 in the SAT test?"
-    else:
-        prompt = "What is OPEA?"
-
+    prompt = "What is OPEA?"
     if args.stream:
         process_request(url, prompt, is_stream=True)
     else:
diff --git a/tests/agent/test_agent_langchain_on_intel_hpu.sh b/tests/agent/test_agent_langchain_on_intel_hpu.sh
index 77aa1fb19d..9ba25228ad 100644
--- a/tests/agent/test_agent_langchain_on_intel_hpu.sh
+++ b/tests/agent/test_agent_langchain_on_intel_hpu.sh
@@ -6,17 +6,12 @@
 
 WORKPATH=$(dirname "$PWD")
 echo $WORKPATH
-ls $WORKPATH
-echo "========================="
 LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')
 tgi_port=8085
 tgi_volume=$WORKPATH/data
-
 vllm_port=8086
-export vllm_volume=$WORKPATH/data
-echo "vllm_volume:"
-ls $vllm_volume
+vllm_volume=$WORKPATH/data
 
 export WORKPATH=$WORKPATH
 
@@ -28,7 +23,7 @@ export HUGGINGFACEHUB_API_TOKEN=${HF_TOKEN}
 export ip_address=$(hostname -I | awk '{print $1}')
 export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
 export LLM_MODEL_ID="meta-llama/Meta-Llama-3.1-70B-Instruct"
-export LLM_ENDPOINT_URL="http://${ip_address}:${vllm_port}"
+export LLM_ENDPOINT_URL="http://${ip_address}:${tgi_port}"
 export temperature=0.01
 export max_new_tokens=4096
 export TOOLSET_PATH=$WORKPATH/comps/agent/langchain/tools/
@@ -93,7 +88,7 @@ function start_vllm_service() {
     #single card
     echo "start vllm gaudi service"
     echo "**************model is $model**************"
-    docker run -d --runtime=habana --rm --name "test-comps-vllm-gaudi-service" -e HABANA_VISIBLE_DEVICES=all -p $vllm_port:80 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host opea/vllm-gaudi:comps --model ${model} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs  4096 --max-seq-len-to-capture 8192
+    docker run -d --runtime=habana --rm --name "test-comps-vllm-gaudi-service" -e HABANA_VISIBLE_DEVICES=all -p $vllm_port:80 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host opea/vllm-gaudi:comps --model ${model} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs  4096 --max-seq_len-to-capture 8192
     sleep 5s
     echo "Waiting vllm gaudi ready"
     n=0
@@ -120,34 +115,7 @@ function start_vllm_auto_tool_choice_service() {
     #single card
     echo "start vllm gaudi service"
     echo "**************auto_tool model is $model**************"
-    docker run -d --runtime=habana --rm --name "test-comps-vllm-gaudi-service" -e HABANA_VISIBLE_DEVICES=all -p $vllm_port:80 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host opea/vllm-gaudi:comps --model ${model} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs  4096 --max-seq-len-to-capture 8192 --enable-auto-tool-choice --tool-call-parser ${model_parser}
-    sleep 5s
-    echo "Waiting vllm gaudi ready"
-    n=0
-    until [[ "$n" -ge 100 ]] || [[ $ready == true ]]; do
-        docker logs test-comps-vllm-gaudi-service &> ${LOG_PATH}/vllm-gaudi-service.log
-        n=$((n+1))
-        if grep -q "Uvicorn running on" ${LOG_PATH}/vllm-gaudi-service.log; then
-            break
-        fi
-        if grep -q "No such container" ${LOG_PATH}/vllm-gaudi-service.log; then
-            echo "container test-comps-vllm-gaudi-service not found"
-            exit 1
-        fi
-        sleep 5s
-    done
-    sleep 5s
-    echo "Service started successfully"
-}
-
-function start_vllm_service_70B() {
-    # redis endpoint
-    echo "token is ${HF_TOKEN}"
-
-    #single card
-    echo "start vllm gaudi service"
-    echo "**************model is $model**************"
-    docker run -d --runtime=habana --rm --name "test-comps-vllm-gaudi-service" -e HABANA_VISIBLE_DEVICES=0,1,2,3 -p $vllm_port:80 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host opea/vllm-gaudi:comps --model ${model} --host 0.0.0.0 --port 80 --block-size 128 --max-seq-len-to-capture 16384 --tensor-parallel-size 4
+    docker run -d --runtime=habana --rm --name "test-comps-vllm-gaudi-service" -e HABANA_VISIBLE_DEVICES=all -p $vllm_port:80 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host opea/vllm-gaudi:comps --model ${model} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs  4096 --max-seq_len-to-capture 8192 --enable-auto-tool-choice --tool-call-parser ${model_parser}
     sleep 5s
     echo "Waiting vllm gaudi ready"
     n=0
@@ -170,7 +138,7 @@ function start_vllm_service_70B() {
 function start_react_langchain_agent_service() {
     echo "Starting react_langchain agent microservice"
     docker compose -f $WORKPATH/tests/agent/react_langchain.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -179,7 +147,7 @@ function start_react_langchain_agent_service() {
 function start_react_langgraph_agent_service_openai() {
     echo "Starting react_langchain agent microservice"
     docker compose -f $WORKPATH/tests/agent/react_langgraph_openai.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -188,7 +156,7 @@ function start_react_langgraph_agent_service_openai() {
 function start_react_llama_agent_service() {
     echo "Starting react_langgraph agent microservice"
     docker compose -f $WORKPATH/tests/agent/reactllama.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -196,7 +164,7 @@ function start_react_llama_agent_service() {
 function start_react_langgraph_agent_service_vllm() {
     echo "Starting react_langgraph agent microservice"
     docker compose -f $WORKPATH/tests/agent/react_vllm.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -204,7 +172,7 @@ function start_react_langgraph_agent_service_vllm() {
 function start_planexec_agent_service_vllm() {
     echo "Starting planexec agent microservice"
     docker compose -f $WORKPATH/tests/agent/planexec_vllm.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -212,7 +180,7 @@ function start_planexec_agent_service_vllm() {
 function start_ragagent_agent_service() {
     echo "Starting rag agent microservice"
     docker compose -f $WORKPATH/tests/agent/ragagent.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -220,7 +188,7 @@ function start_ragagent_agent_service() {
 function start_ragagent_agent_service_openai() {
     echo "Starting rag agent microservice"
     docker compose -f $WORKPATH/tests/agent/ragagent_openai.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -228,7 +196,7 @@ function start_ragagent_agent_service_openai() {
 function start_planexec_agent_service_openai() {
     echo "Starting plan execute agent microservice"
     docker compose -f $WORKPATH/tests/agent/planexec_openai.yaml up -d
-    sleep 60s
+    sleep 5s
     docker logs test-comps-agent-endpoint
     echo "Service started successfully"
 }
@@ -307,12 +275,7 @@ function stop_tgi_docker() {
     cid=$(docker ps -aq --filter "name=test-comps-tgi-gaudi-service")
     echo "Stopping the docker containers "${cid}
     if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
-    echo "TGI Docker containers stopped successfully"
-
-    cid=$(docker ps -aq --filter "name=tgi-server")
-    echo "Stopping the docker containers "${cid}
-    if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
-    echo "TGI Docker containers stopped successfully"
+    echo "Docker containers stopped successfully"
 }
 
 function stop_vllm_docker() {
@@ -335,35 +298,13 @@ function stop_docker() {
     stop_agent_docker
 }
 
-
-function validate_sql_agent(){
-    cd $WORKPATH/tests/
-    local CONTENT=$(bash agent/sql_agent_test/test_sql_agent.sh)
-    local EXIT_CODE=$(validate "$CONTENT" "173" "test-sql-agent")
-    echo "$EXIT_CODE"
-    local EXIT_CODE="${EXIT_CODE:0-1}"
-    echo "return value is $EXIT_CODE"
-    if [ "$EXIT_CODE" == "1" ]; then
-        echo "==================SQL Agent logs ======================"
-        docker logs test-comps-agent-endpoint
-        # echo "================== vllm gaudi service logs ======================"
-        # docker logs test-comps-vllm-gaudi-service
-        exit 1
-    fi
-}
-
-
 function main() {
     stop_agent_docker
     stop_docker
     build_docker_images
-    build_vllm_docker_images
-
-    # ==================== Tests with 70B model ====================
-    # RAG agent, react_llama, react_langchain, assistant apis
 
-    # start_tgi_service
-    start_vllm_service_70B
+    # ==================== TGI tests ====================
+    start_tgi_service
 
     # test rag agent
     start_ragagent_agent_service
@@ -372,7 +313,7 @@ function main() {
     stop_agent_docker
     echo "============================================="
 
-    # # test react_llama
+    # test react_llama
     start_react_llama_agent_service
     echo "===========Testing ReAct Llama ============="
     validate_microservice
@@ -380,7 +321,7 @@ function main() {
     echo "============================================="
 
 
-    # # test react_langchain
+    # test react_langchain
     start_react_langchain_agent_service
     echo "=============Testing ReAct Langchain============="
     validate_microservice_streaming
@@ -388,52 +329,56 @@ function main() {
     stop_agent_docker
     echo "============================================="
 
-    # stop_tgi_docker
-
-    # test sql agent
-    validate_sql_agent
-
-    stop_docker
+    stop_tgi_docker
 
-    # # # ==================== Test react_langgraph with vllm auto-tool-choice ====================
+    # ==================== VLLM tests ====================
+    build_vllm_docker_images
 
     export model=mistralai/Mistral-7B-Instruct-v0.3
     export LLM_MODEL_ID=${model}
     export model_parser=mistral
     export LLM_ENDPOINT_URL="http://${ip_address}:${vllm_port}"
 
-    test react with vllm - Mistral
+    # test react with vllm - Mistral
     start_vllm_auto_tool_choice_service
     start_react_langgraph_agent_service_vllm
     echo "===========Testing ReAct Langgraph VLLM Mistral ============="
     validate_microservice
-    stop_agent_docker
-    stop_vllm_docker
+    # stop_agent_docker
+    # stop_vllm_docker
     echo "============================================="
 
-    # # # ==================== Test plan-execute agent with vllm guided decoding ====================
     # test plan execute with vllm - Mistral
-    # start_vllm_service
-    # start_planexec_agent_service_vllm
-    # echo "===========Testing Plan Execute VLLM Mistral ============="
-    # validate_microservice
-    # stop_agent_docker
-    # stop_vllm_docker
-    # echo "============================================="
+    start_vllm_service
+    start_planexec_agent_service_vllm
+    echo "===========Testing Plan Execute VLLM Mistral ============="
+    validate_microservice
+    stop_agent_docker
+    stop_vllm_docker
+    echo "============================================="
 
-    # export model=meta-llama/Llama-3.1-8B-Instruct
-    # export LLM_MODEL_ID=${model}
-    # export model_parser=llama3_json
+    export model=meta-llama/Llama-3.1-8B-Instruct
+    export LLM_MODEL_ID=${model}
+    export model_parser=llama3_json
 
-    # # test plan execute with vllm - llama3.1
-    # start_vllm_service
-    # start_planexec_agent_service_vllm
-    # echo "===========Testing Plan Execute VLLM Llama3.1 ============="
+    # test react with vllm - llama3 support has not been synced to vllm-gaudi yet
+    # start_vllm_auto_tool_choice_service
+    # start_react_langgraph_agent_service_vllm
+    # echo "===========Testing ReAct VLLM ============="
     # validate_microservice
     # stop_agent_docker
     # stop_vllm_docker
     # echo "============================================="
 
+    # test plan execute with vllm - llama3.1
+    start_vllm_service
+    start_planexec_agent_service_vllm
+    echo "===========Testing Plan Execute VLLM Llama3.1 ============="
+    validate_microservice
+    stop_agent_docker
+    stop_vllm_docker
+    echo "============================================="
+
 
     # # ==================== OpenAI tests ====================
     # start_ragagent_agent_service_openai
@@ -454,7 +399,6 @@ function main() {
     # stop_agent_docker
 
     stop_docker
-
     echo y | docker system prune 2>&1 > /dev/null
 }
 
diff --git a/tests/cores/mega/test_aio.py b/tests/cores/mega/test_aio.py
index fc735e70aa..4187cb0349 100644
--- a/tests/cores/mega/test_aio.py
+++ b/tests/cores/mega/test_aio.py
@@ -14,6 +14,7 @@
 
 import asyncio
 import json
+import multiprocessing
 import time
 import unittest
 
@@ -55,9 +56,14 @@ def setUp(self):
         self.s1 = opea_microservices["s1"]
         self.s2 = opea_microservices["s2"]
         self.s3 = opea_microservices["s3"]
-        self.s1.start()
-        self.s2.start()
-        self.s3.start()
+
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process2 = multiprocessing.Process(target=self.s2.start, daemon=False, name="s2")
+        self.process3 = multiprocessing.Process(target=self.s3.start, daemon=False, name="s2")
+
+        self.process1.start()
+        self.process2.start()
+        self.process3.start()
 
         self.service_builder = ServiceOrchestrator()
 
@@ -70,6 +76,10 @@ def tearDown(self):
         self.s2.stop()
         self.s3.stop()
 
+        self.process1.terminate()
+        self.process2.terminate()
+        self.process3.terminate()
+
     async def test_schedule(self):
         t = time.time()
         task1 = asyncio.create_task(self.service_builder.schedule(initial_inputs={"text": "hello, "}))
diff --git a/tests/cores/mega/test_base_statistics.py b/tests/cores/mega/test_base_statistics.py
index ef4e7da3e0..878b3016c5 100644
--- a/tests/cores/mega/test_base_statistics.py
+++ b/tests/cores/mega/test_base_statistics.py
@@ -3,6 +3,7 @@
 
 import asyncio
 import json
+import multiprocessing
 import time
 import unittest
 
@@ -34,13 +35,15 @@ async def s1_add(request: TextDoc) -> TextDoc:
 class TestBaseStatistics(unittest.IsolatedAsyncioTestCase):
     def setUp(self):
         self.s1 = opea_microservices["s1"]
-        self.s1.start()
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process1.start()
 
         self.service_builder = ServiceOrchestrator()
         self.service_builder.add(opea_microservices["s1"])
 
     def tearDown(self):
         self.s1.stop()
+        self.process1.terminate()
 
     async def test_base_statistics(self):
         for _ in range(2):
diff --git a/tests/cores/mega/test_dynamic_batching.py b/tests/cores/mega/test_dynamic_batching.py
index 945054fb0f..bcb185b8fa 100644
--- a/tests/cores/mega/test_dynamic_batching.py
+++ b/tests/cores/mega/test_dynamic_batching.py
@@ -2,6 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import asyncio
+import multiprocessing
 import unittest
 from enum import Enum
 
@@ -67,10 +68,12 @@ async def fetch(session, url, data):
 
 class TestMicroService(unittest.IsolatedAsyncioTestCase):
     def setUp(self):
-        opea_microservices["s1"].start()
+        self.process1 = multiprocessing.Process(target=opea_microservices["s1"].start, daemon=False, name="s1")
+        self.process1.start()
 
     def tearDown(self):
         opea_microservices["s1"].stop()
+        self.process1.terminate()
 
     async def test_dynamic_batching(self):
         url1 = "http://localhost:8080/v1/add1"
diff --git a/tests/cores/mega/test_handle_message.py b/tests/cores/mega/test_handle_message.py
new file mode 100644
index 0000000000..078bcdcd06
--- /dev/null
+++ b/tests/cores/mega/test_handle_message.py
@@ -0,0 +1,133 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import json
+import unittest
+from typing import Union
+
+from comps.cores.mega.utils import handle_message
+
+
+class TestHandleMessage(unittest.IsolatedAsyncioTestCase):
+
+    def test_handle_message(self):
+        messages = [
+            {"role": "user", "content": "opea project! "},
+        ]
+        prompt = handle_message(messages)
+        self.assertEqual(prompt, "user: opea project! \n")
+
+    def test_handle_message_with_system_prompt(self):
+        messages = [
+            {"role": "system", "content": "System Prompt"},
+            {"role": "user", "content": "opea project! "},
+        ]
+        prompt = handle_message(messages)
+        self.assertEqual(prompt, "System Prompt\nuser: opea project! \n")
+
+    def test_handle_message_with_image(self):
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": "hello, "},
+                    {
+                        "type": "image_url",
+                        "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
+                    },
+                ],
+            },
+        ]
+        prompt, image = handle_message(messages)
+        self.assertEqual(prompt, "user: hello, \n")
+
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": ""},
+                    {
+                        "type": "image_url",
+                        "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
+                    },
+                ],
+            },
+        ]
+        prompt, image = handle_message(messages)
+        self.assertEqual(prompt, "user:")
+
+    def test_handle_message_with_image_str(self):
+        self.img_b64_str = (
+            "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"
+        )
+
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": "hello, "},
+                    {
+                        "type": "image_url",
+                        "image_url": {"url": self.img_b64_str},
+                    },
+                ],
+            },
+        ]
+        prompt, image = handle_message(messages)
+        self.assertEqual(image[0], self.img_b64_str)
+
+    def test_handle_message_with_image_local(self):
+        img_b64_str = (
+            "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"
+        )
+        import base64
+        import io
+
+        from PIL import Image
+
+        img = Image.open(io.BytesIO(base64.decodebytes(bytes(img_b64_str, "utf-8"))))
+        img.save("./test.png")
+
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": "hello, "},
+                    {
+                        "type": "image_url",
+                        "image_url": {"url": "./test.png"},
+                    },
+                ],
+            },
+        ]
+        prompt, image = handle_message(messages)
+        self.assertEqual(prompt, "user: hello, \n")
+
+    def test_handle_message_with_content_list(self):
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": "hello, "},
+                ],
+            },
+            {"role": "assistant", "content": "opea project! "},
+            {"role": "user", "content": ""},
+        ]
+        prompt = handle_message(messages)
+        self.assertEqual(prompt, "user:assistant: opea project! \n")
+
+    def test_handle_string_message(self):
+        messages = "hello, "
+        prompt = handle_message(messages)
+        self.assertEqual(prompt, "hello, ")
+
+    def test_handle_message_with_invalid_role(self):
+        messages = [
+            {"role": "user_test", "content": "opea project! "},
+        ]
+        self.assertRaises(ValueError, handle_message, messages)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/cores/mega/test_hybrid_service_orchestrator.py b/tests/cores/mega/test_hybrid_service_orchestrator.py
index 0838d25ec8..89522eac3e 100644
--- a/tests/cores/mega/test_hybrid_service_orchestrator.py
+++ b/tests/cores/mega/test_hybrid_service_orchestrator.py
@@ -2,6 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import json
+import multiprocessing
 import unittest
 
 from comps import MicroService, ServiceOrchestrator, TextDoc, opea_microservices, register_microservice
@@ -19,23 +20,21 @@ async def s1_add(request: TextDoc) -> TextDoc:
 class TestServiceOrchestrator(unittest.TestCase):
     def setUp(self):
         self.s1 = opea_microservices["s1"]
-        self.s1.start()
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process1.start()
 
         self.service_builder = ServiceOrchestrator()
 
     def tearDown(self):
         self.s1.stop()
+        self.process1.terminate()
 
     def test_add_remote_service(self):
         s2 = MicroService(name="s2", host="fakehost", port=8008, endpoint="/v1/add", use_remote_service=True)
         self.service_builder.add(opea_microservices["s1"]).add(s2)
         self.service_builder.flow_to(self.s1, s2)
         self.assertEqual(s2.endpoint_path, "http://fakehost:8008/v1/add")
-        # Check whether the right exception is raise when init/stop remote service
-        try:
-            s2.start()
-        except Exception as e:
-            self.assertTrue("Method not allowed" in str(e))
+        self.assertRaises(Exception, s2._validate_env, "N/A")
 
 
 if __name__ == "__main__":
diff --git a/tests/cores/mega/test_hybrid_service_orchestrator_with_yaml.py b/tests/cores/mega/test_hybrid_service_orchestrator_with_yaml.py
index bd23201841..8d70ab43f0 100644
--- a/tests/cores/mega/test_hybrid_service_orchestrator_with_yaml.py
+++ b/tests/cores/mega/test_hybrid_service_orchestrator_with_yaml.py
@@ -2,6 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import json
+import multiprocessing
 import unittest
 
 from comps import ServiceOrchestratorWithYaml, TextDoc, opea_microservices, register_microservice
@@ -19,10 +20,12 @@ async def s1_add(request: TextDoc) -> TextDoc:
 class TestYAMLOrchestrator(unittest.TestCase):
     def setUp(self) -> None:
         self.s1 = opea_microservices["s1"]
-        self.s1.start()
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process1.start()
 
     def tearDown(self):
         self.s1.stop()
+        self.process1.terminate()
 
     def test_add_remote_service(self):
         service_builder = ServiceOrchestratorWithYaml(yaml_file_path="megaservice_hybrid.yaml")
diff --git a/tests/cores/mega/test_microservice.py b/tests/cores/mega/test_microservice.py
index dbaff9a760..b621dda5ae 100644
--- a/tests/cores/mega/test_microservice.py
+++ b/tests/cores/mega/test_microservice.py
@@ -2,11 +2,12 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import json
+import multiprocessing
 import unittest
 
 from fastapi.testclient import TestClient
 
-from comps import TextDoc, opea_microservices, register_microservice
+from comps import MicroService, TextDoc, opea_microservices, register_microservice
 
 
 @register_microservice(name="s1", host="0.0.0.0", port=8080, endpoint="/v1/add")
@@ -18,14 +19,24 @@ async def add(request: TextDoc) -> TextDoc:
     return {"text": text}
 
 
+def sum_test():
+    return 1 + 1
+
+
 class TestMicroService(unittest.TestCase):
     def setUp(self):
         self.client = TestClient(opea_microservices["s1"].app)
 
-        opea_microservices["s1"].start()
+        opea_microservices["s1"].add_route("/v1/sum", sum_test, methods=["GET"])
+        self.process1 = multiprocessing.Process(target=opea_microservices["s1"].start, daemon=False, name="s1")
+
+        self.process1.start()
+
+        self.assertRaises(RuntimeError, MicroService, name="s2", host="0.0.0.0", port=8080, endpoint="/v1/add")
 
     def tearDown(self):
         opea_microservices["s1"].stop()
+        self.process1.terminate()
 
     def test_add_route(self):
         response = self.client.post("/v1/add", json={"text": "Hello, "})
@@ -34,6 +45,14 @@ def test_add_route(self):
         response = self.client.get("/metrics")
         self.assertEqual(response.status_code, 200)
 
+        response = self.client.get("/v1/health_check")
+        self.assertEqual(
+            response.json(), {"Service Title": "s1", "Service Description": "OPEA Microservice Infrastructure"}
+        )
+
+        response = self.client.get("/v1/sum")
+        self.assertEqual(response.json(), 2)
+
 
 if __name__ == "__main__":
     unittest.main()
diff --git a/tests/cores/mega/test_multimodalqna_gateway.py b/tests/cores/mega/test_multimodalqna_gateway.py
deleted file mode 100644
index c05bf57bdd..0000000000
--- a/tests/cores/mega/test_multimodalqna_gateway.py
+++ /dev/null
@@ -1,213 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import json
-import unittest
-from typing import Union
-
-import requests
-from fastapi import Request
-
-from comps import (
-    EmbedDoc,
-    EmbedMultimodalDoc,
-    LVMDoc,
-    LVMSearchedMultimodalDoc,
-    MultimodalDoc,
-    MultimodalQnAGateway,
-    SearchedMultimodalDoc,
-    ServiceOrchestrator,
-    TextDoc,
-    opea_microservices,
-    register_microservice,
-)
-
-
-@register_microservice(name="mm_embedding", host="0.0.0.0", port=8083, endpoint="/v1/mm_embedding")
-async def mm_embedding_add(request: MultimodalDoc) -> EmbedDoc:
-    req = request.model_dump_json()
-    req_dict = json.loads(req)
-    text = req_dict["text"]
-    res = {}
-    res["text"] = text
-    res["embedding"] = [0.12, 0.45]
-    return res
-
-
-@register_microservice(name="mm_retriever", host="0.0.0.0", port=8084, endpoint="/v1/mm_retriever")
-async def mm_retriever_add(request: EmbedMultimodalDoc) -> SearchedMultimodalDoc:
-    req = request.model_dump_json()
-    req_dict = json.loads(req)
-    text = req_dict["text"]
-    res = {}
-    res["retrieved_docs"] = []
-    res["initial_query"] = text
-    res["top_n"] = 1
-    res["metadata"] = [
-        {
-            "b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC",
-            "transcript_for_inference": "yellow image",
-        }
-    ]
-    res["chat_template"] = "The caption of the image is: '{context}'. {question}"
-    return res
-
-
-@register_microservice(name="lvm", host="0.0.0.0", port=8085, endpoint="/v1/lvm")
-async def lvm_add(request: Union[LVMDoc, LVMSearchedMultimodalDoc]) -> TextDoc:
-    req = request.model_dump_json()
-    req_dict = json.loads(req)
-    if isinstance(request, LVMSearchedMultimodalDoc):
-        print("request is the output of multimodal retriever")
-        text = req_dict["initial_query"]
-        text += "opea project!"
-
-    else:
-        print("request is from user.")
-        text = req_dict["prompt"]
-        text = f"<image>\nUSER: {text}\nASSISTANT:"
-
-    res = {}
-    res["text"] = text
-    return res
-
-
-class TestServiceOrchestrator(unittest.IsolatedAsyncioTestCase):
-    @classmethod
-    def setUpClass(cls):
-        cls.mm_embedding = opea_microservices["mm_embedding"]
-        cls.mm_retriever = opea_microservices["mm_retriever"]
-        cls.lvm = opea_microservices["lvm"]
-        cls.mm_embedding.start()
-        cls.mm_retriever.start()
-        cls.lvm.start()
-
-        cls.service_builder = ServiceOrchestrator()
-
-        cls.service_builder.add(opea_microservices["mm_embedding"]).add(opea_microservices["mm_retriever"]).add(
-            opea_microservices["lvm"]
-        )
-        cls.service_builder.flow_to(cls.mm_embedding, cls.mm_retriever)
-        cls.service_builder.flow_to(cls.mm_retriever, cls.lvm)
-
-        cls.follow_up_query_service_builder = ServiceOrchestrator()
-        cls.follow_up_query_service_builder.add(cls.lvm)
-
-        cls.gateway = MultimodalQnAGateway(cls.service_builder, cls.follow_up_query_service_builder, port=9898)
-
-    @classmethod
-    def tearDownClass(cls):
-        cls.mm_embedding.stop()
-        cls.mm_retriever.stop()
-        cls.lvm.stop()
-        cls.gateway.stop()
-
-    async def test_service_builder_schedule(self):
-        result_dict, _ = await self.service_builder.schedule(initial_inputs={"text": "hello, "})
-        self.assertEqual(result_dict[self.lvm.name]["text"], "hello, opea project!")
-
-    async def test_follow_up_query_service_builder_schedule(self):
-        result_dict, _ = await self.follow_up_query_service_builder.schedule(
-            initial_inputs={"prompt": "chao, ", "image": "some image"}
-        )
-        # print(result_dict)
-        self.assertEqual(result_dict[self.lvm.name]["text"], "<image>\nUSER: chao, \nASSISTANT:")
-
-    def test_MultimodalQnAGateway_gateway(self):
-        json_data = {"messages": "hello, "}
-        response = requests.post("http://0.0.0.0:9898/v1/multimodalqna", json=json_data)
-        response = response.json()
-        self.assertEqual(response["choices"][-1]["message"]["content"], "hello, opea project!")
-
-    def test_follow_up_MultimodalQnAGateway_gateway(self):
-        json_data = {
-            "messages": [
-                {
-                    "role": "user",
-                    "content": [
-                        {"type": "text", "text": "hello, "},
-                        {
-                            "type": "image_url",
-                            "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
-                        },
-                    ],
-                },
-                {"role": "assistant", "content": "opea project! "},
-                {"role": "user", "content": "chao, "},
-            ],
-            "max_tokens": 300,
-        }
-        response = requests.post("http://0.0.0.0:9898/v1/multimodalqna", json=json_data)
-        response = response.json()
-        self.assertEqual(
-            response["choices"][-1]["message"]["content"],
-            "<image>\nUSER: hello, \nASSISTANT: opea project! \nUSER: chao, \n\nASSISTANT:",
-        )
-
-    def test_handle_message(self):
-        messages = [
-            {
-                "role": "user",
-                "content": [
-                    {"type": "text", "text": "hello, "},
-                    {
-                        "type": "image_url",
-                        "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
-                    },
-                ],
-            },
-            {"role": "assistant", "content": "opea project! "},
-            {"role": "user", "content": "chao, "},
-        ]
-        prompt, images = self.gateway._handle_message(messages)
-        self.assertEqual(prompt, "hello, \nASSISTANT: opea project! \nUSER: chao, \n")
-
-    def test_handle_message_with_system_prompt(self):
-        messages = [
-            {"role": "system", "content": "System Prompt"},
-            {
-                "role": "user",
-                "content": [
-                    {"type": "text", "text": "hello, "},
-                    {
-                        "type": "image_url",
-                        "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
-                    },
-                ],
-            },
-            {"role": "assistant", "content": "opea project! "},
-            {"role": "user", "content": "chao, "},
-        ]
-        prompt, images = self.gateway._handle_message(messages)
-        self.assertEqual(prompt, "System Prompt\nhello, \nASSISTANT: opea project! \nUSER: chao, \n")
-
-    async def test_handle_request(self):
-        json_data = {
-            "messages": [
-                {
-                    "role": "user",
-                    "content": [
-                        {"type": "text", "text": "hello, "},
-                        {
-                            "type": "image_url",
-                            "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
-                        },
-                    ],
-                },
-                {"role": "assistant", "content": "opea project! "},
-                {"role": "user", "content": "chao, "},
-            ],
-            "max_tokens": 300,
-        }
-        mock_request = Request(scope={"type": "http"})
-        mock_request._json = json_data
-        res = await self.gateway.handle_request(mock_request)
-        res = json.loads(res.json())
-        self.assertEqual(
-            res["choices"][-1]["message"]["content"],
-            "<image>\nUSER: hello, \nASSISTANT: opea project! \nUSER: chao, \n\nASSISTANT:",
-        )
-
-
-if __name__ == "__main__":
-    unittest.main()
diff --git a/tests/cores/mega/test_runtime_graph.py b/tests/cores/mega/test_runtime_graph.py
index 9a140e0b12..e1449d7fc9 100644
--- a/tests/cores/mega/test_runtime_graph.py
+++ b/tests/cores/mega/test_runtime_graph.py
@@ -1,6 +1,7 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 
+import multiprocessing
 import unittest
 
 from fastapi.testclient import TestClient
@@ -54,10 +55,15 @@ def setUp(self):
         self.s3 = opea_microservices["s3"]
         self.s4 = opea_microservices["s4"]
 
-        self.s1.start()
-        self.s2.start()
-        self.s3.start()
-        self.s4.start()
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process2 = multiprocessing.Process(target=self.s2.start, daemon=False, name="s2")
+        self.process3 = multiprocessing.Process(target=self.s3.start, daemon=False, name="s3")
+        self.process4 = multiprocessing.Process(target=self.s4.start, daemon=False, name="s4")
+
+        self.process1.start()
+        self.process2.start()
+        self.process3.start()
+        self.process4.start()
 
         self.service_builder = ServiceOrchestrator()
         self.service_builder.add(self.s1).add(self.s2).add(self.s3).add(self.s4)
@@ -70,6 +76,10 @@ def tearDown(self):
         self.s2.stop()
         self.s3.stop()
         self.s4.stop()
+        self.process1.terminate()
+        self.process2.terminate()
+        self.process3.terminate()
+        self.process4.terminate()
 
     async def test_add_route(self):
         result_dict, runtime_graph = await self.service_builder.schedule(initial_inputs={"text": "Hi!"})
diff --git a/tests/cores/mega/test_service_orchestrator.py b/tests/cores/mega/test_service_orchestrator.py
index bd19d77945..bb3e15df57 100644
--- a/tests/cores/mega/test_service_orchestrator.py
+++ b/tests/cores/mega/test_service_orchestrator.py
@@ -2,6 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import json
+import multiprocessing
 import unittest
 
 from comps import ServiceOrchestrator, TextDoc, opea_microservices, register_microservice
@@ -30,8 +31,10 @@ class TestServiceOrchestrator(unittest.IsolatedAsyncioTestCase):
     def setUpClass(cls):
         cls.s1 = opea_microservices["s1"]
         cls.s2 = opea_microservices["s2"]
-        cls.s1.start()
-        cls.s2.start()
+        cls.process1 = multiprocessing.Process(target=cls.s1.start, daemon=False, name="s1")
+        cls.process2 = multiprocessing.Process(target=cls.s2.start, daemon=False, name="s2")
+        cls.process1.start()
+        cls.process2.start()
 
         cls.service_builder = ServiceOrchestrator()
 
@@ -42,6 +45,8 @@ def setUpClass(cls):
     def tearDownClass(cls):
         cls.s1.stop()
         cls.s2.stop()
+        cls.process1.terminate()
+        cls.process2.terminate()
 
     async def test_schedule(self):
         result_dict, _ = await self.service_builder.schedule(initial_inputs={"text": "hello, "})
diff --git a/tests/cores/mega/test_service_orchestrator_protocol.py b/tests/cores/mega/test_service_orchestrator_protocol.py
index 9ee2034892..db6cfead8c 100644
--- a/tests/cores/mega/test_service_orchestrator_protocol.py
+++ b/tests/cores/mega/test_service_orchestrator_protocol.py
@@ -1,6 +1,7 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 
+import multiprocessing
 import unittest
 
 from comps import ServiceOrchestrator, opea_microservices, register_microservice
@@ -16,7 +17,8 @@ async def s1_add(request: ChatCompletionRequest) -> ChatCompletionRequest:
 class TestServiceOrchestratorProtocol(unittest.IsolatedAsyncioTestCase):
     def setUp(self):
         self.s1 = opea_microservices["s1"]
-        self.s1.start()
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process1.start()
 
         self.service_builder = ServiceOrchestrator()
 
@@ -24,6 +26,7 @@ def setUp(self):
 
     def tearDown(self):
         self.s1.stop()
+        self.process1.terminate()
 
     async def test_schedule(self):
         input_data = ChatCompletionRequest(messages=[{"role": "user", "content": "What's up man?"}], seed=None)
diff --git a/tests/cores/mega/test_service_orchestrator_streaming.py b/tests/cores/mega/test_service_orchestrator_streaming.py
index d2331dab62..e2d11b1af5 100644
--- a/tests/cores/mega/test_service_orchestrator_streaming.py
+++ b/tests/cores/mega/test_service_orchestrator_streaming.py
@@ -2,6 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import json
+import multiprocessing
 import time
 import unittest
 
@@ -38,8 +39,10 @@ class TestServiceOrchestratorStreaming(unittest.IsolatedAsyncioTestCase):
     def setUpClass(cls):
         cls.s0 = opea_microservices["s0"]
         cls.s1 = opea_microservices["s1"]
-        cls.s0.start()
-        cls.s1.start()
+        cls.process1 = multiprocessing.Process(target=cls.s0.start, daemon=False, name="s0")
+        cls.process2 = multiprocessing.Process(target=cls.s1.start, daemon=False, name="s1")
+        cls.process1.start()
+        cls.process2.start()
 
         cls.service_builder = ServiceOrchestrator()
 
@@ -50,6 +53,8 @@ def setUpClass(cls):
     def tearDownClass(cls):
         cls.s0.stop()
         cls.s1.stop()
+        cls.process1.terminate()
+        cls.process2.terminate()
 
     async def test_schedule(self):
         result_dict, _ = await self.service_builder.schedule(initial_inputs={"text": "hello, "})
diff --git a/tests/cores/mega/test_service_orchestrator_with_gateway.py b/tests/cores/mega/test_service_orchestrator_with_gateway.py
deleted file mode 100644
index 42bad2a2f6..0000000000
--- a/tests/cores/mega/test_service_orchestrator_with_gateway.py
+++ /dev/null
@@ -1,52 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import json
-import unittest
-
-from comps import Gateway, ServiceOrchestrator, TextDoc, opea_microservices, register_microservice
-
-
-@register_microservice(name="s1", host="0.0.0.0", port=8083, endpoint="/v1/add")
-async def s1_add(request: TextDoc) -> TextDoc:
-    req = request.model_dump_json()
-    req_dict = json.loads(req)
-    text = req_dict["text"]
-    text += "opea "
-    return {"text": text}
-
-
-@register_microservice(name="s2", host="0.0.0.0", port=8084, endpoint="/v1/add")
-async def s2_add(request: TextDoc) -> TextDoc:
-    req = request.model_dump_json()
-    req_dict = json.loads(req)
-    text = req_dict["text"]
-    text += "project!"
-    return {"text": text}
-
-
-class TestServiceOrchestrator(unittest.IsolatedAsyncioTestCase):
-    def setUp(self):
-        self.s1 = opea_microservices["s1"]
-        self.s2 = opea_microservices["s2"]
-        self.s1.start()
-        self.s2.start()
-
-        self.service_builder = ServiceOrchestrator()
-
-        self.service_builder.add(opea_microservices["s1"]).add(opea_microservices["s2"])
-        self.service_builder.flow_to(self.s1, self.s2)
-        self.gateway = Gateway(self.service_builder, port=9898)
-
-    def tearDown(self):
-        self.s1.stop()
-        self.s2.stop()
-        self.gateway.stop()
-
-    async def test_schedule(self):
-        result_dict, _ = await self.service_builder.schedule(initial_inputs={"text": "hello, "})
-        self.assertEqual(result_dict[self.s2.name]["text"], "hello, opea project!")
-
-
-if __name__ == "__main__":
-    unittest.main()
diff --git a/tests/cores/mega/test_service_orchestrator_with_retriever_rerank_fake.py b/tests/cores/mega/test_service_orchestrator_with_retriever_rerank_fake.py
index eb74c5fb19..bc0fe48231 100644
--- a/tests/cores/mega/test_service_orchestrator_with_retriever_rerank_fake.py
+++ b/tests/cores/mega/test_service_orchestrator_with_retriever_rerank_fake.py
@@ -2,9 +2,10 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import json
+import multiprocessing
 import unittest
 
-from comps import EmbedDoc, Gateway, ServiceOrchestrator, TextDoc, opea_microservices, register_microservice
+from comps import EmbedDoc, ServiceOrchestrator, TextDoc, opea_microservices, register_microservice
 from comps.cores.mega.constants import ServiceType
 from comps.cores.proto.docarray import RerankerParms, RetrieverParms
 
@@ -45,8 +46,12 @@ class TestServiceOrchestratorParams(unittest.IsolatedAsyncioTestCase):
     def setUp(self):
         self.s1 = opea_microservices["s1"]
         self.s2 = opea_microservices["s2"]
-        self.s1.start()
-        self.s2.start()
+
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process2 = multiprocessing.Process(target=self.s2.start, daemon=False, name="s2")
+
+        self.process1.start()
+        self.process2.start()
 
         ServiceOrchestrator.align_inputs = align_inputs
         ServiceOrchestrator.align_outputs = align_outputs
@@ -54,12 +59,12 @@ def setUp(self):
 
         self.service_builder.add(opea_microservices["s1"]).add(opea_microservices["s2"])
         self.service_builder.flow_to(self.s1, self.s2)
-        self.gateway = Gateway(self.service_builder, port=9898)
 
     def tearDown(self):
         self.s1.stop()
         self.s2.stop()
-        self.gateway.stop()
+        self.process1.terminate()
+        self.process2.terminate()
 
     async def test_retriever_schedule(self):
         result_dict, _ = await self.service_builder.schedule(
diff --git a/tests/cores/mega/test_service_orchestrator_with_videoqnagateway.py b/tests/cores/mega/test_service_orchestrator_with_videoqnagateway.py
deleted file mode 100644
index 4905120fbb..0000000000
--- a/tests/cores/mega/test_service_orchestrator_with_videoqnagateway.py
+++ /dev/null
@@ -1,73 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-import json
-import unittest
-
-from fastapi.responses import StreamingResponse
-
-from comps import ServiceOrchestrator, ServiceType, TextDoc, VideoQnAGateway, opea_microservices, register_microservice
-from comps.cores.proto.docarray import LLMParams
-
-
-@register_microservice(name="s1", host="0.0.0.0", port=8083, endpoint="/v1/add")
-async def s1_add(request: TextDoc) -> TextDoc:
-    req = request.model_dump_json()
-    req_dict = json.loads(req)
-    text = req_dict["text"]
-    text += "opea "
-    return {"text": text}
-
-
-@register_microservice(name="s2", host="0.0.0.0", port=8084, endpoint="/v1/add", service_type=ServiceType.LVM)
-async def s2_add(request: TextDoc) -> TextDoc:
-    req = request.model_dump_json()
-    req_dict = json.loads(req)
-    text = req_dict["text"]
-
-    def streamer(text):
-        yield f"{text}".encode("utf-8")
-        for i in range(3):
-            yield "project!".encode("utf-8")
-
-    return StreamingResponse(streamer(text), media_type="text/event-stream")
-
-
-class TestServiceOrchestrator(unittest.IsolatedAsyncioTestCase):
-    def setUp(self):
-        self.s1 = opea_microservices["s1"]
-        self.s2 = opea_microservices["s2"]
-        self.s1.start()
-        self.s2.start()
-
-        self.service_builder = ServiceOrchestrator()
-
-        self.service_builder.add(opea_microservices["s1"]).add(opea_microservices["s2"])
-        self.service_builder.flow_to(self.s1, self.s2)
-        self.gateway = VideoQnAGateway(self.service_builder, port=9898)
-
-    def tearDown(self):
-        self.s1.stop()
-        self.s2.stop()
-        self.gateway.stop()
-
-    async def test_schedule(self):
-        result_dict, _ = await self.service_builder.schedule(
-            initial_inputs={"text": "hello, "}, llm_parameters=LLMParams(streaming=True)
-        )
-        streaming_response = result_dict[self.s2.name]
-
-        if isinstance(streaming_response, StreamingResponse):
-            content = b""
-            async for chunk in streaming_response.body_iterator:
-                content += chunk
-            final_text = content.decode("utf-8")
-
-        print("Streamed content from s2: ", final_text)
-
-        expected_result = "hello, opea project!project!project!"
-        self.assertEqual(final_text, expected_result)
-
-
-if __name__ == "__main__":
-    unittest.main()
diff --git a/tests/cores/mega/test_service_orchestrator_with_yaml.py b/tests/cores/mega/test_service_orchestrator_with_yaml.py
index 3a3c6509d3..9da5a77919 100644
--- a/tests/cores/mega/test_service_orchestrator_with_yaml.py
+++ b/tests/cores/mega/test_service_orchestrator_with_yaml.py
@@ -2,6 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import json
+import multiprocessing
 import unittest
 
 from comps import ServiceOrchestratorWithYaml, TextDoc, opea_microservices, register_microservice
@@ -29,12 +30,17 @@ class TestYAMLOrchestrator(unittest.IsolatedAsyncioTestCase):
     def setUp(self) -> None:
         self.s1 = opea_microservices["s1"]
         self.s2 = opea_microservices["s2"]
-        self.s1.start()
-        self.s2.start()
+
+        self.process1 = multiprocessing.Process(target=self.s1.start, daemon=False, name="s1")
+        self.process2 = multiprocessing.Process(target=self.s2.start, daemon=False, name="s2")
+        self.process1.start()
+        self.process2.start()
 
     def tearDown(self):
         self.s1.stop()
         self.s2.stop()
+        self.process1.terminate()
+        self.process2.terminate()
 
     async def test_schedule(self):
         service_builder = ServiceOrchestratorWithYaml(yaml_file_path="megaservice.yaml")
diff --git a/tests/embeddings/test_embeddings_tei_langchain.sh b/tests/embeddings/test_embeddings_tei_langchain.sh
index df2642cf12..7c58deadd3 100644
--- a/tests/embeddings/test_embeddings_tei_langchain.sh
+++ b/tests/embeddings/test_embeddings_tei_langchain.sh
@@ -24,7 +24,7 @@ function start_service() {
     model="BAAI/bge-base-en-v1.5"
     unset http_proxy
     docker run -d --name="test-comps-embedding-tei-endpoint" -p $tei_endpoint:80 -v ./data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $model
-    export TEI_EMBEDDING_ENDPOINT="http://${ip_address}:${tei_endpoint}"
+    export TEI_EMBEDDING_ENDPOINT="http://${ip_address}:${tei_endpoint}/v1/embeddings"
     tei_service_port=5002
     docker run -d --name="test-comps-embedding-tei-server" -e LOGFLAG=True -e http_proxy=$http_proxy -e https_proxy=$https_proxy -p ${tei_service_port}:6000 --ipc=host -e TEI_EMBEDDING_ENDPOINT=$TEI_EMBEDDING_ENDPOINT  opea/embedding-tei:comps
     sleep 3m
diff --git a/tests/llms/test_llms_text-generation_vllm_langchain_on_intel_hpu.sh b/tests/llms/test_llms_text-generation_vllm_langchain_on_intel_hpu.sh
index 6b8e468f85..c83799128c 100644
--- a/tests/llms/test_llms_text-generation_vllm_langchain_on_intel_hpu.sh
+++ b/tests/llms/test_llms_text-generation_vllm_langchain_on_intel_hpu.sh
@@ -44,6 +44,7 @@ function start_service() {
         -p $port_number:80 \
         -e HABANA_VISIBLE_DEVICES=all \
         -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
+        -e VLLM_SKIP_WARMUP=true \
         --cap-add=sys_nice \
         --ipc=host \
         -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} \
@@ -62,7 +63,7 @@ function start_service() {
 
     # check whether vllm ray is fully ready
     n=0
-    until [[ "$n" -ge 160 ]] || [[ $ready == true ]]; do
+    until [[ "$n" -ge 70 ]] || [[ $ready == true ]]; do
         docker logs test-comps-vllm-service > ${WORKPATH}/tests/test-comps-vllm-service.log
         n=$((n+1))
         if grep -q throughput ${WORKPATH}/tests/test-comps-vllm-service.log; then
@@ -90,9 +91,23 @@ function validate_microservice() {
         docker logs test-comps-vllm-microservice
         exit 1
     fi
+
+    result=$(http_proxy="" curl http://${ip_address}:5030/v1/chat/completions \
+        -X POST \
+        -d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17, "stream":false}' \
+        -H 'Content-Type: application/json')
+    if [[ $result == *"content"* ]]; then
+        echo "Result correct."
+    else
+        echo "Result wrong. Received was $result"
+        docker logs test-comps-vllm-service
+        docker logs test-comps-vllm-microservice
+        exit 1
+    fi
+
     result=$(http_proxy="" curl http://${ip_address}:5030/v1/chat/completions \
         -X POST \
-        -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
+        -d '{"model": "Intel/neural-chat-7b-v3-3", "messages": "What is Deep Learning?", "max_tokens":17, "stream":false}' \
         -H 'Content-Type: application/json')
     if [[ $result == *"text"* ]]; then
         echo "Result correct."
diff --git a/tests/test-agent-langchain-assistantsapi.log b/tests/test-agent-langchain-assistantsapi.log
deleted file mode 100644
index c39a97e132..0000000000
--- a/tests/test-agent-langchain-assistantsapi.log
+++ /dev/null
@@ -1,27 +0,0 @@
-env_config:  ['--model', 'meta-llama/Meta-Llama-3.1-70B-Instruct', '--recursion_limit', '15', '--max_new_tokens', '4096', '--temperature', '0.01']
-==========sys_args==========:
- Namespace(agent_name='OPEA_Default_Agent', custom_prompt=None, db_name=None, db_path=None, debug=False, hints_file=None, llm_endpoint_url='http://localhost:8080', llm_engine='tgi', max_new_tokens=4096, model='meta-llama/Meta-Llama-3.1-70B-Instruct', port=9090, recursion_limit=15, repetition_penalty=1.03, require_human_feedback=False, return_full_text=False, role_description='LLM enhanced agent', strategy='react_langchain', streaming=True, temperature=0.01, timeout=60, tools='tools/custom_tools.yaml', top_k=10, top_p=0.95, use_hints=False, with_memory=False, with_store=False)
-test args: Namespace(agent_name='OPEA_Default_Agent', assistants_api_test=True, custom_prompt=None, db_name=None, db_path=None, debug=False, endpoint_test=False, ext_port='9095', filedir='./', filename='query.csv', hints_file=None, ip_addr='10.7.4.57', llm_endpoint_url='http://localhost:8080', llm_engine='tgi', local_test=False, max_new_tokens=4096, model='meta-llama/Meta-Llama-3.1-70B-Instruct', output='output.csv', port=9090, q=0, query='What is Intel OPEA project?', recursion_limit=15, repetition_penalty=1.03, require_human_feedback=False, return_full_text=False, role_description='LLM enhanced agent', strategy='react_langchain', streaming=True, temperature=0.01, timeout=60, tools='tools/custom_tools.yaml', top_k=10, top_p=0.95, use_hints=False, ut=False, with_memory=False, with_store=False)
-send request to http://10.7.4.57:9095/v1/assistants, data is {}
-{'id': 'assistant_ReActAgentwithLangchain_86106da1-87e8-4424-a4e4-6b2aa131d29a', 'object': 'assistant', 'created_at': 1733791024, 'name': None, 'description': None, 'model': 'Intel/neural-chat-7b-v3-3', 'instructions': None, 'tools': None}
-Created Assistant Id:  assistant_ReActAgentwithLangchain_86106da1-87e8-4424-a4e4-6b2aa131d29a
-send request to http://10.7.4.57:9095/v1/threads, data is {}
-{'id': 'thread_2fa8fae7-6cfe-4b35-b2e0-eb214a80a81b', 'object': 'thread', 'created_at': 1733791024}
-Created Thread Id:  thread_2fa8fae7-6cfe-4b35-b2e0-eb214a80a81b
-send request to http://10.7.4.57:9095/v1/threads/thread_2fa8fae7-6cfe-4b35-b2e0-eb214a80a81b/messages, data is {"role": "user", "content": "What is Intel OPEA project?"}
-{'id': 'msg_c7ab1fa5-30bb-45c8-950b-abdd4b0f9e1d', 'object': 'thread.message', 'created_at': 1733791024, 'thread_id': 'thread_2fa8fae7-6cfe-4b35-b2e0-eb214a80a81b', 'role': 'user', 'status': None, 'content': [{'type': 'text', 'text': 'What is Intel OPEA project?'}], 'assistant_id': None, 'run_id': None, 'attachments': None}
-You may cancel the running process with cmdline
-curl http://10.7.4.57:9095/v1/threads/thread_2fa8fae7-6cfe-4b35-b2e0-eb214a80a81b/runs/cancel -X POST -H 'Content-Type: application/json'
-send request to http://10.7.4.57:9095/v1/threads/thread_2fa8fae7-6cfe-4b35-b2e0-eb214a80a81b/runs, data is {"assistant_id": "assistant_ReActAgentwithLangchain_86106da1-87e8-4424-a4e4-6b2aa131d29a"}
-Calling Tool: `search_knowledge_base` with input `Intel OPEA project
-`
-
-Tool Result: `
-    The Linux Foundation AI & Data announced the Open Platform for Enterprise AI (OPEA) as its latest Sandbox Project.
-    OPEA aims to accelerate secure, cost-effective generative AI (GenAI) deployments for businesses by driving interoperability across a diverse and heterogeneous ecosystem, starting with retrieval-augmented generation (RAG).
-    `
-
-data: 'The Intel OPEA project, also known as the Open Platform for Enterprise AI, is a Linux Foundation AI & Data Sandbox Project that aims to accelerate secure and cost-effective generative AI deployments for businesses by driving interoperability across a diverse ecosystem, starting with retrieval-augmented generation (RAG).'
-
-data: [DONE]
-
diff --git a/tests/tgi-gaudi-service.log b/tests/tgi-gaudi-service.log
deleted file mode 100644
index f6582b98fe..0000000000
--- a/tests/tgi-gaudi-service.log
+++ /dev/null
@@ -1,343 +0,0 @@
-[2m2024-09-25T18:38:21.777789Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Args {
-    model_id: "meta-llama/Meta-Llama-3.1-70B-Instruct",
-    revision: None,
-    validation_workers: 2,
-    sharded: Some(
-        true,
-    ),
-    num_shard: Some(
-        4,
-    ),
-    quantize: None,
-    speculate: None,
-    dtype: Some(
-        BFloat16,
-    ),
-    trust_remote_code: false,
-    max_concurrent_requests: 128,
-    max_best_of: 2,
-    max_stop_sequences: 4,
-    max_top_n_tokens: 5,
-    max_input_tokens: Some(
-        4096,
-    ),
-    max_input_length: None,
-    max_total_tokens: Some(
-        8192,
-    ),
-    waiting_served_ratio: 0.3,
-    max_batch_prefill_tokens: None,
-    max_batch_total_tokens: None,
-    max_waiting_tokens: 20,
-    max_batch_size: None,
-    cuda_graphs: None,
-    hostname: "c84c01d5ea43",
-    port: 80,
-    shard_uds_path: "/tmp/text-generation-server",
-    master_addr: "localhost",
-    master_port: 29500,
-    huggingface_hub_cache: Some(
-        "/data",
-    ),
-    weights_cache_override: None,
-    disable_custom_kernels: false,
-    cuda_memory_fraction: 1.0,
-    rope_scaling: None,
-    rope_factor: None,
-    json_output: false,
-    otlp_endpoint: None,
-    cors_allow_origin: [],
-    watermark_gamma: None,
-    watermark_delta: None,
-    ngrok: false,
-    ngrok_authtoken: None,
-    ngrok_edge: None,
-    tokenizer_config_path: None,
-    disable_grammar_support: false,
-    env: false,
-    max_client_batch_size: 4,
-}
-[2m2024-09-25T18:38:21.777937Z[0m [32m INFO[0m [2mhf_hub[0m[2m:[0m Token file not found "/root/.cache/huggingface/token"    
-[2m2024-09-25T18:38:49.912211Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Default `max_batch_prefill_tokens` to 4146
-[2m2024-09-25T18:38:49.912238Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Using default cuda graphs [1, 2, 4, 8, 16, 32]
-[2m2024-09-25T18:38:49.912246Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Sharding model on 4 processes
-[2m2024-09-25T18:38:49.912444Z[0m [32m INFO[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Starting download process.
-[2m2024-09-25T18:38:53.522093Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Files are already present on the host. Skipping download.
-[2m2024-09-25T18:38:53.918677Z[0m [32m INFO[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Successfully downloaded weights.
-[2m2024-09-25T18:38:53.919058Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:38:59.690613Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m CLI SHARDED = True DTYPE = bfloat16
-[2m2024-09-25T18:38:59.690660Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m CLI SHARDED = 4
-[2m2024-09-25T18:38:59.690737Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 4 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server
-[2m2024-09-25T18:39:03.930109Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:39:13.938953Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:39:23.962697Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:39:33.977358Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:39:43.999526Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:39:54.020195Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:40:03.832775Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard ready in 69.911394193s [2m[3mrank[0m[2m=[0m0[0m
-[2m2024-09-25T18:40:05.899860Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Starting Webserver
-[2m2024-09-25T18:40:05.931919Z[0m [32m INFO[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m217:[0m Using the Hugging Face API
-[2m2024-09-25T18:40:05.931972Z[0m [32m INFO[0m [2mhf_hub[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs[0m[2m:[0m[2m55:[0m Token file not found "/root/.cache/huggingface/token"    
-[2m2024-09-25T18:40:06.394450Z[0m [32m INFO[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m516:[0m Serving revision 945c8663693130f8be2ee66210e062158b2a9693 of model meta-llama/Llama-3.1-70B-Instruct
-[2m2024-09-25T18:40:06.609777Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|begin_of_text|>' was expected to have ID '128000' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609792Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|end_of_text|>' was expected to have ID '128001' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609794Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_0|>' was expected to have ID '128002' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609796Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_1|>' was expected to have ID '128003' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609798Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|finetune_right_pad_id|>' was expected to have ID '128004' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609799Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_2|>' was expected to have ID '128005' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609800Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|start_header_id|>' was expected to have ID '128006' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609802Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|end_header_id|>' was expected to have ID '128007' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609803Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|eom_id|>' was expected to have ID '128008' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609805Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|eot_id|>' was expected to have ID '128009' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609806Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|python_tag|>' was expected to have ID '128010' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609807Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_3|>' was expected to have ID '128011' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609810Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_4|>' was expected to have ID '128012' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609812Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_5|>' was expected to have ID '128013' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609813Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_6|>' was expected to have ID '128014' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609814Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_7|>' was expected to have ID '128015' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609816Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_8|>' was expected to have ID '128016' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609817Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_9|>' was expected to have ID '128017' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609818Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_10|>' was expected to have ID '128018' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609820Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_11|>' was expected to have ID '128019' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609821Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_12|>' was expected to have ID '128020' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609823Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_13|>' was expected to have ID '128021' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609824Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_14|>' was expected to have ID '128022' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609826Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_15|>' was expected to have ID '128023' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609828Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_16|>' was expected to have ID '128024' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609829Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_17|>' was expected to have ID '128025' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609830Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_18|>' was expected to have ID '128026' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609832Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_19|>' was expected to have ID '128027' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609833Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_20|>' was expected to have ID '128028' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609834Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_21|>' was expected to have ID '128029' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609836Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_22|>' was expected to have ID '128030' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609837Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_23|>' was expected to have ID '128031' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609838Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_24|>' was expected to have ID '128032' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609840Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_25|>' was expected to have ID '128033' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609842Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_26|>' was expected to have ID '128034' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609843Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_27|>' was expected to have ID '128035' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609845Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_28|>' was expected to have ID '128036' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609846Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_29|>' was expected to have ID '128037' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609847Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_30|>' was expected to have ID '128038' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609849Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_31|>' was expected to have ID '128039' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609850Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_32|>' was expected to have ID '128040' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609851Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_33|>' was expected to have ID '128041' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609853Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_34|>' was expected to have ID '128042' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609854Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_35|>' was expected to have ID '128043' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609855Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_36|>' was expected to have ID '128044' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609857Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_37|>' was expected to have ID '128045' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609859Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_38|>' was expected to have ID '128046' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609860Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_39|>' was expected to have ID '128047' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609861Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_40|>' was expected to have ID '128048' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609863Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_41|>' was expected to have ID '128049' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609864Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_42|>' was expected to have ID '128050' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609866Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_43|>' was expected to have ID '128051' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609867Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_44|>' was expected to have ID '128052' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609868Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_45|>' was expected to have ID '128053' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609870Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_46|>' was expected to have ID '128054' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609871Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_47|>' was expected to have ID '128055' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609873Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_48|>' was expected to have ID '128056' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609874Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_49|>' was expected to have ID '128057' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609876Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_50|>' was expected to have ID '128058' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609877Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_51|>' was expected to have ID '128059' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609879Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_52|>' was expected to have ID '128060' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609880Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_53|>' was expected to have ID '128061' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609881Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_54|>' was expected to have ID '128062' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609883Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_55|>' was expected to have ID '128063' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609884Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_56|>' was expected to have ID '128064' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609885Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_57|>' was expected to have ID '128065' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609887Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_58|>' was expected to have ID '128066' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609889Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_59|>' was expected to have ID '128067' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609890Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_60|>' was expected to have ID '128068' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609891Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_61|>' was expected to have ID '128069' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609893Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_62|>' was expected to have ID '128070' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609894Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_63|>' was expected to have ID '128071' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609896Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_64|>' was expected to have ID '128072' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609897Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_65|>' was expected to have ID '128073' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609898Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_66|>' was expected to have ID '128074' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609900Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_67|>' was expected to have ID '128075' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609901Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_68|>' was expected to have ID '128076' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609902Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_69|>' was expected to have ID '128077' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609904Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_70|>' was expected to have ID '128078' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609906Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_71|>' was expected to have ID '128079' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609907Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_72|>' was expected to have ID '128080' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609909Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_73|>' was expected to have ID '128081' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609910Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_74|>' was expected to have ID '128082' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609911Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_75|>' was expected to have ID '128083' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609913Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_76|>' was expected to have ID '128084' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609914Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_77|>' was expected to have ID '128085' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609915Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_78|>' was expected to have ID '128086' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609917Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_79|>' was expected to have ID '128087' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609918Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_80|>' was expected to have ID '128088' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609920Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_81|>' was expected to have ID '128089' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609921Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_82|>' was expected to have ID '128090' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609923Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_83|>' was expected to have ID '128091' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609924Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_84|>' was expected to have ID '128092' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609926Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_85|>' was expected to have ID '128093' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609927Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_86|>' was expected to have ID '128094' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609928Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_87|>' was expected to have ID '128095' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609930Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_88|>' was expected to have ID '128096' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609931Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_89|>' was expected to have ID '128097' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609932Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_90|>' was expected to have ID '128098' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609934Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_91|>' was expected to have ID '128099' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609936Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_92|>' was expected to have ID '128100' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609937Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_93|>' was expected to have ID '128101' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609938Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_94|>' was expected to have ID '128102' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609940Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_95|>' was expected to have ID '128103' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609941Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_96|>' was expected to have ID '128104' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609962Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_97|>' was expected to have ID '128105' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609963Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_98|>' was expected to have ID '128106' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609965Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_99|>' was expected to have ID '128107' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609966Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_100|>' was expected to have ID '128108' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609967Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_101|>' was expected to have ID '128109' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609969Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_102|>' was expected to have ID '128110' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609970Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_103|>' was expected to have ID '128111' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609972Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_104|>' was expected to have ID '128112' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609973Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_105|>' was expected to have ID '128113' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609974Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_106|>' was expected to have ID '128114' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609976Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_107|>' was expected to have ID '128115' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609977Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_108|>' was expected to have ID '128116' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609978Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_109|>' was expected to have ID '128117' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609980Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_110|>' was expected to have ID '128118' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609981Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_111|>' was expected to have ID '128119' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609982Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_112|>' was expected to have ID '128120' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609984Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_113|>' was expected to have ID '128121' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609986Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_114|>' was expected to have ID '128122' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609987Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_115|>' was expected to have ID '128123' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609989Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_116|>' was expected to have ID '128124' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609990Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_117|>' was expected to have ID '128125' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609991Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_118|>' was expected to have ID '128126' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609993Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_119|>' was expected to have ID '128127' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609994Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_120|>' was expected to have ID '128128' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609995Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_121|>' was expected to have ID '128129' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609997Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_122|>' was expected to have ID '128130' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609998Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_123|>' was expected to have ID '128131' but was given ID 'None'    
-[2m2024-09-25T18:40:06.609999Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_124|>' was expected to have ID '128132' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610002Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_125|>' was expected to have ID '128133' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610003Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_126|>' was expected to have ID '128134' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610004Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_127|>' was expected to have ID '128135' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610006Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_128|>' was expected to have ID '128136' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610007Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_129|>' was expected to have ID '128137' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610008Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_130|>' was expected to have ID '128138' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610010Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_131|>' was expected to have ID '128139' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610011Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_132|>' was expected to have ID '128140' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610012Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_133|>' was expected to have ID '128141' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610014Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_134|>' was expected to have ID '128142' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610015Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_135|>' was expected to have ID '128143' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610017Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_136|>' was expected to have ID '128144' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610019Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_137|>' was expected to have ID '128145' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610020Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_138|>' was expected to have ID '128146' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610021Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_139|>' was expected to have ID '128147' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610023Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_140|>' was expected to have ID '128148' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610024Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_141|>' was expected to have ID '128149' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610025Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_142|>' was expected to have ID '128150' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610027Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_143|>' was expected to have ID '128151' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610028Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_144|>' was expected to have ID '128152' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610029Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_145|>' was expected to have ID '128153' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610031Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_146|>' was expected to have ID '128154' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610033Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_147|>' was expected to have ID '128155' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610034Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_148|>' was expected to have ID '128156' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610036Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_149|>' was expected to have ID '128157' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610037Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_150|>' was expected to have ID '128158' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610038Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_151|>' was expected to have ID '128159' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610040Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_152|>' was expected to have ID '128160' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610041Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_153|>' was expected to have ID '128161' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610042Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_154|>' was expected to have ID '128162' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610043Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_155|>' was expected to have ID '128163' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610045Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_156|>' was expected to have ID '128164' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610046Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_157|>' was expected to have ID '128165' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610048Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_158|>' was expected to have ID '128166' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610050Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_159|>' was expected to have ID '128167' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610051Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_160|>' was expected to have ID '128168' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610052Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_161|>' was expected to have ID '128169' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610054Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_162|>' was expected to have ID '128170' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610057Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_163|>' was expected to have ID '128171' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610059Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_164|>' was expected to have ID '128172' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610060Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_165|>' was expected to have ID '128173' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610061Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_166|>' was expected to have ID '128174' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610063Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_167|>' was expected to have ID '128175' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610064Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_168|>' was expected to have ID '128176' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610066Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_169|>' was expected to have ID '128177' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610067Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_170|>' was expected to have ID '128178' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610068Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_171|>' was expected to have ID '128179' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610070Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_172|>' was expected to have ID '128180' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610071Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_173|>' was expected to have ID '128181' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610072Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_174|>' was expected to have ID '128182' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610074Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_175|>' was expected to have ID '128183' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610075Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_176|>' was expected to have ID '128184' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610076Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_177|>' was expected to have ID '128185' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610078Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_178|>' was expected to have ID '128186' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610079Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_179|>' was expected to have ID '128187' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610081Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_180|>' was expected to have ID '128188' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610082Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_181|>' was expected to have ID '128189' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610084Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_182|>' was expected to have ID '128190' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610085Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_183|>' was expected to have ID '128191' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610086Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_184|>' was expected to have ID '128192' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610088Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_185|>' was expected to have ID '128193' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610089Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_186|>' was expected to have ID '128194' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610090Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_187|>' was expected to have ID '128195' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610092Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_188|>' was expected to have ID '128196' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610093Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_189|>' was expected to have ID '128197' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610094Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_190|>' was expected to have ID '128198' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610096Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_191|>' was expected to have ID '128199' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610098Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_192|>' was expected to have ID '128200' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610099Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_193|>' was expected to have ID '128201' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610100Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_194|>' was expected to have ID '128202' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610102Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_195|>' was expected to have ID '128203' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610103Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_196|>' was expected to have ID '128204' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610104Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_197|>' was expected to have ID '128205' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610106Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_198|>' was expected to have ID '128206' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610107Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_199|>' was expected to have ID '128207' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610108Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_200|>' was expected to have ID '128208' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610110Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_201|>' was expected to have ID '128209' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610114Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_202|>' was expected to have ID '128210' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610115Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_203|>' was expected to have ID '128211' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610117Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_204|>' was expected to have ID '128212' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610118Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_205|>' was expected to have ID '128213' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610119Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_206|>' was expected to have ID '128214' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610121Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_207|>' was expected to have ID '128215' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610122Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_208|>' was expected to have ID '128216' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610123Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_209|>' was expected to have ID '128217' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610125Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_210|>' was expected to have ID '128218' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610126Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_211|>' was expected to have ID '128219' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610127Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_212|>' was expected to have ID '128220' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610129Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_213|>' was expected to have ID '128221' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610131Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_214|>' was expected to have ID '128222' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610132Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_215|>' was expected to have ID '128223' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610133Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_216|>' was expected to have ID '128224' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610135Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_217|>' was expected to have ID '128225' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610136Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_218|>' was expected to have ID '128226' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610138Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_219|>' was expected to have ID '128227' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610139Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_220|>' was expected to have ID '128228' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610140Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_221|>' was expected to have ID '128229' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610142Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_222|>' was expected to have ID '128230' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610143Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_223|>' was expected to have ID '128231' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610145Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_224|>' was expected to have ID '128232' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610146Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_225|>' was expected to have ID '128233' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610148Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_226|>' was expected to have ID '128234' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610149Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_227|>' was expected to have ID '128235' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610150Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_228|>' was expected to have ID '128236' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610152Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_229|>' was expected to have ID '128237' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610153Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_230|>' was expected to have ID '128238' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610155Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_231|>' was expected to have ID '128239' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610156Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_232|>' was expected to have ID '128240' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610157Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_233|>' was expected to have ID '128241' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610159Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_234|>' was expected to have ID '128242' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610211Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_235|>' was expected to have ID '128243' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610214Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_236|>' was expected to have ID '128244' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610215Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_237|>' was expected to have ID '128245' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610217Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_238|>' was expected to have ID '128246' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610218Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_239|>' was expected to have ID '128247' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610220Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_240|>' was expected to have ID '128248' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610221Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_241|>' was expected to have ID '128249' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610222Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_242|>' was expected to have ID '128250' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610224Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_243|>' was expected to have ID '128251' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610225Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_244|>' was expected to have ID '128252' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610226Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_245|>' was expected to have ID '128253' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610229Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_246|>' was expected to have ID '128254' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610230Z[0m [33m WARN[0m [2mtokenizers::tokenizer::serialization[0m[2m:[0m [2m/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs[0m[2m:[0m[2m159:[0m Warning: Token '<|reserved_special_token_247|>' was expected to have ID '128255' but was given ID 'None'    
-[2m2024-09-25T18:40:06.610938Z[0m [32m INFO[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m317:[0m Using config Some(Llama)
-[2m2024-09-25T18:40:06.627921Z[0m [32m INFO[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m345:[0m Warming up model
-[2m2024-09-25T18:40:06.627943Z[0m [33m WARN[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m361:[0m Model does not support automatic max batch total tokens
-[2m2024-09-25T18:40:06.627946Z[0m [32m INFO[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m383:[0m Setting max batch total tokens to 16000
-[2m2024-09-25T18:40:06.627948Z[0m [32m INFO[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m384:[0m Connected
-[2m2024-09-25T18:40:06.627951Z[0m [33m WARN[0m [2mtext_generation_router[0m[2m:[0m [2mrouter/src/main.rs[0m[2m:[0m[2m398:[0m Invalid hostname, defaulting to 0.0.0.0
diff --git a/tests/vllm-gaudi-service.log b/tests/vllm-gaudi-service.log
deleted file mode 100644
index 4230ffbe97..0000000000
--- a/tests/vllm-gaudi-service.log
+++ /dev/null
@@ -1,272 +0,0 @@
-INFO 12-09 23:52:00 api_server.py:625] vLLM API server version 0.6.3.dev1405+gdb686905
-INFO 12-09 23:52:00 api_server.py:626] args: Namespace(host='0.0.0.0', port=80, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='meta-llama/Meta-Llama-3.1-70B-Instruct', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', weights_load_device=None, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='xgrammar', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=4, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=128, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=False, use_padding_aware_scheduling=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_num_prefill_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=16384, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False)
-INFO 12-09 23:52:00 __init__.py:60] No plugins found.
-INFO 12-09 23:52:00 api_server.py:178] Multiprocessing frontend to use ipc:///tmp/b03e0fed-aee3-48aa-ab57-8675627b48e2 for IPC Path.
-INFO 12-09 23:52:00 api_server.py:197] Started engine process with PID 77
-INFO 12-09 23:52:04 __init__.py:60] No plugins found.
-INFO 12-09 23:52:07 config.py:403] This model supports multiple tasks: {'embedding', 'generate'}. Defaulting to 'generate'.
-INFO 12-09 23:52:07 config.py:1042] Defaulting to use mp for distributed inference
-WARNING 12-09 23:52:07 arg_utils.py:1104] The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
-WARNING 12-09 23:52:07 arg_utils.py:1160] [DEPRECATED] Block manager v1 has been removed, and setting --use-v2-block-manager to True or False has no effect on vLLM behavior. Please remove --use-v2-block-manager in your engine argument. If your use case is not supported by SelfAttnBlockSpaceManager (i.e. block manager v2), please file an issue with detailed information.
-INFO 12-09 23:52:12 config.py:403] This model supports multiple tasks: {'generate', 'embedding'}. Defaulting to 'generate'.
-INFO 12-09 23:52:12 config.py:1042] Defaulting to use mp for distributed inference
-WARNING 12-09 23:52:12 arg_utils.py:1104] The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
-WARNING 12-09 23:52:12 arg_utils.py:1160] [DEPRECATED] Block manager v1 has been removed, and setting --use-v2-block-manager to True or False has no effect on vLLM behavior. Please remove --use-v2-block-manager in your engine argument. If your use case is not supported by SelfAttnBlockSpaceManager (i.e. block manager v2), please file an issue with detailed information.
-/usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
-  return isinstance(object, types.FunctionType)
-INFO 12-09 23:52:12 llm_engine.py:250] Initializing an LLM engine (v0.6.3.dev1405+gdb686905) with config: model='meta-llama/Meta-Llama-3.1-70B-Instruct', speculative_config=None, tokenizer='meta-llama/Meta-Llama-3.1-70B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, weights_load_device=hpu, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=hpu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=meta-llama/Meta-Llama-3.1-70B-Instruct, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=True, mm_processor_kwargs=None, pooler_config=None,compilation_config=CompilationConfig(level=0, backend='', custom_ops=[], splitting_ops=['vllm.unified_attention', 'vllm.unified_attention_with_output'], use_inductor=True, inductor_specialize_for_cudagraph_no_more_than=None, inductor_compile_sizes=[], inductor_compile_config={}, inductor_passes={}, use_cudagraph=False, cudagraph_num_of_warmups=0, cudagraph_capture_sizes=None, cudagraph_copy_inputs=False, pass_config=PassConfig(dump_graph_stages=[], dump_graph_dir=PosixPath('.'), enable_fusion=True, enable_reshape=True), compile_sizes=[], capture_sizes=[256, 248, 240, 232, 224, 216, 208, 200, 192, 184, 176, 168, 160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], enabled_custom_ops=Counter(), disabled_custom_ops=Counter(), compilation_time=0.0, static_forward_context={})
-WARNING 12-09 23:52:13 multiproc_gpu_executor.py:56] Reducing Torch parallelism from 80 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
-INFO 12-09 23:52:13 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
-Detected capabilities: [-cpu -gaudi +gaudi2 -gaudi3 -index_reduce]
-WARNING 12-09 23:52:16 utils.py:772] Pin memory is not supported on HPU.
-INFO 12-09 23:52:16 selector.py:159] Using HPUAttention backend.
-[1;36m(VllmWorkerProcess pid=358)[0;0m Detected capabilities: [-cpu -gaudi +gaudi2 -gaudi3 -index_reduce]
-[1;36m(VllmWorkerProcess pid=358)[0;0m WARNING 12-09 23:52:16 utils.py:772] Pin memory is not supported on HPU.
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:52:16 selector.py:159] Using HPUAttention backend.
-[1;36m(VllmWorkerProcess pid=357)[0;0m Detected capabilities: [-cpu -gaudi +gaudi2 -gaudi3 -index_reduce]
-[1;36m(VllmWorkerProcess pid=357)[0;0m WARNING 12-09 23:52:16 utils.py:772] Pin memory is not supported on HPU.
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:52:16 selector.py:159] Using HPUAttention backend.
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_PROMPT_BS_BUCKET_MIN=1 (default:1)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_PROMPT_BS_BUCKET_STEP=32 (default:32)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_PROMPT_BS_BUCKET_MAX=256 (default:256)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_DECODE_BS_BUCKET_MIN=1 (default:1)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_DECODE_BS_BUCKET_STEP=32 (default:32)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_DECODE_BS_BUCKET_MAX=256 (default:256)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_PROMPT_SEQ_BUCKET_MIN=128 (default:128)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_PROMPT_SEQ_BUCKET_STEP=128 (default:128)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_PROMPT_SEQ_BUCKET_MAX=1024 (default:1024)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_DECODE_BLOCK_BUCKET_MIN=128 (default:128)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_DECODE_BLOCK_BUCKET_STEP=128 (default:128)
-[1;36m(VllmWorkerProcess pid=358)[0;0m VLLM_DECODE_BLOCK_BUCKET_MAX=4096 (default:4096)
-[1;36m(VllmWorkerProcess pid=358)[0;0m Prompt bucket config (min, step, max_warmup) bs:[1, 32, 256], seq:[128, 128, 1024]
-[1;36m(VllmWorkerProcess pid=358)[0;0m Decode bucket config (min, step, max_warmup) bs:[1, 32, 256], block:[128, 128, 4096]
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:52:16 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_PROMPT_BS_BUCKET_MIN=1 (default:1)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_PROMPT_BS_BUCKET_STEP=32 (default:32)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_PROMPT_BS_BUCKET_MAX=256 (default:256)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_DECODE_BS_BUCKET_MIN=1 (default:1)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_DECODE_BS_BUCKET_STEP=32 (default:32)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_DECODE_BS_BUCKET_MAX=256 (default:256)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_PROMPT_SEQ_BUCKET_MIN=128 (default:128)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_PROMPT_SEQ_BUCKET_STEP=128 (default:128)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_PROMPT_SEQ_BUCKET_MAX=1024 (default:1024)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_DECODE_BLOCK_BUCKET_MIN=128 (default:128)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_DECODE_BLOCK_BUCKET_STEP=128 (default:128)
-[1;36m(VllmWorkerProcess pid=357)[0;0m VLLM_DECODE_BLOCK_BUCKET_MAX=4096 (default:4096)
-[1;36m(VllmWorkerProcess pid=357)[0;0m Prompt bucket config (min, step, max_warmup) bs:[1, 32, 256], seq:[128, 128, 1024]
-[1;36m(VllmWorkerProcess pid=357)[0;0m Decode bucket config (min, step, max_warmup) bs:[1, 32, 256], block:[128, 128, 4096]
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:52:16 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
-[1;36m(VllmWorkerProcess pid=359)[0;0m Detected capabilities: [-cpu -gaudi +gaudi2 -gaudi3 -index_reduce]
-[1;36m(VllmWorkerProcess pid=359)[0;0m WARNING 12-09 23:52:16 utils.py:772] Pin memory is not supported on HPU.
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:52:16 selector.py:159] Using HPUAttention backend.
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_PROMPT_BS_BUCKET_MIN=1 (default:1)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_PROMPT_BS_BUCKET_STEP=32 (default:32)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_PROMPT_BS_BUCKET_MAX=256 (default:256)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_DECODE_BS_BUCKET_MIN=1 (default:1)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_DECODE_BS_BUCKET_STEP=32 (default:32)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_DECODE_BS_BUCKET_MAX=256 (default:256)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_PROMPT_SEQ_BUCKET_MIN=128 (default:128)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_PROMPT_SEQ_BUCKET_STEP=128 (default:128)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_PROMPT_SEQ_BUCKET_MAX=1024 (default:1024)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_DECODE_BLOCK_BUCKET_MIN=128 (default:128)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_DECODE_BLOCK_BUCKET_STEP=128 (default:128)
-[1;36m(VllmWorkerProcess pid=359)[0;0m VLLM_DECODE_BLOCK_BUCKET_MAX=4096 (default:4096)
-[1;36m(VllmWorkerProcess pid=359)[0;0m Prompt bucket config (min, step, max_warmup) bs:[1, 32, 256], seq:[128, 128, 1024]
-[1;36m(VllmWorkerProcess pid=359)[0;0m Decode bucket config (min, step, max_warmup) bs:[1, 32, 256], block:[128, 128, 4096]
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:52:16 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
-============================= HABANA PT BRIDGE CONFIGURATION =========================== 
- PT_HPU_LAZY_MODE = 1
- PT_RECIPE_CACHE_PATH = 
- PT_CACHE_FOLDER_DELETE = 0
- PT_HPU_RECIPE_CACHE_CONFIG = 
- PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
- PT_HPU_LAZY_ACC_PAR_MODE = 1
- PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
- PT_HPU_EAGER_PIPELINE_ENABLE = 1
- PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
----------------------------: System Configuration :---------------------------
-Num CPU Cores : 160
-CPU RAM       : 1056374408 KB
-------------------------------------------------------------------------------
-============================= HABANA PT BRIDGE CONFIGURATION =========================== 
- PT_HPU_LAZY_MODE = 1
- PT_RECIPE_CACHE_PATH = 
- PT_CACHE_FOLDER_DELETE = 0
- PT_HPU_RECIPE_CACHE_CONFIG = 
- PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
- PT_HPU_LAZY_ACC_PAR_MODE = 1
- PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
- PT_HPU_EAGER_PIPELINE_ENABLE = 1
- PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
----------------------------: System Configuration :---------------------------
-Num CPU Cores : 160
-CPU RAM       : 1056374408 KB
-------------------------------------------------------------------------------
-============================= HABANA PT BRIDGE CONFIGURATION =========================== 
- PT_HPU_LAZY_MODE = 1
- PT_RECIPE_CACHE_PATH = 
- PT_CACHE_FOLDER_DELETE = 0
- PT_HPU_RECIPE_CACHE_CONFIG = 
- PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
- PT_HPU_LAZY_ACC_PAR_MODE = 1
- PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
- PT_HPU_EAGER_PIPELINE_ENABLE = 1
- PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
----------------------------: System Configuration :---------------------------
-Num CPU Cores : 160
-CPU RAM       : 1056374408 KB
-------------------------------------------------------------------------------
-============================= HABANA PT BRIDGE CONFIGURATION =========================== 
- PT_HPU_LAZY_MODE = 1
- PT_RECIPE_CACHE_PATH = 
- PT_CACHE_FOLDER_DELETE = 0
- PT_HPU_RECIPE_CACHE_CONFIG = 
- PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
- PT_HPU_LAZY_ACC_PAR_MODE = 1
- PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
- PT_HPU_EAGER_PIPELINE_ENABLE = 1
- PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
----------------------------: System Configuration :---------------------------
-Num CPU Cores : 160
-CPU RAM       : 1056374408 KB
-------------------------------------------------------------------------------
-VLLM_PROMPT_BS_BUCKET_MIN=1 (default:1)
-VLLM_PROMPT_BS_BUCKET_STEP=32 (default:32)
-VLLM_PROMPT_BS_BUCKET_MAX=256 (default:256)
-VLLM_DECODE_BS_BUCKET_MIN=1 (default:1)
-VLLM_DECODE_BS_BUCKET_STEP=32 (default:32)
-VLLM_DECODE_BS_BUCKET_MAX=256 (default:256)
-VLLM_PROMPT_SEQ_BUCKET_MIN=128 (default:128)
-VLLM_PROMPT_SEQ_BUCKET_STEP=128 (default:128)
-VLLM_PROMPT_SEQ_BUCKET_MAX=1024 (default:1024)
-VLLM_DECODE_BLOCK_BUCKET_MIN=128 (default:128)
-VLLM_DECODE_BLOCK_BUCKET_STEP=128 (default:128)
-VLLM_DECODE_BLOCK_BUCKET_MAX=4096 (default:4096)
-Prompt bucket config (min, step, max_warmup) bs:[1, 32, 256], seq:[128, 128, 1024]
-Decode bucket config (min, step, max_warmup) bs:[1, 32, 256], block:[128, 128, 4096]
-INFO 12-09 23:52:21 shm_broadcast.py:236] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1, 2, 3], buffer=<vllm.distributed.device_communicators.shm_broadcast.ShmRingBuffer object at 0x7f54d9e04fd0>, local_subscribe_port=42205, remote_subscribe_port=None)
-INFO 12-09 23:52:23 loader.py:368] Loading weights on hpu...
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:52:23 loader.py:368] Loading weights on hpu...
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:52:23 loader.py:368] Loading weights on hpu...
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:52:23 loader.py:368] Loading weights on hpu...
-INFO 12-09 23:52:23 weight_utils.py:243] Using model weights format ['*.safetensors']
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:52:23 weight_utils.py:243] Using model weights format ['*.safetensors']
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:52:23 weight_utils.py:243] Using model weights format ['*.safetensors']
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:52:23 weight_utils.py:243] Using model weights format ['*.safetensors']
-
-Loading safetensors checkpoint shards:   0% Completed | 0/30 [00:00<?, ?it/s]
-
-Loading safetensors checkpoint shards:   3% Completed | 1/30 [00:01<00:46,  1.60s/it]
-
-Loading safetensors checkpoint shards:   7% Completed | 2/30 [00:02<00:35,  1.28s/it]
-
-Loading safetensors checkpoint shards:  10% Completed | 3/30 [00:03<00:34,  1.29s/it]
-
-Loading safetensors checkpoint shards:  13% Completed | 4/30 [00:06<00:47,  1.83s/it]
-
-Loading safetensors checkpoint shards:  17% Completed | 5/30 [00:09<00:51,  2.04s/it]
-
-Loading safetensors checkpoint shards:  20% Completed | 6/30 [00:10<00:46,  1.95s/it]
-
-Loading safetensors checkpoint shards:  23% Completed | 7/30 [00:13<00:52,  2.28s/it]
-
-Loading safetensors checkpoint shards:  27% Completed | 8/30 [00:15<00:49,  2.26s/it]
-
-Loading safetensors checkpoint shards:  30% Completed | 9/30 [00:16<00:37,  1.79s/it]
-
-Loading safetensors checkpoint shards:  33% Completed | 10/30 [00:17<00:29,  1.47s/it]
-
-Loading safetensors checkpoint shards:  37% Completed | 11/30 [00:18<00:23,  1.26s/it]
-
-Loading safetensors checkpoint shards:  40% Completed | 12/30 [00:19<00:23,  1.28s/it]
-
-Loading safetensors checkpoint shards:  43% Completed | 13/30 [00:20<00:22,  1.30s/it]
-
-Loading safetensors checkpoint shards:  47% Completed | 14/30 [00:22<00:20,  1.26s/it]
-
-Loading safetensors checkpoint shards:  50% Completed | 15/30 [00:22<00:15,  1.05s/it]
-
-Loading safetensors checkpoint shards:  53% Completed | 16/30 [00:23<00:12,  1.11it/s]
-
-Loading safetensors checkpoint shards:  57% Completed | 17/30 [00:23<00:10,  1.24it/s]
-
-Loading safetensors checkpoint shards:  60% Completed | 18/30 [00:24<00:09,  1.30it/s]
-
-Loading safetensors checkpoint shards:  63% Completed | 19/30 [00:25<00:07,  1.40it/s]
-
-Loading safetensors checkpoint shards:  67% Completed | 20/30 [00:25<00:06,  1.43it/s]
-
-Loading safetensors checkpoint shards:  70% Completed | 21/30 [00:26<00:05,  1.57it/s]
-
-Loading safetensors checkpoint shards:  73% Completed | 22/30 [00:26<00:05,  1.57it/s]
-
-Loading safetensors checkpoint shards:  77% Completed | 23/30 [00:27<00:04,  1.57it/s]
-
-Loading safetensors checkpoint shards:  80% Completed | 24/30 [00:28<00:03,  1.54it/s]
-
-Loading safetensors checkpoint shards:  83% Completed | 25/30 [00:28<00:02,  1.69it/s]
-
-Loading safetensors checkpoint shards:  87% Completed | 26/30 [00:29<00:02,  1.66it/s]
-
-Loading safetensors checkpoint shards:  90% Completed | 27/30 [00:29<00:01,  1.73it/s]
-
-Loading safetensors checkpoint shards:  93% Completed | 28/30 [00:30<00:01,  1.82it/s]
-
-Loading safetensors checkpoint shards:  97% Completed | 29/30 [00:30<00:00,  1.91it/s]
-
-Loading safetensors checkpoint shards: 100% Completed | 30/30 [00:31<00:00,  1.92it/s]
-
-Loading safetensors checkpoint shards: 100% Completed | 30/30 [00:31<00:00,  1.04s/it]
-
-INFO 12-09 23:52:55 hpu_model_runner.py:668] Pre-loading model weights on hpu:0 took 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.121 GiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:52:55 hpu_model_runner.py:668] Pre-loading model weights on hpu:0 took 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.112 GiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:52:55 hpu_model_runner.py:668] Pre-loading model weights on hpu:0 took 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.118 GiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:52:55 hpu_model_runner.py:668] Pre-loading model weights on hpu:0 took 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.117 GiB of host memory (127.2 GiB/1007 GiB used)
-INFO 12-09 23:52:56 hpu_model_runner.py:742] Wrapping in HPU Graph took 0 B of device memory (32.95 GiB/94.62 GiB used) and -40 KiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:52:56 hpu_model_runner.py:742] Wrapping in HPU Graph took 0 B of device memory (32.95 GiB/94.62 GiB used) and -40 KiB of host memory (127.2 GiB/1007 GiB used)
-INFO 12-09 23:52:56 hpu_model_runner.py:746] Loading model weights took in total 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.125 GiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:52:56 hpu_model_runner.py:746] Loading model weights took in total 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.118 GiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:52:56 hpu_model_runner.py:742] Wrapping in HPU Graph took 0 B of device memory (32.95 GiB/94.62 GiB used) and -32 KiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:52:56 hpu_model_runner.py:742] Wrapping in HPU Graph took 0 B of device memory (32.95 GiB/94.62 GiB used) and -72 KiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:52:56 hpu_model_runner.py:746] Loading model weights took in total 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.12 GiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:52:56 hpu_model_runner.py:746] Loading model weights took in total 32.95 GiB of device memory (32.95 GiB/94.62 GiB used) and 4.119 GiB of host memory (127.2 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:53:56 hpu_worker.py:240] Model profiling run took 3.439 GiB of device memory (36.39 GiB/94.62 GiB used) and 4.119 GiB of host memory (131.3 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:53:56 hpu_worker.py:264] Free device memory: 58.23 GiB, 52.41 GiB usable (gpu_memory_utilization=0.9), 5.241 GiB reserved for HPUGraphs (VLLM_GRAPH_RESERVED_MEM=0.1), 47.17 GiB reserved for KV cache
-INFO 12-09 23:53:56 hpu_worker.py:240] Model profiling run took 3.439 GiB of device memory (36.39 GiB/94.62 GiB used) and 3.274 GiB of host memory (130.5 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:53:56 hpu_worker.py:240] Model profiling run took 3.439 GiB of device memory (36.39 GiB/94.62 GiB used) and 3.26 GiB of host memory (130.5 GiB/1007 GiB used)
-INFO 12-09 23:53:56 hpu_worker.py:264] Free device memory: 58.23 GiB, 52.41 GiB usable (gpu_memory_utilization=0.9), 5.241 GiB reserved for HPUGraphs (VLLM_GRAPH_RESERVED_MEM=0.1), 47.17 GiB reserved for KV cache
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:53:56 hpu_worker.py:264] Free device memory: 58.23 GiB, 52.41 GiB usable (gpu_memory_utilization=0.9), 5.241 GiB reserved for HPUGraphs (VLLM_GRAPH_RESERVED_MEM=0.1), 47.17 GiB reserved for KV cache
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:53:56 hpu_worker.py:240] Model profiling run took 3.439 GiB of device memory (36.39 GiB/94.62 GiB used) and 3.26 GiB of host memory (130.5 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:53:56 hpu_worker.py:264] Free device memory: 58.23 GiB, 52.41 GiB usable (gpu_memory_utilization=0.9), 5.241 GiB reserved for HPUGraphs (VLLM_GRAPH_RESERVED_MEM=0.1), 47.17 GiB reserved for KV cache
-INFO 12-09 23:53:56 distributed_gpu_executor.py:57] # GPU blocks: 4830, # CPU blocks: 409
-INFO 12-09 23:53:56 distributed_gpu_executor.py:61] Maximum concurrency for 131072 tokens per request: 4.72x
-INFO 12-09 23:53:58 hpu_worker.py:297] Initializing cache engine took 47.17 GiB of device memory (83.56 GiB/94.62 GiB used) and 16.76 GiB of host memory (147.2 GiB/1007 GiB used)
-INFO 12-09 23:53:58 hpu_model_runner.py:1617] Skipping warmup...
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:53:58 hpu_worker.py:297] Initializing cache engine took 47.17 GiB of device memory (83.56 GiB/94.62 GiB used) and 17.02 GiB of host memory (147.5 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=357)[0;0m INFO 12-09 23:53:58 hpu_model_runner.py:1617] Skipping warmup...
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:53:59 hpu_worker.py:297] Initializing cache engine took 47.17 GiB of device memory (83.56 GiB/94.62 GiB used) and 17.15 GiB of host memory (147.6 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=359)[0;0m INFO 12-09 23:53:59 hpu_model_runner.py:1617] Skipping warmup...
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:53:59 hpu_worker.py:297] Initializing cache engine took 47.17 GiB of device memory (83.56 GiB/94.62 GiB used) and 17.15 GiB of host memory (147.6 GiB/1007 GiB used)
-[1;36m(VllmWorkerProcess pid=358)[0;0m INFO 12-09 23:53:59 hpu_model_runner.py:1617] Skipping warmup...
-INFO 12-09 23:53:59 llm_engine.py:495] init engine (profile, create kv cache, warmup model) took 62.70 seconds
-INFO 12-09 23:53:59 api_server.py:252] vLLM to use /tmp/tmpsrzzvs8g as PROMETHEUS_MULTIPROC_DIR
-INFO 12-09 23:53:59 api_server.py:560] Using supplied chat template:
-INFO 12-09 23:53:59 api_server.py:560] None
-INFO 12-09 23:53:59 launcher.py:19] Available routes are:
-INFO 12-09 23:53:59 launcher.py:27] Route: /openapi.json, Methods: GET, HEAD
-INFO 12-09 23:53:59 launcher.py:27] Route: /docs, Methods: GET, HEAD
-INFO 12-09 23:53:59 launcher.py:27] Route: /docs/oauth2-redirect, Methods: GET, HEAD
-INFO 12-09 23:53:59 launcher.py:27] Route: /redoc, Methods: GET, HEAD
-INFO 12-09 23:53:59 launcher.py:27] Route: /health, Methods: GET
-INFO 12-09 23:53:59 launcher.py:27] Route: /tokenize, Methods: POST
-INFO 12-09 23:53:59 launcher.py:27] Route: /detokenize, Methods: POST
-INFO 12-09 23:53:59 launcher.py:27] Route: /v1/models, Methods: GET
-INFO 12-09 23:53:59 launcher.py:27] Route: /version, Methods: GET
-INFO 12-09 23:53:59 launcher.py:27] Route: /v1/chat/completions, Methods: POST
-INFO 12-09 23:53:59 launcher.py:27] Route: /v1/completions, Methods: POST
-INFO 12-09 23:53:59 launcher.py:27] Route: /v1/embeddings, Methods: POST
-INFO 12-09 23:53:59 launcher.py:27] Route: /v1/score, Methods: POST
-INFO:     Started server process [1]
-INFO:     Waiting for application startup.
-INFO:     Application startup complete.
-INFO:     Uvicorn running on http://0.0.0.0:80 (Press CTRL+C to quit)
diff --git a/vllm-fork b/vllm-fork
deleted file mode 160000
index db686905a1..0000000000
--- a/vllm-fork
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit db686905a1327f600fe65091be0917434a64eb7e