-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Juanmaturino/edu 23 need to add example of summarizer component for #1022
Open
jjmaturino
wants to merge
7
commits into
opea-project:main
Choose a base branch
from
jjmaturino:juanmaturino/edu-23-need-to-add-example-of-summarizer-component-for
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
71814a8
feature: initial commit to pr for adding summarization component for …
jjmaturino 044566f
test: added test for ci/cd GenAI process. Verifies that PredictionGua…
jjmaturino 2d02891
doc: updated README to reflect changes made for docsum component. Det…
jjmaturino e7055e3
feature: added Prediction Guard DocSum GenAI component example
jjmaturino d8628aa
fix: updated test time
jjmaturino 17e1777
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 047ac62
fix: updated curl json parameters
jjmaturino File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Copyright (C) 2024 Prediction Guard, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
FROM python:3.11-slim | ||
|
||
COPY comps /home/comps | ||
|
||
RUN pip install --no-cache-dir --upgrade pip setuptools && \ | ||
pip install --no-cache-dir -r /home/comps/llms/summarization/predictionguard/requirements.txt | ||
|
||
ENV PYTHONPATH=$PYTHONPATH:/home | ||
|
||
WORKDIR /home/comps/llms/summarization/predictionguard | ||
|
||
ENTRYPOINT ["bash", "entrypoint.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Prediction Guard Introduction | ||
|
||
[Prediction Guard](https://docs.predictionguard.com) allows you to utilize hosted open access LLMs, LVMs, and embedding functionality with seamlessly integrated safeguards. In addition to providing a scalable access to open models, Prediction Guard allows you to configure factual consistency checks, toxicity filters, PII filters, and prompt injection blocking. Join the [Prediction Guard Discord channel](https://discord.gg/TFHgnhAFKd) and request an API key to get started. | ||
|
||
# Getting Started | ||
|
||
## 🚀1. Start Microservice with Docker 🐳 | ||
|
||
### 1.1 Set up Prediction Guard API Key | ||
|
||
You can get your API key from the [Prediction Guard Discord channel](https://discord.gg/TFHgnhAFKd). | ||
|
||
```bash | ||
export PREDICTIONGUARD_API_KEY=<your_api_key> | ||
``` | ||
|
||
### 1.2 Build Docker Image | ||
|
||
```bash | ||
docker build -t opea/llm-docsum-predictionguard:latest -f comps/llms/summarization/predictionguard/Dockerfile . | ||
``` | ||
|
||
### 1.3 Run the Predictionguard Microservice | ||
|
||
```bash | ||
docker run -d -p 9000:9000 -e PREDICTIONGUARD_API_KEY=$PREDICTIONGUARD_API_KEY --name llm-docsum-predictionguard opea/llm-docsum-predictionguard:latest | ||
``` | ||
|
||
## 🚀 2. Consume the Prediction Guard Microservice | ||
|
||
See the [Prediction Guard docs](https://docs.predictionguard.com/) for available model options. | ||
|
||
### Without streaming | ||
|
||
```bash | ||
curl -X POST http://localhost:9000/v1/chat/docsum \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"model": "Hermes-2-Pro-Llama-3-8B", | ||
"query": "Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to analyze various levels of abstract data representations. It enables computers to identify patterns and make decisions with minimal human intervention by learning from large amounts of data.", | ||
"max_tokens": 100, | ||
"temperature": 0.7, | ||
"top_p": 0.9, | ||
"top_k": 50, | ||
"stream": false | ||
}' | ||
``` | ||
|
||
### With streaming | ||
|
||
```bash | ||
curl -N -X POST http://localhost:9000/v1/chat/docsum \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"model": "Hermes-2-Pro-Llama-3-8B", | ||
"query": "Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to analyze various levels of abstract data representations. It enables computers to identify patterns and make decisions with minimal human intervention by learning from large amounts of data.", | ||
"max_tokens": 100, | ||
"temperature": 0.7, | ||
"top_p": 0.9, | ||
"top_k": 50, | ||
"stream": true | ||
}' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Copyright (C) 2024 Prediction Guard, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 |
20 changes: 20 additions & 0 deletions
20
comps/llms/summarization/predictionguard/docker_compose_llm.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Copyright (C) 2024 Prediction Guard, Inc | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
services: | ||
llm: | ||
image: opea/llm-docsum-predictionguard:latest | ||
container_name: llm-docsum-predictionguard | ||
ports: | ||
- "9000:9000" | ||
ipc: host | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
PREDICTIONGUARD_API_KEY: ${PREDICTIONGUARD_API_KEY} | ||
restart: unless-stopped | ||
|
||
networks: | ||
default: | ||
driver: bridge |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Copyright (C) 2024 Prediction Guard, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
#pip --no-cache-dir install -r requirements-runtime.txt | ||
|
||
python llm_predictionguard.py |
87 changes: 87 additions & 0 deletions
87
comps/llms/summarization/predictionguard/llm_predictionguard.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Copyright (C) 2024 Prediction Guard, Inc. | ||
# SPDX-License-Identified: Apache-2.0 | ||
import json | ||
import time | ||
|
||
from fastapi import FastAPI, HTTPException | ||
from fastapi.responses import StreamingResponse | ||
from predictionguard import PredictionGuard | ||
|
||
from comps import ( | ||
GeneratedDoc, | ||
LLMParamsDoc, | ||
ServiceType, | ||
opea_microservices, | ||
register_microservice, | ||
register_statistics, | ||
statistics_dict, | ||
) | ||
|
||
client = PredictionGuard() | ||
app = FastAPI() | ||
|
||
|
||
@register_microservice( | ||
name="opea_service@llm_predictionguard_docsum", | ||
service_type=ServiceType.LLM, | ||
endpoint="/v1/chat/docsum", | ||
host="0.0.0.0", | ||
port=9000, | ||
) | ||
@register_statistics(names=["opea_service@llm_predictionguard_docsum"]) | ||
def llm_generate(input: LLMParamsDoc): | ||
start = time.time() | ||
|
||
messages = [ | ||
{ | ||
"role": "system", | ||
"content": "You are a summarization assistant. Your goal is to provide a very concise, summarized responses of the user query.", | ||
}, | ||
{"role": "user", "content": input.query}, | ||
] | ||
|
||
if input.streaming: | ||
|
||
async def stream_generator(): | ||
chat_response = "" | ||
for res in client.chat.completions.create( | ||
model=input.model, | ||
messages=messages, | ||
max_tokens=input.max_tokens, | ||
temperature=input.temperature, | ||
top_p=input.top_p, | ||
top_k=input.top_k, | ||
stream=True, | ||
): | ||
if "choices" in res["data"] and "delta" in res["data"]["choices"][0]: | ||
delta_content = res["data"]["choices"][0]["delta"]["content"] | ||
chat_response += delta_content | ||
yield f"data: {delta_content}\n\n" | ||
else: | ||
yield "data: [DONE]\n\n" | ||
|
||
statistics_dict["opea_service@llm_predictionguard_docsum"].append_latency(time.time() - start, None) | ||
return StreamingResponse(stream_generator(), media_type="text/event-stream") | ||
else: | ||
try: | ||
response = client.chat.completions.create( | ||
model=input.model, | ||
messages=messages, | ||
max_tokens=input.max_tokens, | ||
temperature=input.temperature, | ||
top_p=input.top_p, | ||
top_k=input.top_k, | ||
) | ||
|
||
print(json.dumps(response, sort_keys=True, indent=4, separators=(",", ": "))) | ||
|
||
response_text = response["choices"][0]["message"]["content"] | ||
except Exception as e: | ||
raise HTTPException(status_code=500, detail=str(e)) | ||
|
||
statistics_dict["opea_service@llm_predictionguard_docsum"].append_latency(time.time() - start, None) | ||
return GeneratedDoc(text=response_text, prompt=input.query) | ||
|
||
|
||
if __name__ == "__main__": | ||
opea_microservices["opea_service@llm_predictionguard_docsum"].start() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
aiohttp | ||
docarray | ||
fastapi | ||
opentelemetry-api | ||
opentelemetry-exporter-otlp | ||
opentelemetry-sdk | ||
Pillow | ||
predictionguard | ||
prometheus-fastapi-instrumentator | ||
shortuuid | ||
transformers | ||
uvicorn |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
#!/bin/bash | ||
# Copyright (C) 2024 Prediction Guard, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
set -x # Print commands and their arguments as they are executed | ||
|
||
WORKPATH=$(dirname "$PWD") | ||
ip_address=$(hostname -I | awk '{print $1}') # Adjust to a more reliable command | ||
if [ -z "$ip_address" ]; then | ||
ip_address="localhost" # Default to localhost if IP address is empty | ||
fi | ||
|
||
function build_docker_images() { | ||
cd $WORKPATH | ||
echo $(pwd) | ||
docker build --no-cache -t opea/llm-pg:comps -f comps/llms/summarization/predictionguard/Dockerfile . | ||
if [ $? -ne 0 ]; then | ||
echo "opea/llm-pg built failed" | ||
exit 1 | ||
else | ||
echo "opea/llm-pg built successfully" | ||
fi | ||
} | ||
|
||
function start_service() { | ||
llm_service_port=9000 | ||
unset http_proxy | ||
docker run -d --name=test-comps-llm-pg-server \ | ||
-e http_proxy= -e https_proxy= \ | ||
-e PREDICTIONGUARD_API_KEY=${PREDICTIONGUARD_API_KEY} \ | ||
-p 9000:9000 --ipc=host opea/llm-pg:comps | ||
sleep 5 # Sleep for 5 seconds to allow the service to start | ||
} | ||
|
||
function validate_microservice() { | ||
llm_service_port=9000 | ||
result=$(http_proxy="" curl http://${ip_address}:${llm_service_port}/v1/chat/docsum \ | ||
-X POST \ | ||
-d '{"model": "Hermes-3-Llama-3.1-8B", "query": "Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to analyze various levels of abstract data representations. It enables computers to identify patterns and make decisions with minimal human intervention by learning from large amounts of data.", "streaming": false, "max_tokens": 100, "temperature": 0.7, "top_p": 1.0, "top_k": 50}' \ | ||
-H 'Content-Type: application/json') | ||
|
||
if [[ $result == *"text"* ]]; then | ||
echo "Service response is correct." | ||
else | ||
echo "Result wrong. Received was $result" | ||
docker logs test-comps-llm-pg-server | ||
exit 1 | ||
fi | ||
} | ||
|
||
function stop_docker() { | ||
cid=$(docker ps -aq --filter "name=test-comps-llm-pg-*") | ||
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi | ||
} | ||
|
||
function main() { | ||
stop_docker | ||
|
||
build_docker_images | ||
start_service | ||
|
||
validate_microservice | ||
|
||
stop_docker | ||
echo y | docker system prune | ||
} | ||
|
||
main |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lvliang-intel pls review this PR.
From my view, why not merge this with guardrail component we have had? looks like only the prompt and model variant are different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ftian1 Hey Tian,
Hope you're having a good day.
Adding this summarization component, is desired, because it relies on the PredictionGuard platform.
I want to add a PredictionGuard DocSum Permutation to the GenAI Example Repo at some point, and the document summarization component is a necessary component for that MegaService.
You are correct, that this component only differs in the prompt and model from the predictionguard/text-gen component.
Having two separate components allows us to separate concerns if in the future we want to change how docsum is handled by predictionguard, without modifying textgen