Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MODULE] Add module on inference #150

Merged
merged 3 commits into from
Jan 17, 2025
Merged

Conversation

burtenshaw
Copy link
Collaborator

This is a module on inference techniques like pipeline and TGI.

@burtenshaw burtenshaw marked this pull request as draft December 30, 2024 06:28
7_inference/inference_pipeline.md Show resolved Hide resolved
Comment on lines +45 to +50
response = generator(
"Write a short poem about coding:",
max_new_tokens=100,
do_sample=True,
temperature=0.7
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't 100 tokens a bit small?

7_inference/inference_pipeline.md Show resolved Hide resolved
Here's how to integrate a pipeline into a Flask application:

```python
from flask import Flask, request, jsonify

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you using Flask and not something more modern like FastAPI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just in the pipeline documentation

@@ -0,0 +1,137 @@
# Text Generation Inference (TGI)

Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving Large Language Models (LLMs). It's designed to enable high-performance text generation for popular open-source LLMs. TGI is used in production by Hugging Chat - An open-source interface for open-access models.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fair to mention other providers like vLLM and Ollama too, as being the same but different?

7_inference/inference_pipeline.md Show resolved Hide resolved

LLM inference can be categorized into two main approaches: simple pipeline-based inference for development and testing, and optimized serving solutions for production deployments. We'll cover both approaches, starting with the simpler pipeline approach and moving to production-ready solutions.

## Contents

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider showing and mentioning inference options again like adapters and structured generation?

Copy link
Contributor

@souzatharsis souzatharsis Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It would be very helpful to include controlled inference generation, which is particular crucial when integrating LLMs with downstream systems (e.g. grammars with llama.cpp, FSM with outlines or logit processing techniques in general).

Here's an example on how to control inference using Smol model + LogitsProcessor class from the Transformers library:

https://www.tamingllms.com/notebooks/structured_output.html#logit-post-processing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice suggestion, but I would add it in a subsequent PR and a separate page.

from huggingface_hub import InferenceClient

client = InferenceClient(
base_url="http://localhost:8080/v1/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be helpful to use working API URLs such that sample code runs. Plus explain that in HuggingFace one can run inference on Serverless Inference API, dedicated APIs etc.

@burtenshaw burtenshaw marked this pull request as ready for review January 17, 2025 20:04
@burtenshaw burtenshaw merged commit 91e5cf8 into main Jan 17, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants