-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
130 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,130 @@ | ||
hello | ||
JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE | ||
|
||
Key Principles: | ||
- Choose simplest solution: Use single API when possible | ||
- Answer "can't do" for tasks outside these APIs' scope | ||
- Prefer built-in features over custom implementations | ||
- Leverage multilingual/multimodal capabilities when needed | ||
|
||
Core APIs and Use Cases: | ||
|
||
1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings) | ||
Purpose: Convert text/images to fixed-length vectors | ||
Best for: Semantic search, similarity matching, clustering | ||
Example Request: | ||
{ | ||
"model": "jina-embeddings-v3", | ||
"task": "text-matching", | ||
"input": ["search query", "document text"] | ||
} | ||
|
||
2. RERANKER API (https://api.jina.ai/v1/rerank) | ||
Purpose: Improve search result relevancy | ||
Best for: Refining search results, RAG accuracy | ||
Example Request: | ||
{ | ||
"model": "jina-reranker-v2-base-multilingual", | ||
"query": "search query", | ||
"documents": ["candidate1", "candidate2"] | ||
} | ||
|
||
3. READER API (https://r.jina.ai) | ||
Purpose: Convert URLs to LLM-friendly text | ||
Best for: Web content extraction, RAG input preparation | ||
|
||
4. SEARCH API (https://s.jina.ai) | ||
Purpose: Web search with LLM-friendly results | ||
Best for: Knowledge retrieval, RAG source gathering | ||
|
||
5. GROUNDING API (https://g.jina.ai) | ||
Purpose: Ground statements with web knowledge | ||
Best for: Fact verification, claim validation | ||
|
||
6. CLASSIFIER API (https://api.jina.ai/v1/classify) | ||
Purpose: Zero-shot/few-shot classification | ||
Best for: Content categorization without training | ||
Example Request: | ||
{ | ||
"model": "jina-embeddings-v3", | ||
"input": [{"text": "content"}], | ||
"labels": ["category1", "category2"] | ||
} | ||
|
||
7. SEGMENTER API (https://segment.jina.ai) | ||
Purpose: Tokenize and segment long text | ||
Best for: Breaking down documents into manageable chunks | ||
|
||
RECOMMENDED PATTERNS: | ||
|
||
1. Basic Search: | ||
- If simple search: Use Search API alone | ||
- If need better ranking: Search API -> Reranker API | ||
|
||
2. RAG Implementation: | ||
- Basic: Reader API -> Segmenter API -> Embeddings API | ||
- Advanced: Add Reranker API for better result ranking | ||
|
||
3. Fact Checking: | ||
- Simple: Grounding API alone | ||
- Thorough: Search API -> Grounding API | ||
|
||
4. Content Classification: | ||
- Single task: Classifier API (zero-shot) | ||
- Multiple related tasks: Consider embeddings for similarity | ||
|
||
RECOMMENDED PATTERNS: | ||
|
||
1. Basic Search Implementation: | ||
- For simple queries: Use Search API directly | ||
- For better relevancy: First use Search API, then pass results through Reranker API | ||
- Consider using embedding comparison only when semantic matching is crucial | ||
|
||
2. RAG (Retrieval-Augmented Generation) Pipeline: | ||
- Basic flow: Reader API -> Segmenter -> Embeddings | ||
- Enhanced flow: Add Reranker as final step | ||
- When to use each step: | ||
* Reader: When source is a URL | ||
* Segmenter: When content is long | ||
* Embeddings: For semantic matching | ||
* Reranker: When result ordering is critical | ||
|
||
3. Fact Checking Implementation: | ||
- Simple verification: Use Grounding API directly | ||
- Enhanced verification: Search API first, then Grounding API | ||
- Use X-Site header to specify trusted sources | ||
|
||
4. Classification Tasks: | ||
- Single-language: Use Classifier API directly | ||
- Multilingual: Use embeddings-v3 model | ||
- Multiple categories: Provide semantic labels | ||
|
||
5. Content Processing: | ||
- URL content: Reader API only | ||
- Long text: Segmenter API only | ||
- Mixed content: Reader -> Segmenter | ||
|
||
INTEGRATION GUIDELINES: | ||
- Always handle API errors and rate limits | ||
- Implement retries for network failures | ||
- Cache results when appropriate | ||
- Validate inputs before API calls | ||
- Handle multilingual content properly | ||
|
||
ANTI-PATTERNS TO AVOID: | ||
1. Don't chain APIs unnecessarily | ||
2. Don't segment already short text | ||
3. Don't rerank without query-document pairs | ||
4. Don't use grounding for open questions | ||
|
||
WHAT THESE APIs CAN'T DO: | ||
1. Generate new text or images | ||
2. Modify or edit content | ||
3. Execute code or perform calculations | ||
4. Real-time data processing | ||
5. Store or cache results permanently | ||
|
||
All APIs require: | ||
- Authorization: Bearer token | ||
- Error handling | ||
- Rate limit consideration | ||
- Response validation |