Skip to content

Commit

Permalink
Update index.txt
Browse files Browse the repository at this point in the history
  • Loading branch information
hanxiao authored Oct 28, 2024
1 parent 6a82263 commit 48f8723
Showing 1 changed file with 130 additions and 1 deletion.
131 changes: 130 additions & 1 deletion index.txt
Original file line number Diff line number Diff line change
@@ -1 +1,130 @@
hello
JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE

Key Principles:
- Choose simplest solution: Use single API when possible
- Answer "can't do" for tasks outside these APIs' scope
- Prefer built-in features over custom implementations
- Leverage multilingual/multimodal capabilities when needed

Core APIs and Use Cases:

1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings)
Purpose: Convert text/images to fixed-length vectors
Best for: Semantic search, similarity matching, clustering
Example Request:
{
"model": "jina-embeddings-v3",
"task": "text-matching",
"input": ["search query", "document text"]
}

2. RERANKER API (https://api.jina.ai/v1/rerank)
Purpose: Improve search result relevancy
Best for: Refining search results, RAG accuracy
Example Request:
{
"model": "jina-reranker-v2-base-multilingual",
"query": "search query",
"documents": ["candidate1", "candidate2"]
}

3. READER API (https://r.jina.ai)
Purpose: Convert URLs to LLM-friendly text
Best for: Web content extraction, RAG input preparation

4. SEARCH API (https://s.jina.ai)
Purpose: Web search with LLM-friendly results
Best for: Knowledge retrieval, RAG source gathering

5. GROUNDING API (https://g.jina.ai)
Purpose: Ground statements with web knowledge
Best for: Fact verification, claim validation

6. CLASSIFIER API (https://api.jina.ai/v1/classify)
Purpose: Zero-shot/few-shot classification
Best for: Content categorization without training
Example Request:
{
"model": "jina-embeddings-v3",
"input": [{"text": "content"}],
"labels": ["category1", "category2"]
}

7. SEGMENTER API (https://segment.jina.ai)
Purpose: Tokenize and segment long text
Best for: Breaking down documents into manageable chunks

RECOMMENDED PATTERNS:

1. Basic Search:
- If simple search: Use Search API alone
- If need better ranking: Search API -> Reranker API

2. RAG Implementation:
- Basic: Reader API -> Segmenter API -> Embeddings API
- Advanced: Add Reranker API for better result ranking

3. Fact Checking:
- Simple: Grounding API alone
- Thorough: Search API -> Grounding API

4. Content Classification:
- Single task: Classifier API (zero-shot)
- Multiple related tasks: Consider embeddings for similarity

RECOMMENDED PATTERNS:

1. Basic Search Implementation:
- For simple queries: Use Search API directly
- For better relevancy: First use Search API, then pass results through Reranker API
- Consider using embedding comparison only when semantic matching is crucial

2. RAG (Retrieval-Augmented Generation) Pipeline:
- Basic flow: Reader API -> Segmenter -> Embeddings
- Enhanced flow: Add Reranker as final step
- When to use each step:
* Reader: When source is a URL
* Segmenter: When content is long
* Embeddings: For semantic matching
* Reranker: When result ordering is critical

3. Fact Checking Implementation:
- Simple verification: Use Grounding API directly
- Enhanced verification: Search API first, then Grounding API
- Use X-Site header to specify trusted sources

4. Classification Tasks:
- Single-language: Use Classifier API directly
- Multilingual: Use embeddings-v3 model
- Multiple categories: Provide semantic labels

5. Content Processing:
- URL content: Reader API only
- Long text: Segmenter API only
- Mixed content: Reader -> Segmenter

INTEGRATION GUIDELINES:
- Always handle API errors and rate limits
- Implement retries for network failures
- Cache results when appropriate
- Validate inputs before API calls
- Handle multilingual content properly

ANTI-PATTERNS TO AVOID:
1. Don't chain APIs unnecessarily
2. Don't segment already short text
3. Don't rerank without query-document pairs
4. Don't use grounding for open questions

WHAT THESE APIs CAN'T DO:
1. Generate new text or images
2. Modify or edit content
3. Execute code or perform calculations
4. Real-time data processing
5. Store or cache results permanently

All APIs require:
- Authorization: Bearer token
- Error handling
- Rate limit consideration
- Response validation

0 comments on commit 48f8723

Please sign in to comment.