Update index.txt

jina-ai · Oct 28, 2024 · 48f8723 · 48f8723
1 parent 6a82263
commit 48f8723
Showing 1 changed file with 130 additions and 1 deletion.
diff --git a/index.txt b/index.txt
@@ -1 +1,130 @@
-hello
+JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE
+
+Key Principles:
+- Choose simplest solution: Use single API when possible
+- Answer "can't do" for tasks outside these APIs' scope
+- Prefer built-in features over custom implementations
+- Leverage multilingual/multimodal capabilities when needed
+
+Core APIs and Use Cases:
+
+1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings)
+Purpose: Convert text/images to fixed-length vectors
+Best for: Semantic search, similarity matching, clustering
+Example Request:
+{
+    "model": "jina-embeddings-v3",
+    "task": "text-matching",
+    "input": ["search query", "document text"]
+}
+
+2. RERANKER API (https://api.jina.ai/v1/rerank)
+Purpose: Improve search result relevancy
+Best for: Refining search results, RAG accuracy
+Example Request:
+{
+    "model": "jina-reranker-v2-base-multilingual",
+    "query": "search query",
+    "documents": ["candidate1", "candidate2"]
+}
+
+3. READER API (https://r.jina.ai)
+Purpose: Convert URLs to LLM-friendly text
+Best for: Web content extraction, RAG input preparation
+
+4. SEARCH API (https://s.jina.ai)
+Purpose: Web search with LLM-friendly results
+Best for: Knowledge retrieval, RAG source gathering
+
+5. GROUNDING API (https://g.jina.ai)
+Purpose: Ground statements with web knowledge
+Best for: Fact verification, claim validation
+
+6. CLASSIFIER API (https://api.jina.ai/v1/classify)
+Purpose: Zero-shot/few-shot classification
+Best for: Content categorization without training
+Example Request:
+{
+    "model": "jina-embeddings-v3",
+    "input": [{"text": "content"}],
+    "labels": ["category1", "category2"]
+}
+
+7. SEGMENTER API (https://segment.jina.ai)
+Purpose: Tokenize and segment long text
+Best for: Breaking down documents into manageable chunks
+
+RECOMMENDED PATTERNS:
+
+1. Basic Search:
+- If simple search: Use Search API alone
+- If need better ranking: Search API -> Reranker API
+
+2. RAG Implementation:
+- Basic: Reader API -> Segmenter API -> Embeddings API
+- Advanced: Add Reranker API for better result ranking
+
+3. Fact Checking:
+- Simple: Grounding API alone
+- Thorough: Search API -> Grounding API
+
+4. Content Classification:
+- Single task: Classifier API (zero-shot)
+- Multiple related tasks: Consider embeddings for similarity
+
+RECOMMENDED PATTERNS:
+
+1. Basic Search Implementation:
+- For simple queries: Use Search API directly
+- For better relevancy: First use Search API, then pass results through Reranker API
+- Consider using embedding comparison only when semantic matching is crucial
+
+2. RAG (Retrieval-Augmented Generation) Pipeline:
+- Basic flow: Reader API -> Segmenter -> Embeddings
+- Enhanced flow: Add Reranker as final step
+- When to use each step:
+  * Reader: When source is a URL
+  * Segmenter: When content is long
+  * Embeddings: For semantic matching
+  * Reranker: When result ordering is critical
+
+3. Fact Checking Implementation:
+- Simple verification: Use Grounding API directly
+- Enhanced verification: Search API first, then Grounding API
+- Use X-Site header to specify trusted sources
+
+4. Classification Tasks:
+- Single-language: Use Classifier API directly
+- Multilingual: Use embeddings-v3 model
+- Multiple categories: Provide semantic labels
+
+5. Content Processing:
+- URL content: Reader API only
+- Long text: Segmenter API only
+- Mixed content: Reader -> Segmenter
+
+INTEGRATION GUIDELINES:
+- Always handle API errors and rate limits
+- Implement retries for network failures
+- Cache results when appropriate
+- Validate inputs before API calls
+- Handle multilingual content properly
+
+ANTI-PATTERNS TO AVOID:
+1. Don't chain APIs unnecessarily
+2. Don't segment already short text
+3. Don't rerank without query-document pairs
+4. Don't use grounding for open questions
+
+WHAT THESE APIs CAN'T DO:
+1. Generate new text or images
+2. Modify or edit content
+3. Execute code or perform calculations
+4. Real-time data processing
+5. Store or cache results permanently
+
+All APIs require:
+- Authorization: Bearer token
+- Error handling
+- Rate limit consideration
+- Response validation