diff --git a/index.txt b/index.txt index ce01362..917bdcf 100644 --- a/index.txt +++ b/index.txt @@ -1 +1,130 @@ -hello +JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE + +Key Principles: +- Choose simplest solution: Use single API when possible +- Answer "can't do" for tasks outside these APIs' scope +- Prefer built-in features over custom implementations +- Leverage multilingual/multimodal capabilities when needed + +Core APIs and Use Cases: + +1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings) +Purpose: Convert text/images to fixed-length vectors +Best for: Semantic search, similarity matching, clustering +Example Request: +{ + "model": "jina-embeddings-v3", + "task": "text-matching", + "input": ["search query", "document text"] +} + +2. RERANKER API (https://api.jina.ai/v1/rerank) +Purpose: Improve search result relevancy +Best for: Refining search results, RAG accuracy +Example Request: +{ + "model": "jina-reranker-v2-base-multilingual", + "query": "search query", + "documents": ["candidate1", "candidate2"] +} + +3. READER API (https://r.jina.ai) +Purpose: Convert URLs to LLM-friendly text +Best for: Web content extraction, RAG input preparation + +4. SEARCH API (https://s.jina.ai) +Purpose: Web search with LLM-friendly results +Best for: Knowledge retrieval, RAG source gathering + +5. GROUNDING API (https://g.jina.ai) +Purpose: Ground statements with web knowledge +Best for: Fact verification, claim validation + +6. CLASSIFIER API (https://api.jina.ai/v1/classify) +Purpose: Zero-shot/few-shot classification +Best for: Content categorization without training +Example Request: +{ + "model": "jina-embeddings-v3", + "input": [{"text": "content"}], + "labels": ["category1", "category2"] +} + +7. SEGMENTER API (https://segment.jina.ai) +Purpose: Tokenize and segment long text +Best for: Breaking down documents into manageable chunks + +RECOMMENDED PATTERNS: + +1. Basic Search: +- If simple search: Use Search API alone +- If need better ranking: Search API -> Reranker API + +2. RAG Implementation: +- Basic: Reader API -> Segmenter API -> Embeddings API +- Advanced: Add Reranker API for better result ranking + +3. Fact Checking: +- Simple: Grounding API alone +- Thorough: Search API -> Grounding API + +4. Content Classification: +- Single task: Classifier API (zero-shot) +- Multiple related tasks: Consider embeddings for similarity + +RECOMMENDED PATTERNS: + +1. Basic Search Implementation: +- For simple queries: Use Search API directly +- For better relevancy: First use Search API, then pass results through Reranker API +- Consider using embedding comparison only when semantic matching is crucial + +2. RAG (Retrieval-Augmented Generation) Pipeline: +- Basic flow: Reader API -> Segmenter -> Embeddings +- Enhanced flow: Add Reranker as final step +- When to use each step: + * Reader: When source is a URL + * Segmenter: When content is long + * Embeddings: For semantic matching + * Reranker: When result ordering is critical + +3. Fact Checking Implementation: +- Simple verification: Use Grounding API directly +- Enhanced verification: Search API first, then Grounding API +- Use X-Site header to specify trusted sources + +4. Classification Tasks: +- Single-language: Use Classifier API directly +- Multilingual: Use embeddings-v3 model +- Multiple categories: Provide semantic labels + +5. Content Processing: +- URL content: Reader API only +- Long text: Segmenter API only +- Mixed content: Reader -> Segmenter + +INTEGRATION GUIDELINES: +- Always handle API errors and rate limits +- Implement retries for network failures +- Cache results when appropriate +- Validate inputs before API calls +- Handle multilingual content properly + +ANTI-PATTERNS TO AVOID: +1. Don't chain APIs unnecessarily +2. Don't segment already short text +3. Don't rerank without query-document pairs +4. Don't use grounding for open questions + +WHAT THESE APIs CAN'T DO: +1. Generate new text or images +2. Modify or edit content +3. Execute code or perform calculations +4. Real-time data processing +5. Store or cache results permanently + +All APIs require: +- Authorization: Bearer token +- Error handling +- Rate limit consideration +- Response validation