diff --git a/docs/docs/concepts.mdx b/docs/docs/concepts.mdx
index 619075ebc8a41..db8d4736b4e86 100644
--- a/docs/docs/concepts.mdx
+++ b/docs/docs/concepts.mdx
@@ -11,7 +11,7 @@ LangChain as a framework consists of a number of packages.
 
 ### `langchain-core`
 This package contains base abstractions of different components and ways to compose them together.
-The interfaces for core components like LLMs, vectorstores, retrievers and more are defined here.
+The interfaces for core components like LLMs, vector stores, retrievers and more are defined here.
 No third party integrations are defined here.
 The dependencies are kept purposefully very lightweight.
 
@@ -30,7 +30,7 @@ All chains, agents, and retrieval strategies here are NOT specific to any one in
 
 This package contains third party integrations that are maintained by the LangChain community.
 Key partner packages are separated out (see below).
-This contains all integrations for various components (LLMs, vectorstores, retrievers).
+This contains all integrations for various components (LLMs, vector stores, retrievers).
 All dependencies in this package are optional to keep the package as lightweight as possible.
 
 ### [`langgraph`](https://langchain-ai.github.io/langgraph)
@@ -463,7 +463,7 @@ For specifics on how to use vector stores, see the [relevant how-to guides here]
 A retriever is an interface that returns documents given an unstructured query.
 It is more general than a vector store.
 A retriever does not need to be able to store documents, only to return (or retrieve) them.
-Retrievers can be created from vectorstores, but are also broad enough to include [Wikipedia search](/docs/integrations/retrievers/wikipedia/) and [Amazon Kendra](/docs/integrations/retrievers/amazon_kendra_retriever/).
+Retrievers can be created from vector stores, but are also broad enough to include [Wikipedia search](/docs/integrations/retrievers/wikipedia/) and [Amazon Kendra](/docs/integrations/retrievers/amazon_kendra_retriever/).
 
 Retrievers accept a string query as input and return a list of Document's as output.
 
@@ -816,7 +816,7 @@ For a full list of model providers that support JSON mode, see [this table](/doc
 We use the term tool calling interchangeably with function calling. Although
 function calling is sometimes meant to refer to invocations of a single function,
 we treat all models as though they can return multiple tool or function calls in
-each message.
+each message
 :::
 
 Tool calling allows a model to respond to a given prompt by generating output that
@@ -860,30 +860,162 @@ For a full list of model providers that support tool calling, [see this table](/
 
 ### Retrieval
 
-LangChain provides several advanced retrieval types. A full list is below, along with the following information:
+LLMs are trained on a large but fixed dataset, limiting their ability to reason over private or recent information. Fine-tuning an LLM with specific facts is one way to mitigate this, but is often [poorly suited for factual recall](https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts) and [can be costly](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise). 
+Retrieval is the process of providing relevant information to an LLM to improve its response for a given input. Retrieval augmented generation (RAG) is the process of grounding the LLM generation (output) using the retrieved information.
 
-**Name**: Name of the retrieval algorithm.
+:::tip
 
-**Index Type**: Which index type (if any) this relies on.
+* See our RAG from Scratch [code](https://github.com/langchain-ai/rag-from-scratch) and [video series](https://youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x&feature=shared).
+* For a high-level guide on retrieval, see this [tutorial on RAG](/docs/tutorials/rag/).
 
-**Uses an LLM**: Whether this retrieval method uses an LLM.
+:::
+
+RAG is only as good as the retrieved documents’ relevance and quality. Fortunately, an emerging set of techniques can be employed to design and improve RAG systems. We've focused on taxonomizing and summarizing many of these techniques (see below figure) and will share some high-level strategic guidance in the following sections.
+You can and should experiment with using different pieces together. You might also find [this LangSmith guide](https://docs.smith.langchain.com/how_to_guides/evaluation/evaluate_llm_application) useful for showing how to evaluate different iterations of your app.
+
+![](/img/rag_landscape.png)
+
+#### Query Translation
 
-**When to Use**: Our commentary on when you should considering using this retrieval method.
+First, consider the user input(s) to your RAG system. Ideally, a RAG system can handle a wide range of inputs, from poorly worded questions to complex multi-part queries.
+**Using an LLM to review and optionally modify the input is the central idea behind query translation.** This serves as a general buffer, optimizing raw user inputs for your retrieval system. 
+For example, this can be as simple as extracting keywords or as complex as generating multiple sub-questions for a complex query.
 
-**Description**: Description of what this retrieval algorithm is doing.
+| Name          | When to use | Description |
+|---------------|-------------|-------------|
+| [Multi-query](/docs/how_to/MultiQueryRetriever/)   | When you need to cover multiple perspectives of a question. | Rewrite the user question from multiple perspectives, retrieve documents for each rewritten question, return the unique documents for all queries. |
+| [Decomposition](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb) | When a question can be broken down into smaller subproblems. | Decompose a question into a set of subproblems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer). |
+| [Step-back](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)     | When a higher-level conceptual understanding is required. | First prompt the LLM to ask a generic step-back question about higher-level concepts or principles, and retrieve relevant facts about them. Use this grounding to help answer the user question. |
+| [HyDE](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)          | If you have challenges retrieving relevant documents using the raw user inputs. | Use an LLM to convert questions into hypothetical documents that answer the question. Use the embedded hypothetical documents to retrieve real documents with the premise that doc-doc similarity search can produce more relevant matches. |
+
+:::tip
+
+See our RAG from Scratch videos for a few different specific approaches:
+- [Multi-query](https://youtu.be/JChPi0CRnDY?feature=shared)
+- [Decomposition](https://youtu.be/h0OPWlEOank?feature=shared)
+- [Step-back](https://youtu.be/xn1jEjRyJ2U?feature=shared)
+- [HyDE](https://youtu.be/SaDzIVkYqyY?feature=shared)
+
+:::
+
+#### Routing
+
+Second, consider the data sources available to your RAG system. You want to query across more than one database or across structured and unstructured data sources. **Using an LLM to review the input and route it to the appropriate data source is a simple and effective approach for querying across sources.**
+
+| Name             | When to use                                | Description |
+|------------------|--------------------------------------------|-------------|
+| [Logical routing](/docs/how_to/routing/#using-a-runnablebranch)  | When you can prompt an LLM with rules to decide where to route the input. | Logical routing can use an LLM to reason about the query and choose which datastore is most appropriate. |
+| [Semantic routing](/docs/how_to/routing/#using-a-runnablebranch) | When semantic similarity is an effective way to determine where to route the input. | Semantic routing embeds both query and, typically a set of prompts. It then chooses the appropriate prompt based upon similarity. |
+
+:::tip
+
+See our RAG from Scratch video on [routing](https://youtu.be/pfpIndq7Fi8?feature=shared).  
+
+:::
+
+#### Query Construction
+
+Third, consider whether any of your data sources require specific query formats. Many structured databases use SQL. Vector stores often have specific syntax for applying keyword filters to document metadata. **Using an LLM to convert a natural language query into a query syntax is a popular and powerful approach.**
+In particular, [text-to-SQL](/docs/tutorials/sql_qa/), [text-to-Cypher](/docs/tutorials/graph/), and [query analysis for metadata filters](/docs/tutorials/query_analysis/#query-analysis) are useful ways to interact with structured, graph, and vector databases respectively. 
+
+| Name                                        | When to Use                                                                                                                                   | Description                                                                                                                                                                                                                                                                                      |
+|---------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Text to SQL](/docs/tutorials/sql_qa/)      | If users are asking questions that require information housed in a relational database, accessible via SQL.                                   | This uses an LLM to transform user input into a SQL query.                                             |
+| [Text-to-Cypher](/docs/tutorials/graph/)    | If users are asking questions that require information housed in a graph database, accessible via Cypher.                                     | This uses an LLM to transform user input into a Cypher query.                                              |
+| [Self Query](/docs/how_to/self_query/)      | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.          | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filter to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself).                                              |
+
+:::tip
+
+See our [blog post overview](https://blog.langchain.dev/query-construction/) and RAG from Scratch video on [query construction](https://youtu.be/kl6NwWYxvbM?feature=shared), the process of text-to-DSL where DSL is a domain specific language required to interact with a given database. This converts user questions into structured queries. 
+
+:::
+
+#### Indexing
+
+Fouth, consider the design of your document index. A simple and powerful idea is to **decouple the documents that you index for retrieval from the documents that you pass to the LLM for generation.** Indexing frequently uses embedding models with vector stores, which [compress the semantic information in documents to fixed-size vectors](/docs/concepts/#embedding-models).
+
+Many RAG approaches focus on splitting documents into chunks and retrieving some number based on similarity to an input question for the LLM. But chunk size and chunk number can be difficult to set and affect results if they do not provide full context for the LLM to answer a question. Furthermore, LLMs are increasingly capable of processing millions of tokens. 
+
+Two approaches can address this tension: (1) [Multi Vector](/docs/how_to/multi_vector/) retriever using an LLM to translate documents into any form (e.g., often into a summary) that is well-suited for indexing, but returns full documents to the LLM for generation. (2) [ParentDocument](/docs/how_to/parent_document_retriever/) retriever embeds document chunks, but also returns full documents. The idea is to get the best of both worlds: use concise representations (summaries or chunks) for retrieval, but use the full documents for answer generation.
+
+| Name                      | Index Type                   | Uses an LLM               | When to Use                                                                                                                                   | Description                                                                                                                                                                                                                                                                                      |
+|---------------------------|------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Vector store](/docs/how_to/vectorstore_retriever/)               | Vector store                  | No                        | If you are just getting started and looking for something quick and easy.                                                                     | This is the simplest method and the one that is easiest to get started with. It involves creating embeddings for each piece of text.                                                                                                                                                             |
+| [ParentDocument](/docs/how_to/parent_document_retriever/)            | Vector store + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together.       | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks).                                                                         |
+| [Multi Vector](/docs/how_to/multi_vector/)              | Vector store + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                          | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                                                                                                 |
+| [Time-Weighted Vector store](/docs/how_to/time_weighted_vectorstore/) | Vector store                  | No                        | If you have timestamps associated with your documents, and you want to retrieve the most recent ones                                          | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents)                                                                                                                                    |
+
+:::tip
+
+- See our RAG from Scratch video on [indexing fundamentals](https://youtu.be/bjb_EMsTDKI?feature=shared)
+- See our RAG from Scratch video on [multi vector retriever](https://youtu.be/gTCU9I6QqCE?feature=shared)
+
+:::
+
+Fifth, consider ways to improve the quality of your similarity search itself. Embedding models compress text into fixed-length (vector) representations that capture the semantic content of the document. This compression is useful for search / retrieval, but puts a heavy burden on that single vector representation to capture the semantic nuance / detail of the document. In some cases, irrelevant or redundant content can dilute the semantic usefulness of the embedding.
+
+[ColBERT](https://docs.google.com/presentation/d/1IRhAdGjIevrrotdplHNcc4aXgIYyKamUKTWtB3m3aMU/edit?usp=sharing) is an interesting approach to address this with a higher granularity embeddings: (1) produce a contextually influenced embedding for each token in the document and query, (2) score similarity between each query token and all document tokens, (3) take the max, (4) do this for all query tokens, and (5) take the sum of the max scores (in step 3) for all query tokens to get a query-document similarity score; this token-wise scoring can yield strong results. 
+
+![](/img/colbert.png)
+
+There are some additional tricks to improve the quality of your retrieval. Embeddings excel at capturing semantic information, but may struggle with keyword-based queries. Many [vector stores](https://python.langchain.com/v0.2/docs/integrations/retrievers/pinecone_hybrid_search/) offer built-in [hybrid-search](https://docs.pinecone.io/guides/data/understanding-hybrid-search) to combine keyword and semantic similarity, which marries the benefits of both approaches. Furthermore, many vector stores have [maximal marginal relevance](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/mmr/), which attempts to diversify the results of a search to avoid returning similar and redundant documents. 
+
+| Name              | When to use                                              | Description |
+|-------------------|----------------------------------------------------------|-------------|
+| [ColBERT](/docs/integrations/providers/ragatouille/#using-colbert-as-a-reranker)           | When higher granularity embeddings are needed.           | ColBERT uses contextually influenced embeddings for each token in the document and query to get a granular query-document similarity score. |
+| [Hybrid search](/docs/integrations/retrievers/pinecone_hybrid_search/)     | When combining keyword-based and semantic similarity.    | Hybrid search combines keyword and semantic similarity, marrying the benefits of both approaches. |
+| [Maximal Marginal Relevance (MMR) ](/docs/integrations/vectorstores/pinecone/#maximal-marginal-relevance-searches) | When needing to diversify search results. | MMR attempts to diversify the results of a search to avoid returning similar and redundant documents. |
+
+:::tip
+
+See our RAG from Scratch video on [ColBERT](https://youtu.be/cN6S0Ehm7_8?feature=shared>).
+
+:::
+
+#### Post-processing
+
+Sixth, consider ways to filter or rank retrieved documents. This is very useful if you are [combining documents returned from multiple sources](/docs/integrations/retrievers/cohere-reranker/#doing-reranking-with-coherererank), since it can can down-rank less relevant documents and / or [compress similar documents](/docs/how_to/contextual_compression/#more-built-in-compressors-filters). 
 
 | Name                      | Index Type                   | Uses an LLM               | When to Use                                                                                                                                   | Description                                                                                                                                                                                                                                                                                      |
 |---------------------------|------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [Vectorstore](/docs/how_to/vectorstore_retriever/)               | Vectorstore                  | No                        | If you are just getting started and looking for something quick and easy.                                                                     | This is the simplest method and the one that is easiest to get started with. It involves creating embeddings for each piece of text.                                                                                                                                                             |
-| [ParentDocument](/docs/how_to/parent_document_retriever/)            | Vectorstore + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together.       | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks).                                                                         |
-| [Multi Vector](/docs/how_to/multi_vector/)              | Vectorstore + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                          | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                                                                                                 |
-| [Self Query](/docs/how_to/self_query/)               | Vectorstore                  | Yes                       | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.          | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filer to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself).                                              |
 | [Contextual Compression](/docs/how_to/contextual_compression/)    | Any                          | Sometimes                 | If you are finding that your retrieved documents contain too much irrelevant information and are distracting the LLM.                         | This puts a post-processing step on top of another retriever and extracts only the most relevant information from retrieved documents. This can be done with embeddings or an LLM.                                                                                                               |
-| [Time-Weighted Vectorstore](/docs/how_to/time_weighted_vectorstore/) | Vectorstore                  | No                        | If you have timestamps associated with your documents, and you want to retrieve the most recent ones                                          | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents)                                                                                                                                    |
-| [Multi-Query Retriever](/docs/how_to/MultiQueryRetriever/)     | Any                          | Yes                       | If users are asking questions that are complex and require multiple pieces of distinct information to respond                                 | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them.                             |
 | [Ensemble](/docs/how_to/ensemble_retriever/)                  | Any                          | No                        | If you have multiple retrieval methods and want to try combining them.                                                                        | This fetches documents from multiple retrievers and then combines them.                                                                                                                                                                                                                          |
+| [Re-ranking](/docs/integrations/retrievers/cohere-reranker/)                  | Any                          | Yes                        |  If you want to rank retrieved documents based upon relevance, especially if you want to combine results from multiple retrieval methods .                                                                         | Given a query and a list of documents, Rerank indexes the documents from most to least semantically relevant to the query.                                                                                                                                                                                                                         |
+
+:::tip
+
+See our RAG from Scratch video on [RAG-Fusion](https://youtu.be/77qELPbNgxA?feature=shared), on approach for post-processing across multiple queries:  Rewrite the user question from multiple perspectives, retrieve documents for each rewritten question, and combine the ranks of multiple search result lists to produce a single, unified ranking with [Reciprocal Rank Fusion (RRF)](https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1).
+
+:::
+
+#### Generation
+
+**Finally, consider ways to build self-correction into your RAG system.** RAG systems can suffer from low quality retrieval (e.g., if a user question is out of the domain for the index) and / or hallucinations in generation. A naive retrieve-generate pipeline has no ability to detect or self-correct from these kinds of errors. The concept of ["flow engineering"](https://x.com/karpathy/status/1748043513156272416) has been introduced [in the context of code generation](https://arxiv.org/abs/2401.08500): iteratively build an answer to a code question with unit tests to check and self-correct errors. Several works have applied this RAG, such as Self-RAG and Corrective-RAG. In both cases, checks for document relevance, hallucinations, and / or answer quality are performed in the RAG answer generation flow.
 
-For a high-level guide on retrieval, see this [tutorial on RAG](/docs/tutorials/rag/).
+We've found that graphs are a great way to reliably express logical flows and have implemented ideas from several of these papers [using LangGraph](https://github.com/langchain-ai/langgraph/tree/main/examples/rag), as shown in the figure below (red - routing, blue - fallback, green - self-correction):
+- **Routing:**  Adaptive RAG ([paper](https://arxiv.org/abs/2403.14403)). Route questions to different retrieval approaches, as discussed above 
+- **Fallback:** Corrective RAG ([paper](https://arxiv.org/pdf/2401.15884.pdf)). Fallback to web search if docs are not relevant to query
+- **Self-correction:** Self-RAG ([paper](https://arxiv.org/abs/2310.11511)). Fix answers w/ hallucinations or don’t address question
+
+![](/img/langgraph_rag.png)
+
+| Name              | When to use                                               | Description |
+|-------------------|-----------------------------------------------------------|-------------|
+| Self-RAG          | When needing to fix answers with hallucinations or irrelevant content. | Self-RAG performs checks for document relevance, hallucinations, and answer quality during the RAG answer generation flow, iteratively building an answer and self-correcting errors. |
+| Corrective-RAG    | When needing a fallback mechanism for low relevance docs. | Corrective-RAG includes a fallback (e.g., to web search) if the retrieved documents are not relevant to the query, ensuring higher quality and more relevant retrieval. |
+
+:::tip
+
+See several videos and cookbooks showcasing RAG with LangGraph: 
+- [LangGraph Corrective RAG](https://www.youtube.com/watch?v=E2shqsYwxck)
+- [LangGraph combining Adaptive, Self-RAG, and Corrective RAG](https://www.youtube.com/watch?v=-ROS6gfYIts) 
+- [Cookbooks for RAG using LangGraph ](https://github.com/langchain-ai/langgraph/tree/main/examples/rag)
+
+See our LangGraph RAG recipes with partners:
+- [Meta](https://github.com/meta-llama/llama-recipes/tree/main/recipes/use_cases/agents/langchain) 
+- [Mistral](https://github.com/mistralai/cookbook/tree/main/third_party/langchain)
+
+:::
 
 ### Text splitting
 
diff --git a/docs/docs/how_to/chat_models_universal_init.ipynb b/docs/docs/how_to/chat_models_universal_init.ipynb
index ffa0714790961..c77083cdfb119 100644
--- a/docs/docs/how_to/chat_models_universal_init.ipynb
+++ b/docs/docs/how_to/chat_models_universal_init.ipynb
@@ -5,7 +5,7 @@
    "id": "cfdf4f09-8125-4ed1-8063-6feed57da8a3",
    "metadata": {},
    "source": [
-    "# How to let your end users choose their model\n",
+    "# How to init any model in one line\n",
     "\n",
     "Many LLM applications let end users specify what model provider and model they want the application to be powered by. This requires writing some logic to initialize different ChatModels based on some user configuration. The `init_chat_model()` helper method makes it easy to initialize a number of different model integrations without having to worry about import paths and class names.\n",
     "\n",
diff --git a/docs/docs/how_to/index.mdx b/docs/docs/how_to/index.mdx
index db4989a09c91e..63d4ab9707bbf 100644
--- a/docs/docs/how_to/index.mdx
+++ b/docs/docs/how_to/index.mdx
@@ -79,7 +79,7 @@ These are the core building blocks you can use when building applications.
 - [How to: stream a response back](/docs/how_to/chat_streaming)
 - [How to: track token usage](/docs/how_to/chat_token_usage_tracking)
 - [How to: track response metadata across providers](/docs/how_to/response_metadata)
-- [How to: let your end users choose their model](/docs/how_to/chat_models_universal_init/)
+- [How to: init any model in one line](/docs/how_to/chat_models_universal_init/)
 
 ### LLMs
 
diff --git a/docs/docs/integrations/chat/deepinfra.ipynb b/docs/docs/integrations/chat/deepinfra.ipynb
index f3f3704ccf4b8..e8d6d3465a912 100644
--- a/docs/docs/integrations/chat/deepinfra.ipynb
+++ b/docs/docs/integrations/chat/deepinfra.ipynb
@@ -98,6 +98,78 @@
     ")\n",
     "chat.invoke(messages)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "466c3cb41ace1410",
+   "metadata": {},
+   "source": [
+    "# Tool Calling\n",
+    "\n",
+    "DeepInfra currently supports only invoke and async invoke tool calling.\n",
+    "\n",
+    "For a complete list of models that support tool calling, please refer to our [tool calling documentation](https://deepinfra.com/docs/advanced/function_calling)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ddc4f4299763651c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "\n",
+    "from dotenv import find_dotenv, load_dotenv\n",
+    "from langchain_community.chat_models import ChatDeepInfra\n",
+    "from langchain_core.messages import HumanMessage\n",
+    "from langchain_core.pydantic_v1 import BaseModel\n",
+    "from langchain_core.tools import tool\n",
+    "\n",
+    "model_name = \"meta-llama/Meta-Llama-3-70B-Instruct\"\n",
+    "\n",
+    "_ = load_dotenv(find_dotenv())\n",
+    "\n",
+    "\n",
+    "# Langchain tool\n",
+    "@tool\n",
+    "def foo(something):\n",
+    "    \"\"\"\n",
+    "    Called when foo\n",
+    "    \"\"\"\n",
+    "    pass\n",
+    "\n",
+    "\n",
+    "# Pydantic class\n",
+    "class Bar(BaseModel):\n",
+    "    \"\"\"\n",
+    "    Called when Bar\n",
+    "    \"\"\"\n",
+    "\n",
+    "    pass\n",
+    "\n",
+    "\n",
+    "llm = ChatDeepInfra(model=model_name)\n",
+    "tools = [foo, Bar]\n",
+    "llm_with_tools = llm.bind_tools(tools)\n",
+    "messages = [\n",
+    "    HumanMessage(\"Foo and bar, please.\"),\n",
+    "]\n",
+    "\n",
+    "response = llm_with_tools.invoke(messages)\n",
+    "print(response.tool_calls)\n",
+    "# [{'name': 'foo', 'args': {'something': None}, 'id': 'call_Mi4N4wAtW89OlbizFE1aDxDj'}, {'name': 'Bar', 'args': {}, 'id': 'call_daiE0mW454j2O1KVbmET4s2r'}]\n",
+    "\n",
+    "\n",
+    "async def call_ainvoke():\n",
+    "    result = await llm_with_tools.ainvoke(messages)\n",
+    "    print(result.tool_calls)\n",
+    "\n",
+    "\n",
+    "# Async call\n",
+    "asyncio.run(call_ainvoke())\n",
+    "# [{'name': 'foo', 'args': {'something': None}, 'id': 'call_ZH7FetmgSot4LHcMU6CEb8tI'}, {'name': 'Bar', 'args': {}, 'id': 'call_2MQhDifAJVoijZEvH8PeFSVB'}]"
+   ]
   }
  ],
  "metadata": {
diff --git a/docs/static/img/colbert.png b/docs/static/img/colbert.png
new file mode 100644
index 0000000000000..17f902138eb6b
Binary files /dev/null and b/docs/static/img/colbert.png differ
diff --git a/docs/static/img/langgraph_rag.png b/docs/static/img/langgraph_rag.png
new file mode 100644
index 0000000000000..0dfcbb743a71f
Binary files /dev/null and b/docs/static/img/langgraph_rag.png differ
diff --git a/docs/static/img/rag_landscape.png b/docs/static/img/rag_landscape.png
new file mode 100644
index 0000000000000..417d2e6c7eb09
Binary files /dev/null and b/docs/static/img/rag_landscape.png differ
diff --git a/libs/community/langchain_community/chat_models/deepinfra.py b/libs/community/langchain_community/chat_models/deepinfra.py
index 51df3b634b0b9..32be0867a0a47 100644
--- a/libs/community/langchain_community/chat_models/deepinfra.py
+++ b/libs/community/langchain_community/chat_models/deepinfra.py
@@ -13,6 +13,7 @@
     List,
     Mapping,
     Optional,
+    Sequence,
     Tuple,
     Type,
     Union,
@@ -24,6 +25,7 @@
     AsyncCallbackManagerForLLMRun,
     CallbackManagerForLLMRun,
 )
+from langchain_core.language_models import LanguageModelInput
 from langchain_core.language_models.chat_models import (
     BaseChatModel,
     agenerate_from_stream,
@@ -44,15 +46,18 @@
     SystemMessage,
     SystemMessageChunk,
 )
+from langchain_core.messages.tool import ToolCall
 from langchain_core.outputs import (
     ChatGeneration,
     ChatGenerationChunk,
     ChatResult,
 )
-from langchain_core.pydantic_v1 import Field, root_validator
+from langchain_core.pydantic_v1 import BaseModel, Field, root_validator
+from langchain_core.runnables import Runnable
+from langchain_core.tools import BaseTool
 from langchain_core.utils import get_from_dict_or_env
+from langchain_core.utils.function_calling import convert_to_openai_tool
 
-# from langchain.llms.base import create_base_retry_decorator
 from langchain_community.utilities.requests import Requests
 
 logger = logging.getLogger(__name__)
@@ -78,19 +83,51 @@ def _create_retry_decorator(
     )
 
 
+def _parse_tool_calling(tool_call: dict) -> ToolCall:
+    """
+    Convert a tool calling response from server to a ToolCall object.
+    Args:
+        tool_call:
+
+    Returns:
+
+    """
+    name = tool_call.get("name", "")
+    args = json.loads(tool_call["function"]["arguments"])
+    id = tool_call.get("id")
+    return ToolCall(name=name, args=args, id=id)
+
+
+def _convert_to_tool_calling(tool_call: ToolCall) -> Dict[str, Any]:
+    """
+    Convert a ToolCall object to a tool calling request for server.
+    Args:
+        tool_call:
+
+    Returns:
+
+    """
+    return {
+        "type": "function",
+        "function": {
+            "arguments": json.dumps(tool_call["args"]),
+            "name": tool_call["name"],
+        },
+        "id": tool_call.get("id"),
+    }
+
+
 def _convert_dict_to_message(_dict: Mapping[str, Any]) -> BaseMessage:
     role = _dict["role"]
     if role == "user":
         return HumanMessage(content=_dict["content"])
     elif role == "assistant":
-        # Fix for azure
-        # Also OpenAI returns None for tool invocations
         content = _dict.get("content", "") or ""
-        if _dict.get("function_call"):
-            additional_kwargs = {"function_call": dict(_dict["function_call"])}
-        else:
-            additional_kwargs = {}
-        return AIMessage(content=content, additional_kwargs=additional_kwargs)
+        tool_calls_content = _dict.get("tool_calls", []) or []
+        tool_calls = [
+            _parse_tool_calling(tool_call) for tool_call in tool_calls_content
+        ]
+        return AIMessage(content=content, tool_calls=tool_calls)
     elif role == "system":
         return SystemMessage(content=_dict["content"])
     elif role == "function":
@@ -104,15 +141,14 @@ def _convert_delta_to_message_chunk(
 ) -> BaseMessageChunk:
     role = _dict.get("role")
     content = _dict.get("content") or ""
-    if _dict.get("function_call"):
-        additional_kwargs = {"function_call": dict(_dict["function_call"])}
-    else:
-        additional_kwargs = {}
 
     if role == "user" or default_class == HumanMessageChunk:
         return HumanMessageChunk(content=content)
     elif role == "assistant" or default_class == AIMessageChunk:
-        return AIMessageChunk(content=content, additional_kwargs=additional_kwargs)
+        tool_calls = [
+            _parse_tool_calling(tool_call) for tool_call in _dict.get("tool_calls", [])
+        ]
+        return AIMessageChunk(content=content, tool_calls=tool_calls)
     elif role == "system" or default_class == SystemMessageChunk:
         return SystemMessageChunk(content=content)
     elif role == "function" or default_class == FunctionMessageChunk:
@@ -129,9 +165,14 @@ def _convert_message_to_dict(message: BaseMessage) -> dict:
     elif isinstance(message, HumanMessage):
         message_dict = {"role": "user", "content": message.content}
     elif isinstance(message, AIMessage):
-        message_dict = {"role": "assistant", "content": message.content}
-        if "function_call" in message.additional_kwargs:
-            message_dict["function_call"] = message.additional_kwargs["function_call"]
+        tool_calls = [
+            _convert_to_tool_calling(tool_call) for tool_call in message.tool_calls
+        ]
+        message_dict = {
+            "role": "assistant",
+            "content": message.content,
+            "tool_calls": tool_calls,  # type: ignore[dict-item]
+        }
     elif isinstance(message, SystemMessage):
         message_dict = {"role": "system", "content": message.content}
     elif isinstance(message, FunctionMessage):
@@ -417,6 +458,27 @@ def _headers(self) -> Dict:
     def _body(self, kwargs: Any) -> Dict:
         return kwargs
 
+    def bind_tools(
+        self,
+        tools: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
+        **kwargs: Any,
+    ) -> Runnable[LanguageModelInput, BaseMessage]:
+        """Bind tool-like objects to this chat model.
+
+        Assumes model is compatible with OpenAI tool-calling API.
+
+        Args:
+            tools: A list of tool definitions to bind to this chat model.
+                Can be  a dictionary, pydantic model, callable, or BaseTool. Pydantic
+                models, callables, and BaseTools will be automatically converted to
+                their schema dictionary representation.
+            **kwargs: Any additional parameters to pass to the
+                :class:`~langchain.runnable.Runnable` constructor.
+        """
+
+        formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
+        return super().bind(tools=formatted_tools, **kwargs)
+
 
 def _parse_stream(rbody: Iterator[bytes]) -> Iterator[str]:
     for line in rbody:
diff --git a/libs/community/langchain_community/embeddings/baichuan.py b/libs/community/langchain_community/embeddings/baichuan.py
index d0f54fff0d36b..21175fb901521 100644
--- a/libs/community/langchain_community/embeddings/baichuan.py
+++ b/libs/community/langchain_community/embeddings/baichuan.py
@@ -2,7 +2,7 @@
 
 import requests
 from langchain_core.embeddings import Embeddings
-from langchain_core.pydantic_v1 import BaseModel, SecretStr, root_validator
+from langchain_core.pydantic_v1 import BaseModel, Field, SecretStr, root_validator
 from langchain_core.utils import convert_to_secret_str, get_from_dict_or_env
 from requests import RequestException
 
@@ -37,9 +37,16 @@ class BaichuanTextEmbeddings(BaseModel, Embeddings):
     """
 
     session: Any  #: :meta private:
-    model_name: str = "Baichuan-Text-Embedding"
-    baichuan_api_key: Optional[SecretStr] = None
+    model_name: str = Field(default="Baichuan-Text-Embedding", alias="model")
+    baichuan_api_key: Optional[SecretStr] = Field(default=None, alias="api_key")
     """Automatically inferred from env var `BAICHUAN_API_KEY` if not provided."""
+    chunk_size: int = 16
+    """Chunk size when multiple texts are input"""
+
+    class Config:
+        """Configuration for this pydantic object."""
+
+        allow_population_by_field_name = True
 
     @root_validator(allow_reuse=True)
     def validate_environment(cls, values: Dict) -> Dict:
@@ -78,26 +85,35 @@ def _embed(self, texts: List[str]) -> Optional[List[List[float]]]:
             A list of list of floats representing the embeddings, or None if an
             error occurs.
         """
-        response = self.session.post(
-            BAICHUAN_API_URL, json={"input": texts, "model": self.model_name}
-        )
-        # Raise exception if response status code from 400 to 600
-        response.raise_for_status()
-        # Check if the response status code indicates success
-        if response.status_code == 200:
-            resp = response.json()
-            embeddings = resp.get("data", [])
-            # Sort resulting embeddings by index
-            sorted_embeddings = sorted(embeddings, key=lambda e: e.get("index", 0))
-            # Return just the embeddings
-            return [result.get("embedding", []) for result in sorted_embeddings]
-        else:
-            # Log error or handle unsuccessful response appropriately
-            # Handle 100 <= status_code < 400, not include 200
-            raise RequestException(
-                f"Error: Received status code {response.status_code} from "
-                "`BaichuanEmbedding` API"
+        chunk_texts = [
+            texts[i : i + self.chunk_size]
+            for i in range(0, len(texts), self.chunk_size)
+        ]
+        embed_results = []
+        for chunk in chunk_texts:
+            response = self.session.post(
+                BAICHUAN_API_URL, json={"input": chunk, "model": self.model_name}
             )
+            # Raise exception if response status code from 400 to 600
+            response.raise_for_status()
+            # Check if the response status code indicates success
+            if response.status_code == 200:
+                resp = response.json()
+                embeddings = resp.get("data", [])
+                # Sort resulting embeddings by index
+                sorted_embeddings = sorted(embeddings, key=lambda e: e.get("index", 0))
+                # Return just the embeddings
+                embed_results.extend(
+                    [result.get("embedding", []) for result in sorted_embeddings]
+                )
+            else:
+                # Log error or handle unsuccessful response appropriately
+                # Handle 100 <= status_code < 400, not include 200
+                raise RequestException(
+                    f"Error: Received status code {response.status_code} from "
+                    "`BaichuanEmbedding` API"
+                )
+        return embed_results
 
     def embed_documents(self, texts: List[str]) -> Optional[List[List[float]]]:  # type: ignore[override]
         """Public method to get embeddings for a list of documents.
diff --git a/libs/community/tests/integration_tests/chat_models/test_deepinfra.py b/libs/community/tests/integration_tests/chat_models/test_deepinfra.py
index 0fa4593ace8ab..572cec0522418 100644
--- a/libs/community/tests/integration_tests/chat_models/test_deepinfra.py
+++ b/libs/community/tests/integration_tests/chat_models/test_deepinfra.py
@@ -1,11 +1,23 @@
 """Test ChatDeepInfra wrapper."""
+from typing import List
+
 from langchain_core.messages import BaseMessage, HumanMessage
+from langchain_core.messages.ai import AIMessage
+from langchain_core.messages.tool import ToolMessage
 from langchain_core.outputs import ChatGeneration, LLMResult
+from langchain_core.pydantic_v1 import BaseModel
+from langchain_core.runnables.base import RunnableBinding
 
 from langchain_community.chat_models.deepinfra import ChatDeepInfra
 from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
 
 
+class GenerateMovieName(BaseModel):
+    "Get a movie name from a description"
+
+    description: str
+
+
 def test_chat_deepinfra() -> None:
     """Test valid call to DeepInfra."""
     chat = ChatDeepInfra(
@@ -63,3 +75,51 @@ async def test_async_chat_deepinfra_streaming() -> None:
     assert isinstance(generation, ChatGeneration)
     assert isinstance(generation.text, str)
     assert generation.text == generation.message.content
+
+
+def test_chat_deepinfra_bind_tools() -> None:
+    class Foo(BaseModel):
+        pass
+
+    chat = ChatDeepInfra(
+        max_tokens=10,
+    )
+    tools = [Foo]
+    chat_with_tools = chat.bind_tools(tools)
+    assert isinstance(chat_with_tools, RunnableBinding)
+    chat_tools = chat_with_tools.tools
+    assert chat_tools
+    assert chat_tools == {
+        "tools": [
+            {
+                "function": {
+                    "description": "",
+                    "name": "Foo",
+                    "parameters": {"properties": {}, "type": "object"},
+                },
+                "type": "function",
+            }
+        ]
+    }
+
+
+def test_tool_use() -> None:
+    llm = ChatDeepInfra(model="meta-llama/Meta-Llama-3-70B-Instruct", temperature=0)
+    llm_with_tool = llm.bind_tools(tools=[GenerateMovieName], tool_choice=True)
+    msgs: List = [
+        HumanMessage(content="It should be a movie explaining humanity in 2133.")
+    ]
+    ai_msg = llm_with_tool.invoke(msgs)
+
+    assert isinstance(ai_msg, AIMessage)
+    assert isinstance(ai_msg.tool_calls, list)
+    assert len(ai_msg.tool_calls) == 1
+    tool_call = ai_msg.tool_calls[0]
+    assert "args" in tool_call
+
+    tool_msg = ToolMessage(
+        content="Year 2133",
+        tool_call_id=ai_msg.additional_kwargs["tool_calls"][0]["id"],
+    )
+    msgs.extend([ai_msg, tool_msg])
+    llm_with_tool.invoke(msgs)
diff --git a/libs/community/tests/integration_tests/embeddings/test_baichuan.py b/libs/community/tests/integration_tests/embeddings/test_baichuan.py
index b8f8e68bff304..fd5921642f3bb 100644
--- a/libs/community/tests/integration_tests/embeddings/test_baichuan.py
+++ b/libs/community/tests/integration_tests/embeddings/test_baichuan.py
@@ -17,3 +17,13 @@ def test_baichuan_embedding_query() -> None:
     embedding = BaichuanTextEmbeddings()  # type: ignore[call-arg]
     output = embedding.embed_query(document)
     assert len(output) == 1024  # type: ignore[arg-type]
+
+
+def test_baichuan_embeddings_multi_documents() -> None:
+    """Test Baichuan Text Embedding for documents with multi texts."""
+    document = "午餐吃了螺蛳粉"
+    doc_amount = 35
+    embeddings = BaichuanTextEmbeddings()  # type: ignore[call-arg]
+    output = embeddings.embed_documents([document] * doc_amount)
+    assert len(output) == doc_amount  # type: ignore[arg-type]
+    assert len(output[0]) == 1024  # type: ignore[index]
diff --git a/libs/community/tests/unit_tests/embeddings/test_baichuan.py b/libs/community/tests/unit_tests/embeddings/test_baichuan.py
new file mode 100644
index 0000000000000..10513948f9427
--- /dev/null
+++ b/libs/community/tests/unit_tests/embeddings/test_baichuan.py
@@ -0,0 +1,18 @@
+from typing import cast
+
+from langchain_core.pydantic_v1 import SecretStr
+
+from langchain_community.embeddings import BaichuanTextEmbeddings
+
+
+def test_sparkllm_initialization_by_alias() -> None:
+    # Effective initialization
+    embeddings = BaichuanTextEmbeddings(  # type: ignore[call-arg]
+        model="embedding_model",  # type: ignore[arg-type]
+        api_key="your-api-key",  # type: ignore[arg-type]
+    )
+    assert embeddings.model_name == "embedding_model"
+    assert (
+        cast(SecretStr, embeddings.baichuan_api_key).get_secret_value()
+        == "your-api-key"
+    )