diff --git a/README.md b/README.md
index 749db8a690..47dbd5ac3f 100644
--- a/README.md
+++ b/README.md
@@ -26,15 +26,16 @@ potential of cutting-edge AI models.
 
 ## 🔥 Hot Topics
 ### Framework Enhancements
+- Incorporate vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
 - Embedding model support: [#418](https://github.com/xorbitsai/inference/pull/418)
-- Custom model support: [#325](https://github.com/xorbitsai/inference/pull/325)
 - LoRA support: [#271](https://github.com/xorbitsai/inference/issues/271)
 - Multi-GPU support for PyTorch models: [#226](https://github.com/xorbitsai/inference/issues/226)
 - Xinference dashboard: [#93](https://github.com/xorbitsai/inference/issues/93)
 ### New Models
 - Built-in support for [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
-### Tools
-- LlamaIndex plugin: [#7151](https://github.com/jerryjliu/llama_index/pull/7151)
+### Integrations
+- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
+- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.
 
 
 ## Key Features
@@ -57,8 +58,7 @@ for seamless management and monitoring.
 allowing the seamless distribution of model inference across multiple devices or machines.
 
 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
-with popular third-party libraries like [LangChain](https://python.langchain.com/docs/integrations/providers/xinference) 
-and [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window).
+with popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/).
 
 ## Getting Started
 Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
diff --git a/README_zh_CN.md b/README_zh_CN.md
index 274e88028a..20f09c8007 100644
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@@ -23,14 +23,17 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 
 ## 🔥 近期热点
 ### 框架增强
-- 自定义模型: [#325](https://github.com/xorbitsai/inference/pull/325)
+- 引入 vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
+- Embedding 模型支持: [#418](https://github.com/xorbitsai/inference/pull/418)
 - LoRA 支持: [#271](https://github.com/xorbitsai/inference/issues/271)
 - PyTorch 模型多 GPU 支持: [#226](https://github.com/xorbitsai/inference/issues/226)
 - Xinference 仪表盘: [#93](https://github.com/xorbitsai/inference/issues/93)
 ### 新模型
 - 内置 [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
-### 工具
-- LlamaIndex 插件: [#7151](https://github.com/jerryjliu/llama_index/pull/7151)
+### 集成
+- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
+- [Chatbox](https://chatboxai.app/): 一个支持前沿大语言模型的桌面客户端，支持 Windows，Mac，以及 Linux。
+
 
 
 
@@ -39,16 +42,13 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 
 ⚡️ **前沿模型，应有尽有**：框架内置众多中英文的前沿大语言模型，包括 baichuan，chatglm2 等，一键即可体验！内置模型列表还在快速更新中！
 
-
 🖥 **异构硬件，快如闪电**：通过 [ggml](https://github.com/ggerganov/ggml)，同时使用你的 GPU 与 CPU 进行推理，降低延迟，提高吞吐！
 
 ⚙️ **接口调用，灵活多样**：提供多种使用模型的接口，包括 RPC，RESTful API，命令行，web UI 等等。方便模型的管理与监控。
 
 🌐 **集群计算，分布协同**: 支持分布式部署，通过内置的资源调度器，让不同大小的模型按需调度到不同机器，充分使用集群资源。
 
-🔌 **开放生态，无缝对接**: 与流行的三方库无缝对接，包括 [LangChain](https://python.langchain.com/docs/integrations/providers/xinference) 
-and [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window)。
-让开发者能够快速构建基于 AI 的应用。
+🔌 **开放生态，无缝对接**: 与流行的三方库无缝对接，包括 [LangChain](https://python.langchain.com/docs/integrations/providers/xinference)，[LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window)，[Dify](https://docs.dify.ai/advanced/model-configuration/xinference)，以及 [Chatbox](https://chatboxai.app/)。
 
 ## 快速入门
 Xinference 可以通过 pip 从 PyPI 安装。我们非常推荐在安装前创建一个新的虚拟环境以避免依赖冲突。
diff --git a/doc/source/index.rst b/doc/source/index.rst
index f18cdf5eec..56177bae04 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -33,7 +33,9 @@ allowing the seamless distribution of model inference across multiple devices or
 
 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
 with popular third-party libraries like `LangChain <https://python.langchain.com/docs/integrations/providers/xinference>`_
-and `LlamaIndex <https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window>`_.
+, `LlamaIndex <https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window>`_
+, `Dify <https://docs.dify.ai/advanced/model-configuration/xinference>`_
+, and `Chatbox <https://chatboxai.app/>`_.
 
 
 🔥 Hot Topics
@@ -41,20 +43,21 @@ and `LlamaIndex <https://gpt-index.readthedocs.io/en/stable/examples/llm/Xinfere
 
 Framework Enhancements
 ~~~~~~~~~~~~~~~~~~~~~~
-- Custom model support: `#325 <https://github.com/xorbitsai/inference/pull/325>`_
+- Incorporate vLLM: `#445 <https://github.com/xorbitsai/inference/pull/445>`_
+- Embedding model support: `#418 <https://github.com/xorbitsai/inference/pull/418>`_
 - LoRA support: `#271 <https://github.com/xorbitsai/inference/issues/271>`_
 - Multi-GPU support for PyTorch models: `#226 <https://github.com/xorbitsai/inference/issues/226>`_
 - Xinference dashboard: `#93 <https://github.com/xorbitsai/inference/issues/93>`_
 
 New Models
 ~~~~~~~~~~
-- Built-in support for `Starcoder` in GGML: `#289 <https://github.com/xorbitsai/inference/pull/289>`_
-- Built-in support for `MusicGen <https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md>`_: `#313 <https://github.com/xorbitsai/inference/issues/313>`_
-- Built-in support for `SD-XL <https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0>`_: `318 <https://github.com/xorbitsai/inference/issues/318>`_
+- Built-in support for `CodeLLama <https://github.com/facebookresearch/codellama>`_: `#414 <https://github.com/xorbitsai/inference/pull/414>`_ `#402 <https://github.com/xorbitsai/inference/pull/402>`_
 
-Tools
-~~~~~
-- LlamaIndex plugin: `7151 <https://github.com/jerryjliu/llama_index/pull/7151>`_
+
+Integrations
+~~~~~~~~~~~~
+- `Dify <https://docs.dify.ai/advanced/model-configuration/xinference>`_: an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
+- `Chatbox <https://chatboxai.app/>`_: a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.
 
 
 License
diff --git a/doc/source/models/builtin/llama-2-chat.rst b/doc/source/models/builtin/llama-2-chat.rst
index 2eca8d85ef..ded31d4a41 100644
--- a/doc/source/models/builtin/llama-2-chat.rst
+++ b/doc/source/models/builtin/llama-2-chat.rst
@@ -66,7 +66,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 
 Model Spec 5 (pytorch, 13 Billion)
@@ -84,7 +84,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 Model Spec 6 (pytorch, 70 Billion)
 ++++++++++++++++++++++++++++++++++
@@ -101,4 +101,4 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
\ No newline at end of file
+   4-bit quantization is not supported on macOS.
\ No newline at end of file
diff --git a/doc/source/models/builtin/llama-2.rst b/doc/source/models/builtin/llama-2.rst
index 19614bbba4..a42090890a 100644
--- a/doc/source/models/builtin/llama-2.rst
+++ b/doc/source/models/builtin/llama-2.rst
@@ -65,7 +65,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 Model Spec 5 (pytorch, 13 Billion)
 ++++++++++++++++++++++++++++++++++
@@ -82,7 +82,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 Model Spec 6 (pytorch, 70 Billion)
 ++++++++++++++++++++++++++++++++++
@@ -99,4 +99,4 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.