Sophora is a local generative AI toolkit for Delphi, powered by the DeepHermes-3 model and the latest llama.cpp optimizations. It enables fast, efficient, and unified reasoning, making it ideal for AI-driven applications that require high-performance local inference without relying on external cloud services. With features like function calling, embedding generation, retrieval-augmented generation (RAG), and deep inference capabilities, Sophora provides developers with a versatile and powerful toolset for integrating AI into their Delphi projects. By supporting optimized execution on modern hardware, including compute capability 5.0+ GPUs via Vulkan for acceleration, it ensures smooth and efficient model operations.
- Local AI Inference: Run DeepHermes-3 (Llama 3-based) entirely on your machine, enabling fully offline AI capabilities.
- Fast Token Streaming: Supports both non-thinking (fast response) and thinking (deep reasoning) modes.
- Function Calling & Embeddings: Execute function calls and perform vector-based search for advanced AI-driven workflows.
- Retrieval-Augmented Generation (RAG): Enhances AI-generated responses using structured database lookups.
- SQL and Vector Databases: Works with SQLite3 and vector stores, making structured and semantic searches more efficient.
- Optimized with llama.cpp: Leverages the latest optimizations for high performance and reduced memory usage.
- Flexible Model Deployment: Supports various model configurations, letting users balance between performance and accuracy.
Get the latest version of Sophora and set up the toolkit:
- Download the latest version from: Sophora Main ZIP or clone the repository:
git clone https://github.com/tinyBigGAMES/Sophora.git
- Extract the contents to your preferred directory.
- Open the project in Delphi, and run the provided examples to explore the toolkit. Be sure to reference the Usage Notes in
UTestbed.pas
for insights about setup and using the toolkit. - Ensure your system meets the minimum requirements for running large language models efficiently. Your device will need enough RAM/VRAM to hold the model plus context. Your GPU must have compute capability 5.0+ and support Vulkan for acceleration.
Sophora requires DeepHermes-3, which can be downloaded from Hugging Face:
- DeepHermes-3-Llama-3-8B-Preview-abliterated-Q4_K_M-GGUF (General, Reasoning, Tools)
- bge-m3-Q8_0-GGUF (Embeddings)
- Place the downloaded model in the desired location (default:
C:/LLM/GGUF
). - Ensure the model file is correctly placed before running the inference engine.
To enable web-augmented search capabilities, obtain an API key from Tavily.
- You receive 1000 free API credits per month.
- Set an environment variable:
TAVILY_API_KEY="your_api_key_here"
- This API can be used for enhanced external queries via tool calls when needed.
Sophora can generate fast responses without deep reasoning.
LMsg := TsoMessages.Create();
LInf := TsoInference.Create();
if not LInf.LoadModel() then Exit;
LMsg.Add(soUser, 'Who is Bill Gates?');
if not LInf.Run(LMsg) then
soConsole.PrintLn(LInf.GetError());
Sophora enables multi-step AI reasoning for complex problem-solving.
LMsg.Add(soSystem, 'You are a deep-thinking AI...');
LMsg.Add(soUser, 'Solve this riddle: I walk on four legs in the morning...');
LInf.Run(LMsg);
Sophora supports vector search using LLM embeddings.
LEmb := TsoEmbeddings.Create();
LEmb.LoadModel();
LResult := LEmb.Generate('Explain data analysis in ML');
Store and retrieve articles from an SQLite database.
LDb := TsoDatabase.Create();
LDb.Open('articles.db');
LDb.ExecuteSQL('INSERT INTO articles VALUES (''AI is transforming industries.'')');
LDb.ExecuteSQL('SELECT * FROM articles');
Sophora supports semantic search over stored documents.
LEmb := TsoEmbeddings.Create();
LEmb.LoadModel();
LVectorDB := TsoVectorDatabase.Create();
LVectorDB.Open(LEmb, 'vectors.db');
LVectorDB.AddDocument('doc1', 'AI and deep learning research.');
LSearchResults := LVectorDB.Search('machine learning', 3);
Sophora provides detailed performance tracking:
- Input Tokens: Number of tokens processed.
- Output Tokens: Tokens generated by the model.
- Speed: Processing speed in tokens per second.
Performance:
Input : 15 tokens
Output: 156 tokens
Speed : 49.68 tokens/sec
π§ Note: This repository is currently in the setup phase, and full documentation is not yet available. However, the code is fully functional and generally stable. Additional examples, guides, and API documentation will be added soon. Stay tunedβthis README, along with other resources, will be continuously updated! π
π Deep Dive Podcast
Discover in-depth discussions and insights about Sophora and its innovative features. πβ¨
Sophora-Local-Generative-AI-Tool.mp4
- π Report issues via the Issue Tracker.
- π¬ Engage in discussions on the Forum and Discord.
- π Learn more at Learn Delphi.
Contributions to β¨ Sophora are highly encouraged! π
- π Report Issues: Submit issues if you encounter bugs or need help.
- π‘ Suggest Features: Share your ideas to make Sophora even better.
- π§ Create Pull Requests: Help expand the capabilities and robustness of the library.
Your contributions make a difference! πβ¨
Sophora is distributed under the π BSD-3-Clause License, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions.
See the π LICENSE file for more details.
If you find this project useful, please consider sponsoring this project. Your support helps sustain development, improve features, and keep the project thriving.
If you're unable to support financially, there are many other ways to contribute:
- β Star the repo β It helps increase visibility and shows appreciation.
- π’ Spread the word β Share the project with others who might find it useful.
- π Report bugs β Help improve the project by identifying and reporting issues.
- π§ Submit fixes β Found a bug? Fix it and contribute!
- π‘ Make suggestions β Share ideas for improvements and new features.
Every contribution, big or small, helps make this project better. Thank you for your support! π
π οΈ Sophora AI Toolkit β A Powerful Local AI Framework for Delphi with Fast Token Streaming, Deep Reasoning, RAG, and Vector Search! ππ€