The AI Engine is a flexible system for structured data collection and workflow processing using LLMs. It processes JSON-based configurations to orchestrate complex data extraction and decision-making workflows.
-
Data Definitions - Define collectable fields with specific types:
string
- Text-based responsesnumeric
- Numerical valuesobject
- Complex nested structureslist
- Array processing
-
Workflow Definitions - Organize execution flow:
- Prompt-based workflows (independent)
- Explanation-based workflows (dependent)
The implementation provides several sophisticated features:
-
AIEngine Class
- Central orchestrator
- Parallel execution support
- Result caching
- Workflow management
-
Executors
DataExecutor
- Handles data processing operationsWorkflowExecutor
- Manages workflow orchestration
-
Key Features
- Parallel processing
- Thread-safe caching
- Multiple LLM provider support
- Robust error handling
- Extensible architecture
Here's how to use the AI Engine:
from ai_engine import AIEngine
from langchain_openai import ChatOpenAI
# Initialize model
model = ChatOpenAI(
openai_api_key="your-key",
model_name="gpt-4o"
)
# Configure engine
config = {
"data": {
"summary": {
"type": "string",
"prompt": "Summarize the content"
}
},
"workflow": {
"analyze": {
"prompt": "Analyze the content",
"data": ["summary"]
}
}
}
# Initialize engine
engine = AIEngine(config, model)
# Execute
async def run():
result = await engine.execute("Your content here")
print(result)
# Run with asyncio
import asyncio
asyncio.run(run())
- Install direnv:
# macOS
brew install direnv
# Linux
curl -sfL https://direnv.net/install.sh | bash
- Install devbox:
curl -fsSL https://get.jetpack.io/devbox | bash
- Clone the repository:
git clone https://github.com/your-org/ai-engine.git
cd ai-engine
- Create
.envrc
file:
export OPENAI_API_KEY="your-key-here"
- Allow direnv:
direnv allow
- Initialize devbox:
devbox init
- Install dependencies:
devbox install
- Start the development shell:
devbox shell
- Install Poetry:
curl -sSL https://install.python-poetry.org | python3 -
Or through devbox (devbox should already install it)
devbox add poetry
- Install project dependencies through Poetry:
poetry install
The project uses GitHub Actions for automated publishing to PyPI. The workflow is triggered when you push a version tag.
-
Configure GitHub repository:
- Go to repository Settings → Secrets and variables → Actions
- Add a new secret named
PYPI_TOKEN
with your PyPI API token
-
Update the version in pyproject.toml:
poetry version patch # For patch version bump
# or
poetry version minor # For minor version bump
# or
poetry version major # For major version bump
- Commit your changes:
git add pyproject.toml
git commit -m "Bump version to x.y.z"
- Create and push a version tag:
git tag vx.y.z # Replace with your version (e.g., v1.0.0)
git push origin vx.y.z
The GitHub Action will automatically:
- Build the package
- Publish to PyPI
- Create a release on GitHub
Note: To publish to Test PyPI first, you can manually run:
poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry publish -r testpypi
The project uses modern development tools (devbox and direnv) to ensure consistent development environments and secure credential management. The implementation supports parallel processing, caching, and multiple LLM providers while maintaining a clean, extensible architecture.
The AI Engine roadmap includes implementing robust RAG capabilities:
-
Document Processing
- PDF, markdown, and plain text ingestion
- Document chunking and preprocessing
- Metadata extraction and indexing
-
Vector Store Integration
- Support for multiple vector databases (Pinecone, Weaviate, etc.)
- Efficient similarity search
- Hybrid search capabilities
-
Context Enhancement
- Dynamic context window management
- Relevance scoring and filtering
- Context compression techniques
-
Advanced Features
- Multi-document reasoning
- Cross-reference validation
- Source attribution and citation
- Incremental learning capabilities
These enhancements will enable the AI Engine to:
- Process and understand large document collections
- Provide more accurate and contextual responses
- Support domain-specific knowledge bases
- Maintain traceability to source materials