GitHub - InftyAI/Manta: 💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯

A lightweight P2P-based cache system for model distributions on Kubernetes.

Name Story: the inspiration of the name Manta is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.

We're reframing the Manta to make it a general distributed cache system with POSIX promise, the current capacities are still available with the latest v0.0.4 release. Let's see what will happen.

Architecture

Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.

Features Overview

Model Hub Support: Models could be downloaded directly from model hubs (Huggingface etc.) or object storages, no other effort.
Model Preheat: Models could be preloaded to clusters, or specified nodes to accelerate the model serving.
Model Cache: Models will be cached as chunks after downloading for faster model loading.
Model Lifecycle Management: Model lifecycle is managed automatically with different strategies, like Retain or Delete.
Plugin Framework: Filter and Score plugins could be extended to pick up the best candidates.
Memory Management(WIP): Manage the reserved memories for caching, together with LRU algorithm for GC.

You Should Know Before

Manta is not an all-in-one solution for model management, instead, it offers a lightweight solution to utilize the idle bandwidth and cost-effective disk, helping you save money.
It requires no additional components like databases or storage systems, simplifying setup and reducing effort.
All the models will be stored under the host path of /mnt/models/
After all, it's just a cache system.

Quick Start

Installation

Read the Installation for guidance.

Preheat Model

A sample to preload the Qwen/Qwen2.5-0.5B-Instruct model. Once preheated, no longer to fetch the models from cold start, but from the cache instead.

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct

If you want to preload the model to specified nodes, use the NodeSelector:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  nodeSelector:
    foo: bar

Use Model

Once you have a Torrent, you can access the model simply from host path of `/mnt/models/. What you need to do is just set the Pod label like:

metadata:
  labels:
    manta.io/torrent-name: "torrent-sample"

Note: you can make the Torrent Standby by setting the preheat to false (true by default), then preheating will process in runtime, which obviously wll slow down the model loading.

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  preheat: false

Delete Model

If you want to remove the model weights once Torrent is deleted, set the ReclaimPolicy=Delete, default to Retain:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  reclaimPolicy: Delete

More details refer to the APIs.

Roadmap

In the long term, we hope to make Manta an unified cache system within MLOps.

Preloading datasets from model hubs
RDMA support for faster model loading
More integrations with MLOps system, including training and serving

Community

Join us for more discussions:

Slack Channel: #manta

Contributions

All kinds of contributions are welcomed ! Please following CONTRIBUTING.md.

Name	Name	Last commit message	Last commit date
Latest commit InftyAI-Agent Merge pull request #41 from kerthcet/cleanup/update-readme Dec 6, 2024 90d6e71 · Dec 6, 2024 History 63 Commits
.github/workflows	.github/workflows	Update Torrent API and add tests	Sep 23, 2024
agent	agent	Add preheat implementation	Dec 4, 2024
api	api	Add preheat implementation	Dec 4, 2024
cmd	cmd	Add preheat implementation	Dec 4, 2024
config	config	Add preheat implementation	Dec 4, 2024
docs	docs	Release v0.0.4	Dec 5, 2024
hack	hack	Add preheat implementation	Dec 4, 2024
pkg	pkg	Add preheat implementation	Dec 4, 2024
preheat	preheat	Add preheat implementation	Dec 4, 2024
test	test	Add preheat implementation	Dec 4, 2024
.gitignore	.gitignore	[1/N] Dispatcher implementattion	Oct 22, 2024
.golangci.yaml	.golangci.yaml	[1/N] Implement agent: download handler	Oct 13, 2024
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	init commit	Sep 18, 2024
CONTRIBUTING.md	CONTRIBUTING.md	init commit	Sep 18, 2024
Dockerfile	Dockerfile	[1/N] Implement controller	Sep 29, 2024
Dockerfile.agent	Dockerfile.agent	Add sync logic	Nov 9, 2024
Dockerfile.preheat	Dockerfile.preheat	Add preheat implementation	Dec 4, 2024
LICENSE	LICENSE	init commit	Sep 18, 2024
Makefile	Makefile	Add preheat implementation	Dec 4, 2024
OWNERS	OWNERS	init commit	Sep 18, 2024
PROJECT	PROJECT	Add APIs	Sep 21, 2024
README.md	README.md	Update README.md to state what we are doing	Dec 6, 2024
go.mod	go.mod	Add roadmap and update readme	Nov 19, 2024
go.sum	go.sum	[1/N] Implement controller	Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A lightweight P2P-based cache system for model distributions on Kubernetes.

Architecture

Features Overview

You Should Know Before

Quick Start

Installation

Preheat Model

Use Model

Delete Model

Roadmap

Community

Contributions

About

Releases 4

Packages

Contributors 2

Languages

License

InftyAI/Manta

Folders and files

Latest commit

History

Repository files navigation

A lightweight P2P-based cache system for model distributions on Kubernetes.

Architecture

Features Overview

You Should Know Before

Quick Start

Installation

Preheat Model

Use Model

Delete Model

Roadmap

Community

Contributions

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages