Skip to content
/ Manta Public

πŸ’« A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯

License

Notifications You must be signed in to change notification settings

InftyAI/Manta

Folders and files

NameName
Last commit message
Last commit date
Sep 23, 2024
Dec 4, 2024
Dec 4, 2024
Dec 4, 2024
Dec 4, 2024
Dec 5, 2024
Dec 4, 2024
Dec 4, 2024
Dec 4, 2024
Dec 4, 2024
Oct 22, 2024
Oct 13, 2024
Sep 18, 2024
Sep 18, 2024
Sep 29, 2024
Nov 9, 2024
Dec 4, 2024
Sep 18, 2024
Dec 4, 2024
Sep 18, 2024
Sep 21, 2024
Dec 6, 2024
Nov 19, 2024
Sep 29, 2024

Repository files navigation

manta

A lightweight P2P-based cache system for model distributions on Kubernetes.

stability-alpha GoReport Widget Latest Release

Name Story: the inspiration of the name Manta is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.

We're reframing the Manta to make it a general distributed cache system with POSIX promise, the current capacities are still available with the latest v0.0.4 release. Let's see what will happen.

Architecture

architecture

Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.

Features Overview

  • Model Hub Support: Models could be downloaded directly from model hubs (Huggingface etc.) or object storages, no other effort.
  • Model Preheat: Models could be preloaded to clusters, or specified nodes to accelerate the model serving.
  • Model Cache: Models will be cached as chunks after downloading for faster model loading.
  • Model Lifecycle Management: Model lifecycle is managed automatically with different strategies, like Retain or Delete.
  • Plugin Framework: Filter and Score plugins could be extended to pick up the best candidates.
  • Memory Management(WIP): Manage the reserved memories for caching, together with LRU algorithm for GC.

You Should Know Before

  • Manta is not an all-in-one solution for model management, instead, it offers a lightweight solution to utilize the idle bandwidth and cost-effective disk, helping you save money.
  • It requires no additional components like databases or storage systems, simplifying setup and reducing effort.
  • All the models will be stored under the host path of /mnt/models/
  • After all, it's just a cache system.

Quick Start

Installation

Read the Installation for guidance.

Preheat Model

A sample to preload the Qwen/Qwen2.5-0.5B-Instruct model. Once preheated, no longer to fetch the models from cold start, but from the cache instead.

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct

If you want to preload the model to specified nodes, use the NodeSelector:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  nodeSelector:
    foo: bar

Use Model

Once you have a Torrent, you can access the model simply from host path of `/mnt/models/. What you need to do is just set the Pod label like:

metadata:
  labels:
    manta.io/torrent-name: "torrent-sample"

Note: you can make the Torrent Standby by setting the preheat to false (true by default), then preheating will process in runtime, which obviously wll slow down the model loading.

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  preheat: false

Delete Model

If you want to remove the model weights once Torrent is deleted, set the ReclaimPolicy=Delete, default to Retain:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  reclaimPolicy: Delete

More details refer to the APIs.

Roadmap

In the long term, we hope to make Manta an unified cache system within MLOps.

  • Preloading datasets from model hubs
  • RDMA support for faster model loading
  • More integrations with MLOps system, including training and serving

Community

Join us for more discussions:

Contributions

All kinds of contributions are welcomed ! Please following CONTRIBUTING.md.

About

πŸ’« A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published