This repository contains a Python server proof of concept that acts as a proxy for interacting with large language model (LLM) WebSocket APIs. Currently, it supports Gemini and facilitates real-time communication using WebRTC and WebTransport.
- Python installed on your system
- Install requirements with
pip install -r requirements.txt
- A valid
GOOGLE_API_KEY
-
Set the
GOOGLE_API_KEY
environment variable and run the server:GOOGLE_API_KEY=XXX python proxy.py
-
Open the test page in your browser and navigate to:
http://localhost:8080/demo/index.html
-
Click Start and begin talking with the LLM
const stream = await navigator.getUserMedia({ audio: true })
const pc = new RTCPeerConnection();
pc.ontrack = e => audioElement.srcObject = e.streams[0];
pc.addTrack(stream.getTracks()[0]);
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const resp = await fetch('http://localhost:8080/', { method: 'POST', body: offer.sdp })
await pc.setRemoteDescription({ type: 'answer', sdp: await resp.text() })
[X] WebRTC interface
[X] Audio support
[X] Basic datachannels support
[X] Gemini integration
[ ] WebTransport interface (WIP)
[X] Video support
[ ] Implement HTTP real-time control interface