AgentCrossTalk

This project uses the Google Gemini to create a simple chatbot application simulating two crosstalk performers (Dougen and Penggen) performing based on user-provided topics with text,image (audio coming soon) input.

The project was completed by Yue Su,Shunyuan Mao,Ting Wang,Yingying Li,Haonan Shi.

Welcome to visit our project page.

Project Details

This project consists of three Python files:

main.py: The main program file responsible for handling user interactions with multimodal input.
crosstalk.py: Contains the logic for the crosstalk performance, including interactions with the Gemini and generating dialogues for Dougen and Penggen.
config.py: Contains API key and other configuration details.
crosstalk_utils.py: The ultis helps Blip Model extract topic from image as well as audio assistance.
dianatalk.py: A sample for vtuber talk. You can try to interact with diana.
tts_speech.py: For standard audio output.
ui_elements.py For UI windows embark design.
Vtuber_speech.py: Implement for vtuber audio. You can change with your preferd vtuber on huggingface through link.

How to Run

Install required libraries:

conda create -n crosstalk python==3.11
pip install -r requirements.txt

Obtain the Google Gemini API key:

You need to register a Google account, enable the Gemini API, and obtain an API key. Refer to the official Google AI documentation for details: Google aistudio api docs

Copy your API key into the api_key.txt file's first line. Do not commit the api_key.txt file to version control!
Run the application:
```
conda activate crosstalk
python main.py
```
This will start a GUI window where you can input a topic and click the "Start Performance" button. The program will simulate two crosstalk performers discussing and performing based on your topic, displaying the conversation in the chat area. You can also add Image input or launch Vtuber voice via the GUI buttons.

Example

Enter "Artificial Intelligence" in the input box and click the "Start Performance" button. You will see two crosstalk performers discussing and performing around the topic of artificial intelligence.

Notes

Ensure that you have Python 3.7 or later installed.
The quality and coherence of the crosstalk performance may vary due to limitations of the Gemini API.
This project is for demonstration and educational purposes only. Please adhere to the Google Cloud Platform's terms of service and usage limitations.

File Structure

├── main.py         # Main program, handles GUI logic
├── crosstalk.py    # Logic for crosstalk performance
├── config.py       # Configuration (API Key and model initialization)
└── api_key.txt     # File containing the Gemini API key (handle securely)
└── ... (Core files are above)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentCrossTalk

Project Details

How to Run

Example

Notes

File Structure

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitignore		.gitignore
README.md		README.md
Vtuber_speech.py		Vtuber_speech.py
config.py		config.py
crosstalk.py		crosstalk.py
crosstalk_utils.py		crosstalk_utils.py
dianatalk.py		dianatalk.py
main.py		main.py
requirements.txt		requirements.txt
tts_speech.py		tts_speech.py
ui_elements.py		ui_elements.py

Selen-Suyue/AgentCrossTalk

Folders and files

Latest commit

History

Repository files navigation

AgentCrossTalk

Project Details

How to Run

Example

Notes

File Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages