Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Integration of Gemini 2.0 (audio output) #6067

Open
1 task done
goreply-de opened this issue Feb 26, 2025 · 0 comments
Open
1 task done

[Enhancement]: Integration of Gemini 2.0 (audio output) #6067

goreply-de opened this issue Feb 26, 2025 · 0 comments
Labels
✨ enhancement New feature or request

Comments

@goreply-de
Copy link

What features would you like to see added?

Adding support to generate audio output would be great.

More details

With gemini-2.0-flash-exp Gemini can now support the generation of audio output. The generated audio response can be downloaded afterwards as well.

To access this feature on GCP you need to navigate to ‘Vertex AI’ → ‘Vertex AI Studio’ → ‘Freeform’ → Select ‘gemini-2.0-flash-exp’ as model and ‘Audio’ as response output type. The attached recording shows the process (sadly without the audio). The text was read out when clicking on the play button. Running the same prompt multiple times results in different audio files (different voices, speed, pronunciation, etc.)

The UI needs to be adjusted for models that allow this kind of output. Probably some endpoints need to be adjusted as well to deal with the new kind of data.

Note: Currently the feature is still experimental.

Which components are impacted by your request?

UI, General, Endpoints

Pictures

Image

Image

Code of Conduct

  • I agree to follow this project's Code of Conduct
@goreply-de goreply-de added the ✨ enhancement New feature or request label Feb 26, 2025
Repository owner locked and limited conversation to collaborators Mar 3, 2025
@danny-avila danny-avila converted this issue into a discussion Mar 3, 2025
@danny-avila danny-avila reopened this Mar 3, 2025
Repository owner unlocked this conversation Mar 3, 2025
Repository owner locked and limited conversation to collaborators Mar 3, 2025
@danny-avila danny-avila converted this issue into a discussion Mar 3, 2025
@danny-avila danny-avila reopened this Mar 3, 2025
Repository owner unlocked this conversation Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants