Real time streaming support instead of text chunk stream #700

saifulislam79 · 2025-01-08T16:53:48Z

Checks

This template is only for feature request.
I have thoroughly reviewed the project documentation but couldn't find any relevant information that meets my needs.
I have searched for existing issues, including closed ones, and found not discussion yet.
I confirm that I am using English to submit this report in order to facilitate communication.

1. Is this request related to a challenge you're experiencing? Tell us your story.

I have worked few days of the project f5-tts and i am grateful to author because they are active and give response with short time. My question: here i have found chunk stream of f5-tts but is it possible real time stream like as xtts v2 stream inference byte label stream or have any possibility add stream inference instead of chunk stream

2. What is your suggested solution?

i have found some inference code like stream but it merge and used cross fade,
` # inference
with torch.inference_mode():
generated, _ = model_obj.sample(
cond=audio,
text=final_text_list,
duration=duration,
steps=nfe_step,
cfg_strength=cfg_strength,
sway_sampling_coef=sway_sampling_coef,
)

        generated = generated.to(torch.float32)
        generated = generated[:, ref_audio_len:, :]
        generated_mel_spec = generated.permute(0, 2, 1)
        if mel_spec_type == "vocos":
            generated_wave = vocoder.decode(generated_mel_spec)
        elif mel_spec_type == "bigvgan":
            generated_wave = vocoder(generated_mel_spec)
        if rms < target_rms:
            generated_wave = generated_wave * rms / target_rms`

https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/utils_infer.py line number 455. is it possible generated wav yield as a stream

3. Additional context or comments

Details already share into above section

4. Can you help us with this feature?

I am interested in contributing to this feature.

The text was updated successfully, but these errors were encountered:

SWivid · 2025-01-10T15:02:13Z

i have found some inference code like stream but it merge and used cross fade,

it's just chunk inference

welcome pr~

saifulislam79 added the enhancement New feature or request label Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real time streaming support instead of text chunk stream #700

Real time streaming support instead of text chunk stream #700

saifulislam79 commented Jan 8, 2025

SWivid commented Jan 10, 2025

Real time streaming support instead of text chunk stream #700

Real time streaming support instead of text chunk stream #700

Comments

saifulislam79 commented Jan 8, 2025

Checks

1. Is this request related to a challenge you're experiencing? Tell us your story.

2. What is your suggested solution?

3. Additional context or comments

4. Can you help us with this feature?

SWivid commented Jan 10, 2025