-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow downloading extra formats in the demo #617
Conversation
Changed Files
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #617 +/- ##
==========================================
+ Coverage 76.75% 78.66% +1.91%
==========================================
Files 46 47 +1
Lines 3446 3900 +454
Branches 470 575 +105
==========================================
+ Hits 2645 3068 +423
- Misses 700 723 +23
- Partials 101 109 +8 ☔ View full report in Codecov by Sentry. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, though I have a number of comments and suggestions.
In general, I find it not very intuitive to have to re-synthesize if I change the output format. I guess from how things work, it's probably unavoidable, but it's not ideal UX. Maybe the File Output box could have a hint indicating that Output Format is where you change what's available for download here?
When you keep just the wav output format, it's confusing that the File Output box is present but you can't interact with it. I wanted to download my audio file, and I tried to find it in the File Output box. It took me a little while to locate the download button in the Audio box above.
everyvoice/cli.py
Outdated
@@ -616,6 +616,12 @@ def demo( | |||
"-s", | |||
help="Specify speakers to be included in the demo. Example: everyvoice demo <path_to_text_to_spec_model> <path_to_spec_to_wav_model> --speaker speaker_1 --speaker Sue", | |||
), | |||
outputs: List[str] = typer.Option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to enumerate the valid options in the help message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for --accelerator
, while I'm thinking about it... And for --language
and --speaker
we should state that they have to be language(s) and speaker(s) known to the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, please address listing valid values for --output-format
, fixing the other help messages is gravy and could go into a separate PR or issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for
--accelerator
, while I'm thinking about it... And for--language
and--speaker
we should state that they have to be language(s) and speaker(s) known to the model.
I think they do get listed, don't they? Like if you type a speaker that doesn't exist, I thought the error message listed out all the possible speakers. The output formats are dependent on the version of everyvoice installed, so we could include that in the help message, but the language and speaker are model-dependent, so we wouldn't be able to include the lists of those in the help message, just in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that the everyvoice demo -h
message should say something like "valid values are the language(s) and speaker(s) the model was trained on", or something to that effect, maybe more concisely. As the documentation stands, if you're not familiar with things yet, it's a bit mysterious how you're supposed to know what values you can use there.
And I know if you're just trained things, it's going to be obvious, but the point the of the help message is to support you when the information is not already obvious to you.
everyvoice/demo/app.py
Outdated
else: | ||
print( | ||
f"Attention: This model is not able to produce '{output}' as an output. The '{output}' option will not be available for selection. Please choose from the following possible outputs: {', '.join(possible_outputs)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a fatal error with an immediate exit, and the message is misleading: it's not that the model can't produce the requested output, it's that the software has no implementation for it. Right now, everyvoice demo -O foo fs2.ckpt voc.ckpt
prints this message about foo
and then continues anyway and crashes with an exception a few lines later.
This is really CLI error checking, it should happen much earlier in this function, in particular before we load any checkpoint, so the error is dumped right away without having to wait 20 seconds or more for models to load first. You might get all this for nearly free if you define the list of valid values for outputs
in cli.py
's demo()
function as I already suggested elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the RAS output specifiers are readalong_xml
and readalong_html
with an underscore instead of a hyphen like in everyvoice synthesize from-text
. They should be unified, using hyphens here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed!
everyvoice/demo/app.py
Outdated
wav_output = wav_writer.get_filename(basename, speaker, language) | ||
file_writer = None | ||
file_output = None | ||
if output_format == SynthesizeOutputFormats.readalong_html.name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and all the other callback constructor calls really ought to be able to be replaced by a single call to get_synthesis_output_callbacks
, no?
I realize you need the wav writer to create the RAS_html writer, but that's already done too.
Refactoring suggestion: have get_synthesis_output_callbacks
return a dict where the key are the output types, and the values are the writers. Then here you can use writers["wav"]
and writers[output_format]
to access the two writers you need, having passed ["wav", output_format]
(or the proper Enum form if need be) as the output_type
argument to get_synthesis_output_callbacks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, this is a good idea. I would definitely be in favour of this refactor, which I think would clean up some of the acrobatics we're currently doing in order to get the wav writer separately.
I agree, but I'm not sure gradio provides a callback for when the output format gets changed. It's possible, but given how expensive it is to synthesize (depending on the underlying hardware), I'm not super opposed to this UX, despite agreeing that it's clunky, as you say.
Yes, the problem is that I don't believe gradio allows you to dynamically change the interface once it's been rendered. So we might be stuck with that. Note that it doesn't get rendered if the only possible output format is "wav" (i.e. if the demo command disables all the other output formats). |
Having issues trying to generate a spec. The other formats being generated are good and valid :-) Also, The demo server is generated this error traceback at times. In this example below, The first time I try to synth / output a "spec" , no messages or "traceback" in logs... ( The web interface did show "Error").
Below is the screen capture of the message presented to the end user from the Gradio web interface when using select output format "spec" after "synth". |
Hi @marctessier I fixed the problem with the Spec in #619. Please test with that PR's code instead of this one. |
Details: - everyvoice demo -h will now list the valid output formats - code is DRYer: we only instantiate the callbacks in synthesize_helper, which returns them so we can access them to find out the output filenames where we need them - the File Output box is hidden when wav is the only output format
Plus minimal unit testing of this updated error checking code.
PR Goal?
This PR allows us to download any of the available output formats through the demo (TextGrid, ReadAlong, Spectrogram, etc). We can also optional disable certain output formats
Fixes?
This adds to fixing #607
Feedback sought?
Test it out, confirm it works as expected.
Priority?
medium
Tests added?
We currently don't test the demo - but we're moving to do that in @joanise 's PR
How to test?
spin up a demo and try synthesizing other formats
Confidence?
medium-high (I've tested it and it works well on my Mac)
Version change?
Related PRs?
EveryVoiceTTS/FastSpeech2_lightning#105