Noob question - Input text file? #1101

BrainstormerPrime · 2022-01-13T08:13:20Z

BrainstormerPrime
Jan 13, 2022

I hope this is a good place to ask this, and maybe some other noobies might also not be able to figure this out as well.

I just installed TTS from PyPI using "PIP install TTS" and doing some tests from the terminal interface, but I cannot figure out how to feed the text file I want it to read. I can get to read a short piece of inline text, but it doesn't seem able to take an input text file as an argument? It just reads the name of the text file I give it. I can't find anywhere in the documentation that describes how to do this.

or is there some other UI other than the terminal interface that I should be trying? There has to be a way other that typing out the text to be spoken in it's entirety when invoking the TTS command from CMD.

Thanks,

Akash7789 · 2022-01-14T10:15:00Z

Akash7789
Jan 14, 2022

If it is reading the name of the text file. Then you are already giving it your file name as input. You may change a little bit of code to instruct it to take your text file as input. And there may be a gui (local server) present.

3 replies

BrainstormerPrime Jan 14, 2022
Author

I understand it's just reading my file name. I just can't find any command syntax that will use the file contents instead. I've looked up CMD syntax to see if I can redirect the file input, or writing a batch file that invokes TTS and passes the file as an argument, but I haven't had much luck yet. I'm using windows, so a lot of the instructions about BASH or Ubuntu forums doesn't help.

I'd love to "change a little bit of code" so it could take a file input. If you know where in the python code that could be done, that would be great. I've no idea where to begin looking for that. Digging through .py files, lots of error message dialogs, try-catch kind of stuff, and checks... the model and basic instructions are in there somewhere, I'm sure.

It just seems like a huge oversight. not taking a file as an input seems as basic as saving a file as output. If the script just read the text aloud and didn't save it, and you had to redirect the audio output to Audacity to record it or something, that would obviously not be a workable interface. It just seems like taking a file as input is equally as required for it to be really functional.

Akash7789 Jan 15, 2022

Do you just want to give it a text file as input or want to it set up for custom training ?

BrainstormerPrime Jan 16, 2022
Author

I did download the voice data set, though I'm just starting, so I don't think I'll be really capable of building and training my own models for a while yet.

It just seems like it should be able to take a file for input. Text-to-speech would seem like .TXT -> .WAV would be the natural way to do it. Using the CLI for large amounts of text, like a full audio book or scrip (especially since having return characters breaks it) without that is really user unfriendly.

notklaatu · 2022-01-24T07:46:46Z

notklaatu
Jan 24, 2022

I'm also interested in this.

For example, I want to synthesize the phrase "hello world".

First, I place the phrase into a text file:

$ echo "Hello world" > hello.txt

Then I want Coqui to synthesize the speech. I imagine a command like this:

$ cat hello.txt | tts --model_name tts_models/en/vctk/vits --speaker_idx p225

Or this:

$ tts --model_name tts_models/en/vctk/vits --speaker_idx p225 --input_file hello.txt

The results would be a WAV file containing the speech "hello world" NOT "hello.txt"

I understand that it's possible to synthesize "hello world" with the --text "hello world" option, but imagine that instead I wanted to synthesize 2000 words from a text file, and not just two being used as an example.

I've tried:

$ cat example.txt | tts  --model_name tts_models/en/vctk/vits --speaker_idx p225
$ tts  --model_name tts_models/en/vctk/vits --speaker_idx p225 --text $(< example.txt)
$ cat example.txt - |  tts  --model_name tts_models/en/vctk/vits --speaker_idx p225 --text -

But none of these "tricks" work.

2 replies

ebenfarnworth Jan 24, 2022

I'm interested too. This is what I was able to setup with other TTS Linux software: https://ebenfarnworth.substack.com/p/text-to-speech-tts-on-linux I'm interested in setting up something similar using Coqui.

milosimpson Sep 8, 2022

Workaround using environment variables on a Mac / Linux.

# say you have a sample.txt file you want it to process
# load all the text as an environment variable
export SAMPLE=`cat sample.txt`

# sanity check, echo the environment variable, and make sure you see the text you expect
echo $SAMPLE

tts  --model_name tts_models/en/vctk/vits --speaker_idx p225 --text $SAMPLE

I have found that I need to scrub the text of all double quotes, as tts doesn't like them.

notklaatu · 2022-01-24T16:07:16Z

notklaatu
Jan 24, 2022

Here's a work-around, at least for now, that's working for me.

Launch a TTS server

tts-server --model_name tts_models/en/vctk/vits  --port 8080

Open a web browser and navigate to localhost:8080. I'm using Firefox, so these instructions apply to it, but I assume Chrome has similar options.
Copy and paste the text you want to synthesize. I've tested 5,000 words (24 minutes of audio) with success. 13,000 words failed.
Click the speak button.
When the process is over, an audio player appears below the text field. Right-click on this and select Save Audio As
Save the WAV file to your computer.

It's not my ideal workflow, but this does the trick.

Hope that helps.

2 replies

BrainstormerPrime Feb 17, 2022
Author

Thanks for that. It is a nice feature. But it doesn't seem to be working for me plug-and-play. I pasted my text, and the audio came out with some huge jumps in the text, and I'm not sure if it's from the syntax of my text, or something else. I saw another post (and successfully replicated) that using "hello" vs "hello." produce very different results, with the text with no punctuation having a weird hissing trailing off, so maybe it could be something with the text having lines that end in commas.

but I see in the terminal that I have several instances of "Decoder stopped with 'max_decoder_steps' 500", so that might be the culprit as well.

The web interface also seems to be doing some kind of sentence parsing from my text, based on the command line log.

notklaatu Feb 17, 2022

Agreed, it's a flimsy workaround. The more I try it, the more variation in output I get. Not ideal.

christosangelopoulos · 2022-03-06T18:42:05Z

christosangelopoulos
Mar 6, 2022

I have written a small bash script just for that, you might want to have a look,
https://gitlab.com/christosangel/sapo
The script delimits the text tomore lines, with less chars each, then feeds the text line by line. In the end, with sox it cocatenates all the wav files to one.
The output does need some editing in the end, but not much, at least no text is missing.
Feel free to give feedback.

2 replies

ghost Oct 25, 2022

F******ck. If only I had seen this yesterday. I've also done a script to do the same thing. I should have read all the answers before starting.

christosangelopoulos Nov 3, 2022

I am glad you find it useful. It is still a work in progress, as I try to eliminate the mispronounced words with the sed files, and it is not a small task.

FrontierDK · 2022-04-05T19:16:55Z

FrontierDK
Apr 5, 2022

In the future, will it be possible to use a text file as input?

0 replies

pvonmoradi · 2022-10-11T19:27:53Z

pvonmoradi
Oct 11, 2022

Just use xargs :

cat my_text_file.txt | xargs -0 tts --model_name "tts_models/en/ljspeech/tacotron2-DDC"  --out_path "out_tacotron2.wav" --text

This puts the content of my_text_file.txt as an argument in front of --text.

Note that I found out tts can't detect sentences from input properly nor does it support some --process-on-blank-line-like option. So you need to tokenize the text into sentences using some NLP tools specific to your language. I tested this and it worked nicely (single binary and all).

3 replies

ghost Oct 23, 2022

Just use xargs :
cat my_text_file.txt | xargs -0 tts --model_name "tts_models/en/ljspeech/tacotron2-DDC"  --out_path "out_tacotron2.wav" --text
This puts the content of my_text_file.txt as an argument in front of --text.

Note that I found out tts can't detect sentences from input properly nor does it support some --process-on-blank-line-like option. So you need to tokenize the text into sentences using some NLP tools specific to your language. I tested this and it worked nicely (single binary and all).

Could you expand on how to use that binary combined with the TTS? Would it be like this?

ttst() {
  cat $1 | sentences | xargs -0 tts --model_name "tts_models/en/ljspeech/tacotron2-DDC"  --out_path "${2:-out.wav}"
}

Isn't there a program that combines both automatically to just do something like tts -i test.txt -o test.wav?

pvonmoradi Oct 23, 2022

Could you expand on how to use that binary combined with the TTS? Would it be like this?
ttst() {
  cat $1 | sentences | xargs -0 tts --model_name "tts_models/en/ljspeech/tacotron2-DDC"  --out_path "${2:-out.wav}"
}

Yes, but you missed the --text at the end. Also, quote the shell variable (cat "$1")

Isn't there a program that combines both automatically to just do something like tts -i test.txt -o test.wav?

I don't know how TTS models work but I think they are supposed to tokenize input so external processing like sentences I mentioned would not be needed.
BTW, I'm now using this tool which, in my opinion, produces far better results (and takes much longer too). There is a nifty CLI script there too for the use-case you described.

miguelgh65 Oct 30, 2022

For this issue I just use this : SAMPLE=$(cat file)
echo "${SAMPLE//$'\n'/\n}"
sudo tts --text "${SAMPLE//$'\n'/\n}" --out_path output/prueba.wav
And it worked for me, easier solution, similar to the solution posted before

peter415mars · 2022-12-05T15:50:32Z

peter415mars
Dec 5, 2022

go to server/template/index.html
add an input file in html:
<input type="file" name="inputfile" id="inputfile">
edit the top of script you can input the following

     document.getElementById('inputfile')
        .addEventListener('change', function() {
        var fr=new FileReader();
        fr.onload=function(){
            document.getElementById('text')
                    .value=fr.result;
        }
        fr.readAsText(this.files[0]);
    })

then launch server

3 replies

system1system2 Dec 6, 2022

Are you sure about the syntax here, @peter415mars ? I think there is some involuntary quote.

peter415mars Dec 7, 2022

the formatting was weird but other then that i do know this will work ive used this exact script to test some models with txt files. Now wether its the best solution i'm not sure about. also what do you mean by 'Involountaty qoute?' @system1system2

system1system2 Dec 7, 2022

It works. Thank you @peter415mars.

But for it to work, the <input type="file" name="inputfile" id="inputfile"> must be placed within the container div.

I placed it above <input id="text" placeholder="Type here..." size=45 type="text" name="text">.

(by "involuntary quote" I was referring to a potential quotation mark that messed up the formatting of your previous answer. Now it looks fine)

viendocraz · 2023-03-15T17:03:21Z

viendocraz
Mar 15, 2023

Also looking for a command line solution on Windows which takes a text file as input (as opposed to modifying the server ui to accept a file input).

2 replies

briankristensen Jul 11, 2023

On Windows, in Powershell, you could do it like this, to speak the contents of test.txt:
tts --model_name tts_models/en/multi-dataset/tortoise-v2 --text $(type test.txt)
In a Linux environment it would be:
tts --model_name tts_models/en/multi-dataset/tortoise-v2 --text $(cat test.txt)
The commands run the type or cat command and uses the input as the text parameter.

notklaatu Jul 11, 2023

This works, with one correction: On Linux, there need to be quotes around the sub-shell command:

$ tts --model_name  tts_models/en/jenny/jenny --text "$(cat test.txt)"

This produces a file in the current directory called `tts_output.wav' containing the spoken contents of the test file. It can take a very long time for files exceeding 100 words or so.

(Also, the tortoise-v2 voice gave me an error for some reason, but jenny works as expected.)

xrishox · 2023-11-27T14:51:52Z

xrishox
Nov 27, 2023

would also like to see it accept a text file as input.

0 replies

rcidaleassumpo · 2024-01-27T20:02:48Z

rcidaleassumpo
Jan 27, 2024

On macOs, you can do the following

tts --text "$(cat ~/text.txt)"

0 replies

sanketnawale · 2024-04-14T16:46:39Z

sanketnawale
Apr 14, 2024

I also want to use txt ile as input , but when i giuve the path it reads the path as input text , did ypu found the solution yet

0 replies

Nickwiz · 2024-06-13T21:58:14Z

Nickwiz
Jun 13, 2024

Added an ad-hock "enhancement".

Changes:

--text can take multiple arguments. E.g. --text "fo bar" baz "more text"
--file <FILE1 [ FILE2 [...]]>
Or STDIN

diff --git a/TTS/bin/synthesize.py b/TTS/bin/synthesize.py
index b86252ab..8f62150c 100755
--- a/TTS/bin/synthesize.py
+++ b/TTS/bin/synthesize.py
@@ -170,7 +170,18 @@ def main():
         help="model info using query format: <model_type>/<language>/<dataset>/<model_name>",
     )
 
-    parser.add_argument("--text", type=str, default=None, help="Text to generate speech.")
+    parser.add_argument("--text",
+        type=str,
+        default=None,
+        nargs='*',
+        help="Text to generate speech."
+    )
+    parser.add_argument("--text_file",
+        type=str,
+        default=None,
+        nargs='*',
+        help="Text-file to generate speech."
+    )
 
     # Args for running pre-trained TTS models.
     parser.add_argument(
@@ -224,7 +235,7 @@ def main():
         const=True,
         default=False,
     )
-    
+
     # args for multi-speaker synthesis
     parser.add_argument("--speakers_file_path", type=str, help="JSON file for multi-speaker model.", default=None)
     parser.add_argument("--language_ids_file_path", type=str, help="JSON file for multi-lingual model.", default=None)
@@ -313,9 +324,23 @@ def main():
         default=None,
         help="Voice dir for tortoise model",
     )
-
     args = parser.parse_args()
 
+    if args.text_file is not None:
+        if args.text is None:
+            args.text = []
+        for file in args.text_file:
+            with open(file, 'r', encoding = 'utf8') as f:
+                args.text += f.read().splitlines()
+
+    if not sys.stdin.isatty():
+        if args.text is None:
+            args.text = []
+        args.text += sys.stdin.read().splitlines()
+
+    if args.text is not None:
+        args.text = '\n'.join(args.text)
+
     # print the description if either text or list_models is not set
     check_args = [
         args.text,
@@ -482,7 +507,10 @@ def main():
             )
         elif model_dir is not None:
             wav = synthesizer.tts(
-                args.text, speaker_name=args.speaker_idx, language_name=args.language_idx, speaker_wav=args.speaker_wav
+                args.text,
+                speaker_name=args.speaker_idx,
+                language_name=args.language_idx,
+                speaker_wav=args.speaker_wav
             )
 
         # save the results

1 reply

cukabeka Sep 3, 2024

Yes, this would be amazing. Actually I wonder why this scenario is not thought of, it seems quite natural for me to expect to pass longer text files to the CLI tool. So thanks for adding the code! What about a PR? :)

Noob question - Input text file? #1101

Replies: 12 comments · 18 replies

BrainstormerPrime Jan 14, 2022 Author

BrainstormerPrime Jan 16, 2022 Author

BrainstormerPrime Feb 17, 2022 Author

Replies: 12 comments 18 replies

BrainstormerPrime Jan 14, 2022
Author

BrainstormerPrime Jan 16, 2022
Author

BrainstormerPrime Feb 17, 2022
Author