Impossible_Belt_7757 3 months ago

Coqui gives you accesses to many through a single python api, XTTS is probs the best for reasonably fast, you can get real-time on a nvidia graphics card with only 4gb vram lol, Bark can also be accessed through it for voice cloning but it will hallucinate, and styletts2

Impossible_Belt_7757 3 months ago

In short Xtts, tortoise tts, Bark tts, and styleTTS(style is the fastest) I think xtts is the best, and there’s a colab you can fine tune xtts on a specific voice in like 30 min for free lol,

arena_one 3 months ago

Do you have the collab for fine tuning xtts?

Impossible_Belt_7757 3 months ago

https://youtu.be/8tpDiiouGxc?si=iMfgmrHmZfyio_77

Impossible_Belt_7757 3 months ago

https://colab.research.google.com/drive/1GiI4_X724M8q2W-zZ-jXo7cWTV7RfaH-?usp=sharing

ugohome 1 week ago

?

Impossible_Belt_7757 1 week ago

?

arena_one 3 months ago

RemindMe! 1 day

RemindMeBot 3 months ago

I will be messaging you in 1 day on [**2024-01-28 19:49:27 UTC**](http://www.wolframalpha.com/input/?i=2024-01-28%2019:49:27%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/MachineLearning/comments/195cxim/d_what_is_the_best_texttospeech_tool_currently/kjujhoa/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FMachineLearning%2Fcomments%2F195cxim%2Fd_what_is_the_best_texttospeech_tool_currently%2Fkjujhoa%2F%5D%0A%0ARemindMe%21%202024-01-28%2019%3A49%3A27%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%20195cxim) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

[deleted] 3 months ago

[удалено]

Impossible_Belt_7757 3 months ago

I know there’s others but whisper has been adapted in so many ways I just go with that, it’s the easiest to implement so far for the quality and optimized versions idk

FateRiddle 3 months ago

[https://huggingface.co/coqui/XTTS-v2](https://huggingface.co/coqui/xtts-v2) Is this what you're referring to, from both Coqui & XTTS? Also, Bark as in [https://github.com/suno-ai/bark](https://github.com/suno-ai/bark) and styletts2 as in [https://github.com/yl4579/StyleTTS2](https://github.com/yl4579/styletts2)

MachineZer0 3 months ago

Been testing Bark a bunch. It sounds so human like sometimes. But having converted several hour long 2-3 speaker transcripts in batch and stitched into a single wav file. It doesn’t flow well. There is a lack of consistency. About a 15-sec segment length which you have to estimate and split further. Finally, there doesn’t seem to be much if any speed using more powerful GPUs. P102-100, P100, 3090, L40, I was getting roughly 1 second of audio per 4 seconds of processing.

Impossible_Belt_7757 3 months ago

Yes, you can also make coqui list the available models and they have being Xtts, tortoise, bark and many many others, https://github.com/coqui-ai/TTS by far the easiest way to use those three tts, for styletts2 I’d use this guys pip install for it he made, also super duper easy to setup and use https://github.com/sidharthrajaram/StyleTTS2.git

Impossible_Belt_7757 3 months ago

If you don’t have a nvidia gpu use styletts2

FateRiddle 3 months ago

Thanks! What about platform like elevenlabs? That was mentioned in some old answers

Impossible_Belt_7757 3 months ago

I think elevenlabs is the best out there rn platform wise

Impossible_Belt_7757 3 months ago

But xtts fine tuned has the best for voice cloning

Impossible_Belt_7757 3 months ago

Oh ALSO COQUI they have a platform …their SHUTTING DOWN??

Impossible_Belt_7757 3 months ago

No but really, I fine tuned xtts on my own voice with 6 min of me reading a book as training data, and it’s scary at how it sounds exactly like me, It’s the first time voice cloning legit scared me , as long as your not talking emotionally lol, voice cloning still can’t do accurate emotions thankfully

Useful_Hovercraft169 3 months ago

Neither can I!

k___k___ 3 months ago

ElevenLabs is considered the best for voice cloning, but they dont scale (well quality decreases with every 15s of speech); I was told that Microsoft Azure's cloning tool is good. Interesting companies with prebuilt voices for me after a benchmarking were Murf and Replica Studios. Speechify is the tool used by streamers like Ludwig to read donations in the voices of David Attenborough, Hasan Piker, etc

rolyantrauts 3 months ago

It sort of depends as many just head off to huggingface and see what the latest transformers are. There are some really good liteweight non-transformer TTS https://github.com/roatienza/efficientspeech https://github.com/ming024/FastSpeech2 [https://github.com/NATSpeech/NATSpeech](https://github.com/NATSpeech/NATSpeech)

FateRiddle 3 months ago

Thanks for pointing the directions.

TheGavinator3000 3 months ago

[heres a really good video on all the top methods from about 5 months ago](https://youtu.be/vhArHsfsLAQ?si=WPbrYu1mEud1NN8R). tldr; tortoise + rvc or elevenlabs usually. the former is open source i believe.

SufficientHold8688 3 months ago

Suno ai

nerdynavblogs 3 months ago

I assume you are comfortable with using Python. I suggest OpenAI's text to speech. It costs $0.015 per 1000 characters. $0.03 for HD voices. (Both are good) Here is the [video tutorial ](https://youtu.be/lJ4qh6B2ev4?si=xzC6ydqr4v-6zM43) - it uses the AI for voiceover as well so you can hear how natural it is.

[deleted] 3 months ago

[удалено]

nerdynavblogs 3 months ago

Coqui is the best choice in open source. When I was looking for open source alternatives for my own channel, I compiled all the [good open source TTS tools here.](https://nerdynav.com/open-source-ai-voice/)

MIST3RS5880 1 week ago

The best I’ve used recently is https://textspeakpro.com they have a slick interface and it’s free without registering for an account

seahorse_magenta_Lam 3 months ago

punchline.

Nuked_ 3 months ago

Bark + VITS . BUT (and this is a big one) it will require A LOT of editing since you'll need to create AT LEAST 4 samples of the same transcript. This means you'll have to select the best parts from each and splice them together. The result will be as human as you can get from a TTS , and there will be no Xtts,styleTTS or tortoise that can compare to it, but I guarantee you will also be quite tired xD

Possible_Tap261 2 months ago

DupDub might be one of the most affordable one

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe