Discussion TTS4ED: speech synthesis for NPCs (and players) [win]

Current beta version is 1.0.1.0 (2017.04.28)

I need some more testers so I know if my little program is working as intended before I start working on the GUI. Any feedback is appreciated. Even if you don't experience any issues, I just want to hear a simple "It works!" so I know people are actually able to use it :)

Download and info here: sites.google.com/site/tts4ed


A demonstration video that shows it in action:

[video=youtube_share;syRE_nClBS0]https://youtu.be/syRE_nClBS0[/video]
(This video is from an early proof-of-concept version. Effects/filters volume can be adjusted in current versions.)


How does it work?
This program uses a high quality (and very convincing) speech synthesizer on incoming NPC messages. It can also process player messages (disabled by default). It currently utilizes the CereVoice Cloud TTS web service, meaning it has access to a very large number of voices (30+) with different accents and nationalities. I plan on adding support for more TTS services in the future, such as Amazon Polly. Note: it does not use SAPI5, and there is no plan for that either. More info here.

You need a CereVoice Cloud developer account (free) to get started. 10k credits (characters) is included per month in the free account. Before you comment on the cost of purchasing additional credits, please read all relevant info on the TTS4ED webpage and consider the rather low 20GBP cost of access to 30+ high-end TTS voices. Also that there will be downloadable voice packs in the near future, in turn reducing the need for purchased credits. I am not affiliated with CereProc in any way.

I do realize some of the voices and phrases might sound very robot like (all speech synthesizers do that when you compare them directly with real human voices ), but the software does allow for easy correction of phrases/words before they are processed by the TTS engine. This allows for on-the-fly correction of troublesome words, vocalization, and intonation. This can be done per voice, and is especially useful with regards to the non-English voices. Vocal gestures such as sighs, laughs, etc will be added soon. The program is not made for use with pre-recorded phrases done by human voice actors, and I do not plan on adding support for it. It is for TTS only.

It picks a random voice for each NPC at your first encounter, and stores it for next time you meet the same NPC. Generic ships (i.e police or military) that don't have a unique name will have a random voice picked when it enters the instance and sends its first message, and it retains the same voice until you jump out or you kill one of them (to avoid stale voices in CZ or RES). If there are many messages within a very short time frame, it will wait until it has time to play the last message received only (to avoid queuing up messages in high chatter areas, such as compromised nav beacons). This means it will skip messages sometimes when it gets real busy in the chat, but that's by design. And yes, you can mute specific NPCs, such as Wedding Barges and Cruise Liners :)

You can assign specific voices for each player as well (i.e friends). It also uses the non-English voices for pirates and bounty hunters to give some variation to accents (experimental feature, lots of room for improvement).

It caches the downloaded speech audio (or uses pre-downloaded speech from other users, a.k.a voice packs), reducing the account credits cost if it already finds the same message - with the the selected voice - locally on your system. This also reduces processing delays, as it doesn't need to request and then download the TTS speech beforehand (this usually takes less than 1 second).

If you have vision impairment or partial hearing loss and want to use TTS4ED to read out incoming messages loud and clear, you can disable (or reduce) the filters and effects, leaving only the raw speech audio. For improved "hearability" you can also reduce the number of voices so it only uses a single voice on all NPCs (i.e. Nicole is your favorite, and you want to hear her French accent all the time)


Audio processing demonstration here:

[video=youtube_share;DHO-LnwA70Y]https://youtu.be/DHO-LnwA70Y[/video]


Currently it is a console program without a GUI, but that will soon change. I just need to get the bulk features implemented, making sure everything is running smoothly before I start working on that. This will also reduce the need for the overly technical documentation it currently has. Todo list is available here.

BZOUASp.png



Lots more information on features, technical stuff, and configuration available here: sites.google.com/site/tts4ed/



Small end note: I know by experience a lot of people will ask for - and suggest SAPI5 support - and start discussing SAPI5 (or other local TTS engines) vs online TTS services. Please take into consideration the very high cost of purchasing a single high-end SAPI5 voice from a commercial developer. And then... multiply that cost with 30, or even 40. This includes requesting support for the free (and very low quality) Microsoft SAPI5 voices many people already have installed on their system. To be clear: there is no plan to add SAPI5 support.
 
Last edited:
Whilst I applaud you on the work, it is brilliant but think you will struggle to persuade players to buy external software in order to make it work.
 
This thing looks utterly awesome, please keep on it!

Also the news that the audio can be redistributed is fantastic, this mod has a very bright future i think. :)
 
../..you will struggle to persuade players to buy external software in order to make it work.

Well, people are already buying single SAPI5 voices (commercial, high-end voices) for use with ED 3rd party tools (and other similar games). It's not a new phenomena, nor do I believe potential users will find it particular off-putting. It's a small cost for something that brings a new dimension to the game's gameplay.

Note that you don't buy any "external software". It's a web service. And you don't have to buy (credits) to get it to work. At all. The free account will work just fine. Soon there will be downloadable voice packs, and user will just use the free 10k credits/month the developer account provides, without ever thinking about purchasing additional credits (unless they might want to use it for incoming player chat as well).

I do realize that a lot of people think it's a big hassle to register an account, read the received email, enter the credentials, etc. But those are the people that are most likely to just delete the program at the first sign of error messages or trouble - without ever bothering giving me any feedback as well. So I have to be frank... it's not a big loss to me during the beta testing process. These people might come back later on, when I have a user-friendly graphical interface. With a nice big REGISTER button for them to click instead :) I just hope they might reconsider the trouble of registering an account then :D
 
Last edited:
This thing looks utterly awesome, please keep on it!

Also the news that the audio can be redistributed is fantastic, this mod has a very bright future i think. :)

I've accumulated a large amount of voices/phrases during the last month or so, and soon I'm gonna upload a rar file for people to grab. That should give the users a real head start, reducing the cost of account credits drastically.
 
Last edited:
I've accumulated a large amount of voices/phrases during the last month or so, and soon I'm gonna upload a rar file for people to grab. That should give the users a real head start, reducing the cost of account credits drastically.

This would be fantastic, I suspect the credit issue is a bit of a limiter for folk atm.

Been using this mod over the last few days, really love it. [up]
 
Last edited:
I've been using this program for about a week now. It works very well. My only issue is that the settings.ini commands are cryptic and I can't seem to get the app into config mode to adjust the settings. But this is ok, I'll patiently wait for the gui. Great app, I'm looking forward to more updates!
 
I plan to get this over the next few days. I think it looks great.

Question for those using it; Do you you voice attack, and have you had any [amusing] accidental pickups by it, when an NPC speaks?
 
This looks brilliant - I look forward to developments.

[up]

Edit: Just installed and completed a test flight, pirate interdicted me - TTS4ED audio worked brilliantly. Only downside was the arrival of security forces and the same male voice was used for them. A later NPC voice was different and again, worked really well. Still, early days and for a first attempt I was really impressed.
 
Last edited:
Can I ask.

I'd like some of the voices to be distorted a little less, so adjusting the sox overdrive is probably the thing to do, but it is difficult to test.

So..

1) If I delete the contents of the `processed` folder and leave `download` intact will TTS4ED then just renegerate with the new sox `settings.ini` values as the NPC is re-encountered, presumably no redownload will be required.

2) Is there a way to test a new set of sox `settings.ini`s without having to encounter an NPC in game?
 
Last edited:
Actually nevermind. Figured it out, I can fake the journal to test, also hotkeys, but I'll be buggered if I can get those to work for me.

I have no number pad and tried redefining

Code:
AudioConfigStart={F1}
configPreview={F2}

but no joy, can fake it with a pretend journal though, but cannot seem to find how to lower the distortion.
 
Last edited:
Can I ask.

I'd like some of the voices to be distorted a little less../..

1) If I delete the contents of the `processed` folder and leave `download` intact will TTS4ED then just renegerate with the new sox `settings.ini` values as the NPC is re-encountered, presumably no redownload will be required.

2) Is there a way to test a new set of sox `settings.ini`s without having to encounter an NPC in game?

I have no number pad and tried redefining

Code:
AudioConfigStart={F1}
configPreview={F2}

but no joy ../..


Hi there,

Sorry for late reply, but been a little busy this week.

Distortion is controlled by:

Code:
[sox]
overdriveEnabled=1
overdriveGain=30
overdriveColour=0

"Gain" is the amount of distortion; range 0-100, while "Colour" is defined in the SoX manual as "the amount of even harmonic content in the over-driven output"; range 0-100. Colour is timbre/warmth of the sound, and is mostly noticeable at very low Gain values, so it's basically Gain you need to adjust to control the distortion itself. That's also why its default value is set to 0, and is the reason why only Gain is controllable through the interface.

---

In regards to hotkeys, if you can't get F1 and F2 to work it means some other program is already using them. I tried here on my systems, and they work fine, even when ED is running. Try adding modifiers such as CTRL, SHIFT and/or ALT, or different hotkeys altogether.

---

The processed folder contains the rendered audio files if you don't use SoX for direct playback. This is because 1) SoX adds effects/filters and saves it to a file, and then 2) Windows Multimedia API is used for playing this file.
You must have the current config for this to happen:
Code:
[main]
SoxPlayback=0

If you want the program to play a previously saved processed audio file (for reduced delay when using "SoxPlayback=0"), you need to disable overwrite. It will then check to see if the is an existing audio file with the same chat message and the same voice from earlier, and use this instead. If an audio file is not found, it will of course run SoX to generate an audio file. Overwrite is only applicable if "SoxPlayback=0". You need the current config:

Code:
[main]
SoxPlayback=0
Overwrite=1

If "SoxPlayback=1" it uses SoX for direct playback (reduced delay, as there is no need to post-process the effects/filters), and overwrite is irrelevant (it ignores the setting), because it doesn't save the processed audio. It's always done real-time on-the-fly.


I hope this helps. The future GUI will make these things a lot easier to understand.
 
Last edited:
Back
Top Bottom