Open source TTS models Kokoro, Orpheus, and Piper are tested on symbols, abbreviations, and prosody with CER and MOS results.
Abstract: Speech emotional recognition (SER) focuses on developing computers' comprehension and response to human emotional tones and is a key field of research in human-machine interaction. This ...
Abstract: Multimodal speech emotion recognition (SER) has emerged as pivotal for improving human–machine interaction. Researchers are increasingly leveraging both speech and textual information ...