Overview
ABSTRACT
This article provides an overview of speech synthesis from text or Text-To-Speech (TTS) in order to automatically calculate the speech signal corresponding to a given text. The various stages necessary in order to set up such a system are described, including the latest techniques such as those exploiting hidden Markov models. The various applications of speech synthesis and the principal offers in this domain are also discussed.
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHORS
-
Christophe D'ALESSANDRO: Research Director LIMSI-CNRS, Orsay, France
-
Gaël RICHARD: Professor Institut Mines-Télécom, Télécom ParisTech, CNRS-LTCI, Paris, France - This article is an updated version of the 2003 article of the same title by Gaël Richard and Olivier Cappé.
INTRODUCTION
The aim of text-to-speech (or TTS, Text-To-Speech) is to automatically calculate the speech signal corresponding to a given text. The text itself can come from a variety of sources: newspapers, books, voice response systems, dialogue or automatic translation systems (interactive terminals, personal assistants), information system databases, video games, e-mails, SMS, documents browsed on the web, or simply text typed on a computer keyboard.
Voice response in its simplest form can be a set of pre-recorded messages (or "prompts"). Text-to-speech synthesis is more ambitious: it automatically calculates the sound samples corresponding to any written statement, which is not known in advance and may be large in size.
The two sides of speech synthesis are, on the one hand, text analysis and interpretation, and on the other, prediction of the acoustic-phonetic parameters of the sound and signal synthesis itself:
Text analysis: the first stage in transforming text into speech involves the ability to analyze and understand the written text, its nuances and connotations, the speech situation and the speech act to be performed. In addition to the text, the context can be specified (speaking style, emotion, attitude, character type, specific voice...);
Signal synthesis: once the text has been analyzed, the aim is to calculate the acoustic signal that best interprets the linguistic content, with a voice that sounds as natural as possible, resembling a particular speaker, and with the nuances of attitude and even emotion that the text calls for. In addition to the audio signal, the synthesizer can provide instructions for synchronizing the lip movements of an avatar or video character, or the movements of a robot.
Exclusive to subscribers. 97% yet to be discovered!
Already subscribed? Log in!
KEYWORDS
signal processing | linguistics
Text-based speech synthesis
Article included in this offer
"Digital documents and content management"
(
71 articles
)
Updated and enriched with articles validated by our scientific committees
A set of exclusive tools to complement the resources
Bibliography
- (1) - SPROAT (R.), MOEBIUS (B.), MAEDA (K.), TZOUKERMANN (E.) - Multilingual text analysis. - Dans Multilingual Text-To-Speech Synthesis – The Bell Labs Approach, SPROAT (R.) et coll. éd., Kluwer Academic Publishers (1998). Ce livre décrit en détail les procédures de synthèse de l'anglais et d'autres langues, et donne une introduction au...
Exclusive to subscribers. 97% yet to be discovered!
Already subscribed? Log in!