Text-based speech synthesis

Overview

ABSTRACT

This article provides an overview of speech synthesis from text or Text-To-Speech (TTS) in order to automatically calculate the speech signal corresponding to a given text. The various stages necessary in order to set up such a system are described, including the latest techniques such as those exploiting hidden Markov models. The various applications of speech synthesis and the principal offers in this domain are also discussed.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHORS

Christophe D'ALESSANDRO: Research Director LIMSI-CNRS, Orsay, France
Gaël RICHARD: Professor Institut Mines-Télécom, Télécom ParisTech, CNRS-LTCI, Paris, France - This article is an updated version of the 2003 article of the same title by Gaël Richard and Olivier Cappé.

INTRODUCTION

The aim of text-to-speech (or TTS, Text-To-Speech) is to automatically calculate the speech signal corresponding to a given text. The text itself can come from a variety of sources: newspapers, books, voice response systems, dialogue or automatic translation systems (interactive terminals, personal assistants), information system databases, video games, e-mails, SMS, documents browsed on the web, or simply text typed on a computer keyboard.

Voice response in its simplest form can be a set of pre-recorded messages (or "prompts"). Text-to-speech synthesis is more ambitious: it automatically calculates the sound samples corresponding to any written statement, which is not known in advance and may be large in size.

The two sides of speech synthesis are, on the one hand, text analysis and interpretation, and on the other, prediction of the acoustic-phonetic parameters of the sound and signal synthesis itself:

Text analysis: the first stage in transforming text into speech involves the ability to analyze and understand the written text, its nuances and connotations, the speech situation and the speech act to be performed. In addition to the text, the context can be specified (speaking style, emotion, attitude, character type, specific voice...);
Signal synthesis: once the text has been analyzed, the aim is to calculate the acoustic signal that best interprets the linguistic content, with a voice that sounds as natural as possible, resembling a particular speaker, and with the nuances of attitude and even emotion that the text calls for. In addition to the audio signal, the synthesizer can provide instructions for synchronizing the lip movements of an avatar or video character, or the movements of a robot.

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource. Click here to request your free trial access!

Already subscribed? Log in!

KEYWORDS

signal processing | linguistics

Ongoing reading
Text-based speech synthesis

History of speech synthesis

Article included in this offer

"Digital documents and content management"

( 75 articles )

Complete knowledge base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

View offer details

Bibliography

(1) - SPROAT (R.), MOEBIUS (B.), MAEDA (K.), TZOUKERMANN (E.) - Multilingual text analysis. - Dans Multilingual Text-To-Speech Synthesis – The Bell Labs Approach, SPROAT (R.) et coll. éd., Kluwer Academic Publishers (1998). Ce livre décrit en détail les procédures de synthèse de l'anglais et d'autres langues, et donne une introduction au...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource. Click here to request your free trial access!

Already subscribed? Log in!