Audio Samples from "A Pipeline for Stochastic and Controlled Generation of Realistic Language Input for Simulating Infant Language Acquisition"

The page contains a set of audio samples in support of the paper "A Pipeline for Stochastic and Controlled Generation of Realistic Language Input for Simulating Infant Language Acquisition".

One can listen to speech samples produced for our experiments presented in the paper. The samples were produced by FastPitch model trained jointly using LJSpeech and Expresso datasets (see the detailed description in the paper). There are samples with both speaking styles, i.e. child-directed (CDS) and neutral, and both types of text content, i.e. books (from LibriSpeech) and synthetic CDS transcripts. Note, that due to limited training data for this style, some samples with CDS styles suffered from specific issue resulting in unjustified pauses.

1. Synthetic content


1.1. Good quality samples (UTMOS > 4.0)

Text: Yes. You do. What is this up here.

CDS style. Expresso speaker (exp2) Neutral style. LJSpeech speaker

Text: Ok ok. We'll change this one later.

CDS style. Expresso speaker (exp2) Neutral style. LJSpeech speaker

Text: And they're talking about your birthday party.

CDS style. Expresso speaker (exp2) Neutral style. LJSpeech speaker

1.2. Samples, suffering from specific issues

Text: So many times. You want mommy to hold that on.

The CDS styled sample has specific problem that some words of the text input were not pronounced. So many time... You want mommy to hold that ...

CDS style. Expresso speaker (exp2) Neutral style. LJSpeech speaker

2. LibriSpeech content


Text: Seemed to be common between impeached members and the parliament.

CDS style. Expresso speaker (exp2) Neutral style. LJSpeech speaker