useTextToSpeech
Text to speech is a task that allows to transform written text into spoken language. It is commonly used to implement features such as voice assistants, accessibility tools, or audiobooks.
It is recommended to use models provided by us, which are available at our Hugging Face repository. You can also use constants shipped with our library.
API Reference
- For detailed API Reference for
useTextToSpeechsee:useTextToSpeechAPI Reference. - For all text to speech models available out-of-the-box in React Native ExecuTorch see: TTS Models.
- For all supported voices in
useTextToSpeechplease refer to: Supported Voices
High Level Overview
You can play the generated waveform in any way most suitable to you; however, in the snippet below we utilize the react-native-audio-api library to play synthesized speech.
import {
useTextToSpeech,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';
const model = useTextToSpeech({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
const audioContext = new AudioContext({ sampleRate: 24000 });
const handleSpeech = async (text: string) => {
const speed = 1.0;
const waveform = await model.forward(text, speed);
const audioBuffer = audioContext.createBuffer(1, waveform.length, 24000);
audioBuffer.getChannelData(0).set(waveform);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
};
Arguments
useTextToSpeech takes TextToSpeechProps that consists of:
modelof typeKokoroConfigcontaining thedurationPredictorSource,synthesizerSource, andtype.- An optional flag
preventLoadwhich prevents auto-loading of the model. voiceof typeVoiceConfig- configuration of specific voice used in TTS.
You need more details? Check the following resources:
- For detailed information about
useTextToSpeecharguments check this section:useTextToSpeecharguments. - For all text to speech models available out-of-the-box in React Native ExecuTorch see: Text to Speech Models.
- For all supported voices in
useTextToSpeechplease refer to: Supported Voices - For more information on loading resources, take a look at loading models page.
Returns
useTextToSpeech returns an object called TextToSpeechType containing bunch of functions to interact with TTS. To get more details please read: TextToSpeechType API Reference.
Running the model
The module provides two ways to generate speech:
forward(text, speed): Generates the complete audio waveform at once. Returns a promise resolving to aFloat32Array.
Since it processes the entire text at once, it might take a significant amount of time to produce an audio for long text inputs.
stream({ text, speed }): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
Example
Speech Synthesis
import React from 'react';
import { Button, View } from 'react-native';
import {
useTextToSpeech,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';
export default function App() {
const tts = useTextToSpeech({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
const generateAudio = async () => {
const audioData = await tts.forward({
text: 'Hello world! This is a sample text.',
});
// Playback example
const ctx = new AudioContext({ sampleRate: 24000 });
const buffer = ctx.createBuffer(1, audioData.length, 24000);
buffer.getChannelData(0).set(audioData);
const source = ctx.createBufferSource();
source.buffer = buffer;
source.connect(ctx.destination);
source.start();
};
return (
<View style={{ flex: 1, justifyContent: 'center', alignItems: 'center' }}>
<Button title="Speak" onPress={generateAudio} disabled={!tts.isReady} />
</View>
);
}
Streaming Synthesis
import React, { useRef } from 'react';
import { Button, View } from 'react-native';
import {
useTextToSpeech,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';
export default function App() {
const tts = useTextToSpeech({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
const contextRef = useRef(new AudioContext({ sampleRate: 24000 }));
const generateStream = async () => {
const ctx = contextRef.current;
await tts.stream({
text: "This is a longer text, which is being streamed chunk by chunk. Let's see how it works!",
onNext: async (chunk) => {
return new Promise((resolve) => {
const buffer = ctx.createBuffer(1, chunk.length, 24000);
buffer.getChannelData(0).set(chunk);
const source = ctx.createBufferSource();
source.buffer = buffer;
source.connect(ctx.destination);
source.onEnded = () => resolve();
source.start();
});
},
});
};
return (
<View style={{ flex: 1, justifyContent: 'center', alignItems: 'center' }}>
<Button title="Stream" onPress={generateStream} disabled={!tts.isReady} />
</View>
);
}
Supported models
| Model | Language |
|---|---|
| Kokoro | English |