TextToSpeechModule
TypeScript API implementation of the useTextToSpeech hook.
API Reference
- For detailed API Reference for
TextToSpeechModulesee:TextToSpeechModuleAPI Reference. - For all text to speech models available out-of-the-box in React Native ExecuTorch see: TTS Models.
- For all supported voices in
TextToSpeechModuleplease refer to: Supported Voices
High Level Overview
import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
const model = await TextToSpeechModule.fromModelName(
{ model: KOKORO_MEDIUM, voice: KOKORO_VOICE_AF_HEART },
(progress) => console.log(progress)
);
await model.forward(text, 1.0);
Methods
All methods of TextToSpeechModule are explained in details here: TextToSpeechModule API Reference
Loading the model
Use the static fromModelName factory method with the following parameters:
-
config- Object containing: -
onDownloadProgress- Optional callback to track download progress (value between 0 and 1).
This method returns a promise that resolves to a TextToSpeechModule instance once the assets are downloaded and loaded into memory.
For more information on resource sources, see loading models.
Running the model
The module provides two ways to generate speech using either raw text or pre-generated phonemes:
Using Text
forward(text, speed): Generates the complete audio waveform at once. Returns a promise resolving to aFloat32Array.stream({ speed, stopAutomatically, onNext, ... }): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences. In contrast toforward, it enables inserting text chunks dynamically into processing buffer withstreamInsert(text)and allows stopping generation early withstreamStop(instant).
Using Phonemes
If you have pre-computed phonemes (e.g., from an external dictionary or a custom G2P model), you can skip the internal phoneme generation step:
forwardFromPhonemes(phonemes, speed): Generates the complete audio waveform from a phoneme string.streamFromPhonemes({ phonemes, speed, onNext, ... }): Streams audio chunks generated from a phoneme string.
Since forward and forwardFromPhonemes process the entire input at once, they might take a significant amount of time to produce audio for long inputs.
Example
Speech Synthesis
import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';
const tts = await TextToSpeechModule.fromModelName({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
const audioContext = new AudioContext({ sampleRate: 24000 });
try {
const waveform = await tts.forward('Hello from ExecuTorch!', 1.0);
// Create audio buffer and play
const audioBuffer = audioContext.createBuffer(1, waveform.length, 24000);
audioBuffer.getChannelData(0).set(waveform);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
} catch (error) {
console.error('Text-to-speech failed:', error);
}
Streaming Synthesis
import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';
const tts = await TextToSpeechModule.fromModelName({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
const audioContext = new AudioContext({ sampleRate: 24000 });
try {
for await (const chunk of tts.stream({
text: 'This is a streaming test, with a sample input.',
speed: 1.0,
})) {
// Play each chunk sequentially
await new Promise<void>((resolve) => {
const audioBuffer = audioContext.createBuffer(1, chunk.length, 24000);
audioBuffer.getChannelData(0).set(chunk);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.onEnded = () => resolve();
source.start();
});
}
} catch (error) {
console.error('Streaming failed:', error);
}
Synthesis from Phonemes
If you already have a phoneme string (e.g., from an external library), you can use forwardFromPhonemes or streamFromPhonemes to synthesize audio directly, skipping the internal phonemizer stage.
import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
const tts = await TextToSpeechModule.fromModelName({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
// Example phonemes for "ExecuTorch"
const waveform = await tts.forwardFromPhonemes('həlˈO wˈɜɹld!', 1.0);
// Or stream from phonemes
for await (const chunk of tts.streamFromPhonemes({
phonemes:
'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.',
speed: 1.0,
})) {
// ... process chunk ...
}