Skip to main content
Version: Next

TextToSpeechModule

TypeScript API implementation of the useTextToSpeech hook.

API Reference

High Level Overview

import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';

const model = await TextToSpeechModule.fromModelName(
{ model: KOKORO_MEDIUM, voice: KOKORO_VOICE_AF_HEART },
(progress) => console.log(progress)
);

await model.forward(text, 1.0);

Methods

All methods of TextToSpeechModule are explained in details here: TextToSpeechModule API Reference

Loading the model

Use the static fromModelName factory method with the following parameters:

  • config - Object containing:

    • model - Model configuration (e.g. KOKORO_MEDIUM).
    • voice - Voice configuration (e.g. KOKORO_VOICE_AF_HEART).
  • onDownloadProgress - Optional callback to track download progress (value between 0 and 1).

This method returns a promise that resolves to a TextToSpeechModule instance once the assets are downloaded and loaded into memory.

For more information on resource sources, see loading models.

Running the model

The module provides two ways to generate speech using either raw text or pre-generated phonemes:

Using Text

  1. forward(text, speed): Generates the complete audio waveform at once. Returns a promise resolving to a Float32Array.
  2. stream({ speed, stopAutomatically, onNext, ... }): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences. In contrast to forward, it enables inserting text chunks dynamically into processing buffer with streamInsert(text) and allows stopping generation early with streamStop(instant).

Using Phonemes

If you have pre-computed phonemes (e.g., from an external dictionary or a custom G2P model), you can skip the internal phoneme generation step:

  1. forwardFromPhonemes(phonemes, speed): Generates the complete audio waveform from a phoneme string.
  2. streamFromPhonemes({ phonemes, speed, onNext, ... }): Streams audio chunks generated from a phoneme string.
note

Since forward and forwardFromPhonemes process the entire input at once, they might take a significant amount of time to produce audio for long inputs.

Example

Speech Synthesis

import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';

const tts = await TextToSpeechModule.fromModelName({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
const audioContext = new AudioContext({ sampleRate: 24000 });

try {
const waveform = await tts.forward('Hello from ExecuTorch!', 1.0);

// Create audio buffer and play
const audioBuffer = audioContext.createBuffer(1, waveform.length, 24000);
audioBuffer.getChannelData(0).set(waveform);

const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.start();
} catch (error) {
console.error('Text-to-speech failed:', error);
}

Streaming Synthesis

import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';

const tts = await TextToSpeechModule.fromModelName({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
const audioContext = new AudioContext({ sampleRate: 24000 });

try {
for await (const chunk of tts.stream({
text: 'This is a streaming test, with a sample input.',
speed: 1.0,
})) {
// Play each chunk sequentially
await new Promise<void>((resolve) => {
const audioBuffer = audioContext.createBuffer(1, chunk.length, 24000);
audioBuffer.getChannelData(0).set(chunk);

const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioContext.destination);
source.onEnded = () => resolve();
source.start();
});
}
} catch (error) {
console.error('Streaming failed:', error);
}

Synthesis from Phonemes

If you already have a phoneme string (e.g., from an external library), you can use forwardFromPhonemes or streamFromPhonemes to synthesize audio directly, skipping the internal phonemizer stage.

import {
TextToSpeechModule,
KOKORO_MEDIUM,
KOKORO_VOICE_AF_HEART,
} from 'react-native-executorch';

const tts = await TextToSpeechModule.fromModelName({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});

// Example phonemes for "ExecuTorch"
const waveform = await tts.forwardFromPhonemes('həlˈO wˈɜɹld!', 1.0);

// Or stream from phonemes
for await (const chunk of tts.streamFromPhonemes({
phonemes:
'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.',
speed: 1.0,
})) {
// ... process chunk ...
}