SpeechToTextModule
TypeScript API implementation of the useSpeechToText hook.
API Reference
- For detailed API Reference for
SpeechToTextModulesee:SpeechToTextModuleAPI Reference. - For all speech to text models available out-of-the-box in React Native ExecuTorch see: STT Models.
High Level Overview
import { SpeechToTextModule, WHISPER_TINY_EN } from 'react-native-executorch';
const model = new SpeechToTextModule();
await model.load(WHISPER_TINY_EN, (progress) => {
console.log(progress);
});
await model.transcribe(waveform);
Methods
All methods of SpeechToTextModule are explained in details here: SpeechToTextModule API Reference
committedcontains the latest part of the transcription that is finalized and will not change. To obtain the full transcription during streaming, concatenate all thecommittedvalues yielded over time. Useful for displaying stable results during streaming.nonCommittedcontains the part of the transcription that is still being processed and may change. Useful for displaying live, partial results during streaming.
Loading the model
Create an instance of SpeechToTextModule and use the load method. It accepts an object with the following fields:
-
model- Object containing:-
isMultilingual- Flag indicating if model is multilingual. -
encoderSource- The location of the used encoder. -
decoderSource- The location of the used decoder. -
tokenizerSource- The location of the used tokenizer.
-
-
onDownloadProgressCallback- Callback to track download progress.
This method returns a promise, which can resolve to an error or void.
For more information on loading resources, take a look at loading models page.
Running the model
To run the model, you can use the transcribe method. It accepts one argument, which is an array of type Float32Array representing a waveform at 16kHz sampling rate. The method returns a promise, which can resolve either to an error or a string containing the output text.
Multilingual transcription
If you aim to obtain a transcription in other languages than English, use the multilingual version of whisper. To obtain the output text in your desired language, pass the DecodingOptions object with the language field set to your desired language code.
import { SpeechToTextModule, WHISPER_TINY } from 'react-native-executorch';
const model = new SpeechToTextModule();
await model.load(WHISPER_TINY, (progress) => {
console.log(progress);
});
const transcription = await model.transcribe(spanishAudio, { language: 'es' });
Example
Transcription
import { SpeechToTextModule, WHISPER_TINY_EN } from 'react-native-executorch';
import { AudioContext } from 'react-native-audio-api';
import * as FileSystem from 'expo-file-system';
// Load the model
const model = new SpeechToTextModule();
// Download the audio file
const { uri } = await FileSystem.downloadAsync(
'https://some-audio-url.com/file.mp3',
FileSystem.cacheDirectory + 'audio_file'
);
// Decode the audio data
const audioContext = new AudioContext({ sampleRate: 16000 });
const decodedAudioData = await audioContext.decodeAudioDataSource(uri);
const audioBuffer = decodedAudioData.getChannelData(0);
// Transcribe the audio
try {
const transcription = await model.transcribe(audioBuffer);
console.log(transcription);
} catch (error) {
console.error('Error during audio transcription', error);
}
Streaming Transcription
import { SpeechToTextModule, WHISPER_TINY_EN } from 'react-native-executorch';
import { AudioManager, AudioRecorder } from 'react-native-audio-api';
// Load the model
const model = new SpeechToTextModule();
await model.load(WHISPER_TINY_EN, (progress) => {
console.log(progress);
});
// Configure audio session
AudioManager.setAudioSessionOptions({
iosCategory: 'playAndRecord',
iosMode: 'spokenAudio',
iosOptions: ['allowBluetooth', 'defaultToSpeaker'],
});
AudioManager.requestRecordingPermissions();
// Initialize audio recorder
const recorder = new AudioRecorder({
sampleRate: 16000,
bufferLengthInSamples: 1600,
});
recorder.onAudioReady(({ buffer }) => {
// Insert the audio into the streaming transcription
model.streamInsert(buffer.getChannelData(0));
});
recorder.start();
// Start streaming transcription
try {
let transcription = '';
for await (const { committed, nonCommitted } of model.stream()) {
console.log('Streaming transcription:', { committed, nonCommitted });
transcription += committed;
}
console.log('Final transcription:', transcription);
} catch (error) {
console.error('Error during streaming transcription:', error);
}
// Stop streaming transcription
model.streamStop();
recorder.stop();