Skip to main content

SpeechToTextModule

Hookless implementation of the useSpeechToText hook.

Reference

import { SpeechToTextModule } from 'react-native-executorch';

const audioUrl = ...; // URL with audio to transcribe

// Loading the model
const onSequenceUpdate = (sequence) => {
console.log(sequence);
};
await SpeechToTextModule.load('moonshine', onSequenceUpdate);

// Loading the audio and running the model
await SpeechToTextModule.loadAudio(audioUrl);
const transcribedText = await SpeechToTextModule.transcribe();

Methods

MethodTypeDescription
load(modelName: 'whisper' | 'moonshine, transcribeCallback?: (sequence: string) => void, modelDownloadProgressCalback?: (downloadProgress: number) => void, encoderSource?: ResourceSource, decoderSource?: ResourceSource, tokenizerSource?: ResourceSource)Loads the model specified with modelName, where encoderSource, decoderSource, tokenizerSource are strings specifying the location of the binaries for the models. modelDownloadProgressCallback allows you to monitor the current progress of the model download, while transcribeCallback is invoked with each generated token
transcribe(waveform: number[]): Promise<string>Starts a transcription process for a given input array, which should be a waveform at 16kHz. When no input is provided, it uses an internal state which is set by calling loadAudio. Resolves a promise with the output transcription when the model is finished.
loadAudio(url: string) => voidLoads audio file from given url. It sets an internal state which serves as an input to transcribe().
encode(waveform: number[]) => Promise<number[]>Runs the encoding part of the model. Returns a float array representing the output of the encoder.
decode(tokens: number[], encodings: number[]) => Promise<number[]>Runs the decoder of the model. Returns a single token representing a next token in the output sequence.

Type definitions

type ResourceSource = string | number;

Loading the model

To load the model, use the load method. The required argument is modelName, which serves as an identifier for which model to use. It also accepts accepts optional arguments such as encoderSource, decoderSource, tokenizerSource which are strings that specify the location of the binaries for the model. For more information, take a look at loading models page. This method returns a promise, which can resolve to an error or void.

Running the model

To run the model, you can use the transcribe method. It accepts one argument, which is an array of numbers representing a waveform at 16kHz sampling rate. The method returns a promise, which can resolve either to an error or a string containing the output text.

Obtaining the input

To get the input, you can use the loadAudio method, which sets the internal input state of the model. Then you can just call transcribe without passing any args. It is also possible to pass inputs from other sources, as long as it is a float array containing the aforementioned waveform.