Skip to main content
Version: Next

Class: SpeechToTextModule

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:17

Module for Speech to Text (STT) functionalities.

Methods

decode()

decode(tokens, encoderOutput): Promise<Float32Array<ArrayBufferLike>>

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:140

Runs the decoder of the model.

Parameters

tokens

Int32Array

The input tokens.

encoderOutput

Float32Array

The encoder output.

Returns

Promise<Float32Array<ArrayBufferLike>>

Decoded output.


delete()

delete(): void

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:119

Unloads the model from memory.

Returns

void


encode()

encode(waveform): Promise<Float32Array<ArrayBufferLike>>

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:129

Runs the encoding part of the model on the provided waveform. Returns the encoded waveform as a Float32Array.

Parameters

waveform

Float32Array

The input audio waveform.

Returns

Promise<Float32Array<ArrayBufferLike>>

The encoded output.


stream()

stream(options?): AsyncGenerator<{ committed: TranscriptionResult; nonCommitted: TranscriptionResult; }>

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:180

Starts a streaming transcription session. Yields objects with committed and nonCommitted transcriptions. Committed transcription contains the part of the transcription that is finalized and will not change. Useful for displaying stable results during streaming. Non-committed transcription contains the part of the transcription that is still being processed and may change. Useful for displaying live, partial results during streaming. Use with streamInsert and streamStop to control the stream.

Parameters

options?

DecodingOptions = {}

Decoding options including language.

Returns

AsyncGenerator<{ committed: TranscriptionResult; nonCommitted: TranscriptionResult; }>

An async generator yielding transcription updates.

Yields

An object containing committed and nonCommitted transcription results.


streamInsert()

streamInsert(waveform): void

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:252

Inserts a new audio chunk into the streaming transcription session.

Parameters

waveform

Float32Array

The audio chunk to insert.

Returns

void


streamStop()

streamStop(): void

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:259

Stops the current streaming transcription session.

Returns

void


transcribe()

transcribe(waveform, options?): Promise<TranscriptionResult>

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:156

Starts a transcription process for a given input array (16kHz waveform). For multilingual models, specify the language in options. Returns the transcription as a string. Passing number[] is deprecated.

Parameters

waveform

Float32Array

The Float32Array audio data.

options?

DecodingOptions = {}

Decoding options including language.

Returns

Promise<TranscriptionResult>

The transcription string.


fromCustomModel()

static fromCustomModel(modelSource, tokenizerSource, isMultilingual, onDownloadProgress?): Promise<SpeechToTextModule>

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:69

Creates a Speech to Text instance with user-provided model binaries. Use this when working with a custom-exported STT model. Internally uses 'custom' as the model name for telemetry.

Parameters

modelSource

ResourceSource

A fetchable resource pointing to the model binary.

tokenizerSource

ResourceSource

A fetchable resource pointing to the tokenizer file.

isMultilingual

boolean

Whether the model supports multiple languages.

onDownloadProgress?

(progress) => void

Optional callback to monitor download progress, receiving a value between 0 and 1.

Returns

Promise<SpeechToTextModule>

A Promise resolving to a SpeechToTextModule instance.

Remarks

The native model contract for this method is not formally defined and may change between releases. Currently only the Whisper architecture is supported by the native runner. Refer to the native source code for the current expected interface.


fromModelName()

static fromModelName(namedSources, onDownloadProgress?): Promise<SpeechToTextModule>

Defined in: modules/natural_language_processing/SpeechToTextModule.ts:40

Creates a Speech to Text instance for a built-in model.

Parameters

namedSources

SpeechToTextModelConfig

Configuration object containing model name, sources, and multilingual flag.

onDownloadProgress?

(progress) => void

Optional callback to monitor download progress, receiving a value between 0 and 1.

Returns

Promise<SpeechToTextModule>

A Promise resolving to a SpeechToTextModule instance.

Example

import { SpeechToTextModule, WHISPER_TINY_EN } from 'react-native-executorch';
const stt = await SpeechToTextModule.fromModelName(WHISPER_TINY_EN);