Skip to main content
Version: Next

useTextToImage

Text-to-image is a process of generating images directly from a description in natural language by conditioning a model on the provided text input. Our implementation follows the Stable Diffusion pipeline, which applies the diffusion process in a lower-dimensional latent space to reduce memory requirements. The pipeline combines a text encoder to preprocess the prompt, a U-Net that iteratively denoises latent representations, and a VAE decoder to reconstruct the final image. React Native ExecuTorch offers a dedicated hook, useTextToImage, for this task.

warning

It is recommended to use models provided by us which are available at our Hugging Face repository, you can also use constants shipped with our library.

Reference

import { useTextToImage, BK_SDM_TINY_VPRED_256 } from 'react-native-executorch';

const model = useTextToImage({ model: BK_SDM_TINY_VPRED_256 });

const input = 'a castle';

try {
const image = await model.generate(input);
} catch (error) {
console.error(error);
}

Arguments

model - Object containing the model source.

  • schedulerSource - A string that specifies the location of the scheduler config.

  • tokenizerSource - A string that specifies the location of the tokenizer config.

  • encoderSource - A string that specifies the location of the text encoder binary.

  • unetSource - A string that specifies the location of the U-Net binary.

  • decoderSource - A string that specifies the location of the VAE decoder binary.

preventLoad? - Boolean that can prevent automatic model loading (and downloading the data if you load it for the first time) after running the hook.

For more information on loading resources, take a look at loading models page.

Returns

FieldTypeDescription
generate(input: string, imageSize?: number, numSteps?: number, seed?: number) => Promise<string>Runs the model to generate an image described by input, and conditioned by seed, performing numSteps inference steps. The resulting image, with dimensions imageSize×imageSize pixels, is returned as a base64-encoded string.
errorstring | nullContains the error message if the model failed to load.
isGeneratingbooleanIndicates whether the model is currently processing an inference.
isReadybooleanIndicates whether the model has successfully loaded and is ready for inference.
downloadProgressnumberRepresents the download progress as a value between 0 and 1.
interrupt()() => voidInterrupts the current inference. The model is stopped in the nearest inference step.

Running the model

To run the model, you can use the forward method. It accepts four arguments: a text prompt describing the requested image, a size of the image in pixels, a number of denoising steps, and an optional seed value, which enables reproducibility of the results.

The image size must be a multiple of 32 due to the architecture of the U-Net and VAE models. The seed should be a positive integer.

warning

Larger imageSize values require significantly more memory to run the model.

Example

import { useTextToImage, BK_SDM_TINY_VPRED_256 } from 'react-native-executorch';

function App() {
const model = useTextToImage({ model: BK_SDM_TINY_VPRED_256 });

//...
const input = 'a medieval castle by the sea shore';

const imageSize = 256;
const numSteps = 25;

try {
image = await model.generate(input, imageSize, numSteps);
} catch (error) {
console.error(error);
}
//...

return <Image source={{ uri: `data:image/png;base64,${image}` }} />;
}
Castle 256x256Castle 512x512
Image of size 256×256Image of size 512×512

Supported models

ModelParameters [B]Description
bk-sdm-tiny-vpred0.5BK-SDM (Block-removed Knowledge-distilled Stable Diffusion Model) is a compressed version of Stable Diffusion v1.4 with several residual and attention blocks removed. The BK-SDM-Tiny is a v-prediction variant of the model, obtained through further block removal, built around a 0.33B-parameter U-Net.

Benchmarks

info

The number following the underscore (_) indicates that the model supports generating image with dimensions ranging from 128 pixels up to that value. This setting doesn’t affect the model’s file size - it only determines how memory is allocated at runtime, based on the maximum allowed image size.

Model size

ModelText encoder (XNNPACK) [MB]UNet (XNNPACK) [MB]VAE decoder (XNNPACK) [MB]
BK_SDM_TINY_VPRED_2564921290198
BK_SDM_TINY_VPRED_5124921290198

Memory usage

ModelAndroid (XNNPACK) [MB]iOS (XNNPACK) [MB]
BK_SDM_TINY_VPRED_25629002800
BK_SDM_TINY_VPRED_51267006560

Inference time

ModeliPhone 16 Pro (XNNPACK) [ms]iPhone 14 Pro Max (XNNPACK) [ms]iPhone SE 3 (XNNPACK)Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
BK_SDM_TINY_VPRED_256191002500023100
info

Text-to-image benchmark times are measured generating 256×256 images in 10 inference steps.