Version: Next

useInstanceSegmentation

Instance segmentation is a computer vision technique that detects individual objects within an image and produces a per-pixel segmentation mask for each one. Unlike object detection (which only returns bounding boxes), instance segmentation provides precise object boundaries. React Native ExecuTorch offers a dedicated hook useInstanceSegmentation for this task.

info

It is recommended to use models provided by us, which are available at our Hugging Face repository.

API Reference

For detailed API Reference for useInstanceSegmentation see: useInstanceSegmentation API Reference.

High Level Overview

import { models, useInstanceSegmentation } from 'react-native-executorch';
const model = useInstanceSegmentation({
  model: models.instance_segmentation.yolo26n(),
});

const imageUri = 'file:///Users/.../photo.jpg';

try {
  const instances = await model.forward(imageUri);
  // instances is an array of SegmentedInstance objects
} catch (error) {
  console.error(error);
}

Arguments

useInstanceSegmentation takes InstanceSegmentationProps that consists of:

model - An object containing:
- modelName - The name of a built-in model. See InstanceSegmentationModelName for the list of supported models.
- modelSource - The location of the model binary (a URL or a bundled resource).
An optional flag preventLoad which prevents auto-loading of the model.

The hook is generic over the model config — TypeScript automatically infers the correct label type based on the modelName you provide. No explicit generic parameter is needed.

For more information on loading resources, take a look at loading models page.

Returns

useInstanceSegmentation returns an InstanceSegmentationType object containing:

isReady - Whether the model is loaded and ready to process images.
isGenerating - Whether the model is currently processing an image.
error - An error object if the model failed to load or encountered a runtime error.
downloadProgress - A value between 0 and 1 representing the download progress of the model binary.
forward - A function to run inference on an image.
getAvailableInputSizes - Returns the available input sizes for the loaded model, or undefined if the model accepts only a single input size. Use this to populate UI controls for selecting the input resolution.
runOnFrame - A synchronous worklet function for real-time VisionCamera frame processing. See VisionCamera Integration for usage.

Running the model

To run the model, use the forward method. It accepts two arguments:

imageSource (required) - The image to process. Can be a remote URL, a local file URI, a base64-encoded image (whole URI or only raw base64), or a PixelData object (raw RGB pixel buffer).
options (optional) - An InstanceSegmentationOptions object with the following fields:
- confidenceThreshold - Minimum confidence score for including instances. Defaults to the model's configured threshold (typically 0.5).
- iouThreshold - IoU threshold for non-maximum suppression. Defaults to 0.5.
- maxInstances - Maximum number of instances to return. Defaults to 100.
- classesOfInterest - Filter results to include only specific classes (e.g. ['PERSON', 'CAR']). Use label names from the model's label enum (e.g. CocoLabelYolo for YOLO models).
- returnMaskAtOriginalResolution - Whether to resize masks to the original image resolution. Defaults to true.
- inputSize - Input size for the model (e.g. 384, 512, 640). Must be one of the model's available input sizes. If the model has only one forward method (i.e. no availableInputSizes configured), this option is not needed.

forward returns a promise resolving to an array of SegmentedInstance objects, each containing:

bbox - A Bbox object with x1, y1 (top-left corner) and x2, y2 (bottom-right corner) coordinates in the original image's pixel space.
label - The class name of the detected instance, typed to the label map of the chosen model.
score - The confidence score of the detection, between 0 and 1.
mask - A Uint8Array binary mask (0 or 1) representing the instance's segmentation.
maskWidth - Width of the mask array.
maskHeight - Height of the mask array.

Example

import { models, useInstanceSegmentation } from 'react-native-executorch';
function App() {
  const model = useInstanceSegmentation({
    model: models.instance_segmentation.yolo26n(),
  });

  const handleSegment = async () => {
    if (!model.isReady) return;

    const imageUri = 'file:///Users/.../photo.jpg';

    try {
      const instances = await model.forward(imageUri, {
        confidenceThreshold: 0.5,
        inputSize: 640,
      });

      for (const instance of instances) {
        console.log('Label:', instance.label);
        console.log('Score:', instance.score);
        console.log('Bounding box:', instance.bbox);
        console.log('Mask size:', instance.maskWidth, 'x', instance.maskHeight);
      }
    } catch (error) {
      console.error(error);
    }
  };

  // ...
}

VisionCamera integration

See the full guide: VisionCamera Integration.

Supported models

info

YOLO models use the CocoLabelYolo enum (80 classes, 0-indexed), which differs from CocoLabel used by RF-DETR and SSDLite object detection models (91 classes, 1-indexed). When filtering with classesOfInterest, use the label names from CocoLabelYolo.

Model	Number of classes	Class list	Available input sizes
yolo26n-seg	80	COCO (YOLO)	384, 512, 640
yolo26s-seg	80	COCO (YOLO)	384, 512, 640
yolo26m-seg	80	COCO (YOLO)	384, 512, 640
yolo26l-seg	80	COCO (YOLO)	384, 512, 640
yolo26x-seg	80	COCO (YOLO)	384, 512, 640
rfdetr-nano-seg	91	COCO	312 (fixed)
fastsam-s	1	FastSAMLabel	640 (fixed)
fastsam-x	1	FastSAMLabel	640 (fixed)

tip

FastSAM models are class-agnostic, so they segment every instance without classifying it. That makes them a good fit for promptable selection workflows.

Promptable selection

Instance segmentation models return a list of segmented instances. After forward(), you can use prompt-based selectors to pick the instance you want. Use point selection for tap-to-select or cutout tools, box selection for drag-to-outline workflows, and text selection for search or describe-it-in-words flows. For example, a photo-editing app can use point selection to isolate a person, create custom sticker or background-removal flow can use box selection, and a shopping app can use text selection to find a product by name or description:

Load an instance segmentation model with useInstanceSegmentation.
Run forward(image) once to get the detected instances.
Use a selector to pick the instance or instances matching the user's prompt.
Re-run the selector when the prompt changes; you do not need to call forward again unless the image changes.

import {
  models,
  useInstanceSegmentation,
  selectByPoint,
  selectByBox,
  selectByText,
} from 'react-native-executorch';
const model = useInstanceSegmentation({
  model: models.instance_segmentation.fastsam_x(),
});

try {
  const instances = await model.forward(imageUri);

  // Point: the smallest instance whose mask covers (x, y).
  const pointMatch = selectByPoint(instances, x, y);
  console.log('point match:', pointMatch?.bbox);

  // Box: the instance with highest IoU with the prompt box.
  const boxMatch = selectByBox(instances, { x1, y1, x2, y2 });
  console.log('box match:', boxMatch?.bbox);

  // Text: highest cosine similarity between text and per-instance image
  // embeddings (you must provide the embeddings, e.g. with CLIP).
  const textMatch = selectByText(instances, instanceEmbeddings, textEmbedding);
  console.log('text match:', textMatch?.bbox);
} catch (error) {
  console.error(error);
}

For detailed API Reference for selectByPoint, selectByBox, and selectByText see their respective documentation pages:

tip

Use FastSAM-S for faster performance on simple images with non-overlapping instances and FastSAM-X for better accuracy on complex scenes with many overlapping objects.

API Reference​

High Level Overview​

Arguments​

Returns​

Running the model​

Example​

VisionCamera integration​

Supported models​

Promptable selection​