useInstanceSegmentation
Instance segmentation is a computer vision technique that detects individual objects within an image and produces a per-pixel segmentation mask for each one. Unlike object detection (which only returns bounding boxes), instance segmentation provides precise object boundaries. React Native ExecuTorch offers a dedicated hook useInstanceSegmentation for this task.
It is recommended to use models provided by us, which are available at our Hugging Face repository.
API Reference
- For detailed API Reference for
useInstanceSegmentationsee:useInstanceSegmentationAPI Reference.
High Level Overview
import { models, useInstanceSegmentation } from 'react-native-executorch';
const model = useInstanceSegmentation({
model: models.instance_segmentation.yolo26n(),
});
const imageUri = 'file:///Users/.../photo.jpg';
try {
const instances = await model.forward(imageUri);
// instances is an array of SegmentedInstance objects
} catch (error) {
console.error(error);
}
Arguments
useInstanceSegmentation takes InstanceSegmentationProps that consists of:
model- An object containing:modelName- The name of a built-in model. SeeInstanceSegmentationModelNamefor the list of supported models.modelSource- The location of the model binary (a URL or a bundled resource).
- An optional flag
preventLoadwhich prevents auto-loading of the model.
The hook is generic over the model config — TypeScript automatically infers the correct label type based on the modelName you provide. No explicit generic parameter is needed.
For more information on loading resources, take a look at loading models page.
Returns
useInstanceSegmentation returns an InstanceSegmentationType object containing:
isReady- Whether the model is loaded and ready to process images.isGenerating- Whether the model is currently processing an image.error- An error object if the model failed to load or encountered a runtime error.downloadProgress- A value between 0 and 1 representing the download progress of the model binary.forward- A function to run inference on an image.getAvailableInputSizes- Returns the available input sizes for the loaded model, orundefinedif the model accepts only a single input size. Use this to populate UI controls for selecting the input resolution.runOnFrame- A synchronous worklet function for real-time VisionCamera frame processing. See VisionCamera Integration for usage.
Running the model
To run the model, use the forward method. It accepts two arguments:
imageSource(required) - The image to process. Can be a remote URL, a local file URI, a base64-encoded image (whole URI or only raw base64), or aPixelDataobject (raw RGB pixel buffer).options(optional) - AnInstanceSegmentationOptionsobject with the following fields:confidenceThreshold- Minimum confidence score for including instances. Defaults to the model's configured threshold (typically0.5).iouThreshold- IoU threshold for non-maximum suppression. Defaults to0.5.maxInstances- Maximum number of instances to return. Defaults to100.classesOfInterest- Filter results to include only specific classes (e.g.['PERSON', 'CAR']). Use label names from the model's label enum (e.g.CocoLabelYolofor YOLO models).returnMaskAtOriginalResolution- Whether to resize masks to the original image resolution. Defaults totrue.inputSize- Input size for the model (e.g.384,512,640). Must be one of the model's available input sizes. If the model has only one forward method (i.e. noavailableInputSizesconfigured), this option is not needed.
forward returns a promise resolving to an array of SegmentedInstance objects, each containing:
bbox- ABboxobject withx1,y1(top-left corner) andx2,y2(bottom-right corner) coordinates in the original image's pixel space.label- The class name of the detected instance, typed to the label map of the chosen model.score- The confidence score of the detection, between 0 and 1.mask- AUint8Arraybinary mask (0 or 1) representing the instance's segmentation.maskWidth- Width of the mask array.maskHeight- Height of the mask array.
Example
import { models, useInstanceSegmentation } from 'react-native-executorch';
function App() {
const model = useInstanceSegmentation({
model: models.instance_segmentation.yolo26n(),
});
const handleSegment = async () => {
if (!model.isReady) return;
const imageUri = 'file:///Users/.../photo.jpg';
try {
const instances = await model.forward(imageUri, {
confidenceThreshold: 0.5,
inputSize: 640,
});
for (const instance of instances) {
console.log('Label:', instance.label);
console.log('Score:', instance.score);
console.log('Bounding box:', instance.bbox);
console.log('Mask size:', instance.maskWidth, 'x', instance.maskHeight);
}
} catch (error) {
console.error(error);
}
};
// ...
}
VisionCamera integration
See the full guide: VisionCamera Integration.
Supported models
YOLO models use the CocoLabelYolo enum (80 classes, 0-indexed), which differs from CocoLabel used by RF-DETR and SSDLite object detection models (91 classes, 1-indexed). When filtering with classesOfInterest, use the label names from CocoLabelYolo.
| Model | Number of classes | Class list | Available input sizes |
|---|---|---|---|
| yolo26n-seg | 80 | COCO (YOLO) | 384, 512, 640 |
| yolo26s-seg | 80 | COCO (YOLO) | 384, 512, 640 |
| yolo26m-seg | 80 | COCO (YOLO) | 384, 512, 640 |
| yolo26l-seg | 80 | COCO (YOLO) | 384, 512, 640 |
| yolo26x-seg | 80 | COCO (YOLO) | 384, 512, 640 |
| rfdetr-nano-seg | 91 | COCO | 312 (fixed) |
| fastsam-s | 1 | FastSAMLabel | 640 (fixed) |
| fastsam-x | 1 | FastSAMLabel | 640 (fixed) |
FastSAM models are class-agnostic, so they segment every instance without classifying it. That makes them a good fit for promptable selection workflows.
Promptable selection
Instance segmentation models return a list of segmented instances. After forward(), you can use prompt-based selectors to pick the instance you want. Use point selection for tap-to-select or cutout tools, box selection for drag-to-outline workflows, and text selection for search or describe-it-in-words flows. For example, a photo-editing app can use point selection to isolate a person, create custom sticker or background-removal flow can use box selection, and a shopping app can use text selection to find a product by name or description:
- Load an instance segmentation model with
useInstanceSegmentation. - Run
forward(image)once to get the detected instances. - Use a selector to pick the instance or instances matching the user's prompt.
- Re-run the selector when the prompt changes; you do not need to call
forwardagain unless the image changes.
import {
models,
useInstanceSegmentation,
selectByPoint,
selectByBox,
selectByText,
} from 'react-native-executorch';
const model = useInstanceSegmentation({
model: models.instance_segmentation.fastsam_x(),
});
try {
const instances = await model.forward(imageUri);
// Point: the smallest instance whose mask covers (x, y).
const pointMatch = selectByPoint(instances, x, y);
console.log('point match:', pointMatch?.bbox);
// Box: the instance with highest IoU with the prompt box.
const boxMatch = selectByBox(instances, { x1, y1, x2, y2 });
console.log('box match:', boxMatch?.bbox);
// Text: highest cosine similarity between text and per-instance image
// embeddings (you must provide the embeddings, e.g. with CLIP).
const textMatch = selectByText(instances, instanceEmbeddings, textEmbedding);
console.log('text match:', textMatch?.bbox);
} catch (error) {
console.error(error);
}
For detailed API Reference for selectByPoint, selectByBox, and selectByText see their respective documentation pages:
Use FastSAM-S for faster performance on simple images with non-overlapping instances and FastSAM-X for better accuracy on complex scenes with many overlapping objects.