Skip to main content
Version: Next

Inference Time

info

Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.

Inference times are measured directly from native C++ code, wrapping only the model's forward pass, excluding input-dependent pre- and post-processing (e.g. image resizing, normalization) and any overhead from React Native runtime.

Classification

note

For this model all input images, whether larger or smaller, are resized before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total time.

Model / DeviceiPhone 17 Pro [ms]Google Pixel 10 [ms]
EFFICIENTNET_V2_S (XNNPACK FP32)70100
EFFICIENTNET_V2_S (XNNPACK INT8)2238
EFFICIENTNET_V2_S (Core ML FP32)12-
EFFICIENTNET_V2_S (Core ML FP16)5-

Object Detection

note

For this model all input images, whether larger or smaller, are resized before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total time.

Times presented in the tables are measured for YOLO models with input size equal to 512. Other input sizes may yield slower or faster inference times. RF-DETR Nano uses a fixed resolution of 312×312.

Model / DeviceiPhone 17 Pro [ms]Google Pixel 10 [ms]
SSDLITE_320_MOBILENET_V3_LARGE (XNNPACK FP32)2018
SSDLITE_320_MOBILENET_V3_LARGE (Core ML FP32)18-
SSDLITE_320_MOBILENET_V3_LARGE (Core ML FP16)8-
RF_DETR_NANO (XNNPACK FP32)101277
YOLO26N (XNNPACK FP32)2938
YOLO26S (XNNPACK FP32)6072
YOLO26M (XNNPACK FP32)134177
YOLO26L (XNNPACK FP32)169216
YOLO26X (XNNPACK FP32)371434

Style Transfer

note

For this model all input images, whether larger or smaller, are resized before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total time.

Model / DeviceiPhone 17 Pro [ms]Google Pixel 10 [ms]
STYLE_TRANSFER_CANDY (XNNPACK FP32)11921025
STYLE_TRANSFER_CANDY (XNNPACK INT8)272430
STYLE_TRANSFER_CANDY (Core ML FP32)100-
STYLE_TRANSFER_CANDY (Core ML FP16)150-
STYLE_TRANSFER_MOSAIC (XNNPACK FP32)11921025
STYLE_TRANSFER_MOSAIC (XNNPACK INT8)272430
STYLE_TRANSFER_MOSAIC (Core ML FP32)100-
STYLE_TRANSFER_MOSAIC (Core ML FP16)150-
STYLE_TRANSFER_UDNIE (XNNPACK FP32)11921025
STYLE_TRANSFER_UDNIE (XNNPACK INT8)272430
STYLE_TRANSFER_UDNIE (Core ML FP32)100-
STYLE_TRANSFER_UDNIE (Core ML FP16)150-
STYLE_TRANSFER_RAIN_PRINCESS (XNNPACK FP32)11921025
STYLE_TRANSFER_RAIN_PRINCESS (XNNPACK INT8)272430
STYLE_TRANSFER_RAIN_PRINCESS (Core ML FP32)100-
STYLE_TRANSFER_RAIN_PRINCESS (Core ML FP16)150-

OCR

Notice that the recognizer models were executed between 3 and 7 times during a single recognition. The values below represent the averages across all runs for the benchmark image.

ModeliPhone 17 Pro [ms]iPhone 16 Pro [ms]iPhone SE 3Samsung Galaxy S24 [ms]OnePlus 12 [ms]
Total Inference Time652600285510921034
Detector (CRAFT) forward_8002202211740521492
Recognizer (CRNN) forward_51245381104038
Recognizer (CRNN) forward_2562118542019
Recognizer (CRNN) forward_128119271010

Vertical OCR

note

Recognizer models, as well as detector's forward_320 method, were executed between 4 and 21 times during a single recognition.

The values below represent the averages across all runs for the benchmark image.

ModeliPhone 17 Pro
[ms]
iPhone 16 Pro
[ms]
iPhone SE 3Samsung Galaxy S24
[ms]
OnePlus 12
[ms]
Total Inference Time11041113884028452640
Detector (CRAFT) forward_1280501507431714051275
Detector (CRAFT) forward_3201251211060338299
Recognizer (CRNN) forward_51246421094737
Recognizer (CRNN) forward_64561476

LLMs

ModelGoogle Pixel 10 (XNNPACK) [tokens/s]iPhone 17 Pro (XNNPACK) [tokens/s]OnePlus 12 (XNNPACK) [tokens/s]iPhone SE 3 (XNNPACK) [tokens/s]
LLAMA3_2_1B8815N/A
LLAMA3_2_1B_QLORA22224519
LLAMA3_2_1B_SPINQUANT24364817
LLAMA3_2_3B236N/A
LLAMA3_2_3B_QLORA8717N/A
LLAMA3_2_3B_SPINQUANT111218N/A
QWEN3_0_6B79159
QWEN3_0_6B_QUANTIZED20273735
QWEN3_1_7B358N/A
QWEN3_1_7B_QUANTIZED10142013
QWEN3_4B2N/A4N/A
QWEN3_4B_QUANTIZED5710N/A
HAMMER2_1_0_5B13132516
HAMMER2_1_0_5B_QUANTIZED34977256
HAMMER2_1_1_5B5510N/A
HAMMER2_1_1_5B_QUANTIZED14163622
HAMMER2_1_3B235N/A
HAMMER2_1_3B_QUANTIZED91020N/A
SMOLLM2_1_135M25243342
SMOLLM2_1_135M_QUANTIZED20326447
SMOLLM2_1_360M12132015
SMOLLM2_1_360M_QUANTIZED12152918
SMOLLM2_1_1_7B357N/A
SMOLLM2_1_1_7B_QUANTIZED12142723
QWEN2_5_0_5B12122115
QWEN2_5_0_5B_QUANTIZED33315548
QWEN2_5_1_5B559N/A
QWEN2_5_1_5B_QUANTIZED15152816
QWEN2_5_3B235N/A
QWEN2_5_3B_QUANTIZED91018N/A
PHI_4_MINI_4B234N/A
PHI_4_MINI_4B_QUANTIZED4710N/A
LFM2_5_350M16263421
LFM2_5_350M_QUANTIZED586710351
LFM2_5_1_2B_INSTRUCT61013N/A
LFM2_5_1_2B_INSTRUCT_QUANTIZED8264724

❌ - Insufficient RAM.

Speech to Text

Encoding

Average time for encoding audio of given length over 10 runs. For Whisper model we only list 30 sec audio chunks since Whisper does not accept other lengths (for shorter audio the audio needs to be padded to 30sec with silence).

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
Whisper-tiny (30s)8993403277260

Decoding

Average time for decoding one token in sequence of approximately 100 tokens, with encoding context is obtained from audio of noted length.

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
Whisper-tiny (30s)66402825

Text to Speech

Average time to synthesize speech from an input text of approximately 60 tokens, resulting in 2 to 5 seconds of audio depending on the input and selected voice.

ModeliPhone 17 Pro (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
Kokoro-small20511548
Kokoro-medium21241625

Text Embeddings

note

Benchmark times for text embeddings are highly dependent on the sentence length. The numbers below are based on a sentence of around 80 tokens. For shorter or longer sentences, inference time may vary accordingly.

Model / DeviceiPhone 17 Pro [ms]OnePlus 12 [ms]
ALL_MINILM_L6_V2 (XNNPACK)721
ALL_MPNET_BASE_V2 (XNNPACK)2490
MULTI_QA_MINILM_L6_COS_V1 (XNNPACK)719
MULTI_QA_MPNET_BASE_DOT_V1 (XNNPACK)2488
CLIP_VIT_BASE_PATCH32_TEXT (XNNPACK)1439
DISTILUSE_BASE_MULTILINGUAL_CASED_V2 (XNNPACK 8da4w)1615
DISTILUSE_BASE_MULTILINGUAL_CASED_V2 (Core ML FP32)15-

Image Embeddings

note

For this model all input images, whether larger or smaller, are resized before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total time.

Model / DeviceiPhone 17 Pro [ms]Google Pixel 10 [ms]
CLIP_VIT_BASE_PATCH32_IMAGE (XNNPACK FP32)1468
CLIP_VIT_BASE_PATCH32_IMAGE (XNNPACK INT8)1131

Semantic Segmentation

note

For this model all input images, whether larger or smaller, are resized before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total time.

Model / DeviceiPhone 17 Pro [ms]Google Pixel 10 [ms]
DEEPLAB_V3_RESNET50 (XNNPACK FP32)20002200
DEEPLAB_V3_RESNET50 (XNNPACK INT8)118380
DEEPLAB_V3_RESNET101 (XNNPACK FP32)29003300
DEEPLAB_V3_RESNET101 (XNNPACK INT8)174660
DEEPLAB_V3_MOBILENET_V3_LARGE (XNNPACK FP32)131153
DEEPLAB_V3_MOBILENET_V3_LARGE (XNNPACK INT8)1740
LRASPP_MOBILENET_V3_LARGE (XNNPACK FP32)1336
LRASPP_MOBILENET_V3_LARGE (XNNPACK INT8)1220
FCN_RESNET50 (XNNPACK FP32)18002160
FCN_RESNET50 (XNNPACK INT8)100320
FCN_RESNET101 (XNNPACK FP32)26003160
FCN_RESNET101 (XNNPACK INT8)160620

Instance Segmentation

note

Times presented in the tables are measured for YOLO models with input size equal to 512. Other input sizes may yield slower or faster inference times. RF-DETR Nano Seg uses a fixed resolution of 312×312.

ModelSamsung Galaxy S24 (XNNPACK) [ms]Iphone 17 pro (XNNPACK) [ms]
YOLO26N_SEG9290
YOLO26S_SEG220188
YOLO26M_SEG570550
YOLO26L_SEG680608
YOLO26X_SEG14101338
RF_DETR_NANO_SEG549330

Text to image

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
BK_SDM_TINY_VPRED_25621184210211883416617