Skip to main content
Version: 0.5.x

Inference Time

warning

Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.

Classification

ModeliPhone 17 Pro (Core ML) [ms]iPhone 16 Pro (Core ML) [ms]iPhone SE 3 (Core ML) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
EFFICIENTNET_V2_S105110149299227

Object Detection

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
SSDLITE_320_MOBILENET_V3_LARGE116120164257129

Style Transfer

ModeliPhone 17 Pro (Core ML) [ms]iPhone 16 Pro (Core ML) [ms]iPhone SE 3 (Core ML) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
STYLE_TRANSFER_CANDY13561550200325782328
STYLE_TRANSFER_MOSAIC13761456197126572394
STYLE_TRANSFER_UDNIE13891499185823802124
STYLE_TRANSFER_RAIN_PRINCESS13391514200426082371

OCR

Notice that the recognizer models were executed between 3 and 7 times during a single recognition. The values below represent the averages across all runs for the benchmark image.

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
Detector (CRAFT_800_QUANTIZED)669649825541474
Recognizer (CRNN_512)4847609172
Recognizer (CRNN_256)2222295130
Recognizer (CRNN_128)1111142817

Vertical OCR

Notice that the recognizer models, as well as detector CRAFT_320 model, were executed between 4 and 21 times during a single recognition. The values below represent the averages across all runs for the benchmark image.

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
Detector (CRAFT_1280_QUANTIZED)17491804210512161171
Detector (CRAFT_320_QUANTIZED)458474561360332
Recognizer (CRNN_512)54526814472
Recognizer (CRNN_64)5672811

LLMs

ModeliPhone 16 Pro (XNNPACK) [tokens/s]iPhone 13 Pro (XNNPACK) [tokens/s]iPhone SE 3 (XNNPACK) [tokens/s]Samsung Galaxy S24 (XNNPACK) [tokens/s]OnePlus 12 (XNNPACK) [tokens/s]
LLAMA3_2_1B16.111.415.619.3
LLAMA3_2_1B_SPINQUANT40.616.716.540.348.2
LLAMA3_2_1B_QLORA31.811.411.237.344.4
LLAMA3_2_3B7.1
LLAMA3_2_3B_SPINQUANT17.28.216.219.4
LLAMA3_2_3B_QLORA14.514.818.1

❌ - Insufficient RAM.

Encoding

Average time for encoding audio of given length over 10 runs. For Whisper model we only list 30 sec audio chunks since Whisper does not accept other lengths (for shorter audio the audio needs to be padded to 30sec with silence).

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
Whisper-tiny (30s)13911372189413031214

Decoding

Average time for decoding one token in sequence of approximately 100 tokens, with encoding context is obtained from audio of noted length.

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
Whisper-tiny (30s)53537410084

Text Embeddings

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
ALL_MINILM_L6_V21616195428
ALL_MPNET_BASE_V211511614414595
MULTI_QA_MINILM_L6_COS_V11616204728
MULTI_QA_MPNET_BASE_DOT_V111211914414696
CLIP_VIT_BASE_PATCH32_TEXT4745576548
info

Benchmark times for text embeddings are highly dependent on the sentence length. The numbers above are based on a sentence of around 80 tokens. For shorter or longer sentences, inference time may vary accordingly.

Image Embeddings

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
CLIP_VIT_BASE_PATCH32_IMAGE7070906658
info

Image embedding benchmark times are measured using 224×224 pixel images, as required by the model. All input images, whether larger or smaller, are resized to 224×224 before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total inference time.

Text to Image

ModeliPhone 17 Pro (XNNPACK) [ms]iPhone 16 Pro (XNNPACK) [ms]iPhone SE 3 (XNNPACK) [ms]Samsung Galaxy S24 (XNNPACK) [ms]OnePlus 12 (XNNPACK) [ms]
BK_SDM_TINY_VPRED_25621184210211883416617