Skip to main content
Version: Next

Model Size

Classification

ModelXNNPACK FP32 [MB]XNNPACK INT8 [MB]Core ML FP32 [MB]Core ML FP16 [MB]
EFFICIENTNET_V2_S85.722.986.543.9

Object Detection

ModelXNNPACK FP32 [MB]Core ML FP32 [MB]Core ML FP16 [MB]
SSDLITE_320_MOBILENET_V3_LARGE13.915.68.46
RF_DETR_NANO112--
YOLO26N10.3--
YOLO26S38.6--
YOLO26M82.3--
YOLO26L100--
YOLO26X224--

Instance Segmentation

ModelXNNPACK [MB]
YOLO26N_SEG11.6
YOLO26S_SEG42.3
YOLO26M_SEG95.4
YOLO26L_SEG113
YOLO26X_SEG252
RF_DETR_NANO_SEG124

Style Transfer

ModelXNNPACK FP32 [MB]XNNPACK INT8 [MB]Core ML FP32 [MB]Core ML FP16 [MB]
STYLE_TRANSFER_CANDY6.821.847.123.79
STYLE_TRANSFER_MOSAIC6.821.847.123.79
STYLE_TRANSFER_UDNIE6.821.847.123.79
STYLE_TRANSFER_RAIN_PRINCESS6.821.847.123.79

OCR

ModelXNNPACK [MB]
Detector (CRAFT_QUANTIZED)20.9
Recognizer (CRNN)18.5 - 25.2*

* - The model weights vary depending on the language.

Vertical OCR

ModelXNNPACK [MB]
Detector (CRAFT_QUANTIZED)20.9
Recognizer (CRNN)18.5 - 25.2*

* - The model weights vary depending on the language.

LLMs

ModelXNNPACK [GB]
LLAMA3_2_1B2.47
LLAMA3_2_1B_SPINQUANT1.14
LLAMA3_2_1B_QLORA1.18
LLAMA3_2_3B6.43
LLAMA3_2_3B_SPINQUANT2.55
LLAMA3_2_3B_QLORA2.65
QWEN3_0.6B1.11
QWEN3_0.6B_QUANTIZED0.47
QWEN3_1.7B3.21
QWEN3_1.7B_QUANTIZED1.21
QWEN3_4B7.49
QWEN3_4B_QUANTIZED2.50
QWEN2_5_0.5B0.92
QWEN2_5_0.5B_QUANTIZED0.39
QWEN2_5_1.5B2.88
QWEN2_5_1.5B_QUANTIZED1.06
QWEN2_5_3B5.75
QWEN2_5_3B_QUANTIZED1.95
HAMMER2_1_0.5B0.92
HAMMER2_1_0.5B_QUANTIZED0.39
HAMMER2_1_1.5B2.88
HAMMER2_1_1.5B_QUANTIZED1.06
HAMMER2_1_3B5.75
HAMMER2_1_3B_QUANTIZED1.91
PHI4_MINI7.15
PHI4_MINI_QUANTIZED2.62
SMOLLM2_135M0.25
SMOLLM2_135M_QUANTIZED0.52
SMOLLM2_360M0.67
SMOLLM2_360M_QUANTIZED1.27
SMOLLM2_1.7B3.19
SMOLLM2_1.7B_QUANTIZED0.95
LFM2_5_1.2B_INSTRUCT2.43
LFM2_5_1.2B_INSTRUCT_QUANTIZED0.74
LFM2_5_350M_FP160.79
LFM2_5_350M_QUANTIZED0.26

Speech to text

ModelXNNPACK [MB]
WHISPER_TINY_EN151
WHISPER_TINY151
WHISPER_BASE_EN290.6
WHISPER_BASE290.6
WHISPER_SMALL_EN968
WHISPER_SMALL968

Text to speech

ModelXNNPACK [MB]
KOKORO_SMALL329.6
KOKORO_MEDIUM334.4

Text Embeddings

ModelSize [MB]
ALL_MINILM_L6_V291
ALL_MPNET_BASE_V2438
MULTI_QA_MINILM_L6_COS_V191
MULTI_QA_MPNET_BASE_DOT_V1438
CLIP_VIT_BASE_PATCH32_TEXT254
DISTILUSE_BASE_MULTILINGUAL_CASED_V2_8DA4W393
DISTILUSE_BASE_MULTILINGUAL_CASED_V2_COREML541

Image Embeddings

ModelXNNPACK FP32 [MB]XNNPACK INT8 [MB]
CLIP_VIT_BASE_PATCH32_IMAGE35296.4

Semantic Segmentation

ModelXNNPACK FP32 [MB]XNNPACK INT8 [MB]
DEEPLAB_V3_RESNET5016842.4
DEEPLAB_V3_RESNET10124461.7
DEEPLAB_V3_MOBILENET_V3_LARGE44.111.4
LRASPP_MOBILENET_V3_LARGE12.93.53
FCN_RESNET5014135.7
FCN_RESNET10121755

Text to image

ModelText encoder (XNNPACK) [MB]UNet (XNNPACK) [MB]VAE decoder (XNNPACK) [MB]
BK_SDM_TINY_VPRED4921290198

Voice Activity Detection (VAD)

ModelXNNPACK [MB]
FSMN_VAD1.83