useImageEmbeddings
Image Embedding is the process of converting an image into a numerical representation. This representation can be used for tasks, such as classification, clustering and (using contrastive learning like e.g. CLIP model) image search.
It is recommended to use models provided by us, which are available at our Hugging Face repository. You can also use constants shipped with our library.
API Reference
- For detailed API Reference for
useImageEmbeddingssee:useImageEmbeddingsAPI Reference. - For all image embeddings models available out-of-the-box in React Native ExecuTorch see: Image Embeddings Models.
High Level Overview
import {
useImageEmbeddings,
CLIP_VIT_BASE_PATCH32_IMAGE,
} from 'react-native-executorch';
const model = useImageEmbeddings({ model: CLIP_VIT_BASE_PATCH32_IMAGE });
try {
const imageEmbedding = await model.forward('https://url-to-image.jpg');
} catch (error) {
console.error(error);
}
Arguments
useImageEmbeddings takes ImageEmbeddingsProps that consists of:
modelcontainingmodelSource.- An optional flag
preventLoadwhich prevents auto-loading of the model.
You need more details? Check the following resources:
- For detailed information about
useImageEmbeddingsarguments check this section:useImageEmbeddingsarguments. - For all image embeddings models available out-of-the-box in React Native ExecuTorch see: Image Embeddings Models.
- For more information on loading resources, take a look at loading models page.
Returns
useImageEmbeddings returns an object called ImageEmbeddingsType containing bunch of functions to interact with image embeddings models. To get more details please read: ImageEmbeddingsType API Reference.
Running the model
To run the model, you can use the forward method. It accepts one argument which is a URI/URL to an image you want to encode. The function returns a promise, which can resolve either to an error or an array of numbers representing the embedding.
Example
const dotProduct = (a: Float32Array, b: Float32Array) =>
a.reduce((sum, val, i) => sum + val * b[i], 0);
const cosineSimilarity = (a: Float32Array, b: Float32Array) => {
const dot = dotProduct(a, b);
const normA = Math.sqrt(dotProduct(a, a));
const normB = Math.sqrt(dotProduct(b, b));
return dot / (normA * normB);
};
try {
// we assume you've provided catImage and dogImage
const catImageEmbedding = await model.forward(catImage);
const dogImageEmbedding = await model.forward(dogImage);
const similarity = cosineSimilarity(catImageEmbedding, dogImageEmbedding);
console.log(`Cosine similarity: ${similarity}`);
} catch (error) {
console.error(error);
}
Supported models
| Model | Language | Image size | Embedding dimensions | Description |
|---|---|---|---|---|
| clip-vit-base-patch32-image | English | 224×224 | 512 | CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. CLIP allows to embed images and text into the same vector space. This allows to find similar images as well as to implement image search. This is the image encoder part of the CLIP model. To embed text checkout clip-vit-base-patch32-text. |
Image size - The size of an image that the model takes as an input. Resize will happen automatically.
Embedding Dimensions - The size of the output embedding vector. This is the number of dimensions in the vector representation of the input image.
For the supported models, the returned embedding vector is normalized, meaning that its length is equal to 1. This allows for easier comparison of vectors using cosine similarity, just calculate the dot product of two vectors to get the cosine similarity score.