Skip to main content
Version: Next

useTokenizer

Tokenization is the process of breaking down text into smaller units called tokens. It’s a crucial step in natural language processing that converts text into a format that machine learning models can understand.

info

We are using Hugging Face Tokenizers under the hood, ensuring compatibility with the Hugging Face ecosystem.

API Reference

High Level Overview

import { useTokenizer, ALL_MINILM_L6_V2 } from 'react-native-executorch';

const tokenizer = useTokenizer({ tokenizer: ALL_MINILM_L6_V2 });

const text = 'Hello, world!';

try {
// Tokenize the text
const tokens = await tokenizer.encode(text);
console.log('Tokens:', tokens);

// Decode the tokens back to text
const decodedText = await tokenizer.decode(tokens);
console.log('Decoded text:', decodedText);
} catch (error) {
console.error('Error tokenizing text:', error);
}

Arguments

useTokenizer takes TokenizerProps that consists of:

You need more details? Check the following resources:

Returns

useTokenizer returns an object called TokenizerType containing bunch of functions to interact with Tokenizers. To get more details please read: TokenizerType API Reference.

Example

import { useTokenizer, ALL_MINILM_L6_V2 } from 'react-native-executorch';

function App() {
const tokenizer = useTokenizer({ tokenizer: ALL_MINILM_L6_V2 });

// ...

try {
const text = 'Hello, world!';

const vocabSize = await tokenizer.getVocabSize();
console.log('Vocabulary size:', vocabSize);

const tokens = await tokenizer.encode(text);
console.log('Token IDs:', tokens);

const decoded = await tokenizer.decode(tokens);
console.log('Decoded text:', decoded);

const tokenId = await tokenizer.tokenToId('hello');
console.log('Token ID for "Hello":', tokenId);

const token = await tokenizer.idToToken(tokenId);
console.log('Token for ID:', token);
} catch (error) {
console.error(error);
}

// ...
}