useTokenizer
Tokenization is the process of breaking down text into smaller units called tokens. It’s a crucial step in natural language processing that converts text into a format that machine learning models can understand.
We are using Hugging Face Tokenizers under the hood, ensuring compatibility with the Hugging Face ecosystem.
API Reference
- For detailed API Reference for
useTokenizersee:useTokenizerAPI Reference.
High Level Overview
import { useTokenizer, ALL_MINILM_L6_V2 } from 'react-native-executorch';
const tokenizer = useTokenizer({ tokenizer: ALL_MINILM_L6_V2 });
const text = 'Hello, world!';
try {
// Tokenize the text
const tokens = await tokenizer.encode(text);
console.log('Tokens:', tokens);
// Decode the tokens back to text
const decodedText = await tokenizer.decode(tokens);
console.log('Decoded text:', decodedText);
} catch (error) {
console.error('Error tokenizing text:', error);
}
Arguments
useTokenizer takes TokenizerProps that consists of:
tokenizerof typeKokoroConfigcontainingtokenizerSource.- An optional flag
preventLoadwhich prevents auto-loading of the model.
You need more details? Check the following resources:
- For detailed information about
useTokenizerarguments check this section:useTokenizerarguments. - For more information on loading resources, take a look at loading models page.
Returns
useTokenizer returns an object called TokenizerType containing bunch of functions to interact with Tokenizers. To get more details please read: TokenizerType API Reference.
Example
import { useTokenizer, ALL_MINILM_L6_V2 } from 'react-native-executorch';
function App() {
const tokenizer = useTokenizer({ tokenizer: ALL_MINILM_L6_V2 });
// ...
try {
const text = 'Hello, world!';
const vocabSize = await tokenizer.getVocabSize();
console.log('Vocabulary size:', vocabSize);
const tokens = await tokenizer.encode(text);
console.log('Token IDs:', tokens);
const decoded = await tokenizer.decode(tokens);
console.log('Decoded text:', decoded);
const tokenId = await tokenizer.tokenToId('hello');
console.log('Token ID for "Hello":', tokenId);
const token = await tokenizer.idToToken(tokenId);
console.log('Token for ID:', token);
} catch (error) {
console.error(error);
}
// ...
}