Version: Next

useTokenizer

Tokenization is the process of breaking down text into smaller units called tokens. It’s a crucial step in natural language processing that converts text into a format that machine learning models can understand.

info

We are using Hugging Face Tokenizers under the hood, ensuring compatibility with the Hugging Face ecosystem.

API Reference

For detailed API Reference for useTokenizer see: useTokenizer API Reference.

High Level Overview

import { useTokenizer, ALL_MINILM_L6_V2 } from 'react-native-executorch';

const tokenizer = useTokenizer({ tokenizer: ALL_MINILM_L6_V2 });

const text = 'Hello, world!';

try {
  // Tokenize the text
  const tokens = await tokenizer.encode(text);
  console.log('Tokens:', tokens);

  // Decode the tokens back to text
  const decodedText = await tokenizer.decode(tokens);
  console.log('Decoded text:', decodedText);
} catch (error) {
  console.error('Error tokenizing text:', error);
}

Arguments

useTokenizer takes TokenizerProps that consists of:

tokenizer of type KokoroConfig containing tokenizerSource.
An optional flag preventLoad which prevents auto-loading of the model.

You need more details? Check the following resources:

For detailed information about useTokenizer arguments check this section: useTokenizer arguments.
For more information on loading resources, take a look at loading models page.

Returns

useTokenizer returns an object called TokenizerType containing bunch of functions to interact with Tokenizers. To get more details please read: TokenizerType API Reference.

Example

import { useTokenizer, ALL_MINILM_L6_V2 } from 'react-native-executorch';

function App() {
  const tokenizer = useTokenizer({ tokenizer: ALL_MINILM_L6_V2 });

  // ...

  try {
    const text = 'Hello, world!';

    const vocabSize = await tokenizer.getVocabSize();
    console.log('Vocabulary size:', vocabSize);

    const tokens = await tokenizer.encode(text);
    console.log('Token IDs:', tokens);

    const decoded = await tokenizer.decode(tokens);
    console.log('Decoded text:', decoded);

    const tokenId = await tokenizer.tokenToId('hello');
    console.log('Token ID for "Hello":', tokenId);

    const token = await tokenizer.idToToken(tokenId);
    console.log('Token for ID:', token);
  } catch (error) {
    console.error(error);
  }

  // ...
}

API Reference​

High Level Overview​

Arguments​

Returns​

Example​

API Reference

High Level Overview

Arguments

Returns

Example