Exporting Llama
In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model.
Steps to export Llama
1. Create an account
Get a HuggingFace account. This will allow you to download needed files. You can also use the official Llama website.
2. Select a model
Pick the model that suits your needs. Before you download it, you'll need to accept a license. For best performance, we recommend using Spin-Quant or QLoRA versions of the model:
- Llama 3.2 3B
- Llama 3.2 1B
- Llama 3.2 3B Spin-Quant
- Llama 3.2 1B Spin-Quant
- Llama 3.2 3B QLoRA
- Llama 3.2 1B QLoRA
3. Download files
Download the consolidated.00.pth
, params.json
and tokenizer.model
files. If you can't see them, make sure to check the original
directory.
4. Rename the tokenizer file
Rename the tokenizer.model
file to tokenizer.bin
as required by the library:
mv tokenizer.model tokenizer.bin
5. Run the export script
Navigate to the llama_export
directory and run the following command:
./build_llama_binary.sh --model-path /path/to/consolidated.00.pth --params-path /path/to/params.json
The script will pull a Docker image from Docker Hub, and then run it to export the model. By default the output (llama3_2.pte file) will be saved in the llama-export/outputs
directory. However, you can override that behavior with the --output-path [path]
flag.
This Docker image was tested on MacOS with ARM chip. This might not work in other environments.