LLM推理02-HuggingFace LLM导出ONNX

官方指导：https://huggingface.co/docs/transformers/v4.35.1/zh/serialization

1、下载TinyLLM模型代码

mkdir llmkit
git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0

2、编写export_onnx.py脚本，导出onnx模型

from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer

model_checkpoint = "./TinyLlama-1.1B-Chat-v1.0"
save_directory = "tinyllama_onnx"

# 从 transformers 加载模型并将其导出为 ONNX
ort_model = ORTModelForCausalLM.from_pretrained(model_checkpoint, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# 保存 onnx 模型以及分词器
ort_model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)

结果：

llmkit tree -h tinyllama_onnx
[4.0K]  tinyllama_onnx
├── [ 697]  config.json
├── [2.0M]  model.onnx
├── [4.1G]  model.onnx_data
├── [ 551]  special_tokens_map.json
├── [1.3K]  tokenizer_config.json
├── [1.8M]  tokenizer.json
└── [488K]  tokenizer.model

0 directories, 7 files

输出的onnx模型大小有4G，有点奇怪，晚点再确认下原因，太晚了，睡觉。

本文采用署名-非商业性使用-相同方式共享 4.0 国际许可协议，转载请注明出处。