发布于 

LLM推理02-HuggingFace LLM导出ONNX

官方指导:https://huggingface.co/docs/transformers/v4.35.1/zh/serialization

1、下载TinyLLM模型代码

mkdir llmkit
git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0

2、编写export_onnx.py脚本,导出onnx模型

from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer

model_checkpoint = "./TinyLlama-1.1B-Chat-v1.0"
save_directory = "tinyllama_onnx"

# 从 transformers 加载模型并将其导出为 ONNX
ort_model = ORTModelForCausalLM.from_pretrained(model_checkpoint, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# 保存 onnx 模型以及分词器
ort_model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)

结果:

llmkit tree -h tinyllama_onnx
[4.0K] tinyllama_onnx
├── [ 697] config.json
├── [2.0M] model.onnx
├── [4.1G] model.onnx_data
├── [ 551] special_tokens_map.json
├── [1.3K] tokenizer_config.json
├── [1.8M] tokenizer.json
└── [488K] tokenizer.model

0 directories, 7 files

输出的onnx模型大小有4G,有点奇怪,晚点再确认下原因,太晚了,睡觉。