LLM推理02-HuggingFace LLM导出ONNX
官方指导:https://huggingface.co/docs/transformers/v4.35.1/zh/serialization
1、下载TinyLLM模型代码
mkdir llmkit git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
2、编写export_onnx.py脚本,导出onnx模型
from optimum.onnxruntime import ORTModelForCausalLM from transformers import AutoTokenizer
model_checkpoint = "./TinyLlama-1.1B-Chat-v1.0" save_directory = "tinyllama_onnx"
ort_model = ORTModelForCausalLM.from_pretrained(model_checkpoint, export=True) tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
ort_model.save_pretrained(save_directory) tokenizer.save_pretrained(save_directory)
|
结果:
llmkit tree -h tinyllama_onnx [4.0K] tinyllama_onnx ├── [ 697] config.json ├── [2.0M] model.onnx ├── [4.1G] model.onnx_data ├── [ 551] special_tokens_map.json ├── [1.3K] tokenizer_config.json ├── [1.8M] tokenizer.json └── [488K] tokenizer.model
0 directories, 7 files
|
输出的onnx模型大小有4G,有点奇怪,晚点再确认下原因,太晚了,睡觉。