LLM推理02-HuggingFace LLM导出ONNX

官方指导:https://huggingface.co/docs/transformers/v4.35.1/zh/serialization 1、下载TinyLLM模型代码 mkdir llmkitgit clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 2、编写export_onnx.py脚本,导出onnx模型 ...

发布于 LLM

LLM推理01-lookahead decoding性能测试

术语: LADE:lookahead decoding缩写。 简介LADA方法介绍:https://lmsys.org/blog/2023-11-21-lookahead-decoding/ LADE GitHub仓库:https://github.com/hao-ai-lab/LookaheadDecoding 测试流程下载并安装: git clone https://github.c...

发布于 LLM