源碼下載
# 指定 docs_typo_mlc_chat 分支克隆
git clone -b docs_typo_mlc_chat --single-branch https://github.com/mlc-ai/mlc-llm.git
# 進(jìn)入 mlc-llm 項(xiàng)目
cd mlc-llm
# 克隆子模塊代碼
git submodule update --init --recursive
# 進(jìn)入 MLCChat 目錄
cd ./android/MLCChat
編輯環(huán)境變量
vim ~/.bashrc 查看環(huán)境變量
export ANDROID_NDK=/home/lenovo/Android/Sdk/ndk/26.1.10909125
export ANDROID_HOME=/home/lenovo/Android/Sdk
export PATH=$PATH:/home/lenovo/Android/Sdk/cmake/3.10.2.4988404/bin
export PATH=$PATH:/home/lenovo/Android/Sdk/platform-tools
export TVM_NDK_CC=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android23-clang
export JAVA_HOME=/home/lenovo/.jdks/corretto-18.0.2
export PATH=$PATH:$JAVA_HOME/bin
export MLC_LLM_SOURCE_DIR=/sda/xj/t/mlc-llm
export TVM_SOURCE_DIR=/sda/xj/t/mlc-llm/3rdparty/tvm
<font style="color:#DF2A3F;">需要注意的是jdk版本要和androidStudio里面使用的版本保持一致。</font>
<font style="color:#DF2A3F;"></font>
export MLC_LLM_SOURCE_DIR=/sda/xj/mlc-llm-llama3/mlc-llm
export TVM_SOURCE_DIR=/sda/xj/mlc-llm-llama3/mlc-llm/3rdparty/tvm
export MLC_LLM_SOURCE_DIR=/sda/xj/mlc-llm-llama3/mlc-llm
export TVM_SOURCE_DIR=/sda/xj/mlc-llm-llama3/mlc-llm/3rdparty/tvm
轉(zhuǎn)換模型權(quán)重
下載 MiniCPM-2B-dpo-bf16-llama-format 模型庫(kù)
官網(wǎng) huggingface 下載 openbmb/MiniCPM-2B-dpo-bf16-llama-format ,放入 dist/models 目錄。
convert_weight 權(quán)重轉(zhuǎn)換
# 進(jìn)入 mlc-llm 的安卓 MLCChat 根目錄
cd D:\mlc-llm\android\MLCChat
# MiniCPM-2B-dpo-bf16-llama-format 模型轉(zhuǎn)換
mlc_llm convert_weight ./dist/models/MiniCPM-2B-dpo-bf16-llama-format/ --quantization q4f16_1
-o dist/bundle/MiniCPM-2B-dpo-bf16-llama-format-q4f16_1
llama8b模型轉(zhuǎn)化
mlc_llm convert_weight ./dist/models/Llama-3-8B-Instruct-llama-format/ --quantization q3f16_1 -o dist/bundle/Llama-3-8B-Instruct-llama-format-q3f16_1
生成MLC聊天配置
mlc_llm gen_config ./dist/models/MiniCPM-2B-dpo-bf16-llama-format/ --quantization q4f16_1 -
-conv-template redpajama_chat -o dist/bundle/MiniCPM-2B-dpo-bf16-llama-format-q4f16_1/
執(zhí)行成功后, dist/bundle/MiniCPM-2B-dpo-bf16-llama-format-q4f16_1 目錄下會(huì)多生成 mlc-chat-config.json 、 tokenizer.json 、tokenizer.model 、 tokenizer_config.json 四個(gè)文件。
mlc_llm gen_config ./dist/models/llama_3.1_0.5_4-30/ --quantization q4f16_1 --conv-template redpajama_chat -o dist/bundle/llama_3.1_0.5_4-30-q4f16_1/
mlc_llm gen_config ./dist/models/llama3_pruned/ --quantization q0f16 -
-conv-template redpajama_chat -o dist/bundle/llama3_pruned/
mlc_llm convert_weight ./dist/models/llama3_pruned/ --quantization q0f1
-o dist/bundle/llama3_pruned-format-q4f16
mlc_llm gen_config ./dist/models/llama3_pruned/ --quantization q0f16 -
-conv-template redpajama_chat -o dist/bundle/MiniCPM-2B-dpo-bf16-llama-format-q4f16_1/
編譯安卓依賴庫(kù)&jar包
把轉(zhuǎn)換好的 MiniCPM-2B-dpo-bf16-llama-format-q4f16_1 模型復(fù)制到
mlc_llm\model_weights\hf\mlc-ai 目錄下。(<font style="color:#DF2A3F;">model_weights需要?jiǎng)?chuàng)建</font>)若找不到會(huì)
去官網(wǎng) https://huggingface.co/mlc-ai 下載。不建議去下載。下載模型配置文件在MLCChat/mlc-package-config.json內(nèi)編輯。
mlc_llm package
會(huì)生成以下 /dist/lib/mlc4j 目錄下的文件。一個(gè)<font style="color:#DF2A3F;">libtvm4j_runtime_packed.so</font>、<font style="color:#DF2A3F;">tvm4j_core.jar</font>。
構(gòu)建apk
打開AS, 點(diǎn)擊Build → Generate Signed Bundle / APK
啟動(dòng)AS過程中不小心將gradle給清空后,再次下載會(huì)很慢。可以使用國(guó)內(nèi)騰訊源:
<font style="color:rgb(0, 0, 0);background-color:rgb(149, 236, 105);">https://mirrors.cloud.tencent.com/gradle/gradle-8.5-bin.zip</font>
拷貝模型到手機(jī)端
cd mlc-llm\android\MLCChat
python bundle_weight.py --apk-path app/release/app-release.apk
這里的release指的是在AS中需要設(shè)置應(yīng)用前面編譯構(gòu)建正式應(yīng)用。需要在操作6中完成。
mlc_llm convert_weight ./dist/models/MiniCPM-2B-dpo-bf16-llama-format/ --quantization q4f16_1
-o dist/bundle/MiniCPM-2B-dpo-bf16-llama-format-q4f16_1
python bundle_weight.py --apk-path app/debug/app-debug.apk
其他
<font style="color:rgb(56, 58, 66);">python -m pip </font><font style="color:rgb(64, 120, 242);">install</font><font style="color:rgb(56, 58, 66);"> -U mlc-llm-nightly-cu121.whl mlc-ai-nightly-cu121.whl</font>
<font style="color:rgb(56, 58, 66);">mlc_llm convert_weight ./dist/models/llama3_pruned/ --quantization q0f16 -o dist/bundle/llama3-pruned-format-q0f16</font>
<font style="color:rgb(56, 58, 66);">mlc_llm gen_config ./dist/models/llama3_pruned/ --quantization q0f16 --conv-template redpajama_chat -o dist/bundle/llama3-pruned-format-q0f16/</font>
mlc_llm convert_weight ./dist/models/llama3_pruned/ --quantization q4f16_1 -o dist/bundle/llama3-pruned-format-q4f16_1
mlc_llm gen_config ./dist/models/llama3_pruned/ --quantization q4f16_1 --conv-template redpajama_chat -o dist/bundle/llama3-pruned-format-q4f16_1/
/home/xj/sda/xj/mlc-llm-llama3/mlc-llm/android/MLCChat
可用路徑
conda activate mlc-chat-cpm3
project path: /sda/xj/mlc-llm-llama3/mlc-llm/android/MLCChat
setting env:
注意點(diǎn): 在/sda/xj/mlc-llm-llama3/mlc-llm目錄執(zhí)行
export ANDROID_NDK=/home/lenovo/Android/Sdk/ndk/26.1.10909125
export ANDROID_HOME=/home/lenovo/Android/Sdk
export PATH=$PATH:/home/lenovo/Android/Sdk/cmake/3.10.2.4988404/bin
export PATH=$PATH:/home/lenovo/Android/Sdk/platform-tools
export TVM_NDK_CC=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android23-clang
export JAVA_HOME=/home/lenovo/.jdks/corretto-18.0.2
export PATH=$PATH:$JAVA_HOME/bin
export MLC_LLM_SOURCE_DIR=/sda/xj/mlc-llm-llama3/mlc-llm
export TVM_SOURCE_DIR=/sda/xj/mlc-llm-llama3/mlc-llm/3rdparty/tvm
注意點(diǎn):在/sda/xj/mlc-llm-llama3/mlc-llm/android/MLCChat執(zhí)行mlc_llm package
注意點(diǎn):生成聊天配置也是在/sda/xj/mlc-llm-llama3/mlc-llm/android/MLCChat這個(gè)目錄執(zhí)行指令
192.168.1.129
export ANDROID_NDK=/home/xj/Android/Sdk/ndk/26.1.10909125
export ANDROID_HOME=/home/xj/Android/Sdk
export PATH=$PATH:/home/xj/Android/Sdk/cmake/3.10.2.4988404/bin
export PATH=$PATH:/home/xj/Android/Sdk/platform-tools
export TVM_NDK_CC=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android23-clang
export JAVA_HOME=/home/xj/.jdks/corretto-18.0.2
export PATH=$PATH:$JAVA_HOME/bin
export MLC_LLM_SOURCE_DIR=/home/xj/mlc-llm
export TVM_SOURCE_DIR=/home/xj/mlc-llm/3rdparty/tvm
source $HOME/.cargo/env
// 轉(zhuǎn)化模型權(quán)重
mlc_llm convert_weight ./dist/models/Qwen1.5-1.8B-Chat/ --quantization q4f16_1 -o dist/models/qwen1.5-1.8b-q4f16_1
// 生成聊天配置
mlc_llm gen_config ./dist/models/Qwen1.5-1.8B-Chat/ --quantization q4f16_1 --conv-template redpajama_chat -o dist/models/qwen1.5-1.8b-q4f16_1/
mlc_llm gen_config ./dist/models/Qwen1.5-1.8B-Chat \
--model-type qwen2 \
--quantization q4f16_1 \
--conv-template chatml \
--context-window-size 2048 \
--max-batch-size 1 \
-o dist/models/qwen1.5-1.8b-q4f16_1
自動(dòng)化編譯打包
進(jìn)入android project下執(zhí)行構(gòu)建:
cd /sda/xj/mlc-llm-llama3/mlc-llm/android/MLCChat
使用Gradle Wrapper編譯項(xiàng)目:
./gradlew build
打包Release版本的APK:
./gradlew assembleDebug
打包Release版本的APK:
./gradlew assembleRelease
清理項(xiàng)目:
./gradlew clean