大模型&LLM介紹
參數(shù)量大:億級(jí)參數(shù)
InternLM是輕量級(jí)訓(xùn)練框架,已發(fā)布預(yù)訓(xùn)練模型InternLM-7B、InternLM-20B。
InternLM-7B
本小節(jié)我們將使用 InternStudio 中的 A100(1/4) 機(jī)器和 InternLM-Chat-7B 模型部署一個(gè)智能對(duì)話 Demo。
環(huán)境準(zhǔn)備
選擇英偉達(dá) Cuda11.7 純凈鏡像,基于ubuntu預(yù)裝 Conda
# 創(chuàng)建conda虛擬環(huán)境
/root/share/install_conda_env_internlm_base.sh internlm-demo
# 激活conda環(huán)境
conda activate internlm-demo
# 升級(jí)pip
python -m pip install --upgrade pip
# 安裝依賴
pip install modelscope==1.9.5
pip install transformers==4.35.2
pip install streamlit==1.24.0
pip install sentencepiece==0.1.99
pip install accelerate==0.24.1
下載模型
模型大小為 14 GB,下載模型大概需要 10~20 分鐘
import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='/root/model', revision='v1.0.3')
模型文件列表(pytorch格式):
(base) root@intern-studio:~# ls /root/model/Shanghai_AI_Laboratory/internlm-chat-7b/
README.md pytorch_model-00002-of-00008.bin pytorch_model.bin.index.json
config.json pytorch_model-00003-of-00008.bin special_tokens_map.json
configuration.json pytorch_model-00004-of-00008.bin tokenization_internlm.py
configuration_internlm.py pytorch_model-00005-of-00008.bin tokenizer.model
generation_config.json pytorch_model-00006-of-00008.bin tokenizer_config.json
modeling_internlm.py pytorch_model-00007-of-00008.bin
pytorch_model-00001-of-00008.bin pytorch_model-00008-of-00008.bin
查看模型的配置信息:
# cat /root/model/Shanghai_AI_Laboratory/internlm-chat-7b/config.json
config.json文件內(nèi)容:
{
"architectures": [
"InternLMForCausalLM"
],
"auto_map": {
"AutoConfig": "configuration_internlm.InternLMConfig",
"AutoModel": "modeling_internlm.InternLMForCausalLM",
"AutoModelForCausalLM": "modeling_internlm.InternLMForCausalLM"
},
"bias": true,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 2048,
"model_type": "internlm",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"pad_token_id": 2,
"rms_norm_eps": 1e-06,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.33.2",
"use_cache": true,
"vocab_size": 103168
}
命令行demo:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name_or_path = "/root/model/Shanghai_AI_Laboratory/internlm-chat-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto')
model = model.eval()
system_prompt = """You are an AI assistant whose name is InternLM (書生·浦語(yǔ)).
- InternLM (書生·浦語(yǔ)) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能實(shí)驗(yàn)室). It is designed to be helpful, honest, and harmless.
- InternLM (書生·浦語(yǔ)) can understand and communicate fluently in the language chosen by the user such as English and 中文.
"""
messages = [(system_prompt, '')]
print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")
while True:
input_text = input("User >>> ")
input_text = input_text.replace(' ', '')
if input_text == "exit":
break
response, history = model.chat(tokenizer, input_text, history=messages)
messages.append((input_text, response))
print(f"robot >>> {response}")
在終端運(yùn)行python cli_demo.py 進(jìn)行對(duì)話,輸入exit離開。
web-demo
我們切換到 VScode 中,運(yùn)行 /root/code/InternLM 目錄下的 web_demo.py 文件,輸入以下命令后,查看本教程5.2配置本地端口后,將端口映射到本地。在本地瀏覽器輸入 http://127.0.0.1:6006 即可。
bash
conda activate internlm-demo # 首次進(jìn)入 vscode 會(huì)默認(rèn)是 base 環(huán)境,所以首先切換環(huán)境
cd /root/code/InternLM
streamlit run web_demo.py --server.address 127.0.0.1 --server.port 6006
Lagent智能體工具調(diào)用 Demo
輕量級(jí)智能體框架
本小節(jié)我們將使用 InternStudio 中的 A100(1/4) 機(jī)器、InternLM-Chat-7B 模型和 Lagent 框架部署一個(gè)智能工具調(diào)用 Demo。
Lagent 是一個(gè)輕量級(jí)、開源的基于大語(yǔ)言模型的智能體(agent)框架,支持用戶快速地將一個(gè)大語(yǔ)言模型轉(zhuǎn)變?yōu)槎喾N類型的智能體,并提供了一些典型工具為大語(yǔ)言模型賦能。通過 Lagent 框架可以更好的發(fā)揮 InternLM 的全部性能。

報(bào)錯(cuò)信息
File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3870, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 743, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 19.99 GiB total capacity; 19.42 GiB already allocated; 36.00 MiB free; 19.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
重新測(cè)試,加載運(yùn)行模型成功(Loading checkpoint shards: 100%|███████████████| 8/8 [00:27<00:00, 3.38s/it]),測(cè)試完成demo,此時(shí)工作機(jī)監(jiān)控指標(biāo)如下:
CPU4.02%
內(nèi)存 2.92 / 56 GB5.21%
GPU: Nvidia A100(1/4)0%
顯存 15166 / 20470 MiB74.09%
例子1
已知 2x+3=10,求x。
求解成功。
回答正確。
例子2
已知 2x+3y=10,x + 5y = 20, 求x和y。
求解成功。執(zhí)行結(jié)果: [{x: -10/7, y: 30/7}]
回答錯(cuò)誤。根據(jù)方程組2x+3y=10和x+5y=20,我們可以使用消元法求解得到x=2,y=4。
要求重新回答仍然錯(cuò)誤。
根據(jù)方程組2x+3y=10和x+5y=20,我們可以使用消元法求解得到x=5,y=4。

圖文demo
浦語(yǔ)·靈筆圖文理解創(chuàng)作 Demo
本小節(jié)我們將使用 InternStudio 中的 A100(1/4) * 2 機(jī)器和 internlm-xcomposer-7b 模型部署一個(gè)圖文理解創(chuàng)作 Demo 。
InternLM-Xcomposer-7B模型
環(huán)境配置
pip換源
conda換源
模型下載
三種方式
huggingface-cli
OpenXlab
Modelscope
實(shí)踐
A100數(shù)據(jù)中心顯卡