搭建企業(yè)內部的大語言模型系統(tǒng)

大綱

  • 開源大語言模型
  • 大語言模型管理
  • 私有大語言模型服務部署方案

開源大語言模型

擔心安全與隱私?可私有部署的開源大模型

  • 商業(yè)大模型,不支持私有部署
    • ChatGPT
    • Claude
    • Google Gemini
    • 百度問心一言
  • 開源大模型,支持私有部署
    • Mistral
    • Meta Llama
    • ChatGLM
    • 阿里通義千問

常用開源大模型列表

image.png

開源大模型分支

image.png

大語言模型管理

大語言模型管理工具

  • HuggingFace 全面的大語言模型管理平臺
  • Ollama 在本地管理大語言模型,下載速度超快
  • llama.cpp 在本地和云端的各種硬件上以最少的設置和最先進的性能實現(xiàn) LLM 推理
  • GPT4All 一個免費使用、本地運行、具有隱私意識的聊天機器人。無需 GPU 或互聯(lián)網(wǎng)

Ollama 速度最快的大語言模型管理工具

image.png
image.png

Ollama 的命令

ollama pull llama2
ollama list
ollama run llama2 "Summarize this file: $(cat README.md)"

ollama serve

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'
curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'
image.png

大語言模型的前端

大語言模型的應用前端

  • 開源平臺 ollama-chatbot、PrivateGPT、gradio
  • 開源服務 hugging face TGI、langchain-serve
  • 開源框架 langchain llama-index

ollama chatbot

docker run -p 3000:3000 ghcr.io/ivanfioravanti/chatbot-ollama:main
## http://localhost:3000
image.png

ollama chatbot

PrivateGPT

PrivateGPT 提供了一個 API,其中包含構建私有的、上下文感知的 AI 應用程序所需的所有構建塊。該 API 遵循并擴展了 OpenAI API 標準,支持普通響應和流響應。這意味著,如果您可以在您的工具之一中使用 OpenAI API,則可以使用您自己的 PrivateGPT API,無需更改代碼,并且如果您在本地模式下運行 privateGPT,則免費。

image.png

PrivateGPT 架構

  • FastAPI
  • LLamaIndex
  • 支持本地 LLM,比如 ChatGLM llama Mistral
  • 支持遠程 LLM,比如 OpenAI Claud
  • 支持嵌入 embeddings,比如 ollama embeddings-huggingface
  • 支持向量存儲,比如 Qdrant, ChromaDB and Postgres

PrivateGPT 環(huán)境準備

git clone https://github.com/imartinez/privateGPT
cd privateGPT
#不支持3.11之前的版本
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip poetry

#雖然官網(wǎng)只說了要安裝少部分的依賴,但是那些依賴管理不是那么完善,容易有遺漏
#所以我們的策略就是全都要。
poetry install --extras "ui llms-llama-cpp llms-openai llms-openai-like llms-ollama llms-sagemaker llms-azopenai embeddings-ollama embeddings-huggingface embeddings-openai embeddings-sagemaker embeddings-azopenai vector-stores-qdrant vector-stores-chroma vector-stores-postgres storage-nodestore-postgres"

#或者用這個安裝腳本
#poetry install --extras "$(sed -n '/tool.poetry.extras/,/^$/p'  pyproject.toml | awk -F= 'NR>1{print $1}' | xargs)"

ollama 部署方式

ollama pull mistral
ollama pull nomic-embed-text
ollama serve

#官方這個依賴不夠,還需要額外安裝torch,所以盡量采用上面提到的全部安裝的策略
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
PGPT_PROFILES=ollama poetry run python -m private_gpt

setting-ollama.yaml

server:
  env_name: ${APP_ENV:ollama}

llm:
  mode: ollama
  max_new_tokens: 512
  context_window: 3900
  temperature: 0.1 #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

embedding:
  mode: ollama

ollama:
  llm_model: mistral
  embedding_model: nomic-embed-text
  api_base: http://localhost:11434
  tfs_z: 1.0 ## Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
  top_k: 40 ## Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
  top_p: 0.9 ## Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
  repeat_last_n: 64 ## Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
  repeat_penalty: 1.2 ## Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

啟動


PGPT_PROFILES=ollama poetry run python -m private_gpt

poetry run python -m private_gpt
02:36:06.928 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'ollama']
02:36:46.567 [INFO    ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=ollama
02:36:47.405 [INFO    ] private_gpt.components.embedding.embedding_component - Initializing the embedding model in mode=ollama
02:36:47.414 [INFO    ] llama_index.core.indices.loading - Loading all indices.
02:36:47.571 [INFO    ]         private_gpt.ui.ui - Mounting the gradio UI, at path=/
02:36:47.620 [INFO    ]             uvicorn.error - Started server process [72677]
02:36:47.620 [INFO    ]             uvicorn.error - Waiting for application startup.
02:36:47.620 [INFO    ]             uvicorn.error - Application startup complete.
02:36:47.620 [INFO    ]             uvicorn.error - Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
image.png

PrivateGPT UI

local 部署模式


#todo: 需要安裝llama-cpp,每個平臺的安裝方式都不同,參考官方文檔

poetry run python scripts/setup
PGPT_PROFILES=local poetry run python -m private_gpt

setting-local.yaml

server:
  env_name: ${APP_ENV:local}

llm:
  mode: llamacpp
  ## Should be matching the selected model
  max_new_tokens: 512
  context_window: 3900
  tokenizer: mistralai/Mistral-7B-Instruct-v0.2

llamacpp:
  prompt_style: "mistral"
  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
  llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf

embedding:
  mode: huggingface

huggingface:
  embedding_hf_model_name: BAAI/bge-small-en-v1.5

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

非私有 OpenAI-powered 部署

poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
PGPT_PROFILES=openai poetry run python -m private_gpt

setting-openai.yaml

server:
  env_name: ${APP_ENV:openai}

llm:
  mode: openai

embedding:
  mode: openai

openai:
  api_key: ${OPENAI_API_KEY:}
  model: gpt-3.5-turbo

openai 風格的 API 調用

  • The API is built using FastAPI and follows OpenAI's API scheme.
  • The RAG pipeline is based on LlamaIndex.
curl -X POST http://localhost:8000/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
  "prompt": "string",
  "stream": true

}'
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容