久久久丁香无码久久,亚洲综合婷婷骚逼

大綱

開源大語言模型
大語言模型管理
私有大語言模型服務部署方案

開源大語言模型

擔心安全與隱私？可私有部署的開源大模型

商業(yè)大模型，不支持私有部署
- ChatGPT
- Claude
- Google Gemini
- 百度問心一言
開源大模型，支持私有部署
- Mistral
- Meta Llama
- ChatGLM
- 阿里通義千問

常用開源大模型列表

image.png

開源大模型分支

image.png

大語言模型管理

大語言模型管理工具

HuggingFace 全面的大語言模型管理平臺
Ollama 在本地管理大語言模型，下載速度超快
llama.cpp 在本地和云端的各種硬件上以最少的設置和最先進的性能實現(xiàn) LLM 推理
GPT4All 一個免費使用、本地運行、具有隱私意識的聊天機器人。無需 GPU 或互聯(lián)網(wǎng)

Ollama 速度最快的大語言模型管理工具

image.png

Ollama 的命令

ollama pull llama2
ollama list
ollama run llama2 "Summarize this file: $(cat README.md)"

ollama serve

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'
curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

image.png

大語言模型的前端

大語言模型的應用前端

開源平臺 ollama-chatbot、PrivateGPT、gradio
開源服務 hugging face TGI、langchain-serve
開源框架 langchain llama-index

ollama chatbot

docker run -p 3000:3000 ghcr.io/ivanfioravanti/chatbot-ollama:main
## http://localhost:3000

image.png

ollama chatbot

PrivateGPT

PrivateGPT 提供了一個 API，其中包含構建私有的、上下文感知的 AI 應用程序所需的所有構建塊。該 API 遵循并擴展了 OpenAI API 標準，支持普通響應和流響應。這意味著，如果您可以在您的工具之一中使用 OpenAI API，則可以使用您自己的 PrivateGPT API，無需更改代碼，并且如果您在本地模式下運行 privateGPT，則免費。

image.png

PrivateGPT 架構

FastAPI
LLamaIndex
支持本地 LLM，比如 ChatGLM llama Mistral
支持遠程 LLM，比如 OpenAI Claud
支持嵌入 embeddings，比如 ollama embeddings-huggingface
支持向量存儲，比如 Qdrant, ChromaDB and Postgres

PrivateGPT 環(huán)境準備

git clone https://github.com/imartinez/privateGPT
cd privateGPT
#不支持3.11之前的版本
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip poetry

#雖然官網(wǎng)只說了要安裝少部分的依賴，但是那些依賴管理不是那么完善，容易有遺漏
#所以我們的策略就是全都要。
poetry install --extras "ui llms-llama-cpp llms-openai llms-openai-like llms-ollama llms-sagemaker llms-azopenai embeddings-ollama embeddings-huggingface embeddings-openai embeddings-sagemaker embeddings-azopenai vector-stores-qdrant vector-stores-chroma vector-stores-postgres storage-nodestore-postgres"

#或者用這個安裝腳本
#poetry install --extras "$(sed -n '/tool.poetry.extras/,/^$/p'  pyproject.toml | awk -F= 'NR>1{print $1}' | xargs)"

ollama 部署方式

ollama pull mistral
ollama pull nomic-embed-text
ollama serve

#官方這個依賴不夠，還需要額外安裝torch，所以盡量采用上面提到的全部安裝的策略
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
PGPT_PROFILES=ollama poetry run python -m private_gpt

setting-ollama.yaml

server:
  env_name: ${APP_ENV:ollama}

llm:
  mode: ollama
  max_new_tokens: 512
  context_window: 3900
  temperature: 0.1 #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

embedding:
  mode: ollama

ollama:
  llm_model: mistral
  embedding_model: nomic-embed-text
  api_base: http://localhost:11434
  tfs_z: 1.0 ## Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
  top_k: 40 ## Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
  top_p: 0.9 ## Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
  repeat_last_n: 64 ## Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
  repeat_penalty: 1.2 ## Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

啟動


PGPT_PROFILES=ollama poetry run python -m private_gpt

poetry run python -m private_gpt
02:36:06.928 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'ollama']
02:36:46.567 [INFO    ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=ollama
02:36:47.405 [INFO    ] private_gpt.components.embedding.embedding_component - Initializing the embedding model in mode=ollama
02:36:47.414 [INFO    ] llama_index.core.indices.loading - Loading all indices.
02:36:47.571 [INFO    ]         private_gpt.ui.ui - Mounting the gradio UI, at path=/
02:36:47.620 [INFO    ]             uvicorn.error - Started server process [72677]
02:36:47.620 [INFO    ]             uvicorn.error - Waiting for application startup.
02:36:47.620 [INFO    ]             uvicorn.error - Application startup complete.
02:36:47.620 [INFO    ]             uvicorn.error - Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)

image.png

PrivateGPT UI

local 部署模式


#todo: 需要安裝llama-cpp，每個平臺的安裝方式都不同，參考官方文檔

poetry run python scripts/setup
PGPT_PROFILES=local poetry run python -m private_gpt

setting-local.yaml

server:
  env_name: ${APP_ENV:local}

llm:
  mode: llamacpp
  ## Should be matching the selected model
  max_new_tokens: 512
  context_window: 3900
  tokenizer: mistralai/Mistral-7B-Instruct-v0.2

llamacpp:
  prompt_style: "mistral"
  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
  llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf

embedding:
  mode: huggingface

huggingface:
  embedding_hf_model_name: BAAI/bge-small-en-v1.5

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

非私有 OpenAI-powered 部署

poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
PGPT_PROFILES=openai poetry run python -m private_gpt

setting-openai.yaml

server:
  env_name: ${APP_ENV:openai}

llm:
  mode: openai

embedding:
  mode: openai

openai:
  api_key: ${OPENAI_API_KEY:}
  model: gpt-3.5-turbo

openai 風格的 API 調用

The API is built using FastAPI and follows OpenAI's API scheme.
The RAG pipeline is based on LlamaIndex.

curl -X POST http://localhost:8000/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
  "prompt": "string",
  "stream": true

}'

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

搭建企業(yè)內部的大語言模型系統(tǒng)

搭建企業(yè)內部的大語言模型系統(tǒng)

大綱

開源大語言模型

擔心安全與隱私？可私有部署的開源大模型

常用開源大模型列表

開源大模型分支

大語言模型管理

大語言模型管理工具

Ollama 速度最快的大語言模型管理工具

Ollama 的命令

大語言模型的前端

大語言模型的應用前端

ollama chatbot

PrivateGPT 架構

PrivateGPT 環(huán)境準備

ollama 部署方式

setting-ollama.yaml

啟動

local 部署模式

setting-local.yaml

非私有 OpenAI-powered 部署

setting-openai.yaml

openai 風格的 API 調用

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

搭建企業(yè)內部的大語言模型系統(tǒng)

大綱

開源大語言模型

擔心安全與隱私？可私有部署的開源大模型

常用開源大模型列表

開源大模型分支

大語言模型管理

大語言模型管理工具

Ollama 速度最快的大語言模型管理工具

Ollama 的命令

大語言模型的前端

大語言模型的應用前端

ollama chatbot

PrivateGPT 架構

PrivateGPT 環(huán)境準備

ollama 部署方式

setting-ollama.yaml

啟動

local 部署模式

setting-local.yaml

非私有 OpenAI-powered 部署

setting-openai.yaml

openai 風格的 API 調用

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

擔心安全與隱私？可私有部署的開源大模型