亚洲精品黄色网址,AV片天堂久久久一区

一、假設(shè)性問題索引是什么？

假設(shè)性問題是一種提問方式，它基于一個或多個假設(shè)的情況或前提來提出問題。在對知識庫中的文檔內(nèi)容進(jìn)行切片時，是可以以該切片為假設(shè)條件，利用LLM預(yù)先設(shè)置幾個候選的相關(guān)性問題的，也就是說，這幾個候選的相關(guān)性問題是和切片內(nèi)容強相關(guān)的。

二、優(yōu)點

語義對齊更強
用戶問題與假設(shè)性問題屬于同一語義空間，更容易匹配
提升召回率
即使用戶措辭與原文差異較大，只要語義相近，仍能匹配到相關(guān)問題
支持復(fù)雜意圖
LLM可生成覆蓋不同角度的問題（如：原因、影響、步驟等）

三、局限性

依賴LLM生成問題的質(zhì)量
若生成的問題偏離真實的用戶意圖，會降低檢索效果
領(lǐng)域適配要求高
在專業(yè)領(lǐng)域，通用LLM生成的問題可能不準(zhǔn)確（需微調(diào)或人工校驗）

四、應(yīng)用場景

FAQ類知識庫
問題模式相對固定，適合預(yù)生成
技術(shù)文檔 / 產(chǎn)品手冊
用戶常問：如何使用、為什么報錯等
教育 / 客服場景
問題具有高度重復(fù)性和可預(yù)測性

五、基本思路

讓LLM為每個塊生成N個假設(shè)性問題，并將這些問題以向量形式嵌入
在運行時，針對這個問題向量的索引進(jìn)行查詢搜索（用問題向量替換文檔塊的向量）
檢索后將原始文本塊作為上下文發(fā)送給LLM以獲取答案

六、示例

'''假設(shè)性問題索引'''
import json
import uuid
from typing import List
from langchain_classic.retrievers import MultiVectorRetriever
from langchain_community.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel
from langchain_core.stores import InMemoryByteStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from pydantic import BaseModel, Field
from langchain_core.documents import Document
from langchain_core.globals import set_debug

from Common import get_models
# set_debug(True)
llm,embeddings = get_models()

loader = TextLoader("./deepseek百度百科.txt",encoding="utf-8")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100)
chunks = splitter.split_documents(docs)

vectorstore = Chroma(
    collection_name="hypo-questions",
    embedding_function=embeddings,
)
store = InMemoryByteStore()

id_key = 'doc_id'

retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key
)

doc_ids = [str(uuid.uuid4()) for _ in chunks]

class HypotheticalQuestions(BaseModel):
    """約束生成假設(shè)性問題的格式"""
    questions: List[str] = Field(..., description="List of questions")

def parse_hype_questions(raw_output):
    """解析模型輸出為HypotheticalQuestions 對象"""
    try:
        output_text = raw_output.content if hasattr(raw_output, 'content') else str(raw_output)
        output_text = output_text.strip().strip("`").strip("json").strip()
        json_data = json.loads(output_text)
        return HypotheticalQuestions(**json_data)
    except json.JSONDecodeError as e:
        print(f"JSON decoding error: {e}")
        return HypotheticalQuestions(questions=[])
    except Exception as e:
        print(f"結(jié)構(gòu)化解析失敗:{e}")
        return HypotheticalQuestions(questions=[])

prompt = ChatPromptTemplate.from_template(
    """
    請基于以下文檔生成3個假設(shè)性問題（必須使用JSON格式）：
    {doc}
    
    要求：
    1. 輸出必須為合法JSON格式，包含questions字段
    2. questions字段的值包含3個問題的數(shù)組
    3. 使用中文提問
    
    示例：
    {{
        "questions":["問題1","問題2","問題3"]
    }}
    """
)
chain = (
        {"doc":lambda x:x.page_content}
         | prompt
         | llm
         | parse_hype_questions
         | (lambda x:x.questions)
)
# 測試
# print("測試",chunks[0])
# print("生成的問題",chain.invoke(chunks[0]))

# 批量執(zhí)行生成假設(shè)性問題 30 個并行
hypothetical_questions = chain.batch( chunks,{"max_concurrency":30})
# print("生成假設(shè)性問題",hypothetical_questions)


# 將生成的問題轉(zhuǎn)換為帶元數(shù)據(jù)文檔的對象
question_docs = []
for i,question_list in enumerate(hypothetical_questions):
    question_docs.extend(
        [Document(page_content=s,metadata={id_key:doc_ids[i]}) for s in question_list]
    )

# print(question_docs)

# 問題存入向量數(shù)據(jù)庫
retriever.vectorstore.add_documents(question_docs)
# 原始文檔字節(jié)存儲
retriever.docstore.mset(list(zip(doc_ids,chunks)))

# 測試-執(zhí)行相似性搜索
query = "deepseek受到哪些攻擊？"
sub_docs = retriever.vectorstore.similarity_search(query)
# print("---------------------匹配的假設(shè)性問題--------------------------")
# print(sub_docs[0])

prompt = ChatPromptTemplate.from_template("根據(jù)下面的文檔回答問題：\n\n{doc}\n\n問題：{question}")

# 生成問題回答鏈
chian = RunnableParallel({
    "doc":lambda x:retriever.invoke(x["question"]),
    "question":lambda x:x["question"]
}) | prompt | llm | StrOutputParser()

# 生成回答
print(chian.invoke({"question":query}))

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Advanced RAG 三、假設(shè)性問題索引

Advanced RAG 三、假設(shè)性問題索引

一、假設(shè)性問題索引是什么？

二、優(yōu)點

三、局限性

四、應(yīng)用場景

五、基本思路

六、示例

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Advanced RAG 三、假設(shè)性問題索引

一、假設(shè)性問題索引是什么？

二、優(yōu)點

三、局限性

四、應(yīng)用場景

五、基本思路

六、示例

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

一、假設(shè)性問題索引是什么？

二、優(yōu)點

三、局限性

四、應(yīng)用場景

五、基本思路