青青草91在线,九九九日本韩国

1. Bert模型下載

這里直接使用huggingface提供的pre-trained的bert模型，直接去官網即可搜索想要的模型并下載：https://huggingface.co/models

這里以bert-base-chinese為例。首先將其下載到本地

git lfs install
git clone https://huggingface.co/bert-base-chinese

注意此時下載的模型，還不完成，需要我們手動下載pytorch_model.bin到模型目錄下。

image.png

具體做法是點擊Files and versions，下載pytorch_model.bin，覆蓋掉模型目錄原有的同名文件。

image.png

至此呢，我們就把模型準備好了。

2. 在transformers中使用

在正式使用之前，首先要安裝transformers包。

pip install transformers

然后既可以正式使用啦，首先根據模型所在目錄加載tokenizer和model。

import torch
from transformers import BertModel, BertConfig, BertTokenizer

modle_path = '/xxx/bert-base-chinese'
tokenizer = BertTokenizer.from_pretrained(modle_path)
model = BertModel.from_pretrained(modle_path)
input_ids = torch.tensor([tokenizer.encode("五福臨門", add_special_tokens=True)])
with torch.no_grad():
  output = model(input_ids)
  last_hidden_state = output[0]
  pooler_output = output[1]
  print(last_hidden_state[:, 0, :])

然后通過tokenizer將我們想encode的句子編碼成id，注意[CLS]和[SEP]。

input_ids = torch.tensor([tokenizer.encode("五福臨門", add_special_tokens=True)])

input_ids
tensor([[ 101,  758, 4886,  707, 7305,  102]])

input_ids
tensor([[ 101,  758, 4886,  707, 7305,  102]])

可以看到input_ids的長度跟輸入的“五福臨門”并不一樣，這是為什么呢，我們繼續(xù)看一下：

tokenizer.convert_ids_to_tokens(tokenizer.encode('五福臨門'))
['[CLS]', '五', '福', '臨', '門', '[SEP]']

原來在tokenizer幫我們把句子轉換成id是，已經為我們添加好了[CLS]，[SEP]等信息。

有了input_ids之后，就可以進一步進行編碼了。

output = model(input_ids)
last_hidden_state = output[0]
pooler_output = output[1]

print(last_hidden_state.shape)
torch.Size([1, 6, 768])
print(pooler_output.shape)
torch.Size([1, 768])

last_hidden_state為句子中每個字的編碼，包括[CLS]，pooler_output是經過pool之后的輸出。

有的同學可能會有疑問，Bert的輸入不是還有attenion_masks和token_type_ids嗎。

if attention_mask is None:
  attention_mask = torch.ones(((batch_size, seq_length + past_key_values_length)), device=device)

if token_type_ids is None:
  if hasattr(self.embeddings, "token_type_ids"):
    buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
    buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
  token_type_ids = buffered_token_type_ids_expanded
else:
  token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)

可以看到框架內不已經進行了處理。

如果我們不想用默認值，也可以用tokenzider.encode_plus()。

sent_code = tokenizer.encode_plus('今天是周末', '要在家好好學習哦')
input_ids = torch.tensor([sent_code['input_ids']])
token_type_ids = torch.tensor([sent_code['token_type_ids']])

model(input_ids=input_ids, token_type_ids=token_type_ids)

with torch.no_grad():
 ouptput = model(input_ids, token_type_ids=token_type_ids)
 last_hidden_state, pooler_output = ouptput[0], ouptput[1]

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

transformers中的bert用法

transformers中的bert用法

1. Bert模型下載

2. 在transformers中使用

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

transformers中的bert用法

1. Bert模型下載

2. 在transformers中使用

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av