有多少人工就有多少智能，數(shù)據(jù)標注是大部分人工智能算法得以有效運行的關(guān)鍵環(huán)節(jié)。簡單來說，數(shù)據(jù)標注是對未經(jīng)處理過的語音、圖片、文本、視頻等數(shù)據(jù)進行加工處理，從而轉(zhuǎn)變成機器可識別信息的過程。
工欲善其事必先利其器
用了不少數(shù)據(jù)標注工具，有l(wèi)abelimg ，labelme 只是框的標注，而且數(shù)據(jù)管理和協(xié)同工作不大支持，ppocrlabel也只是ocr相關(guān)的，cvat 在圖像上很強大，且協(xié)同和數(shù)據(jù)管理較好，易用性最好，但是windows上安裝較麻煩，而且開源的數(shù)據(jù)量有限制，且web端不大穩(wěn)定只能是cv類的。

label-studio 是個非常方便安裝和使用的標注工具，而且最近openmmlab加入了sam的半自動標注輔助，同時更為重要的是就不用切換各種標注工具，標注文本，ocr，檢測分割之類的任務(wù)就很方便了。

這里參考的是
1，Playground官方GitHub地址：
https://github.com/open-mmlab/playground

2，SAM官方GitHub地址：https://github.com/facebookresearch/segment-anything

環(huán)境部署

創(chuàng)建anaconda虛擬環(huán)境,并激活

conda create -n labelsam  python=3.9 -y
conda activate labelsam

克隆項目到本地

git clone https://github.com/open-mmlab/playground

安裝 PyTorch

# Linux and Windows CUDA 11.3
pip install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu113/torch_stable.html



# Linux and Windows CPU only
pip install torch==1.10.1+cpu torchvision==0.11.2+cpu torchaudio==0.10.1 -f https://download.pytorch.org/whl/cpu/torch_stable.html

# OSX
pip install torch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1

image.png

安裝 SAM 并下載預(yù)訓練模型
進入plagrounnd\label_anything
下載相關(guān)包

pip install opencv-python pycocotools matplotlib onnxruntime onnx

下載安裝sam項目

pip install git+https://github.com/facebookresearch/segment-anything.git

下載模型
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

image.png

安裝 Label-Studio 和 label-studio-ml-backend

pip install label-studio==1.7.3
pip install label-studio-ml==1.0.9

啟動后端服務(wù)

label-studio-ml start sam --port 8003 --with 
sam_config=vit_b 
sam_checkpoint_file=./sam_vit_b_01ec64.pth 
out_mask=True 
out_bbox=True 
device=cuda:0 
# device=cuda:0 為使用 GPU 推理，如果使用 cpu 推理，將 cuda:0 替換為 cpu
# out_poly=True 返回外接多邊形的標注

image.png

啟動 Label-Studio 網(wǎng)頁服務(wù)

設(shè)置環(huán)境變量以免模型加載過長時間導(dǎo)致異常

Windows要使用以下指令

set ML_TIMEOUT_SETUP=40

在另一個終端啟動

label-studio start --port 8008

打開瀏覽器訪問 http://localhost:8008/ 即可看到 Label-Studio 的界面

image.png

建立新的項目
導(dǎo)入圖片

image.png

在 Settings/Labeling Interface 中配置 Label-Studio 關(guān)鍵點和 Mask 標注

<View>
  <Image name="image" value="$image" zoom="true"/>
  <KeyPointLabels name="KeyPointLabels" toName="image">
    <Label value="cat" smart="true" background="#e51515" showInline="true"/>
    <Label value="person" smart="true" background="#412cdd" showInline="true"/>
  </KeyPointLabels>
  <RectangleLabels name="RectangleLabels" toName="image">
   <Label value="cat" background="#FF0000"/>
   <Label value="person" background="#0d14d3"/>
  </RectangleLabels>
  <PolygonLabels name="PolygonLabels" toName="image">
   <Label value="cat" background="#FF0000"/>
   <Label value="person" background="#0d14d3"/>
  </PolygonLabels>
  <BrushLabels name="BrushLabels" toName="image">
   <Label value="cat" background="#FF0000"/>
   <Label value="person" background="#0d14d3"/>
  </BrushLabels>
</View>

其中 KeyPointLabels 為關(guān)鍵點標注，BrushLabels 為 Mask 標注，PolygonLabels 為外接多邊形標注，RectangleLabels 為矩形標注

選擇模型推理
點擊 Add Model 添加 test后端推理服務(wù),設(shè)置好 SAM 后端推理服務(wù)的 URL 就是剛剛的后端地址

image.png

并打開 Use for interactive preannotations 并點擊 Validate and Save

image.png

開始半自動化標注

image.png

打開 Auto-Annotation 的開關(guān)，并建議勾選 Auto accept annotation suggestions,并點擊右側(cè) Smart 工具，切換到 Point 后，選擇下方需要標注的物體標簽，這里選擇 cat。如果是 BBox 作為提示詞請將 Smart 工具切換到 Rectangle。

image.png

在貓上點一點就開始自動識別并出現(xiàn)bbox

image.png

標注完點擊submit

image.png

點擊 exprot 導(dǎo)出 COCO 格式的數(shù)據(jù)集，就能把標注好的數(shù)據(jù)集的壓縮包導(dǎo)出來了。注意：此處導(dǎo)出的只有邊界框的標注，如果想要導(dǎo)出實例分割的標注，需要在啟動 SAM 后端服務(wù)時設(shè)置 out_poly=True。

image.png

文本標注

這里直接創(chuàng)建項目，然后導(dǎo)入數(shù)據(jù)

image.png

這里需要選擇是 List of tasks 還是Time Series，這里我選擇的是List of tasks

可以選擇自己的任務(wù)
這里用Natural Language Processing，選擇Named Entity Recognition

原始數(shù)據(jù)來自
https://github.com/JackHCC/Chinese-Keyphrase-Extraction
本數(shù)據(jù)集基于采用新浪新聞8個領(lǐng)域（體育，娛樂，彩票，房產(chǎn)，教育，游戲，科技，股票）的新聞數(shù)據(jù)
第一列:content存放新聞標題和新聞的正文內(nèi)容
第二列:type是該新聞的話題類型。
在模型訓練過程只需要利用csv文件中的content列，第二列是根據(jù)提取的關(guān)鍵詞來衡量提取的準確性。
可以做標注，將關(guān)鍵詞作為識別的實體詞

可以看到對應(yīng)文本的標注管理是非常方便的

文檔信息抽取標注

文檔信息抽取涉及到圖像識別文本檢測，文本識別和信息抽取等過程，是圖像和文本多個模態(tài)的數(shù)據(jù)標注

創(chuàng)建一個新的項目，填寫項目名稱、描述，然后選擇
optical character recognition

image.png

刪掉原來的標簽

image.png

add自己的標簽

image.png

添加Relation關(guān)系類型標簽

通過添加兩個標簽的關(guān)系來實現(xiàn)信息抽取的標注

例如

<View>
  <Relations>
    <Relation value="similar" />
    <Relation value="dissimilar" />
  </Relations>

  <Text name="txt-1" value="$text" />
  <Labels name="lbl-1" toName="txt-1">
    <Label value="Relevant" />
    <Label value="Not Relevant" />
  </Labels>
</View>

當然還有更多的模態(tài)標注了

比如視頻，音頻，序列的等等數(shù)據(jù)標注

多模態(tài)+智能標注讓我們不再為標注煩擾了。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

labelstudio

labelstudio

環(huán)境部署

Windows要使用以下指令

文本標注

文檔信息抽取標注

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

labelstudio

環(huán)境部署

Windows要使用以下指令

文本標注

文檔信息抽取標注

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av