ElasticSearch寫入測試

1,ES的存儲結(jié)構(gòu)了解

在ES中,存儲結(jié)構(gòu)主要有四種,與傳統(tǒng)的關(guān)系型數(shù)據(jù)庫對比如下:
index(Indices)相當(dāng)于一個(gè)database
type相當(dāng)于一個(gè)table
document相當(dāng)于一個(gè)row
properties(Fields)相當(dāng)于一個(gè)column

Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

2,ES寫入測試

寫入一個(gè)文檔(一條數(shù)據(jù))

PUT http://192.168.1.32:9200/twitter/tweet/377827236
{
"tweet_id": "555555555555555555555666",
"user_screen_name": "kanazawa_mj",
"tweet": "blog3444444",
"user_id": "377827236",
"id": 214019
}

我們看到path:/twitter/tweet/377827236包含三部分信息:

名字 說明
twitter 索引名
tweet 類型名
377827236 這個(gè)員工的ID

3,ES查詢測試

查詢一個(gè)文檔,包含love,返回50條數(shù)據(jù),采用展開的json格式

GET http://192.168.1.32:9200/twitter/tweet/_search?q=tweet:love&size=50&pretty=true
{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 11639,
    "max_score" : 8.448289,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "AV0fnFOX6PBTXc6mRjpL",
        "_score" : 8.448289,
        "_source" : {
          "tweet_id" : "843105177913757697",
          "user_screen_name" : "jessicapalapal",
          "tweet" : "Love, love, love ",
          "user_id" : "740434015",
          "id" : 474551
        }
      },
      {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "AV0fni__6PBTXc6mSeyR",
        "_score" : 8.436986,
        "_source" : {
          "tweet_id" : "695096306763583488",
          "user_screen_name" : "SampsonMariel",
          "tweet" : "Love love love^_^ #ALDUB29thWeeksary",
          "user_id" : "2483556636",
          "id" : 723297
        }
      },
      {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "AV0fmxvV6PBTXc6mQ8Mb",
        "_score" : 8.425938,
        "_source" : {
          "tweet_id" : "835676311637086209",
          "user_screen_name" : "thedaveywavey",
          "tweet" : "Love is love is love is love. ",
          "user_id" : "17191297",
          "id" : 311967
        }
      }
    ]
  }
}

4,ES批量寫入測試

  • 寫入程序,編寫Python腳本,生產(chǎn)者和消費(fèi)者模式,從Mysql數(shù)據(jù)庫讀取數(shù)據(jù),1000條數(shù)據(jù)寫入一次ES
  • 本機(jī)環(huán)境,Windows,內(nèi)存占用100M,CPU占用15%
  • ES服務(wù),Ubuntu14.04,CPU占用5%,內(nèi)存較少
  • 單進(jìn)程,5個(gè)寫入線程,100萬行數(shù)據(jù),500秒
  • 單進(jìn)程,20個(gè)寫入線程,100萬行數(shù)據(jù),500秒
  • 補(bǔ)充:據(jù)說,修改ES配置,先關(guān)閉數(shù)據(jù)索引,可以提高數(shù)據(jù)寫入速度,尚未測試

5,下一步計(jì)劃

  • ES數(shù)據(jù)分片機(jī)制、搜索參數(shù)配置(mapping、filter)等,尚需要根據(jù)項(xiàng)目需求,深入學(xué)習(xí)和測試。
  • ES支持的額外功能,例如時(shí)間范圍搜索、中文簡繁體、拼音搜索、GIS位置搜索、英文時(shí)態(tài)支持等。

6,參考資料

ES的存儲結(jié)構(gòu)介紹
https://es.xiaoleilu.com/010_Intro/25_Tutorial_Indexing.html
python操作Elasticsearch
http://www.cnblogs.com/yxpblog/p/5141738.html
Elasticsearch權(quán)威指南 - 檢索文檔
https://es.xiaoleilu.com/010_Intro/30_Tutorial_Search.html

7,附件(Python寫入ES代碼)

# coding=utf-8
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import time
import argparse
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

# ES索引和Type名稱
INDEX_NAME = "twitter"
TYPE_NAME = "tweet"

# ES操作工具類
class es_tool():
    # 類初始化函數(shù)
    def __init__(self, hosts, timeout):
        self.es = Elasticsearch(hosts, timeout=5000)
        pass

    # 將數(shù)據(jù)存儲到es中
    def set_data(self, fields_data=[], index_name=INDEX_NAME, doc_type_name=TYPE_NAME):
        # 創(chuàng)建ACTIONS
        ACTIONS = []
        # print "es set_data length",len(fields_data)
        for fields in fields_data:
            # print "fields", fields
            # print fields[1]
            action = {
                "_index": index_name,
                "_type": doc_type_name,
                "_source": {
                    "id": fields[0],
                    "tweet_id": fields[1],
                    "user_id": fields[2],
                    "user_screen_name": fields[3],
                    "tweet": fields[4]
                }
            }
            ACTIONS.append(action)

        # print "len ACTIONS", len(ACTIONS)
        # 批量處理
        success, _ = bulk(self.es, ACTIONS, index=index_name, raise_on_error=True)
        print('Performed %d actions' % success)

# 讀取參數(shù)
def read_args():
    parser = argparse.ArgumentParser(description="Search Elastic Engine")
    parser.add_argument("-i", dest="input_file", action="store", help="input file1", required=False, default="./data.txt")
    # parser.add_argument("-o", dest="output_file", action="store", help="output file", required=True)
    return parser.parse_args()

# 初始化es,設(shè)置mapping
def init_es(hosts=[], timeout=5000, index_name=INDEX_NAME, doc_type_name=TYPE_NAME):
    es = Elasticsearch(hosts, timeout=5000)
    my_mapping = {
        TYPE_NAME: {
            "properties": {
                "id": {
                    "type": "string"
                },
                "tweet_id": {
                    "type": "string"
                },
                "user_id": {
                    "type": "string"
                },
                "user_screen_name": {
                    "type": "string"
                },
                "tweet": {
                    "type": "string"
                }
            }
        }
    }
    try:
        # 先銷毀,后創(chuàng)建Index和mapping
        delete_index = es.indices.delete(index=index_name)  # {u'acknowledged': True}
        create_index = es.indices.create(index=index_name)  # {u'acknowledged': True}
        mapping_index = es.indices.put_mapping(index=index_name, doc_type=doc_type_name,
                                                    body=my_mapping)  # {u'acknowledged': True}
        if delete_index["acknowledged"] != True or create_index["acknowledged"] != True or mapping_index["acknowledged"] != True:
            print "Index creation failed..."
    except Exception, e:
        print "set_mapping except", e

# 主函數(shù)
if __name__ == '__main__':
    # args = read_args()
    # 初始化es環(huán)境
    init_es(hosts=["192.168.1.32:9200"], timeout=5000)
    # 創(chuàng)建es類
    es = es_tool(hosts=["192.168.1.32:9200"], timeout=5000)
    # 執(zhí)行寫入操作
    tweet_list = [("111","222","333","444","555"), ("11","22","33","44","55")]
    es.set_data(tweet_list)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容