在线中文日产一区,伊人99久久

分詞器搜索全匹配查詢
低版本springboot集成es問題
logstash同步問題

Elasticsearch是目前比較火的搜索引擎，能夠做到快速的全文檢索。本文不涉及ES的原理等基礎知識，只是一篇關于SpringBoot如何集成Elasticsearch、使用logstash如何同步mysql數(shù)據(jù)庫中的數(shù)據(jù)到Elasticsearch的簡單入門教程。

版本匹配

SpringBoot提供了spring-boot-starter-data-elasticsearch對Elasticsearch的使用進行了封裝，可以快速方便的使用提供的API進行操作，這是最簡單的集成以及操作Elasticsearch方法。但是對于SpringBoot以及Elasticsearch的版本有要求，由于目前我們公司使用的還是SpringBoot1.5.3的版本，對應的starter只能支持Elasticsearch5.0以下的版本，所以不能使用最新的7.x的Elasticsearch。

SpringBoot Version x	spring-boot-starter-data-elasticsearch Version y	Elasticsearch Version z
x < 2.0.0	y < 2.0.0	z < 5.0
x >= 2.0.0	y >= 2.0.0	z > 5.0

升級項目中的Springboot版本不太現(xiàn)實，而又想使用最新的Elasticsearch，只能換一種方式集成，restClient的集成方式，這種方式對于版本的兼容性較好。restClient有兩種，一種是low-level，一種是high-level，兩者的原理基本一致，區(qū)別最大的是封裝性，官方建議使用high-level，而且low-level將逐漸被廢棄，所以我們使用elasticsearch-rest-high-level-client進行集成。

<dependency>
  <groupId>org.elasticsearch.client</groupId>
  <artifactId>elasticsearch-rest-high-level-client</artifactId>
  <version>7.6.0</version>
</dependency>

對于這種框架的整合，各種組件的版本一定要匹配上，如果不能對應，會出現(xiàn)各種意想不到的情況，在這里我也是走了很多彎路才搞清楚。

Elasticsearch基本使用

配置host和端口

Elasticsearch默認的端口是9200和9300，9200是提供給http方式連接的，9300對應的是tcp的方式連接，這里我們使用9200。

spring:
    elasticsearch:
    host: 192.168.3.75
    port: 9200

注入restHighLevelClient

新建一個配置類，讀取host和port，并創(chuàng)建一個restHighLevelClient的bean注入到spring容器中。

@Configuration
public class EsConfig {
    @Value("${spring.elasticsearch.port}")
    private String port;
    @Value("${spring.elasticsearch.host}")
    private String host;
    @Bean
    public RestHighLevelClient restHighLevelClient() {
        return new RestHighLevelClient(RestClient.builder(new HttpHost(host,Integer.parseInt(port))));
    }
}

使用client進行查詢

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery("name","123"));
SearchRequest searchRequest = new SearchRequest("test_index");
searchRequest.source(searchSourceBuilder);
try {
  SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
  searchResponse.getHits().forEach(i -> {
    String jsonString = i.getSourceAsString();
    //使用fastjson將json對象轉(zhuǎn)化為model
    EsGoodsModel esGoodsModel = JSONObject.parseObject(jsonString,EsGoodsModel.class);
  });
} catch (IOException e) {
  e.printStackTrace();
}

這里只是簡單的展示了基本的使用，具體的查詢條件的封裝，像分頁、排序、條件查詢等都差不多。

使用Logstash同步數(shù)據(jù)

關于logstash

關于搜索數(shù)據(jù)的導入，我這里使用了官方推薦的logstash，還有一些其他的方式，這里不做贅述。

使用logstash最重要的是寫好conf文件。他的格式如下

input {
  
}
filter {
  
}
output {
  
}

input表示輸入的數(shù)據(jù)來源，可以是file、jdbc、http、kafka、log4j、redis等很多途徑（具體可以查看

filter主要是對數(shù)據(jù)來源進行過濾，轉(zhuǎn)換成json格式，然后保存到Elasticsearch中。filter里面有很多的插件，具體官網(wǎng)有詳細的介紹，本次教程主要使用到Aggregate聚合數(shù)據(jù)。
output是數(shù)據(jù)輸出到哪里，也有很多中，本次使用輸出到Elasticsearch中。

數(shù)據(jù)要求

目前我們項目中使用到的是對商品進行檢索，商品中有一些屬性是來自于其他表，且可能有多條數(shù)據(jù)，類似下面的數(shù)據(jù)結(jié)構(gòu)。

{
    "stock_info" : "300公斤",
  "name" : "黃瓜",
  "address" : "上海市普陀區(qū)",
  "price" : "1.59",
  "company_name" : "供應商",
  "number" : "SP484",
  "plant_area" : "50",
  "id" : "55010b5154f84a2fbec4056c185789ac",
  "sl_url" : "黃瓜1_1584424256774.jpg",
  "type" : 1,
  "goodsLabelList" : [
    {
      "dictionary_value" : "綠色"
    },
    {
      "dictionary_value" : "有機"
    }
  ],
  "attributeValueList" : [
    {
      "attribute_value" : "密刺黃瓜",
      "attribute_id" : "68a40212b85c41019f843f8934bbbda5"
    },
    {
      "attribute_value" : "嚴重皺縮",
      "attribute_id" : "d50368f5fab442808dd27ee2c5361048"
    },
    {
      "attribute_value" : "15~25cm",
      "attribute_id" : "58771cbc4caf41789cf747c13fe755bb"
    }
  ]
}

像這種goodsLabelList對應于Elasticsearch就是嵌套的數(shù)據(jù)類型。下面就需要配置logstash的conf文件。

input使用 jdbc插件進行輸入

input {
  jdbc {
    #這里指定connector的位置
    jdbc_driver_library => "../mysql-connector-java-5.1.43-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://192.168.3.78:3306/guoxn_bab_test?useSSL=false&serverTimezone=UTC&rewriteBatchedStatements=true&characterEncoding=utf8"
    jdbc_user => "**"
    jdbc_password => "**"
    #定時時間，一分鐘一次
    schedule => "* * * * *"
    jdbc_paging_enabled => "true"
    jdbc_page_size => "50000"
    record_last_run => true
    use_column_value => true
    #設置時區(qū)，如果默認會有8個小時的時差
    jdbc_default_timezone => "Asia/Shanghai"
    #這里保存last_update_time也可以不指定
    last_run_metadata_path => "../last_goods_record.txt"
    #根據(jù)last_update_time進行更新數(shù)據(jù)
    tracking_column => "last_update_time"
    tracking_column_type => "timestamp"
    #sql省去了具體的查詢內(nèi)容，主要注意 :sql_last_value 的寫法
    statement => "
                SELECT .... and t1.last_update_time > :sql_last_value and t1.last_update_time < NOW() AND t1.is_delete=0 AND t1.type=1 order by t1.id desc"
    }
}

jdbc的輸入基本沒有什么問題，主要是下面的filter部分。

filter {
            #使用aggregate進行聚合數(shù)據(jù)
        aggregate {
                    #task_id指定任務的id，來自于上面jdbc的sql查詢結(jié)果，進行聚合的時候，一定要按照id進行排序，不然可能導致數(shù)據(jù)的丟失
            task_id => "%{id}"
            code => "
                map['id'] = event.get('id')
                map['name'] = event.get('name')
                map['sl_url'] = event.get('sl_url')
                map['company_name'] = event.get('company_name')
                map['description'] = event.get('description')
                map['price'] = event.get('price')
                map['stock_info'] = event.get('stock_info')
                map['address'] = event.get('address')
                map['number'] = event.get('number')
                map['shelf_time'] = event.get('shelf_time')
                map['shelf_stock'] = event.get('shelf_stock')
                map['company_id'] = event.get('company_id')
                map['plant_area'] = event.get('plant_area')
                map['standard'] = event.get('standard')
                map['shelf_state'] = event.get('shelf_state')
                map['type'] = event.get('type')
                            #這里是關鍵，areaListTemp是臨時集合，保存去重的數(shù)據(jù)，然后遍歷到areaList中
                map['areaListTemp'] ||= []
                map['areaList'] ||= []
                if(event.get('area_id') != nil)
                    if !(map['areaListTemp'].include? event.get('area_id'))
                        map['areaListTemp'] << event.get('area_id')
                        map['areaList'] << {
                            'area_id' => event.get('area_id')
                        }
                    end
                end
                map['labelList'] ||= []
                map['goodsLabelList'] ||= []
                if(event.get('dictionary_value') != nil)
                    if !(map['labelList'].include? event.get('dictionary_value'))
                        map['labelList'] << event.get('dictionary_value')
                        map['goodsLabelList'] << {
                            'dictionary_value' => event.get('dictionary_value')
                        }
                    end
                end
                map['attributeList'] ||= []
                map['attributeValueList'] ||= []
                if(event.get('attribute_id') != nil)
                    if !(map['attributeList'].include? event.get('attribute_id'))
                        map['attributeList'] << event.get('attribute_id')
                        map['attributeValueList'] << {
                            'attribute_id' => event.get('attribute_id'),
                            'attribute_value' => event.get('attribute_value')
                        }
                    end
                end
                map['cateList'] ||= []
                map['categoryList'] ||= []
                if(event.get('category_id') != nil)
                    if !(map['cateList'].include? event.get('category_id'))
                        map['cateList'] << event.get('category_id')
                        map['categoryList'] << {
                            'category_id' => event.get('category_id')
                        }
                    end
                end
            event.cancel()"
                        #使用聚合插件
            push_previous_map_as_event => true
                        #超時時間，如果不設置，logstash不知道什么時候會結(jié)束，會導致最后一條數(shù)據(jù)丟失。這里應該有一個結(jié)束條件
                        #設置5秒是一個不嚴謹?shù)霓k法
            timeout => 5
        }
                #這里刪除保存數(shù)據(jù)的臨時集合和生成的一些默認的字段
        mutate  {
            remove_field => ["@version","labelList","attributeList","cateList","areaListTemp"]
            
    }
}

踩坑記錄

關于數(shù)據(jù)的聚合，這里我查找了很多資料，試了很多的寫法，始終有問題要不是數(shù)據(jù)會有丟失，要不會出現(xiàn)數(shù)據(jù)的錯亂的情況。這里有幾個地方需要注意下

task_id 是sql查詢的id，相當于每一個id是一個task，正常我們使用聯(lián)表查詢的時候，因為一對多的關系，會生成多條記錄，areaListTemp保存了同一個id的多條數(shù)據(jù)中的label字段的值，并且進行去重，如果id不是聚集在一起，可能導致臨時的集合還沒有保存完數(shù)據(jù)就被刪除，導致數(shù)據(jù)的丟失。
logstash會使用多線程進行聚合任務，如果同一個聚合任務被多個線程分隔操作，最后聚合的過程中可能會丟失數(shù)據(jù)，這里配置pipeline.yml文件，設置工作線程為1。（這里可能出現(xiàn)性能問題）
```
pipeline.workers: 1
```

多個數(shù)據(jù)源同時輸入也有有坑，我這里需要維護兩個index，所以需要使用兩個jdbc，搜索網(wǎng)上的資料都是在一個conf文件中寫，然后通過type去區(qū)別不同的數(shù)據(jù)源分別處理,，類似下面的處理方法

input {
  jdbc {
    #這里指定connector的位置
    jdbc_driver_library => "../mysql-connector-java-5.1.43-bin.jar"
    ....
    type => goods
    statement => "
                SELECT .... and t1.last_update_time > :sql_last_value and t1.last_update_time < NOW()           AND t1.is_delete=0 AND t1.type=1 order by t1.id desc"
    }
    jdbc {
    ...
    type => category
    ...
  }
}
output {
  //這里根據(jù)上面配置的type進行不同的處理
  if [type] == "goods" {
     elasticsearch {
                hosts => ["localhost: 9200"]   
                index => "goods"
                document_id => "%{id}"
    }
  }
    if [type] == "category" {
     elasticsearch {
                hosts => ["localhost: 9200"]   
                index => "category"
                document_id => "%{id}"
    }
  }
}

但是我在7.6的版本中按照這樣的格式每次只能生成一個index，也沒有報錯，后來我配置了兩個conf文件，然后在pineline.yml中配置多個通道，進行處理

- pipeline.id: goods
  path.config: "../config/goods.conf"
  pipeline.workers: 1
  # pipeline.batch.size: 1000
  # pipeline.output.workers: 3
  # queue.type: persisted

- pipeline.id: category
  path.config: "../config/category.conf"

每個pineline對應一個conf文件的解析，終于解決了問題。

對于嵌套類型的數(shù)據(jù)結(jié)構(gòu)，需要首先在elasticsearch中創(chuàng)建好index的mapping，否則logstash不能自動識別。具體mapping格式如下

{
    "mappings": {
        "properties": {
            "address": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "areaList": {
                "type": "nested",
                "properties": {
                    "area_id": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            },
            "area_id": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "attributeValueList": {
                "type": "nested",
                "properties": {
                    "attribute_id": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "attribute_value": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            },
            "categoryList": {
                "type": "nested",
                "properties": {
                    "category_id": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            },
            "company_id": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "company_name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "description": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "goodsLabelList": {
                "type": "nested",
                "properties": {
                    "dictionary_value": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            },
            "id": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "name": {
                "type": "text",
                "analyzer": "ik_max_word",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "number": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "plant_area": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "price": {
                "type": "float",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "shelf_state": {
                "type": "long"
            },
            "shelf_stock": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "shelf_time": {
                "type": "date"
            },
            "sl_url": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "standard": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "stock_info": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "tags": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "type": {
                "type": "long"
            }
        }
    }
}

這里主要注意areaList中的type設置為nested，如果需要使用分詞器的話，也可以設置好，例如name字段使用了一個比較好用的中文ik分詞器。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

SpringBoot1.5.x集成Elasticsearch

SpringBoot1.5.x集成Elasticsearch

版本匹配

Elasticsearch基本使用

配置host和端口

注入restHighLevelClient

使用client進行查詢

使用Logstash同步數(shù)據(jù)

關于logstash

數(shù)據(jù)要求

踩坑記錄

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

SpringBoot1.5.x集成Elasticsearch

版本匹配

Elasticsearch基本使用

配置host和端口

注入restHighLevelClient

使用client進行查詢

使用Logstash同步數(shù)據(jù)

關于logstash

數(shù)據(jù)要求

踩坑記錄

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av