DataEngine數(shù)據(jù)處理流程
DataEngine數(shù)據(jù)處理流程包含如下步驟:
- change
- validate
- push&relation
- stat
- info
- order
change
change 是從hdfs和hbase表中讀取數(shù)據(jù),逐字段比較提取變更信息,保存至hbase.目前包括工商三類數(shù)據(jù)和非工商4類數(shù)據(jù).
無(wú)前置條件
1. 執(zhí)行流程
Title: change sequence diagram - gs
hdfs->HDFSDataProvider: 1. 讀取HDFS的\ncompare文件
HDFSDataProvider->DataModel: 2. 每行進(jìn)行compare,\n將變更保存到\ndataModel對(duì)象
DataModel->DataModel: 3. 對(duì)dataModel進(jìn)行\(zhòng)n指定字段的空值和日期\n異常進(jìn)行過(guò)濾
HDFSDataProvider->rb_gs_change: 4. 保存到hbase的\nrb_gs_change和\nrb_non_gs_change表
Title: change sequence diagram - nongs
hdfs->HDFSDataProvider: 1. 讀取HDFS的\ncompare文件
HDFSDataProvider->DataModel: 2. 每行進(jìn)行compare,\n將變更保存到dataModel對(duì)象
DataModel->DataModel: 3. 對(duì)dataModel進(jìn)行\(zhòng)n指定字段的空值和日期異常\n進(jìn)行過(guò)濾
DataModel-->HDFSDataProvider:
HDFSDataProvider->rb_gs_change: 4. 保存到hbase的\nrb_gs_change表
2. 執(zhí)行腳本
/opt/data-engine/current/bin/data_loader.sh E_ENT_BASEINFO 20160314000000 hdfs://chinadaas11:8020/hive1/user/hive/warehouse/enterprisebaseinfocollect_20160314_compare/ &
/opt/data-engine/current/bin/data_loader.sh E_INV_INVESTMENT 20160314000000 hdfs://chinadaas11:8020/hive1/user/hive/warehouse/e_inv_investment_20160314_compare/ &
/opt/data-engine/current/bin/data_loader.sh E_PRI_PERSON 20160314000000 hdfs://chinadaas11:8020/hive1/user/hive/warehouse/e_pri_person_20160314_compare/ &
/opt/data-engine/current/bin/data_loader.sh DIS_SXBZXR 20160314000000 dis_sxbzxr_new_name &
/opt/data-engine/current/bin/data_loader.sh FROST 20160314000000 frost_pripid &
/opt/data-engine/current/bin/data_loader.sh IMPAWN 20160314000000 impawn_pripid &
/opt/data-engine/current/bin/data_loader.sh XZCF 20160314000000 xzcf_pripid &
validate
validate 是將訂單企業(yè)的change數(shù)據(jù)進(jìn)行過(guò)濾的過(guò)程.
前置條件 : change order
1. 執(zhí)行流程
Title:validate sequence diagram
rb_..._change->ValidateDataProvider:1.從hbase讀取變更數(shù)據(jù)
ValidateDataProvider->DataModel:2. 生成DataModel對(duì)象
DataModel->DataModel:3. 過(guò)濾重復(fù),\n反復(fù)數(shù)據(jù)
DataModel-->ValidateDataProvider:
ValidateDataProvider->rb_validated_change:4. 將數(shù)據(jù)保存到hbase
2. 執(zhí)行腳本
/opt/data-engine/current/bin/data_loader.sh VALIDATE 20160314000000 rb_gs_change SCAN_BY_DATE &
/opt/data-engine/current/bin/data_loader.sh VALIDATE 20160314000000 rb_non_gs_change SCAN_BY_DATE &
5. push&relation
push&relation 是按照用戶提取企業(yè)變更信息的過(guò)程.處理結(jié)果保存在hbase和es中.
前置條件 : validate
1. 處理過(guò)程
order_index -> changeEntInfo.txt : 1. 按order索引生\n成用戶監(jiān)控的企業(yè)
rb_validated_change -> ChangePushDataProvider : 2. 從rb_validated_change\n按日期讀取
changeEntInfo.txt -> ChangePushDataProvider : 3. 讀取用戶監(jiān)控企業(yè)
ChangePushDataProvider -> ChangePushDataProvider : 4. 讀取監(jiān)控企業(yè)的用戶
ChangePushDataProvider -> column.txt : 5. 讀取用戶監(jiān)控字段
ChangePushDataProvider -> rb_push: 6. 將監(jiān)控字段的\n變更信息入hbase
ChangePushDataProvider -> push&relation_index: 7. 將監(jiān)控字段的\n變更信息入es
2. 執(zhí)行腳本
/opt/data-engine/current/bin/data_loader.sh CHANGE_PUSH 20160314000000 rb_validated_change SCAN_BY_DATE &
stat
stat 是將relation信息和企業(yè)基本信息整合的過(guò)程.
前置條件 : relation info
1. 執(zhí)行腳本
/opt/data-engine/current/bin/data_loader.sh STATISTICS 20160314000000 p_change &
info
info 是工商基本信息的索引.
無(wú)前置條件
1. 執(zhí)行腳本
/opt/data-engine/current/bin/data_loader.sh INDEX_ENT_INFO 20160314000000 ENTERPRISEBASEINFOCOLLECT_20160314 &
order
order 索引,從mysql同步.