
ODS.jpg
更多信息,請關(guān)注個人博客不圓的珠子
儲),是企業(yè)數(shù)據(jù)的抽取與交換平臺。通常ODS的數(shù)據(jù)不做任何轉(zhuǎn)化,只保留源系統(tǒng)數(shù)據(jù)。并將數(shù)據(jù)分發(fā)給數(shù)據(jù)倉庫,數(shù)據(jù)集市等下游系統(tǒng),并在ODS數(shù)據(jù)基礎(chǔ)上開發(fā)各類報表。目前接觸到的項目中,將數(shù)據(jù)倉庫與ODS集成到一個項目中,共用同一個數(shù)據(jù)庫。減少了數(shù)據(jù)傳輸,提高了資源的利用率。在ODS層數(shù)據(jù)中,我們將源系統(tǒng)數(shù)據(jù)變化的過程保留下來,由于源系統(tǒng)和ODS不在同一服務(wù)器上,源系統(tǒng)通常通過文件交換的方式傳輸數(shù)據(jù),為了減少文件傳輸,數(shù)據(jù)文件可分為增量文件和全量文件。根據(jù)保留數(shù)據(jù)類型的不同,又分為流水?dāng)?shù)據(jù)與歷史數(shù)據(jù)。通常我們在開發(fā)ODS時,涉及到以下幾種算法,全量歷史加載算法,增量歷史加載算法,流水全量加載算法,流水增量加載算法,全量覆蓋加載算法。
2.相關(guān)變量說明
| 變量名 | 變量描述 |
|---|---|
| ${DB} | 數(shù)據(jù)庫實例名 |
| ${TBNAME} | 表名 |
| $COLS | 表字段,以逗號分隔,col1,col2.. |
| $DATA_DT | 數(shù)據(jù)加載日期 |
| ${SRC_DB} | 源數(shù)據(jù)庫實例名 |
| ${SRC_TBNAME} | 源表名 |
| ND | 臨時表,用于加載當(dāng)日數(shù)據(jù) |
| OD | 臨時表,用于加載上日數(shù)據(jù) |
| ${HT_S_DT} | 拉鏈開始日期 |
| ${HT_E_DT} | 拉鏈結(jié)束日期 |
| ${TX_DATE} | 交易日期,作業(yè)運(yùn)行日期 |
| WT_U | 臨時表,用于保存需要關(guān)鏈數(shù)據(jù) |
| WT_I | 臨時表,用于保存需要插入的數(shù)據(jù) |
3.全量覆蓋加載算法
全量覆蓋加載是所有加載中最簡單的一種數(shù)據(jù)加載方式。它是指直接將目標(biāo)表中數(shù)據(jù)刪除并將源系統(tǒng)提供的數(shù)據(jù)文件直接加載到目標(biāo)表中的過程。偽代碼如下:
//清除表數(shù)據(jù)
TRUNCATE TABLE ${DB}.${TBNAME} ;
//將源表數(shù)據(jù)插入到目標(biāo)表中
INSERT INTO ${DB}.${TBNAME}
( $COLS
,DATA_DT)
SELECT
$COLS
,TO_DATE('$DATA_DT', 'YYYYMMDD')
FROM ${SRC_DB}.${SRC_TBNAME};
4.全量歷史加載算法
- 將當(dāng)日數(shù)據(jù)加載到ND表中
INSERT INTO ${ND}( $COLS )
SELECT $COLS
FROM ${SRC_DB}.${SRC_TBNAME}
;
- 將上日正常數(shù)據(jù)加載到臨時表OD中
INSERT INTO OD
SELECT * FROM ${DB}.${TBNAME}
WHERE ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
;
- 產(chǎn)生插入結(jié)果數(shù)據(jù)(新增數(shù)據(jù),當(dāng)日數(shù)據(jù)不在昨日數(shù)據(jù)中),并將數(shù)據(jù)加載到臨時表WT_I中
INSERT INTO WT_I(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,$TX_DATE
,$MAX_DT
,$TX_DATE
FROM ${ND}
WHERE ($COLS_NOT_NULL)
NOT IN (SELECT $COLS_NOT_NULL
FROM OD)
;
- 產(chǎn)生關(guān)鏈數(shù)據(jù)(昨日數(shù)據(jù)不在當(dāng)日數(shù)據(jù)中,已經(jīng)被源系統(tǒng)刪除的數(shù)據(jù)),并將數(shù)據(jù)加在到臨時表WT_U中
INSERT INTO WT_U(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,${HT_S_DT}
,${TX_DATE}
,DATA_DT
FROM OD
WHERE ( $COLS_NOT_NULL )
NOT IN (SELECT $COLS_NOT_NULL
FROM ${ND})
;
- 將目標(biāo)表中已經(jīng)失效的數(shù)據(jù)關(guān)鏈
UPDATE ${DB}.${TBNAME}
SET ${HT_E_DT} = ${TX_DATE}
WHERE ($COLS_NOT_NULL) IN (SELECT $COLS_NOT_NULL FROM WT_U)
AND ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
;
- 將新增數(shù)據(jù)插入插入到目標(biāo)表中
INSERT INTO ${DB}.${TBNAME}(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT)
SELECT
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT
FROM WT_I;
5.增量歷史加載算法
1.將當(dāng)日數(shù)據(jù)加載到臨時表ND表中
INSERT INTO ${ND}( $COLS )
SELECT $COLS
FROM ${DB}.${SRC_TBNAME}
;
- 將上日正常數(shù)據(jù)加載到臨時表OD中(根據(jù)表主鍵查找當(dāng)日數(shù)據(jù))
INSERT INTO OD
SELECT * FROM ${DB}.${TBNAME}
WHERE ($PRI_KEY_NAME) IN (SELECT $PRI_KEY_NAME FROM ${ND})
AND ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
- 產(chǎn)生插入結(jié)果數(shù)據(jù)(當(dāng)日數(shù)據(jù)不在昨日數(shù)據(jù)中),并將數(shù)據(jù)加載到臨時表WT_I中
INSERT INTO WT_I(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,$TX_DATE
,$MAX_DT
,$TX_DATE
FROM ${ND}
WHERE ($COLS_NOT_NULL)
NOT IN (SELECT $COLS_NOT_NULL
FROM OD)
;
- 產(chǎn)生關(guān)鏈數(shù)據(jù)(昨日數(shù)據(jù)不在當(dāng)日數(shù)據(jù)中,源系統(tǒng)已經(jīng)失效數(shù)據(jù)),并將數(shù)據(jù)加在到臨時表WT_U中
INSERT INTO WT_U(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,${HT_S_DT}
,${TX_DATE}
,DATA_DT
FROM OD
WHERE ( $COLS_NOT_NULL )
NOT IN (SELECT $COLS_NOT_NULL
FROM ${ND})
;
- 將目標(biāo)表中失效的數(shù)據(jù)關(guān)鏈
UPDATE ${DB}.${TBNAME}
SET ${HT_E_DT} =${TX_DATE}
WHERE ($COLS_NOT_NULL) IN (SELECT $COLS_NOT_NULL FROM WT_U)
AND ($PRI_KEY_NAME) IN (SELECT $PRI_KEY_NAME FROM ${ND})
AND ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
;
- 插入當(dāng)日新增數(shù)據(jù)
INSERT INTO ${DB}.${TBNAME}(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT)
SELECT
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT
FROM WT_I;
6.流水增量加載算法
- 清除當(dāng)日數(shù)據(jù)
DELETE FROM ${DB}.${TBNAME}
WHERE DATA_DT = '$DATA_DT'
;
- 插入數(shù)據(jù)
INSERT INTO ${DB}.${TBNAME}
( $COLS
,DATA_DT)
SELECT
$COLS
,'$DATA_DT'
FROM ${DB}.${SRC_TBNAME}
MINUS ALL
SELECT
$COLS
,'$DATA_DT'
FROM ${DB}.${TBNAME}
WHERE DATA_DT <= ${TX_DATE}
;
7.流水全量加載算法
- 清除當(dāng)日數(shù)據(jù)
DELETE FROM ${DB}.${TBNAME}
WHERE DATA_DT = $DATA_DT
- 插入數(shù)據(jù)
INSERT INTO ${DB}.${TBNAME}
( $COLS
,DATA_DT
,HT_S_DT
,HT_E_DT
)
SELECT
$COLS
,'$DATA_DT'
,'$DATA_DT'
,'$MAX_DT'
FROM ${DB}.${SRC_TBNAME};