跑通MetricFlow官方示例項(xiàng)目:jaffle_shop

背景

MetricFlow 目前國內(nèi)的資料比較少,在學(xué)習(xí)官方文檔過程中給了經(jīng)典的煎餅店的 GitHub 示例代碼。在Windows環(huán)境按其 Readme 說明跑通比較困難。因此主要記錄自己將項(xiàng)目運(yùn)行起來的過程。

環(huán)境信息

數(shù)據(jù)庫:PostgreSQL 15
Python版本:3.12.4
官方文檔:https://docs.getdbt.com/docs/build/about-metricflow
官方示例項(xiàng)目:https://github.com/dbt-labs/jaffle-sl-template

運(yùn)行流程

特別說明:參考官方文檔的Readme

項(xiàng)目下載

方式一:通過Git下載
在文件系統(tǒng)shift + 右鍵 打開菜單,打開 PowerShell 或使用Win + X打開CMD
克隆項(xiàng)目:git clone https://github.com/dbt-labs/jaffle-sl-template.git
方式二:下載Zip包并解壓(推薦)

GitHub

安裝 Metricflow

采用虛擬環(huán)境安裝,打開CMD命令行并切換到的項(xiàng)目目錄下:jaffle-sl-template-main
1)創(chuàng)建虛擬環(huán)境:python -m venv .env
2)切換到虛擬環(huán)境下:.env\Scripts\active.sh

cmd

前面顯示(.env) 即成功。
3)安裝dbt和postgres適配器:pip install dbt-core dbt-postgres
4)安裝metricflow根據(jù)官方readme的說明:pip install "dbt-metricflow[<YOUR_DBT_ADAPTER_NAME>]" 我是用PostgreSQL適配器,因此執(zhí)行執(zhí)行:pip install "dbt-metricflow[postgres]"
5)驗(yàn)證dbt版本:dbt --version

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt --version
Core:
  - installed: 1.8.3
  - latest:    1.8.3 - Up to date!

Plugins:
  - postgres: 1.8.2 - Up to date!

6)驗(yàn)證metricflow版本:mf --version

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf --version
mf, version 0.7.0
連接測試

根據(jù)dbt_project文件知道profile是:snowflake
因此在用戶目錄下,找到:.dbt\profiles.yml文件,添加項(xiàng)目的連接信息。

snowflake:
  target: dev
  outputs:
    dev:
      type: postgres
      host: 192.168.19.128 # PG數(shù)據(jù)庫地址
      user: postgres # PG數(shù)據(jù)庫用戶名
      password: postgres # PG數(shù)據(jù)庫密碼
      port: 5432 # PG數(shù)據(jù)庫端口號(hào)
      dbname: jaffle_shop # PG數(shù)據(jù)庫名稱
      schema: public # PG數(shù)據(jù)庫模式
      threads: 4
      connect_timeout: 10

需提前在PG數(shù)據(jù)庫創(chuàng)建jaffle_shop庫:create database jaffle_shop
運(yùn)行dbt debug 測試連通性:

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt debug
13:14:59  Running with dbt=1.8.3
13:14:59  dbt version: 1.8.3
13:14:59  python version: 3.12.4
13:14:59  python path: C:\Users\admin\Downloads\jaffle-sl-template-main\.env\Scripts\python.exe
13:14:59  os info: Windows-10-10.0.19045-SP0
13:14:59  Using profiles dir at C:\Users\admin\.dbt
13:14:59  Using profiles.yml file at C:\Users\admin\.dbt\profiles.yml
13:14:59  Using dbt_project.yml file at C:\Users\admin\Downloads\jaffle-sl-template-main\dbt_project.yml
13:14:59  adapter type: postgres
13:14:59  adapter version: 1.8.2
13:14:59  Configuration:
13:14:59    profiles.yml file [OK found and valid]
13:14:59    dbt_project.yml file [OK found and valid]
13:14:59  Required dependencies:
13:14:59   - git [OK found]

13:14:59  Connection:
13:14:59    host: 192.168.19.128
13:14:59    port: 5432
13:14:59    user: postgres
13:14:59    database: jaffle_shop
13:14:59    schema: public
13:14:59    connect_timeout: 10
13:14:59    role: None
13:14:59    search_path: None
13:14:59    keepalives_idle: 0
13:14:59    sslmode: None
13:14:59    sslcert: None
13:14:59    sslkey: None
13:14:59    sslrootcert: None
13:14:59    application_name: dbt
13:14:59    retries: 1
13:14:59  Registered adapter: postgres=1.8.2
13:14:59    Connection test: [OK connection ok]

13:14:59  All checks passed!
安裝dbt依賴

1)下載項(xiàng)目依賴的2個(gè)dbt包:dbt_utils 和 dbt_date:dbt deps
發(fā)現(xiàn)運(yùn)行到這里就卡住了,日志說讓更新到最新版本:

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt deps
13:17:03  Running with dbt=1.8.3
13:17:04  Installing dbt-labs/dbt_utils
13:17:05  Installed from version 1.0.0
13:17:05  Updated version available: 1.2.0
13:17:05  Installing calogica/dbt_date
13:17:06  Installed from version 0.8.1
13:17:06  Updated version available: 0.10.1
13:17:06
13:17:06  Updates available for packages: ['dbt-labs/dbt_utils', 'calogica/dbt_date']
Update your versions in packages.yml, then run dbt deps

2(因此根據(jù)提示的版本修改了package.yml 文件:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.2.0
  - package: calogica/dbt_date
    version: 0.10.1

再次運(yùn)行:dbt deps

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt deps
13:20:35  Running with dbt=1.8.3
13:20:36  Updating lock file in file path: C:\Users\admin\Downloads\jaffle-sl-template-main/package-lock.yml
13:20:36  Installing dbt-labs/dbt_utils
13:20:37  Installed from version 1.2.0
13:20:37  Up to date!
13:20:37  Installing calogica/dbt_date
13:20:38  Installed from version 0.10.1
13:20:38  Up to date!
加載數(shù)據(jù)

執(zhí)行:dbt seed

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt seed
13:25:15  Running with dbt=1.8.3
13:25:15  Registered adapter: postgres=1.8.2
13:25:15  Unable to do partial parsing because saved manifest not found. Starting full parse.
13:25:16  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
13:25:17  Found 10 models, 6 seeds, 18 data tests, 15 sources, 17 metrics, 664 macros, 1 group, 5 semantic models, 3 saved queries
13:25:17
13:25:17  Concurrency: 4 threads (target='dev')
13:25:17
13:25:17  1 of 6 START seed file public.raw_customers .................................... [RUN]
13:25:17  2 of 6 START seed file public.raw_items ........................................ [RUN]
13:25:17  3 of 6 START seed file public.raw_orders ....................................... [RUN]
13:25:17  4 of 6 START seed file public.raw_products ..................................... [RUN]
13:25:22  1 of 6 OK loaded seed file public.raw_customers ................................ [INSERT 939 in 4.09s]
13:25:22  4 of 6 OK loaded seed file public.raw_products ................................. [INSERT 10 in 4.55s]
13:25:22  5 of 6 START seed file public.raw_stores ....................................... [RUN]
13:25:22  6 of 6 START seed file public.raw_supplies ..................................... [RUN]
13:25:28  5 of 6 OK loaded seed file public.raw_stores ................................... [INSERT 5 in 5.26s]
13:25:28  6 of 6 OK loaded seed file public.raw_supplies ................................. [INSERT 65 in 5.03s]
13:25:55  2 of 6 OK loaded seed file public.raw_items .................................... [INSERT 95368 in 37.87s]
13:26:03  3 of 6 OK loaded seed file public.raw_orders ................................... [INSERT 59652 in 45.51s]
13:26:03
13:26:03  Finished running 6 seeds in 0 hours 0 minutes and 45.77 seconds (45.77s).
13:26:03
13:26:03  Completed successfully
13:26:03
13:26:03  Done. PASS=6 WARN=0 ERROR=0 SKIP=0 TOTAL=6

將jaffle-data目錄下的csv文件加載到PG數(shù)據(jù)庫中:


PostgreSQL
運(yùn)行你的 dbt 項(xiàng)目并查詢指標(biāo)

1)運(yùn)行:dbt build --exclude path:jaffle-data
執(zhí)行爆了很多錯(cuò)誤,但是同時(shí)類似的。節(jié)選其中一個(gè):

13:29:37    Database Error in model stg_supplies (models\staging\stg_supplies.sql)
  relation "dbt_sl_test.raw_supplies" does not exist
  LINE 11:     select * from "jaffle_shop"."dbt_sl_test"."raw_supplies"

從報(bào)錯(cuò)的select語句看出來,我們沒有dbt_sl_test這個(gè) schema。前面測試連接的時(shí)候我們PG數(shù)據(jù)庫的 schema 的是:public
根據(jù)報(bào)錯(cuò)定位到模型里面用了 source 。發(fā)現(xiàn) source 的配置寫死了 schema,因此需要調(diào)整 source 的 schema。
文件路徑:models\staging\_source.yml

sources:
  - name: ecom
    schema: public # dbt_sl_test
    description: E-commerce data

再次執(zhí)行:dbt build --exclude path:jaffle-data

13:40:51  Finished running 6 view models, 4 table models, 18 data tests, 3 saved queries in 0 hours 0 minutes and 2.44 seconds (2.44s).
13:40:51
13:40:51  Completed successfully
13:40:51
13:40:51  Done. PASS=31 WARN=0 ERROR=0 SKIP=0 TOTAL=31

2)運(yùn)行:mf validate-configs
不出意外,報(bào)錯(cuò):

? ERROR: with metric `revenue`  - Unable to query metric `revenue`.
Received following error from data warehouse:
Database Error
  type "datetime" does not exist
  LINE 15:       DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS met...

看樣子是PG數(shù)據(jù)庫不支持使用cast將 timestamp 強(qiáng)轉(zhuǎn)轉(zhuǎn)為 datetime 類型。測試了一下,可以不需要強(qiáng)轉(zhuǎn)。
查找代碼,發(fā)現(xiàn)是 models\marts\customer360\order_items.yml 的語義模型的維度字段 ordered_at 定義的。修改expr

semantic_models:
  - name: order_item
      ... ...
    dimensions:
      - name: ordered_at
        expr: ordered_at  # cast(ordered_at as DATETIME)
        type: time

保存后重新構(gòu)建:dbt build --exclude path:jaffle-data
再次執(zhí)行:mf validate-configs

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf validate-configs
(To see warnings and future-errors, run again with flag `--show-all`)
v ?? Successfully parsed manifest from dbt project
v ?? Successfully validated the semantics of built manifest (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v ?? Successfully validated semantic models against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v ?? Successfully validated dimensions against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v ?? Successfully validated entities against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v ?? Successfully validated measures against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v ?? Successfully validated metrics against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)

成功!根據(jù)官方文檔查詢指標(biāo)large_orders:mf query --metrics large_orders
不出意外,又報(bào)錯(cuò)了:

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf query --metrics large_orders
| Initiating query…
ERROR: Got errors while resolving the query.

Error #1:
  Message:

    The given input does not exactly match any known metrics.

    Suggestions:
      ['large_order', 'orders', 'food_orders', 'order_cost', 'customers_with_orders', 'order_total']

  Query Input:

    large_orders

  Issue Location:

    [Resolve Query(['large_orders'])]
Log file: C:\Users\admin\Downloads\jaffle-sl-template-main\logs\metricflow.log

看了一下,命令的字面意思是查詢指標(biāo):large_orders,但是從報(bào)錯(cuò)的來看,壓根就沒有large_orders,只有large_order。
文檔錯(cuò)誤???改命令執(zhí)行:mf query --metrics large_order

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf query --metrics large_order
v Success ?? - query completed after 0.30 seconds
  large_order
-------------
        10504

成功!?。?br> 初學(xué)dbt,如有不對(duì)請(qǐng)指正

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容