色视频综合网,啪啪视频91视频,大桥未久中文字幕

前言

今天看到了一篇 AI前線的文章谷歌BigQuery ML正式上崗，只會用SQL也能玩轉(zhuǎn)機器學(xué)習(xí)！。正好自己也在力推 StreamingPro的MLSQL。
今天就來對比下這兩款產(chǎn)品。

StreamingPro簡介

StreamingPro是一套基于Spark的數(shù)據(jù)平臺，MLSQL是基于StreamingPro的算法平臺。利用MLSQL，你可以用類似SQL的方式完成數(shù)據(jù)的ETL，算法訓(xùn)練，模型部署等一整套ML Pipline。MLSQL融合了數(shù)據(jù)平臺和算法平臺，可以讓你在一個平臺上把這些事情都搞定。

運行方式

MLSQL支持Run as Application 和 Run as Service。MLSQL Run as Service很簡單，你可以直接在自己電腦上體驗： Five Minute Quick Tutorial
BigQuery ML 則是云端產(chǎn)品，從表象上來看，應(yīng)該也是Run As Service。

語法功能使用

BigQuery ML 訓(xùn)練一個算法的方式為：

CREATE OR REPLACE MODEL flights.arrdelay
OPTIONS
 (model_type='linear_reg', labels=['arr_delay']) AS
SELECT
 arr_delay,
 carrier,
 origin,
 dest,
 dep_delay,
 taxi_out,
 distance
FROM
 `cloud-training-demos.flights.tzcorr`
WHERE
 arr_delay IS NOT NULL

BigQuery ML 也對原有的SQL語法做了增強，添加了新的關(guān)鍵之，但是總體是遵循SQL原有語法形態(tài)的。

完成相同功能，在MLSQL中中的做法如下：

select arr_delay, carrier, origin, dest, dep_delay,
taxi_out, distance from db.table 
as lrCorpus;

train lrCorpus as LogisticRegressor.`/tmp/linear_regression_model`
where inputCol="features"
and labelCol="label"
;

同樣的，MLSQL也對SQL進行擴展和變更，就模型訓(xùn)練而言，改變會更大些。對應(yīng)的，訓(xùn)練完成后，你可以load 數(shù)據(jù)查看效果,結(jié)果類似這樣：

+--------------------+--------+--------------------+-------------------+-------+-------------+-------------+--------------------+
|           modelPath|algIndex|                 alg|              score| status|    startTime|      endTime|         trainParams|
+--------------------+--------+--------------------+-------------------+-------+-------------+-------------+--------------------+
|/tmp/william/tmp/...|       1|org.apache.spark....|-1.9704115113779945|success|1532659750073|1532659757320|Map(ratingCol -> ...|
|/tmp/william/tmp/...|       0|org.apache.spark....|-1.8446490919033698|success|1532659757327|1532659760394|Map(ratingCol -> ...|
+--------------------+--------+--------------------+-------------------+-------+-------------+-------------+--------------------+

在預(yù)測方面，BigQuery ML語法如下：

SELECT * FROM ML.PREDICT(MODEL flights.arrdelay,
(
SELECT
 carrier,
 origin,
 dest,
 dep_delay,
 taxi_out,
 distance,
 arr_delay AS actual_arr_delay
FROM
 `cloud-training-demos.flights.tzcorr`
WHERE
 arr_delay IS NOT NULL
LIMIT 10))

ML指定模型名稱就可以調(diào)用對應(yīng)的預(yù)測函數(shù)。在MLSQL里，則需要分兩步：

先注冊模型，這樣就能得到一個函數(shù)（pa_lr_predict），名字你自己定義。

register LogisticRegressor.`/tmp/linear_regression_model` as pa_lr_predict options
modelVersion="1" ;

接著就可以使用了：

select pa_lr_predict(features) from lrCorpus limit 10 as predict_result;

和數(shù)據(jù)平臺集成

BigQuery ML 也支持利用SQL對數(shù)據(jù)做復(fù)雜處理，因此可以很好的給模型準(zhǔn)備數(shù)據(jù)。MLSQL也支持非常復(fù)雜的數(shù)據(jù)處理。

除了算法以外

“數(shù)據(jù)處理模型”以及SQL函數(shù)

值得一提的是，MLSQL提供了非常多的“數(shù)據(jù)處理模型”以及SQL函數(shù)。比如我要把文本數(shù)據(jù)轉(zhuǎn)化為tfidf,一條指令即可：

-- 把文本字段轉(zhuǎn)化為tf/idf向量,可以自定義詞典
train orginal_text_corpus as TfIdfInPlace.`/tmp/tfidfinplace`
where inputCol="content"
-- 分詞相關(guān)配置
and ignoreNature="true"
and dicPaths="...."
-- 停用詞路徑
and stopWordPath="/tmp/tfidf/stopwords"
-- 高權(quán)重詞路徑
and priorityDicPath="/tmp/tfidf/prioritywords"
-- 高權(quán)重詞加權(quán)倍數(shù)
and priority="5.0"
-- ngram 配置
and nGram="2,3"
-- split 配置，以split為分隔符分詞，
and split=""
;

-- lwys_corpus_with_featurize 表里content字段目前已經(jīng)是向量了
load parquet.`/tmp/tfidf/data` 
as lwys_corpus_with_featurize;

支持自定義實現(xiàn)算法

除了MLSQL里已經(jīng)實現(xiàn)的算法，你也可以用python腳本來完成自定義算法。目前通過PythonAlg模塊支持SKlearn, Tensorflow, Xgboost, Fasttext等眾多python算法框架。Tensorflow則支持Cluster模式。具體參看這里MLSQL自定義算法

部署

BigQuery ML 和MLSQL都支持直接在SQL里使用其預(yù)測功能。MLSQL還支持將模型部署成API服務(wù)。具體做法超級簡單:

單機模型運行StreamingPro.
通過接口或者配置注冊算法模型 register NaiveBayes./tmp/bayes_modelas bayes_predict;
訪問預(yù)測接口

http://127.0.0.1:9003/model/predict? pipeline= bayes_predict&data=[[1,2,3...]]&dataType=vector

MLSQL 可以實現(xiàn)end2end模式部署，復(fù)用所有數(shù)據(jù)處理流程。更多參看MLSQL部署

模型多版本管理

訓(xùn)練時將keepVersion="true",每次運行都會保留上一次版本。具體參看模型版本管理

多個算法/多組參數(shù)并行運行

如果算法自身已經(jīng)是分布式計算的，那么MLSQL允許多組參數(shù)順序執(zhí)行。比如這個：

train data as ALSInPlace.`/tmp/als` where
-- 第一組參數(shù)
`fitParam.0.maxIter`="5"
and `fitParam.0.regParam` = "0.01"
and `fitParam.0.userCol` = "userId"
and `fitParam.0.itemCol` = "movieId"
and `fitParam.0.ratingCol` = "rating"
-- 第二組參數(shù)    
and `fitParam.1.maxIter`="1"
and `fitParam.1.regParam` = "0.1"
and `fitParam.1.userCol` = "userId"
and `fitParam.1.itemCol` = "movieId"
and `fitParam.1.ratingCol` = "rating"
-- 計算rmse     
and evaluateTable="test"
and ratingCol="rating"
-- 針對用戶做推薦，推薦數(shù)量為10  
and `userRec` = "10"
-- 針對內(nèi)容推薦用戶，推薦數(shù)量為10
-- and `itemRec` = "10"
and coldStartStrategy="drop"

這是一個協(xié)同推薦的一個算法，使用者配置了兩組參數(shù)，因為該算法本身是分布式的，所以兩組參數(shù)會串行運行。

-- train sklearn model
train data as PythonAlg.`${modelPath}` 

-- specify the location of the training script 
where pythonScriptPath="${sklearnTrainPath}"

-- kafka params for log
and `kafkaParam.bootstrap.servers`="${kafkaDomain}"
and `kafkaParam.topic`="test"
and `kafkaParam.group_id`="g_test-2"
and `kafkaParam.userName`="pi-algo"
-- distribute training data, so the python training script can read 
and  enableDataLocal="true"
and  dataLocalFormat="json"

-- sklearn params
-- use SVC
and `fitParam.0.moduleName`="sklearn.svm"
and `fitParam.0.className`="SVC"
and `fitParam.0.featureCol`="features"
and `fitParam.0.labelCol`="label"
and `fitParam.0.class_weight`="balanced"
and `fitParam.0.verbose`="true"

and `fitParam.1.moduleName`="sklearn.naive_bayes"
and `fitParam.1.className`="GaussianNB"
and `fitParam.1.featureCol`="features"
and `fitParam.1.labelCol`="label"
and `fitParam.1.class_weight`="balanced"
and `fitParam.1.labelSize`="26"

-- python env
and `systemParam.pythonPath`="python"
and `systemParam.pythonParam`="-u"
and `systemParam.pythonVer`="2.7";

上面這個則是并行運行兩個算法SVC/GaussianNB。因為每個算法自身無法分布式運行，所以MLSQL允許你并行運行這兩個算法。

總結(jié)

BigQuery ML只是Google BigQuery服務(wù)的一部分。所以其實和其對比還有失偏頗。MLSQL把數(shù)據(jù)平臺和算法平臺合二為一，在上面你可以做ETL,流式，也可以做算法，大家都統(tǒng)一用一套SQL語法。MLSQL還提供了大量使用的“數(shù)據(jù)處理模型”和SQL函數(shù),這些無論對于訓(xùn)練還是預(yù)測都有非常大的幫助，可以使得數(shù)據(jù)預(yù)處理邏輯在訓(xùn)練和預(yù)測時得到復(fù)用，基本無需額外開發(fā)，實現(xiàn)端到端的部署，減少企業(yè)成本。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

谷歌BigQuery ML VS StreamingPro MLSQL

谷歌BigQuery ML VS StreamingPro MLSQL

前言

StreamingPro簡介

運行方式

語法功能使用

和數(shù)據(jù)平臺集成

除了算法以外

“數(shù)據(jù)處理模型”以及SQL函數(shù)

支持自定義實現(xiàn)算法

部署

模型多版本管理

多個算法/多組參數(shù)并行運行

總結(jié)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

谷歌BigQuery ML VS StreamingPro MLSQL

前言

StreamingPro簡介

運行方式

語法功能使用

和數(shù)據(jù)平臺集成

除了算法以外

“數(shù)據(jù)處理模型”以及SQL函數(shù)

支持自定義實現(xiàn)算法

部署

模型多版本管理

多個算法/多組參數(shù)并行運行

總結(jié)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av