數(shù)據(jù)挖掘
這是一個(gè)數(shù)據(jù)挖掘的常規(guī)流程:
- 業(yè)務(wù)理解 :背景是什么,問題的目的是什么
- 數(shù)據(jù)理解 :有哪些數(shù)據(jù),那些數(shù)據(jù)相關(guān),數(shù)據(jù)是否充分,數(shù)據(jù)對不對
- 數(shù)據(jù)預(yù)處理:數(shù)據(jù)的清洗,數(shù)據(jù)的轉(zhuǎn)換,包括特征的選擇
- 建立模型:建立分類模型,回歸模型
- 評估模型:模型效果如何,ks ,auc
-
模型部署,使用建立好的模型
image.png
數(shù)據(jù)處理
輸出數(shù)據(jù)的行列
# simple show rows x columns function
nelems=function(d) paste(nrow(d),"x",ncol(d))
缺失值處理
# 1.直接刪除
bank4=na.omit(bank3)
# 2.用平均值填充
bank5=imputation("value",bank3,"age",Value=meanage)
# 3.substitute NA values by the values found in most similar case (1-nearestneighbor):
bank6=imputation("hotdeck",bank3,"age")
建模
fit函數(shù):訓(xùn)練模型,調(diào)參數(shù)
predict: 函數(shù),進(jìn)行預(yù)測
mining :根據(jù)驗(yàn)證方法和運(yùn)行次數(shù)執(zhí)行幾次擬合并預(yù)測執(zhí)行。
library(rminer)
# ctree
B2=fit(schoolsup~.,math[,c(inputs,bout)],model="ctree")
# rpart
B1=fit(schoolsup~.,math[,c(inputs,bout)],model="rpart")
B3=fit(schoolsup~.,math[,c(inputs,bout)],model="mlpe")
B4=fit(schoolsup~.,math[,c(inputs,bout)],model="ksvm")
C3=fit(Mjob~.,cmath,model="randomForest")
你修改model就好了
評估
B1=fit(schoolsup~.,math[,c(inputs,bout)],model="rpart")
test <- math[,c(inputs,bout)]
y <- test$schoolsup.1
P1=predict(B1,test)
m=mmetric(y,P1,metric=c("ALL"))
這樣就會得出所有的指標(biāo)
如何查看model有哪些模型:
naivemost common class (classification) or mean output value (regression)ctree– conditional inference tree (classification and regression, uses[ctree](http://127.0.0.1:10074/help/library/rminer/help/ctree)frompartypackage)cv.glmnet– generalized linear model with lasso or elasticnet regularization (classification and regression, uses[cv.glmnet](http://127.0.0.1:10074/help/library/rminer/help/cv.glmnet)fromglmnetpackage; note: cross-validation is used to automatically set the lambda parameter that is needed to compute the predictions)rpartordt– decision tree (classification and regression, uses[rpart](http://127.0.0.1:10074/help/library/rminer/help/rpart)fromrpartpackage)kknnorknn– k-nearest neighbor (classification and regression, uses[kknn](http://127.0.0.1:10074/help/library/rminer/help/kknn)fromkknnpackage)ksvmorsvm– support vector machine (classification and regression, uses[ksvm](http://127.0.0.1:10074/help/library/rminer/help/ksvm)fromkernlabpackage)mlp– multilayer perceptron with one hidden layer (classification and regression, uses[nnet](http://127.0.0.1:10074/help/library/rminer/help/nnet)fromnnetpackage)mlpe– multilayer perceptron ensemble (classification and regression, uses[nnet](http://127.0.0.1:10074/help/library/rminer/help/nnet)fromnnetpackage)randomForestorrandomforest– random forest algorithm (classification and regression, uses[randomForest](http://127.0.0.1:10074/help/library/rminer/help/randomForest)fromrandomForestpackage)xgboost– eXtreme Gradient Boosting (Tree) (classification and regression, uses[xgboost](http://127.0.0.1:10074/help/library/rminer/help/xgboost)fromxgboostpackage; note:nroundsparameter is set by default to 2)bagging– bagging (classification, uses[bagging](http://127.0.0.1:10074/help/library/rminer/help/bagging)fromadabagpackage)boosting– boosting (classification, uses[boosting](http://127.0.0.1:10074/help/library/rminer/help/boosting)fromadabagpackage)lda– linear discriminant analysis (classification, uses[lda](http://127.0.0.1:10074/help/library/rminer/help/lda)fromMASSpackage)multinomorlr– logistic regression (classification, uses[multinom](http://127.0.0.1:10074/help/library/rminer/help/multinom)fromnnetpackage)naiveBayesornaivebayes– naive bayes (classification, uses[naiveBayes](http://127.0.0.1:10074/help/library/rminer/help/naiveBayes)frome1071package)qda– quadratic discriminant analysis (classification, uses[qda](http://127.0.0.1:10074/help/library/rminer/help/qda)fromMASSpackage)cubist– M5 rule-based model (regression, uses[cubist](http://127.0.0.1:10074/help/library/rminer/help/cubist)fromCubistpackage)lm– standard multiple/linear regression (uses[lm](http://127.0.0.1:10074/help/library/rminer/help/lm))mr– multiple regression (regression, equivalent to[lm](http://127.0.0.1:10074/help/library/rminer/help/lm)but uses[nnet](http://127.0.0.1:10074/help/library/rminer/help/nnet)fromnnetpackage with zero hidden nodes and linear output function)mars– multivariate adaptive regression splines (regression, uses[mars](http://127.0.0.1:10074/help/library/rminer/help/mars)frommdapackage)pcr– principal component regression (regression, uses[pcr](http://127.0.0.1:10074/help/library/rminer/help/pcr)fromplspackage)plsr– partial least squares regression (regression, uses[plsr](http://127.0.0.1:10074/help/library/rminer/help/plsr)fromplspackage)cppls– canonical powered partial least squares (regression, uses[cppls](http://127.0.0.1:10074/help/library/rminer/help/cppls)fromplspackage)rvm– relevance vector machine (regression, uses[rvm](http://127.0.0.1:10074/help/library/rminer/help/rvm)fromkernlabpackage)
分享資料:
https://repositorium.sdum.uminho.pt/bitstream/1822/36210/1/rminer-tutorial.pdf
