M.L. notes


  1. 主成分分析法 (Principal Component Analysis aka PCA):可以減少系統(tǒng)的維數,保留足以描述各數據點特征的信息,其中新生成的維叫做主成分。
    The first principal component of the data is the direction in which the data varies the most.
  • scikit-learn庫里的fit_transform()函數就是用來降維的,屬于PCA對象。
  • 先導入PCA模塊sklearn.decomposition,然后用PCA()構造函數,用n_components選項指定要降到幾維,最后用fit_transform()傳入參數。
  • 以著名的iris數據集為例:
    from sklearn.decomposition import PCA
    x_reduced = PCA(n_components = 3).fit_transform(iris.data)
  • 畫3D散點圖:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA

iris = datasets.load_iris()
x = iris.data[:, 1]   #X-Axis - petal length
y = iris.data[:, 2]   #Y-Axis - petal width
species = iris.target   #species
x_reduced = PCA(n_components = 3).fit_transform(iris.data)

#SCATTERPLOT 3D
fig = plt.figure()
ax = Axes3D(fig)
ax.set_title('Iris Dataset by PCA', size = 14)
ax.scatter(x_reduced[:, 0], x_reduced[:, 1], x_reduced[:. 2], c = species)
ax.set_xlabel('First Eigenvector')
ax.set_ylabel('Second Eigenvector')
ax.set_zlabel('Third Eigenvector')
ax.w_xaxis.set_ticklabels(())
ax.w_yaxis.set_ticklabels(())
ax.w_xaxis.set_ticklabels(())
  1. 支持向量機(Support Vector Machine,SVM

指一系列機器學習方法。最基礎的任務是判斷新觀測數據屬于兩個類別中的哪一個。在學習階段,這類分類器把訓練數據映射到叫作決策空間(decision space)的多維空間,創(chuàng)建叫作決策邊界的分離面,把決策空間分為兩個區(qū)域??煞譃?strong>SVR(Support Vector Regression,支持向量回歸)和SVC(Support Vector Classification,支持向量分類)。

  1. Standardize vs. Normalize
  • Standardize:標準化,一般指正態(tài)化,即均值為0,方差為1。一般采用z-score。
  • Normalize:歸一化,一般指將數據限制在[0,1]之間。一般采用最大-最小規(guī)范化對原始數據進行線性變換:X*=(X-Xmin)/(Xmax-Xmin)
  1. Backpropagation (反向傳播算法,BP)
  • to calculate the slope for a weight (aka the partial differential of the loss function with regard to the weight):


  1. 迭代(iteration)

指重復反饋過程的活動,其目的通常是為了接近并到達所需的目標或結果。每一次對過程的重復被稱為一次“迭代”,而每一次迭代得到的結果會被用來作為下一次迭代的初始值。

  • Validation Dataset
    The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
  • Test Dataset:
    The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.
    The Test dataset provides the gold standard used to evaluate the model. It is only used once a model is completely trained(using the train and validation sets). The test set is generally what is used to evaluate competing models.
  1. an example
# Import EarlyStopping
from keras.callbacks import EarlyStopping

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience = 2)

# Fit the model
model.fit(predictors, target, epochs=30, validation_split= 0.3, callbacks=[early_stopping_monitor])
  1. Networks
  • degree:The degree of a node is the number of neighbors that it has.
  • The degree centrality: the number of neighbors divided by all possible neighbors that it could have. Depending on whether self-loops are allowed, the set of possible neighbors a node could have could also include the node itself.
  • Betweenness Centrality:一個結點承擔最短路橋梁的次數除以所有(最短?)路徑數量。It is defined as the fraction of all possible shortest paths between any pair of nodes that pass through the node.
  • cliques: cliques are "groups of nodes that are fully connected to one another".
  • maximal clique: a maximal clique is a clique that cannot be extended by adding another node in the graph.
  1. Supervised learning tips:
    Pairwise relationships between continuous variables
    We typically want to avoid using variables that have strong correlations with each other -- hence avoiding feature redundancy -- for a few reasons:
  • To keep the model simple and improve interpretability (with many features, we run the risk of overfitting).
  • When our datasets are very large, using fewer features can drastically speed up our computation time.
  1. Since PCA uses the absolute variance of a feature to rotate the data, a feature with a broader range of values will overpower and bias the algorithm relative to the other features. To avoid this, we must first normalize our data. There are a few methods to do this, but a common way is through standardization, such that all features have a mean = 0 and standard deviation = 1 (the resultant is a z-score).

  2. random_state就是為了保證程序每次運行都分割一樣的訓練集合測試集。否則,同樣的算法模型在不同的訓練集和測試集上的效果不一樣。
    當你用sklearn分割完測試集和訓練集,確定模型和初始參數以后,你會發(fā)現程序每運行一次,都會得到不同的準確率,無法調參。這個時候就是因為沒有加random_state。加上以后就可以調參了。

  3. Bootstrapping: 自助法,bootstrap sampling也稱為可重復采樣/有放回采樣

  • 給定包含m個樣本的數據集D,我們對它進行采樣產生數據集D':每次隨機從D中挑選一個樣本,將其拷貝放入D',然后再將該樣本放回初始數據集D中,使得該樣本在下次采樣時仍有可能被采到;這個過程重復執(zhí)行m次后,我們就得到了包含m個樣本的數據集D‘。
  • 顯然,D中有一部分樣本會在D'中多次出現,而另一部分樣本不出現。


  • 自助法在數據集較小、難以有效劃分訓練/測試集時很有用;然而,自助法產生的數據集改變了初始數據集的分布,這會引入估計偏差。因此,在初始數據量足夠時,留出法(hold-out)和交叉驗證法(cross-validation)更常用一些。
  1. 查準率(precision)與查全率(recall):


    F1:基于查準率與查全率的調和平均:

  2. 調節(jié)hyperparemeters可以用GridSearchCV

  3. Deep Learning Layers:

  • MaxPooling. This passes a (2, 2) moving window over the image and downscales the image by outputting the maximum value within the window.
  • Conv2D. This adds a third convolutional layer since deeper models, i.e. models with more convolutional layers, are better able to learn features from images.
  • Dropout. This prevents the model from overfitting, i.e. perfectly remembering each image, by randomly setting 25% of the input units to 0 at each update during training.
  • Flatten. As its name suggests, this flattens the output from the convolutional part of the CNN into a one-dimensional feature vector which can be passed into the following fully connected layers.
  • Dense. Fully connected layer where every input is connected to every output.
  • Dropout. Another dropout layer to safeguard against overfitting, this time with a rate of 50%.
  1. RBM: 受限玻爾茲曼機。A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.
    Stochastic neural networks are a type of artificial neural networks built by introducing random variations into the network, either by giving the network's neurons stochastic transfer functions, or by giving them stochastic weights. This makes them useful tools for optimization problems, since the random fluctuations help it escape from local minima.

  2. GPU

  3. SVD: singular value decomposition, 奇異值分解。

  4. HAC: Hierarchical Agglomerative Clustering,層次聚類。

  5. t-SNE: T-distributed Stochastic Neighbor Embedding, It is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容