久久AV国产成人精品,最新久久九九视频

偏差和方差的判別

高偏差和高方差本質(zhì)上為學(xué)習(xí)模型的欠擬合和過擬合問題。

對于高偏差和高方差問題，即學(xué)習(xí)模型的欠擬合和過擬合問題，我們通常繪制如下圖表進行判斷：

高偏差——欠擬合問題

J_train(Θ)誤差大
J_CV(Θ)誤差 ≈ J_train(Θ)誤差

高方差——過擬合問題

J_train(Θ)誤差小
J_CV(Θ)誤差 >> J_train(Θ)誤差

補充筆記

Diagnosing Bias vs. Variance

In this section we examine the relationship between the degree of the polynomial d and the underfitting or overfitting of our hypothesis.

We need to distinguish whether bias or variance is the problem contributing to bad predictions.
High bias is underfitting and high variance is overfitting. Ideally, we need to find a golden mean between these two.

The training error will tend to decrease as we increase the degree d of the polynomial.

At the same time, the cross validation error will tend to decrease as we increase d up to a point, and then it will increase as d is increased, forming a convex curve.

High bias (underfitting): both J_train(Θ) and J_CV(Θ) will be high. Also, J_CV(Θ)≈J_train(Θ).

High variance (overfitting): J_train(Θ) will be low and J_CV(Θ) will be much greater than J_train(Θ).

The is summarized in the figure below:

正則化的偏差與方差

在訓(xùn)練模型的過程中，為了避免過擬合問題我們通常使用正則化方法。但對于正則化參數(shù)λ的選擇，我們是需要謹(jǐn)慎考慮的。

之前，我們在考慮正則化參數(shù)λ的選擇時，只是考慮單變量的情況?，F(xiàn)在，我們要考慮在多項式的情況下，正則化參數(shù)λ的取值問題。

例如：對于某一多項式模型，我們使用正則化方法。其中，正則化參數(shù)λ=0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10。現(xiàn)求出最佳的正則化參數(shù)λ的值。

首先，我們將數(shù)據(jù)集分為訓(xùn)練集、交叉驗證集和測試集三部分。

然后，當(dāng)正則化參數(shù)λ=0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10時，我們分別求出J_tran(θ)和J_CV(θ)。

最后，我們利用測試集對J_CV(θ)最小時的某個正則化參數(shù)λ值進行計算，求出其J_test(θ)。

圖中，假設(shè)正則化參數(shù)λ=0.08時，J_CV(θ)最小。

為了便于理解，以及便于找到最佳的正則化參數(shù)λ的值，我們可以畫出下圖：

補充筆記

Regularization and Bias/Variance

In the figure above, we see that as λ increases, our fit becomes more rigid. On the other hand, as λ approaches 0, we tend to over overfit the data. So how do we choose our parameter λ to get it 'just right' ? In order to choose the model and the regularization term λ, we need to:

Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10.24});
Create a set of models with different degrees or any other variants.
Iterate through the λs and for each λ go through all the models to learn some Θ.
Compute the cross validation error using the learned Θ (computed with λ) on the J_CV(Θ) without regularization or λ = 0.
Select the best combo that produces the lowest error on the cross validation set.
Using the best combo Θ and λ, apply it on J_test(Θ) to see if it has a good generalization of the problem.

學(xué)習(xí)曲線

通過繪制學(xué)習(xí)曲線可以幫助我們了解學(xué)習(xí)算法是否運行正常。學(xué)習(xí)曲線為訓(xùn)練集誤差、交叉驗證集誤差與訓(xùn)練集樣本數(shù)量m之間的函數(shù)關(guān)系圖。

上圖中，假設(shè)函數(shù)為h_θ(x) = θ₀ + θ₁x + θ₂x²，且此處不考慮正則化。當(dāng)m = 1時，我們的假設(shè)函數(shù)h_θ(x)能完美擬合訓(xùn)練集，其J_train(θ) = 0，但對于交叉驗證集而言，假設(shè)函數(shù)h_θ(x)的泛化能力差，其J_CV(θ)的值將較大；當(dāng)m=2時，我們的假設(shè)函數(shù)h_θ能較好地擬合訓(xùn)練集，其J_train(θ)的值將稍微增大，但對于交叉驗證集而言，假設(shè)函數(shù)h_θ(x)的泛化能力依舊較差，其J_CV(θ)的值將較比之前有略微減??；······；但m足夠大時，J_train(θ)的值將增大到某一特定值后保持水平，J_CV(θ)的值將減小到某一特定值后保持水平，且J_train(θ)的值與J_CV(θ)的值非常接近。

因此，當(dāng)學(xué)習(xí)算法處于高偏差的情況時，我們增加訓(xùn)練集樣本數(shù)量是毫無用處的。

上圖中，我們的假設(shè)函數(shù)h_θ(x) = θ₀ + θ₁x + θ₂x² + ... + θ₁₀₀x¹⁰⁰，此處考慮正則化，其中正則化參數(shù)λ的值很小。當(dāng)m = 5時，假設(shè)函數(shù)h_θ(x)能夠較好地擬合訓(xùn)練集，其J_train(θ)的值較小，但假設(shè)函數(shù)h_θ(x)的泛化能力較差，其J_CV(θ)的值較大；當(dāng)m = 12時，假設(shè)函數(shù)h_θ(x)依舊能夠較好地擬合訓(xùn)練集，但其J_train(θ)的值稍微增大一些，J_CV(θ)的值略微減小一些；······；當(dāng)m足夠大時，J_train(θ)的值逐漸增大，J_CV(θ)的值逐漸減小。

因此，此時學(xué)習(xí)算法處于高偏差的情況時，我們增加訓(xùn)練集樣本數(shù)量可能會有些幫助。

注：當(dāng)m足夠大時，J_train(θ)的值逐漸增大，J_CV(θ)的值逐漸減小，這兩者是否會相交，視頻中尚未交代清楚。

補充筆記

Learning Curves

Training an algorithm on a very few number of data points (such as 1, 2 or 3) will easily have 0 errors because we can always find a quadratic curve that touches exactly those number of points. Hence:

As the training set gets larger, the error for a quadratic function increases.
The error value will plateau out after a certain m, or training set size.

Experiencing high bias:

Low training set size: causes J_train(Θ) to be low and J_CV(Θ) to be high.

Large training set size: causes both J_train(Θ) and J_CV(Θ) to be high with J_train(Θ)≈J_CV(Θ).

If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much.

Experiencing high variance:

Low training set size: J_train(Θ) will be low and J_CV(Θ) will be high.

Large training set size: J_train(Θ) increases with training set size and J_CV(Θ) continues to decrease without leveling off. Also, J_train(Θ) < J_CV(Θ) but the difference between them remains significant.

If a learning algorithm is suffering from high variance, getting more training data is likely to help.

下一步?jīng)Q定做什么

在機器學(xué)習(xí)應(yīng)用建議（一）一文的開頭，我們就預(yù)測結(jié)果存在高誤差而提出了如下的解決方法：

獲取更多的樣本
嘗試減少特征變量的數(shù)量
嘗試獲取更多的特征變量
嘗試增加多項式特征
嘗試減小正則化參數(shù)λ的值
嘗試增大正則化參數(shù)λ的值

對于這些方法，我們分別進行了研究得出了如下結(jié)論：

獲取更多的樣本——適合高方差（過擬合）問題
嘗試減少特征變量的數(shù)量——適合高方差（過擬合）問題
嘗試獲取更多的特征變量——適合高偏差（欠擬合）問題
嘗試增加多項式特征——適合高偏差（欠擬合）問題
嘗試減小正則化參數(shù)λ的值——適合高偏差（欠擬合）問題
嘗試增大正則化參數(shù)λ的值 ——適合高方差（過擬合）問題

對于神經(jīng)網(wǎng)絡(luò)模型而言，使用“小”的模型，其容易出現(xiàn)高偏差（欠擬合）問題，但其優(yōu)勢在于計算代價較??；使用“大”的模型（即隱藏層激活單元較多或有多個隱藏層。），其容易出現(xiàn)高方差（過擬合）問題，且其計算代價較大。但一般而言，正則化的神經(jīng)網(wǎng)絡(luò)模型越“大”其性能越好。

通常我們選擇只含有一層隱藏層的神經(jīng)網(wǎng)絡(luò)模型。但對于其他情況，只含有一層隱藏層的神經(jīng)網(wǎng)絡(luò)模型并不是最優(yōu)的模型。因此，我們可以將數(shù)據(jù)集分為訓(xùn)練集、交叉驗證集和測試集三部分，分別對隱藏層層數(shù)不同的神經(jīng)網(wǎng)絡(luò)模型進行訓(xùn)練，找到一個J_CV(Θ)最小的神經(jīng)網(wǎng)絡(luò)模型為止。

補充筆記

Deciding What to Do Next Revisited

Our decision process can be broken down as follows:

Getting more training examples: Fixes high variance
Trying smaller sets of features: Fixes high variance
Adding features: Fixes high bias
Adding polynomial features: Fixes high bias
Decreasing λ: Fixes high bias
Increasing λ: Fixes high variance.

Diagnosing Neural Networks

A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.
A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Using a single hidden layer is a good starting default. You can train your neural network on a number of hidden layers using your cross validation set. You can then select the one that performs best.

Model Complexity Effects:

Lower-order polynomials (low model complexity) have high bias and low variance. In this case, the model fits poorly consistently.
Higher-order polynomials (high model complexity) fit the training data extremely well and the test data extremely poorly. These have low bias on the training data, but very high variance.
In reality, we would want to choose a model somewhere in between, that can generalize well but also fits the data reasonably well.

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

機器學(xué)習(xí)應(yīng)用建議（二）

機器學(xué)習(xí)應(yīng)用建議（二）

偏差和方差的判別

補充筆記

Diagnosing Bias vs. Variance

正則化的偏差與方差

補充筆記

Regularization and Bias/Variance

學(xué)習(xí)曲線

補充筆記

Learning Curves

下一步?jīng)Q定做什么

補充筆記

Deciding What to Do Next Revisited

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

機器學(xué)習(xí)應(yīng)用建議（二）

偏差和方差的判別

補充筆記

Diagnosing Bias vs. Variance

正則化的偏差與方差

補充筆記

Regularization and Bias/Variance

學(xué)習(xí)曲線

補充筆記

Learning Curves

下一步?jīng)Q定做什么

補充筆記

Deciding What to Do Next Revisited

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av