青青国产草,欧美啪啪啪啪噜噜噜噜,3571 色影院

我們已經(jīng)了解了回歸，甚至編寫了我們自己的簡單線性回歸算法。并且，我們也構(gòu)建了判定系數(shù)算法來檢查最佳擬合直線的準(zhǔn)確度和可靠性。我們之前討論和展示過，最佳擬合直線可能不是最好的擬合，也解釋了為什么我們的示例方向上是正確的，即使并不準(zhǔn)確。但是現(xiàn)在，我們使用兩個頂級算法，它們由一些小型算法組成。隨著我們繼續(xù)構(gòu)造這種算法層次，如果它們之中有個小錯誤，我們就會遇到麻煩，所以我們打算驗證我們的假設(shè)。

在編程的世界中，系統(tǒng)化的程序測試通常叫做“單元測試”。這就是大型程序構(gòu)建的方式，每個小型的子系統(tǒng)都不斷檢查。隨著大型程序的升級和更新，可以輕易移除一些和之前系統(tǒng)沖突的工具。使用機器學(xué)習(xí)，這也是個問題，但是我們的主要關(guān)注點僅僅是測試我們的假設(shè)。最后，你應(yīng)該足夠聰明，可以為你的整個機器學(xué)習(xí)系統(tǒng)創(chuàng)建單元測試，但是目前為止，我們需要盡可能簡單。

我們的假設(shè)是，我們創(chuàng)建了最賤he直線，之后使用判定系數(shù)法來測量。我們知道（數(shù)學(xué)上），R 平方的值越低，最佳擬合直線就越不好，并且越高（接近 1）就越好。我們的假設(shè)是，我們構(gòu)建了一個這樣工作的系統(tǒng)，我們的系統(tǒng)有許多部分，即使是一個小的操作錯誤都會產(chǎn)生很大的麻煩。我們?nèi)绾螠y試算法的行為，保證任何東西都預(yù)期工作呢？

這里的理念是創(chuàng)建一個樣例數(shù)據(jù)集，由我們定義，如果我們有一個正相關(guān)的數(shù)據(jù)集，相關(guān)性非常強，如果相關(guān)性很弱的話，點也不是很緊密。我們用眼睛很容易評測這個直線，但是機器應(yīng)該做得更好。讓我們構(gòu)建一個系統(tǒng)，生成示例數(shù)據(jù)，我們可以調(diào)整這些參數(shù)。

最開始，我們構(gòu)建一個框架函數(shù)，模擬我們的最終目標(biāo)：

def create_dataset(hm,variance,step=2,correlation=False):

    return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64)

我們查看函數(shù)的開頭，它接受下列參數(shù)：

hm（how much）：這是生成多少個數(shù)據(jù)點。例如我們可以選擇 10，或者一千萬。
variance：決定每個數(shù)據(jù)點和之前的數(shù)據(jù)點相比，有多大變化。變化越大，就越不緊密。
step：每個點距離均值有多遠，默認(rèn)為 2。
correlation：可以為False、pos或者neg，決定不相關(guān)、正相關(guān)和負(fù)相關(guān)。

要注意，我們也導(dǎo)入了random，這會幫助我們生成（偽）隨機數(shù)據(jù)集。

現(xiàn)在我們要開始填充函數(shù)了。

def create_dataset(hm,variance,step=2,correlation=False):
    val = 1
    ys = []
    for i in range(hm):
        y = val + random.randrange(-variance,variance)
        ys.append(y)

非常簡單，我們僅僅使用hm變量，迭代我們所選的范圍，將當(dāng)前值加上一個負(fù)差值到證差值的隨機范圍。這會產(chǎn)生數(shù)據(jù)，但是如果我們想要的話，它沒有相關(guān)性。讓我們這樣：

def create_dataset(hm,variance,step=2,correlation=False):
    val = 1
    ys = []
    for i in range(hm):
        y = val + random.randrange(-variance,variance)
        ys.append(y)
        if correlation and correlation == 'pos':
            val+=step
        elif correlation and correlation == 'neg':
            val-=step

非常棒了，現(xiàn)在我們定義好了 y 值。下面，讓我們創(chuàng)建 x，它更簡單，只是返回所有東西。

def create_dataset(hm,variance,step=2,correlation=False):
    val = 1
    ys = []
    for i in range(hm):
        y = val + random.randrange(-variance,variance)
        ys.append(y)
        if correlation and correlation == 'pos':
            val+=step
        elif correlation and correlation == 'neg':
            val-=step

    xs = [i for i in range(len(ys))]
    
    return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64)

我們準(zhǔn)備好了。為了創(chuàng)建樣例數(shù)據(jù)集，我們所需的就是：

xs, ys = create_dataset(40,40,2,correlation='pos')

讓我們將之前線性回歸教程的代碼放到一起：

from statistics import mean
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')


def create_dataset(hm,variance,step=2,correlation=False):
    val = 1
    ys = []
    for i in range(hm):
        y = val + random.randrange(-variance,variance)
        ys.append(y)
        if correlation and correlation == 'pos':
            val+=step
        elif correlation and correlation == 'neg':
            val-=step

    xs = [i for i in range(len(ys))]
    
    return np.array(xs, dtype=np.float64),np.array(ys,dtype=np.float64)

def best_fit_slope_and_intercept(xs,ys):
    m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
         ((mean(xs)*mean(xs)) - mean(xs*xs)))
    
    b = mean(ys) - m*mean(xs)

    return m, b


def coefficient_of_determination(ys_orig,ys_line):
    y_mean_line = [mean(ys_orig) for y in ys_orig]

    squared_error_regr = sum((ys_line - ys_orig) * (ys_line - ys_orig))
    squared_error_y_mean = sum((y_mean_line - ys_orig) * (y_mean_line - ys_orig))

    print(squared_error_regr)
    print(squared_error_y_mean)

    r_squared = 1 - (squared_error_regr/squared_error_y_mean)

    return r_squared


xs, ys = create_dataset(40,40,2,correlation='pos')
m, b = best_fit_slope_and_intercept(xs,ys)
regression_line = [(m*x)+b for x in xs]
r_squared = coefficient_of_determination(ys,regression_line)
print(r_squared)

plt.scatter(xs,ys,color='#003F72', label = 'data')
plt.plot(xs, regression_line, label = 'regression line')
plt.legend(loc=4)
plt.show()

執(zhí)行代碼，你會看到：

判定系數(shù)是 0.516508576011（要注意你的結(jié)果不會相同，因為我們使用了隨機數(shù)范圍）。

不錯，所以我們的假設(shè)是，如果我們生成一個更加緊密相關(guān)的數(shù)據(jù)集，我們的 R 平方或判定系數(shù)應(yīng)該更好。如何實現(xiàn)它呢？很簡單，把范圍調(diào)低。

xs, ys = create_dataset(40,10,2,correlation='pos')

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2020-05-21

2020-05-21

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

2020-05-21

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av