中文字幕一区有码,九久久精品国产,精品人妻成人网站

線性回歸模型是解決回歸任務(wù)的好起點。

簡單線性回歸

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
rng = np.random.RandomState(1)
x = 10*rng.rand(50)
y = 2*x-5+rng.randn(50)
plt.scatter(x,y);
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)

model.fit(x[:,np.newaxis],y)
xfit = np.linspace(0,10,1000)
yfit = model.predict(xfit[:,np.newaxis])

plt.scatter(x,y)
plt.plot(xfit,yfit);
print("Model slope: ", model.coef_[0]) 
print("Model intercept:", model.intercept_)

多維度的線性回歸模型：

rng = np.random.RandomState(1)
X = 10*rng.rand(100,3)
y = 0.5+np.dot(X,[1.5,-2.,1.])

model.fit(X,y)
model.intercept_,model.coef_

基函數(shù)回歸

多項式基函數(shù)

from sklearn.preprocessing import PolynomialFeatures
x = np.array([2,3,4])
poly = PolynomialFeatures(3,include_bias=False)
poly.fit(x[:,None])
from sklearn.pipeline import make_pipeline
poly_model = make_pipeline(PolynomialFeatures(7),LinearRegression())
rng = np.random.RandomState(1)
x = 10 * rng.rand(50)
y = np.sin(x) + 0.1*rng.randn(50)

poly_model.fit(x[:,np.newaxis],y)
yfit = poly_model.predict(xfit[:,np.newaxis])

plt.scatter(x,y)
plt.plot(xfit,yfit)

高斯基函數(shù)

from sklearn.base import BaseEstimator, TransformerMixin

class GaussianFeatures(BaseEstimator, TransformerMixin):
    """一維輸入均勻分布的高斯特征""" 
    
    def __init__(self, N, width_factor=2.0): 
        self.N = N 
        self.width_factor = width_factor 
    
    @staticmethod 
    def _gauss_basis(x, y, width, axis=None): 
        arg = (x - y) / width 
        return np.exp(-0.5 * np.sum(arg ** 2, axis)) 
    
    def fit(self, X, y=None): 
        # 在數(shù)據(jù)區(qū)間中創(chuàng)建N個高斯分布中心
        self.centers_ = np.linspace(X.min(), X.max(), self.N) 
        self.width_ = self.width_factor * (self.centers_[1] - self.centers_[0]) 
        return self 
    
    def transform(self, X): 
        return self._gauss_basis(X[:, :, np.newaxis], self.centers_, 
         self.width_, axis=1)
         
gauss_model = make_pipeline(GaussianFeatures(20),LinearRegression())

gauss_model.fit(x[:,np.newaxis],y)
yfit = gauss_model.predict(xfit[:,np.newaxis])

plt.scatter(x,y)
plt.plot(xfit,yfit)
plt.xlim(0,10);

當(dāng)你在一個Scikit-Learn管道中調(diào)用fit方法時，如果管道中包含轉(zhuǎn)換器（如你定義的GaussianFeatures類），那么對于每個轉(zhuǎn)換器，它的fit方法會被自動調(diào)用，隨后緊接著調(diào)用transform方法來轉(zhuǎn)換數(shù)據(jù)。這個轉(zhuǎn)換后的數(shù)據(jù)隨后傳遞給管道中的下一個步驟。如果下一步是另一個轉(zhuǎn)換器，這個過程會重復(fù)；如果下一步是估計器（如線性回歸模型），則只調(diào)用估計器的fit方法，因為估計器通常是管道的最后一步。
管道中的這種自動化處理流程是為了簡化數(shù)據(jù)預(yù)處理和模型訓(xùn)練的過程。當(dāng)你調(diào)用管道的fit方法時，你實際上是在執(zhí)行兩個主要步驟：
對管道中的每個轉(zhuǎn)換器，依次調(diào)用它的fit方法然后是transform方法，以此來訓(xùn)練轉(zhuǎn)換器并轉(zhuǎn)換數(shù)據(jù)。
對管道中的最后一個步驟（通常是一個估計器），調(diào)用它的fit方法，此時使用的數(shù)據(jù)是經(jīng)過前面所有轉(zhuǎn)換器處理后的數(shù)據(jù)。
因此，在你的例子中：gauss_model.fit(x[:, np.newaxis], y) 這里gauss_model是一個包含GaussianFeatures轉(zhuǎn)換器的管道。當(dāng)執(zhí)行這條fit命令時，實際上會先調(diào)用GaussianFeatures的fit方法來找到高斯基函數(shù)的中心和寬度，然后調(diào)用transform方法將原始數(shù)據(jù)x轉(zhuǎn)換成基于這些高斯基函數(shù)的新特征空間。這個轉(zhuǎn)換后的數(shù)據(jù)隨后被用于管道中的下一個步驟（如線性回歸模型）的訓(xùn)練。
這種機制確保了數(shù)據(jù)可以通過管道中定義的一系列轉(zhuǎn)換步驟被自動處理，最終以適當(dāng)?shù)男问接糜谀Ｐ陀?xùn)練，而用戶無需手動調(diào)用每個步驟的fit和transform方法。

案例：自行車流量預(yù)測

import pandas as pd
counts = pd.read_csv('SeattleBike-master/FremontHourly.csv', index_col='Date',parse_dates=True)
weather = pd.read_csv('SeattleBike-master/SeaTacWeather.csv', index_col='DATE', parse_dates=True)
daily = counts.resample('d').sum()
daily['Total'] = daily.sum(axis=1)
daily = daily['Total'].to_frame()
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] 
for i in range(7): 
    daily[days[i]] = (daily.index.dayofweek == i).astype(float)

from pandas.tseries.holiday import USFederalHolidayCalendar 
cal = USFederalHolidayCalendar() 
holidays = cal.holidays('2012', '2016') 
daily = daily.join(pd.Series(1, index=holidays, name='holiday')) 
daily['holiday'].fillna(0, inplace=True)

def hours_of_daylight(date, axis=23.44, latitude=47.61): 
    """計算指定日期的白晝時間""" 
    days = (date - pd.to_datetime('2000, 12, 21')).days 
    m = (1. - np.tan(np.radians(latitude)) 
    * np.tan(np.radians(axis) * np.cos(days * 2 * np.pi / 365.25))) 
    return 24. * np.degrees(np.arccos(1 - np.clip(m, 0, 2))) / 180. 
daily['daylight_hrs'] = list(map(hours_of_daylight, daily.index)) 
daily[['daylight_hrs']].plot();

# 溫度是按照1/10攝氏度統(tǒng)計的，首先轉(zhuǎn)換為攝氏度
weather['TMIN'] /= 10 
weather['TMAX'] /= 10 
weather['Temp (C)'] = 0.5 * (weather['TMIN'] + weather['TMAX']) 
# 降雨量也是按照1/10mm統(tǒng)計的，轉(zhuǎn)化為英寸
weather['PRCP'] /= 254 
weather['dry day'] = (weather['PRCP'] == 0).astype(int) 
daily = daily.join(weather[['PRCP', 'Temp (C)', 'dry day']])
daily['annual'] = (daily.index - daily.index[0]).days / 365.
print(daily.head())

column_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'holiday', 
    'daylight_hrs', 'PRCP', 'dry day', 'Temp (C)', 'annual'] 
X = daily[column_names]
y = daily['Total']

model = LinearRegression(fit_intercept=False)
model.fit(X,y)
daily['predicted'] = model.predict(X)

daily[['Total', 'predicted']].plot(alpha=0.5);
params = pd.Series(model.coef_, index=X.columns)
print(params)

from sklearn.utils import resample 
np.random.seed(1) 
err = np.std([model.fit(*resample(X, y)).coef_ for i in range(1000)], 0)
print(pd.DataFrame({'effect': params.round(0), 'error': err.round(0)}))

自舉重采樣（Bootstrap Resampling）是一種統(tǒng)計方法，用于通過從原始數(shù)據(jù)中重復(fù)抽樣（有放回地抽取樣本）來估計一個統(tǒng)計量的分布。這種方法特別適用于估計統(tǒng)計量的標(biāo)準誤差或置信區(qū)間，特別是在理論分布難以直接計算時。在機器學(xué)習(xí)和數(shù)據(jù)科學(xué)領(lǐng)域，自舉方法可以用來估計模型參數(shù)的不確定性。
err = np.std([model.fit(*resample(X, y)).coef_ for i in range(1000)], 0)這行代碼實現(xiàn)了自舉方法來估計線性回歸模型參數(shù)的不確定性。具體操作如下：

重采樣：使用resample(X, y)對原始數(shù)據(jù)集(X, y)進行重采樣，即隨機有放回地抽取樣本。這意味著某些原始樣本可能在重采樣的數(shù)據(jù)集中出現(xiàn)多次，而某些則可能根本不出現(xiàn)。

擬合模型：對每個重采樣得到的數(shù)據(jù)集，使用model.fit()方法擬合線性回歸模型，并獲取模型系數(shù)（.coef_）。

計算標(biāo)準差：重復(fù)步驟1和2共1000次，對這1000個不同的模型系數(shù)集合計算每個系數(shù)的標(biāo)準差。這給出了每個模型參數(shù)的不確定性估計，即參數(shù)估計的穩(wěn)定性或變異性。
通過這種方式，err反映了模型參數(shù)在不同重采樣數(shù)據(jù)集上的變異程度。如果某個參數(shù)的err值較大，意味著該參數(shù)的估計值對樣本數(shù)據(jù)較為敏感，其估計的不確定性較高。相反，如果err值較小，則表明參數(shù)估計較為穩(wěn)定，對隨機樣本抽樣的影響較小。
自舉重采樣是一種強大的非參數(shù)統(tǒng)計方法，能夠在不依賴于嚴格假設(shè)的前提下，為各種統(tǒng)計量提供有效的不確定性估計，從而在實際應(yīng)用中非常有用，尤其是在樣本大小有限或者理論分布未知的情況下。

在Python中，操作符用于函數(shù)調(diào)用時，表示將序列（如列表、元組）展開為函數(shù)的位置參數(shù)。model.fit(*resample(X, y)).coef_，這里resample函數(shù)實際上返回了兩個對象（比如兩個數(shù)組或兩個數(shù)據(jù)幀），操作符的作用就是將這兩個返回的對象分別作為model.fit()的第一個參數(shù)和第二個參數(shù)，而不是將一個包含兩個對象的元組作為單一參數(shù)傳遞。

另外，使用隨機森林模型可以更準確一些：

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100)
# model = LinearRegression(fit_intercept=False)
model.fit(X,y)
daily['predicted'] = model.predict(X)
daily[['Total', 'predicted']].plot(alpha=0.5);

參考：
[1]美萬托布拉斯 (VanderPlas, Jake).Python數(shù)據(jù)科學(xué)手冊[M].人民郵電出版社,2018.

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2024-03-15 線性回歸

2024-03-15 線性回歸

簡單線性回歸

基函數(shù)回歸

案例：自行車流量預(yù)測

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

2024-03-15 線性回歸

簡單線性回歸

基函數(shù)回歸

案例：自行車流量預(yù)測

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av