日韩性视频久久久久,亚洲日韩制服综合无码

根據(jù)摩拜提供的騎行，對(duì)其進(jìn)行初步分析。
訓(xùn)練集取北京某一區(qū)域的一段時(shí)間內(nèi)的部分?jǐn)?shù)據(jù)，測(cè)試集為同一區(qū)域未來(lái)一段時(shí)間的數(shù)據(jù)。
標(biāo)注數(shù)據(jù)中包含300萬(wàn)條出行記錄數(shù)據(jù)，覆蓋超過(guò)30萬(wàn)用戶和40萬(wàn)摩拜單車(chē)。數(shù)據(jù)包括騎行起始時(shí)間和地點(diǎn)、車(chē)輛ID、車(chē)輛類(lèi)型和用戶ID等信息。

首先導(dǎo)入數(shù)據(jù)分析包

import pandas as pd 
import seaborn as sns
import geohash
import matplotlib.pyplot as plt
from math import radians, cos, sin, asin, sqrt
%matplotlib inline

train = pd.read_csv("train.csv",sep = ',',parse_dates=['starttime'])
test = pd.read_csv("test.csv",sep = ',',parse_dates=['starttime'])

查到數(shù)據(jù)

train.head()

image.png

print(train.shape)
print(test.shape)

image.png

train=train.sample(frac=0.3) #抽樣30%

GEOHASH分析

GeoHash將二維的經(jīng)緯度轉(zhuǎn)換成字符串，比如下圖展示了北京9個(gè)區(qū)域的GeoHash字符串，分別是WX4ER，WX4G2、WX4G3等等，每一個(gè)字符串代表了某一矩形區(qū)域。也就是說(shuō)，這個(gè)矩形區(qū)域內(nèi)所有的點(diǎn)（經(jīng)緯度坐標(biāo)）都共享相同的GeoHash字符串，這樣既可以保護(hù)隱私（只表示大概區(qū)域位置而不是具體的點(diǎn)），又比較容易做緩存，比如左上角這個(gè)區(qū)域內(nèi)的用戶不斷發(fā)送位置信息請(qǐng)求餐館數(shù)據(jù)，由于這些用戶的GeoHash字符串都是WX4ER，所以可以把WX4ER當(dāng)作key，把該區(qū)域的餐館信息當(dāng)作value來(lái)進(jìn)行緩存，而如果不使用GeoHash的話，由于區(qū)域內(nèi)的用戶傳來(lái)的經(jīng)緯度是各不相同的，很難做緩存。
字符串越長(zhǎng)，表示的范圍越精確。如圖所示，5位的編碼能表示10平方千米范圍的矩形區(qū)域，而6位編碼能表示更精細(xì)的區(qū)域（約0.34平方千米）
字符串相似的表示距離相近，這樣可以利用字符串的前綴匹配來(lái)查詢(xún)附近的POI信息

image.png

geo編碼長(zhǎng)度誤差

image.png

對(duì)geohash的信息解碼

def processData(df):
    #time
    df['weekday']=df['starttime'].apply(lambda s:s.weekday())
    df['hour']=df['starttime'].apply(lambda s:s.hour)
    df['day']=df['starttime'].apply(lambda s:str(s)[:10])
    print('time process succuessfully')
    
    #Geohash
    df['start_lat_lng']=df['geohashed_start_loc'].apply(lambda s:geohash.decode(s))
    df['end_lat_lng']=df['geohashed_end_loc'].apply(lambda s:geohash.decode(s))
    df['start_neighbors']=df['geohashed_start_loc'].apply(lambda s:geohash.neighbors(s))
    
    df['geohashed_start_loc_6'] = df['geohashed_start_loc'].apply(lambda s : s[:6])
    df['geohashed_end_loc_6'] = df['geohashed_end_loc'].apply(lambda s : s[:6])
    df['start_neighbors_6'] =  df['geohashed_start_loc_6'].apply(lambda s : geohash.neighbors(s))
    
    df['geohashed_start_loc_5'] = df['geohashed_start_loc'].apply(lambda s : s[:5])
    df['geohashed_end_loc_5'] = df['geohashed_end_loc'].apply(lambda s : s[:5])
    df['start_neighbors_5'] =  df['geohashed_start_loc_5'].apply(lambda s : geohash.neighbors(s))
    
    print('geohash process successfully')
    
    #判斷目的地是否在neighbors
    def inGeohash(start_geohash,end_geohash,names):
        names.append(start_geohash)
        if end_geohash in names:
            return 1
        else:
            return 0
    df['inside']=df.apply(lambda s:inGeohash(s['geohashed_start_loc'],s['geohashed_end_loc'],s['start_neighbors']),axis=1)
    df['inside_6']=df.apply(lambda s:inGeohash(s['geohashed_start_loc_6'],s['geohashed_end_loc_6'],s['start_neighbors_6']),axis=1)
    df['inside_5']=df.apply(lambda s:inGeohash(s['geohashed_start_loc_5'],s['geohashed_end_loc_5'],s['start_neighbors_5']),axis=1)
    print('geo_inside process successfully')
    
    #計(jì)算兩個(gè)經(jīng)緯度點(diǎn)之間的公式 start->end
    def haversine(lon1,lat1,lon2,lat2):
        lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
        #公式
        dislon = lon2-lon1
        dislat = lat2-lat1
        a = sin(dislat/2)**2+cos(lat1)*cos(lat2)*sin(dislon/2)**2
        c= 2*asin(sqrt(a))
        r= 6371 #地球平均半徑（公里）
        return c*r*1000
    df['start_end_distance'] = df.apply(lambda s: haversine(s['start_lat_lng'][1],s['start_lat_lng'][0],
                                                            s['end_lat_lng'][1],s['end_lat_lng'][0]),axis=1)
    print('distance process successfully')
    return df

train =processData(train)

查看完成后的數(shù)據(jù)

image.png

根據(jù)時(shí)間段對(duì)數(shù)據(jù)進(jìn)行分析

def timeanalysis(df):
    #day
    print("數(shù)據(jù)集包含的天數(shù)：")
    print(df['day'].unique())
    print("*"*60)
    
    #周一至周日的用車(chē)分析
    g1 = df.groupby('weekday')
    print("周一至周日的用車(chē)數(shù)")
    print(pd.DataFrame(g1['orderid'].count()))
    print("*"*60)
    
    #周一至周日不同時(shí)間的用車(chē)分析
    df.loc[(df['weekday']==5)|(df['weekday']==6),'isweekend']=1
    df.loc[~((df['weekday']==5)|(df['weekday']==6)),'isweekend']=0
    g2 = df.groupby(['isweekend','hour'])
    
    
    print("*"*60)
    
    #計(jì)算工作日和周末的天數(shù)
    g3 = df.groupby(['day','weekday'])
    w = 0 #周末天數(shù)
    c = 0 #工作日天數(shù)
    for i,j in list(g3.groups.keys()):
        if j>=5:
            w +=1
        else:
            c +=1
    #print(w)
    #print(c)
    #
    temp = pd.DataFrame(g2['orderid'].count()).reset_index()
    
    temp.loc[temp['isweekend']==0.0,'orderid'] = temp['orderid']/c
    temp.loc[temp['isweekend']==1.0,'orderid'] = temp['orderid']/w
    #print(temp)
    print("周末和工作日平均每日每小時(shí)用車(chē)數(shù)比較")
    fig =plt.figure(figsize=(12,6))
    sns.barplot(temp['hour'],temp['orderid'],hue=temp['isweekend'])

timeanalysis(train)

image.png

周末和工作日平均每日每小時(shí)用車(chē)數(shù)比較

image.png

數(shù)據(jù)可視化分析

# 出行距離的描述統(tǒng)計(jì)
train['start_end_distance'].describe()

image.png

sns.distplot(train['start_end_distance'])

image.png

去除極值的影響

start_end_distance = train['start_end_distance']
start_end_distance=start_end_distance.loc[start_end_distance<5000]
sns.distplot(start_end_distance)

image.png

#不同時(shí)間騎行的距離是否不一樣
hour_group = train.groupby('hour')
hour_distance = hour_group['start_end_distance'].mean().reset_index()
sns.barplot(x='hour',y='start_end_distance',data=hour_distance)

image.png

不同時(shí)間段對(duì)騎行距離沒(méi)有很大影響

# 不同小時(shí)的出行次數(shù)
hour_id_num = hour_group['orderid'].count().reset_index()
sns.barplot(x='hour',y='orderid',data=hour_id_num)

image.png

可以看到早高峰和晚高峰人數(shù)比較多

isw_hour_group =train.groupby(['isweekend','hour'])
isw_hour_id_num =isw_hour_group['orderid'].count().reset_index()
fig = plt.figure(figsize=(10,6))
sns.barplot(x='hour',y='orderid',hue='isweekend',data=isw_hour_id_num)

plt.title("周末和工作日每小時(shí)總用車(chē)數(shù)比較")

image.png

可以看到工作日有早高峰晚高峰，而周末整個(gè)白天訂單的趨勢(shì)都比較均衡

用戶出發(fā)地與目的地分析

每天有多少用戶/車(chē)輛從該點(diǎn)出發(fā)或者到達(dá)

def analysis_1(data,target):
    g1 = data.groupby(['day',target])
    group_data = g1.agg({'orderid':'count','userid':'nunique','bikeid':'nunique'}).reset_index()
    for each in ['orderid','userid','bikeid']:
        sns.distplot(group_data[each])
        plt.show()
    return group_data

group_data = analysis_1(train,'geohashed_start_loc')

image.png

group_data_6 = analysis_1(train,'geohashed_start_loc_6')

image.png

出發(fā)點(diǎn)-目的地組合分析

start_end = train.groupby(['day','geohashed_start_loc','geohashed_end_loc'])
#計(jì)算出發(fā)點(diǎn)到停車(chē)店的訂單數(shù)，車(chē)輛數(shù)，用戶數(shù)
start_end.agg({'orderid':'count','userid':'nunique','bikeid':'nunique',
               'start_end_distance':'mean'}).reset_index().sort_values(by='orderid',ascending=False)

image.png

出發(fā)點(diǎn)和停車(chē)點(diǎn)不在一個(gè)區(qū)域的數(shù)量有

# 出發(fā)點(diǎn)在g5范圍內(nèi)不一致的數(shù)量
train.loc[train['geohashed_start_loc_5']!=train['geohashed_end_loc_5']].shape[0]

225562

# 出發(fā)點(diǎn)在g6范圍內(nèi)不一致的數(shù)量
train.loc[train['geohashed_start_loc_6']!=train['geohashed_end_loc_6']].shape[0]

772933

對(duì)于摩拜單車(chē)的是可視化分析先做到這里，主要做了：

區(qū)域geohash的解碼，計(jì)算了經(jīng)緯坐標(biāo)的距離
不同時(shí)間段的騎行數(shù)據(jù)可視化展示，發(fā)現(xiàn)工作日和周末的騎行數(shù)據(jù)不同之處，而不同時(shí)間段對(duì)騎行距離是沒(méi)有影響的
出發(fā)地的分析，總結(jié)出高頻地點(diǎn)
出發(fā)地和目的地的組合分析，總結(jié)出高頻路線
在不同的g6，g5, g4網(wǎng)格下，檢查統(tǒng)計(jì)用戶是否會(huì)騎出所在geohash網(wǎng)格

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

摩拜單車(chē)數(shù)據(jù)初步分析

摩拜單車(chē)數(shù)據(jù)初步分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

摩拜單車(chē)數(shù)據(jù)初步分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av