Python對微信好友進行簡單統(tǒng)計分析

早些日子有人問我我的微信里面有一共多少朋友,我就隨后拉倒了通訊錄最下面就找到了微信一共有多少位好友。然后他又問我,這里面你認(rèn)識多少人?這一句話問的我很無語。一千多個好友我真的不知道認(rèn)識的人有多少。他還緊追著不放了,你知道你微信朋友的男女比例嘛?你知道你微信朋友大部分來自什么地方嗎?

不知道不知道不知道!偶然有一天碰到大大的一條朋友圈,大概是對微信朋友做一個分析,于是乎我才想起我也可以做一些簡單的統(tǒng)計,于是就有了今天的內(nèi)容。今天的內(nèi)容會以代碼簡單講解結(jié)果展示為向?qū)А?/p>

以下的代碼內(nèi)容只涉及一些簡單的Python知識,稍微有一點Python知識的朋友都可以讀下去。 如果你沒有Python的知識你可能需要去學(xué)習(xí)一下Python,當(dāng)然你也可以不用學(xué),搭建好Python的環(huán)境就好,期間可能需要用到一些庫需要自己去解決一下,在下文中也會詳細(xì)訴述。

編程零基礎(chǔ)應(yīng)當(dāng)如何開始學(xué)習(xí) Python ? - 路人甲的回答網(wǎng)易云課堂上有哪些值得推薦的 Python 教程? - 路人甲的回答如何學(xué)習(xí)Python爬蟲[入門篇] - 學(xué)習(xí)編程 - 知乎專欄

第一步:首先抓取微信朋友的資料

既然是要做統(tǒng)計和分析,第一步就是微信朋友的所有可以抓取的資料抓取出來。所謂有用的資料大致來說有以下幾個內(nèi)容:

昵稱、微信號、城市、性別、星標(biāo)好友、頭像、個性簽名、備注

每一項或者聯(lián)合項可以做的統(tǒng)計

性別:好友性別統(tǒng)計

城市:好友地區(qū)分布

備注+昵稱:大致統(tǒng)計認(rèn)識的好友比例

頭像:人臉識別

那么如何抓取呢?這里使用了之前有一位大神寫的如何找出被刪的好友的代碼,修改部分為從提取json數(shù)據(jù)截斷,對返回的json數(shù)據(jù)進行提取分別找到了以下的所需要的信息:

代碼修改為:

#!/usr/bin/env python# encoding=utf-8from__future__importprint_functionimportosimportrequestsimportreimporttimeimportxml.dom.minidomimportjsonimportsysimportmathimportsubprocessimportsslimportthreadingimporturllib,urllib2DEBUG =FalseMAX_GROUP_NUM =2# 每組人數(shù)INTERFACE_CALLING_INTERVAL =5# 接口調(diào)用時間間隔, 間隔太短容易出現(xiàn)"操作太頻繁", 會被限制操作半小時左右MAX_PROGRESS_LEN =50QRImagePath = os.path.join(os.getcwd(),'qrcode.jpg')tip =0uuid =''base_uri =''redirect_uri =''push_uri =''skey =''wxsid =''wxuin =''pass_ticket =''deviceId ='e000000000000000'BaseRequest = {}ContactList = []My = []SyncKey = []try: xrange range = xrangeexcept:# python 3passdefresponseState(func, BaseResponse):ErrMsg = BaseResponse['ErrMsg'] Ret = BaseResponse['Ret']ifDEBUGorRet !=0: print('func: %s, Ret: %d, ErrMsg: %s'% (func, Ret, ErrMsg))ifRet !=0:returnFalsereturnTruedefgetUUID():globaluuid url ='https://login.weixin.qq.com/jslogin'params = {'appid':'wx782c26e4c19acffb','fun':'new','lang':'zh_CN','_': int(time.time()), } r= myRequests.get(url=url, params=params) r.encoding ='utf-8'data = r.text# print(data)# window.QRLogin.code = 200; window.QRLogin.uuid = "oZwt_bFfRg==";regx =r'window.QRLogin.code = (\d+); window.QRLogin.uuid = "(\S+?)"'pm = re.search(regx, data) code = pm.group(1) uuid = pm.group(2)ifcode =='200':returnTruereturnFalsedefshowQRImage():globaltip url ='https://login.weixin.qq.com/qrcode/'+ uuid params = {'t':'webwx','_': int(time.time()), } r = myRequests.get(url=url, params=params) tip =1f = open(QRImagePath,'wb') f.write(r.content) f.close() time.sleep(1)ifsys.platform.find('darwin') >=0: subprocess.call(['open', QRImagePath])else: subprocess.call(['xdg-open', QRImagePath]) print('請使用微信掃描二維碼以登錄')defwaitForLogin():globaltip, base_uri, redirect_uri, push_uri url ='https://login.weixin.qq.com/cgi-bin/mmwebwx-bin/login?tip=%s&uuid=%s&_=%s'% ( tip, uuid, int(time.time())) r = myRequests.get(url=url) r.encoding ='utf-8'data = r.text# print(data)# window.code=500;regx =r'window.code=(\d+);'pm = re.search(regx, data) code = pm.group(1)ifcode =='201':# 已掃描print('成功掃描,請在手機上點擊確認(rèn)以登錄') tip =0elifcode =='200':# 已登錄print('正在登錄...') regx =r'window.redirect_uri="(\S+?)";'pm = re.search(regx, data) redirect_uri = pm.group(1) +'&fun=new'base_uri = redirect_uri[:redirect_uri.rfind('/')]# push_uri與base_uri對應(yīng)關(guān)系(排名分先后)(就是這么奇葩..)services = [ ('wx2.qq.com','webpush2.weixin.qq.com'), ('qq.com','webpush.weixin.qq.com'), ('web1.wechat.com','webpush1.wechat.com'), ('web2.wechat.com','webpush2.wechat.com'), ('wechat.com','webpush.wechat.com'), ('web1.wechatapp.com','webpush1.wechatapp.com'), ] push_uri = base_urifor(searchUrl, pushUrl)inservices:ifbase_uri.find(searchUrl) >=0: push_uri ='https://%s/cgi-bin/mmwebwx-bin'% pushUrlbreak# closeQRImageifsys.platform.find('darwin') >=0:# for OSX with Previewos.system("osascript -e 'quit app \"Preview\"'")elifcode =='408':# 超時pass# elif code == '400' or code == '500':returncodedeflogin():globalskey, wxsid, wxuin, pass_ticket, BaseRequest r = myRequests.get(url=redirect_uri) r.encoding ='utf-8'data = r.text# print(data)doc = xml.dom.minidom.parseString(data) root = doc.documentElementfornodeinroot.childNodes:ifnode.nodeName =='skey': skey = node.childNodes[0].dataelifnode.nodeName =='wxsid': wxsid = node.childNodes[0].dataelifnode.nodeName =='wxuin': wxuin = node.childNodes[0].dataelifnode.nodeName =='pass_ticket': pass_ticket = node.childNodes[0].data# print('skey: %s, wxsid: %s, wxuin: %s, pass_ticket: %s' % (skey, wxsid,# wxuin, pass_ticket))ifnotall((skey, wxsid, wxuin, pass_ticket)):returnFalseBaseRequest = {'Uin': int(wxuin),'Sid': wxsid,'Skey': skey,'DeviceID': deviceId, }returnTruedefwebwxinit():url = (base_uri +'/webwxinit?pass_ticket=%s&skey=%s&r=%s'% ( pass_ticket, skey, int(time.time())) ) params = {'BaseRequest': BaseRequest } headers = {'content-type':'application/json; charset=UTF-8'} r = myRequests.post(url=url, data=json.dumps(params),headers=headers) r.encoding ='utf-8'data = r.json()ifDEBUG: f = open(os.path.join(os.getcwd(),'webwxinit.json'),'wb') f.write(r.content) f.close()# print(data)globalContactList, My, SyncKey dic = data ContactList = dic['ContactList'] My = dic['User'] SyncKey = dic['SyncKey'] state = responseState('webwxinit', dic['BaseResponse'])returnstatedefwebwxgetcontact():url = (base_uri +'/webwxgetcontact?pass_ticket=%s&skey=%s&r=%s'% ( pass_ticket, skey, int(time.time())) ) headers = {'content-type':'application/json; charset=UTF-8'} r = myRequests.post(url=url,headers=headers) r.encoding ='utf-8'data = r.json()ifDEBUG: f = open(os.path.join(os.getcwd(),'webwxgetcontact.json'),'wb') f.write(r.content) f.close() dic = data MemberList = dic['MemberList']# 倒序遍歷,不然刪除的時候出問題..SpecialUsers = ["newsapp","fmessage","filehelper","weibo","qqmail","tmessage","qmessage","qqsync","floatbottle","lbsapp","shakeapp","medianote","qqfriend","readerapp","blogapp","facebookapp","masssendapp","meishiapp","feedsapp","voip","blogappweixin","weixin","brandsessionholder","weixinreminder","wxid_novlwrv3lqwv11","gh_22b87fa7cb3c","officialaccounts","notification_messages","wxitil","userexperience_alarm"]foriinrange(len(MemberList) -1,-1,-1): Member = MemberList[i]ifMember['VerifyFlag'] &8!=0:# 公眾號/服務(wù)號MemberList.remove(Member)elifMember['UserName']inSpecialUsers:# 特殊賬號MemberList.remove(Member)elifMember['UserName'].find('@@') !=-1:# 群聊MemberList.remove(Member)elifMember['UserName'] == My['UserName']:# 自己MemberList.remove(Member)returnMemberListdefsyncKey():SyncKeyItems = ['%s_%s'% (item['Key'], item['Val'])foriteminSyncKey['List']] SyncKeyStr ='|'.join(SyncKeyItems)returnSyncKeyStrdefsyncCheck():url = push_uri +'/synccheck?'params = {'skey': BaseRequest['Skey'],'sid': BaseRequest['Sid'],'uin': BaseRequest['Uin'],'deviceId': BaseRequest['DeviceID'],'synckey': syncKey(),'r': int(time.time()), } r = myRequests.get(url=url,params=params) r.encoding ='utf-8'data = r.text# print(data)# window.synccheck={retcode:"0",selector:"2"}regx =r'window.synccheck={retcode:"(\d+)",selector:"(\d+)"}'pm = re.search(regx, data) retcode = pm.group(1) selector = pm.group(2)returnselectordefwebwxsync():globalSyncKey url = base_uri +'/webwxsync?lang=zh_CN&skey=%s&sid=%s&pass_ticket=%s'% ( BaseRequest['Skey'], BaseRequest['Sid'], urllib.quote_plus(pass_ticket)) params = {'BaseRequest': BaseRequest,'SyncKey': SyncKey,'rr': ~int(time.time()), } headers = {'content-type':'application/json; charset=UTF-8'} r = myRequests.post(url=url, data=json.dumps(params)) r.encoding ='utf-8'data = r.json()# print(data)dic = data SyncKey = dic['SyncKey'] state = responseState('webwxsync', dic['BaseResponse'])returnstatedefheartBeatLoop():whileTrue: selector = syncCheck()ifselector !='0': webwxsync() time.sleep(1)defmain():globalmyRequestsifhasattr(ssl,'_create_unverified_context'): ssl._create_default_https_context = ssl._create_unverified_context headers = {'User-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36'} myRequests = requests.Session() myRequests.headers.update(headers)ifnotgetUUID(): print('獲取uuid失敗')returnprint('正在獲取二維碼圖片...') showQRImage()whilewaitForLogin() !='200':passos.remove(QRImagePath)ifnotlogin(): print('登錄失敗')returnifnotwebwxinit(): print('初始化失敗')returnMemberList = webwxgetcontact() threading.Thread(target=heartBeatLoop) MemberCount = len(MemberList) print('通訊錄共%s位好友'% MemberCount) d = {} imageIndex =0forMemberinMemberList: imageIndex = imageIndex +1name ='/root/Desktop/friendImage/image'+str(imageIndex)+'.jpg'imageUrl ='https://wx.qq.com'+Member['HeadImgUrl'] r = myRequests.get(url=imageUrl,headers=headers) imageContent = (r.content) fileImage = open(name,'wb') fileImage.write(imageContent) fileImage.close() print('正在下載第:'+str(imageIndex)+'位好友頭像') d[Member['UserName']] = (Member['NickName'], Member['RemarkName']) city = Member['City'] city ='nocity'ifcity ==''elsecity name = Member['NickName'] name ='noname'ifname ==''elsename sign = Member['Signature'] sign ='nosign'ifsign ==''elsesign remark = Member['RemarkName'] remark ='noremark'ifremark ==''elseremark alias = Member['Alias'] alias ='noalias'ifalias ==''elsealias nick = Member['NickName'] nick ='nonick'ifnick ==''elsenick print(name,' ^+*+^ ',city,' ^+*+^ ',Member['Sex'],' ^+*+^ ',Member['StarFriend'],' ^+*+^ ',sign,' ^+*+^ ',remark,' ^+*+^ ',alias,' ^+*+^ ',nick )if__name__ =='__main__': main() print('回車鍵退出...') input()

所返回的json結(jié)果如下圖所示

昵稱、微信號、城市、性別、星標(biāo)好友、頭像、個性簽名、備注。提取以上信息,對頭像圖片進行下載,并對數(shù)據(jù)進行簡單的清洗等等,最后一列為微信號不方便顯示。

第二步:性別統(tǒng)計和地區(qū)分布

使用python的pandas科學(xué)計算庫進行簡單的統(tǒng)計,如果你沒有用過,可以轉(zhuǎn)至如下鏈接進行安裝學(xué)習(xí):【原】十分鐘搞定pandas

只要掌握了非常簡單的pandas只是就可以繼續(xù)往下看做以下統(tǒng)計

(1)、所有好友的男女比例

(2)、所有好友的城市分布

(3)、統(tǒng)計認(rèn)識的朋友以及占所有朋友的百分比

統(tǒng)計方法:所有朋友 - 沒有備注的朋友 - 備注與昵稱相同的朋友

(4)、統(tǒng)計認(rèn)識的朋友中的男女比例

統(tǒng)計方法:對三的結(jié)果再進行男女劃分即可得到結(jié)果

#-*- coding: UTF-8 -*- importpandasaspddf = pd.read_csv('/root/Desktop/friend02.csv')defcity():'''微信朋友圈的城市'''address = df['city'].value_counts()printaddressdefgender():'''微信朋友的性別比例

1:男 2:女 3:未知

'''gender = df['male'].value_counts()printgenderdefstar():'''星標(biāo)好友

1:星標(biāo) 0:非星標(biāo)

'''star = df['star'].value_counts()printstardefremark():remark = df['remark'] name = df['name']? remarkCount =0maleCount =0femaleCount =0foriinrange(1,len(remark)):ifstr(remark[i]).strip() == str(name[i]).strip()orremark[i] ==' noremark ': remarkCount = remarkCount +1else:ifjudgeGender(i) =='male': maleCount = maleCount +1elifjudgeGender(i) =='female': femaleCount = femaleCount +1print'微信總朋友人數(shù):',str(len(remark)),'\n'print'預(yù)計認(rèn)識的總?cè)藬?shù):',str(len(remark)-remarkCount),'\n'print'認(rèn)識的人中漢子人數(shù):',maleCount,'妹子人數(shù):',femaleCountdefjudgeGender(index):'''判斷傳入的某個位置的用戶的性別

參數(shù):int行

返回結(jié)果:字符串

'''gender = df['gender']ifgender[index] =='1':return'male'elifgender[index] =='2':return'female'else:return'unknown'if__name__=='__main__': remark()

把結(jié)果做成簡單的圖表(主要使用了百度的echarts作圖)(不得不說百度其他產(chǎn)品雖然不怎么樣,但是百度的echarts還是不錯的喲,他的官網(wǎng):http://echarts.baidu.com/)

使用地圖慧江蘇省好友分布,這個編碼我不知怎么回事,可能是瀏覽器問題,回頭我用其它瀏覽器查看一下。(地圖匯比較傻瓜:http://www.dituhui.com/)

最后再生成省份好友分布地圖

最后運用opencv的圖像識別進行人像識別,統(tǒng)計微信好友中用人像作為頭像的好友人數(shù)。OpenCV的全稱是:Open Source Computer Vision Library。OpenCV是一個基于BSD許可(開源)發(fā)行的跨平臺計算機視覺庫,可以運行在Linux、Windows和Mac OS操作系統(tǒng)上。它輕量級而且高效——由一系列 C 函數(shù)和少量 C++ 類構(gòu)成,同時提供了Python、Ruby、MATLAB等語言的接口,實現(xiàn)了圖像處理和計算機視覺方面的很多通用算法。

如果你對opencv不是很了解,你可以按照以下的鏈接進行學(xué)習(xí)。

你可以去它的官網(wǎng):http://opencv.org/ (需要有一定的英語知識)

國內(nèi)也有一些比較好的博客資源,比如以下兩個

【OpenCV入門指南】第一篇 安裝OpenCV 【OpenCV】入門教程

如下開始是對抓取的朋友頭像進行遍歷識別是否含有人臉,代碼如下。

#!/usr/bin/env python

'''

face detection using haar cascades

USAGE:

facedetect.py [--cascade <cascade_fn>] [--nested-cascade <cascade_fn>] [<video_source>]

'''

# Python 2/3 compatibility

from __future__ import print_function

import numpy as np

import cv2

# local modules

from video import create_capture

from common import clock, draw_str

def detect(img, cascade):

rects = cascade.detectMultiScale(img, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),

flags=cv2.CASCADE_SCALE_IMAGE)

if len(rects) == 0:

return []

rects[:,2:] += rects[:,:2]

return rects

def draw_rects(img, rects, color):

for x1, y1, x2, y2 in rects:

cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)

if __name__ == '__main__':

import sys, getopt

print(__doc__)

count = 0

for i in range(1,1192):

print(str(i))

args, video_src = getopt.getopt(sys.argv[1:], '', ['cascade=', 'nested-cascade='])

try:

video_src = video_src[0]

except:

video_src = 0

args = dict(args)

cascade_fn = args.get('--cascade', "../../data/haarcascades/haarcascade_frontalface_alt.xml")

nested_fn = args.get('--nested-cascade', "../../data/haarcascades/haarcascade_eye.xml")

cascade = cv2.CascadeClassifier(cascade_fn)

nested = cv2.CascadeClassifier(nested_fn)

cam = create_capture(video_src, fallback='synth:bg=../data/friend/friendImage/image'+str(i)+'.jpg:noise=0.05')

ret, img = cam.read()

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

gray = cv2.equalizeHist(gray)

rects = detect(gray, cascade)

vis = img.copy()

draw_rects(vis, rects, (0, 255, 0))

if not nested.empty():

if len(rects) == 0:

print('none')

else:

count = count + 1

print(str(count))

input()

執(zhí)行以上代碼統(tǒng)計出最后的結(jié)果

使用人像做頭像的好友:59 因此不使用人像的1133,看來使用人像的人還是很少的。

運行提取人像頭像的代碼最后提取出的頭像如下所示 ,不得不說Python的庫真是十分的有用。(因為涉及到隱私,所以這里不會展示過多的頭像)

最近仍然在研究簽名以及頭像的可用之處,也是歡迎大家一起學(xué)習(xí)交流。同時希望以上的內(nèi)容可以提升一下大家的學(xué)習(xí)興趣。關(guān)于微信好友的更多挖掘會不斷進行。

(1)、人像頭像與年齡之間的關(guān)系(由于微信沒有年齡,于是想通過知乎進行推算)

(2)、個性簽名與年齡性格之間的關(guān)系

(3)、微信號中所包含信息推算年齡層次,預(yù)測當(dāng)前微信號年齡

最后小編自己也是一個有著6年工作經(jīng)驗的工程師,關(guān)于python編程,自己有做材料的整合,一個完整的python編程學(xué)習(xí)路線,學(xué)習(xí)資料和工具。想要這些資料的可以關(guān)注小編,加入python學(xué)習(xí)交流Q群735967233。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容