python3.6爬取智聯(lián)招聘信息(解決動(dòng)態(tài)加載)

由于工作需要,爬取智聯(lián)招聘的招聘信息。

一、了解。
image.png

由于智聯(lián)已經(jīng)不用登錄后才能訪問(wèn),所以可以在請(qǐng)求頭中去掉cookie信息也能訪問(wèn)。但是智聯(lián)是動(dòng)態(tài)加載的,所以在控制臺(tái)中直接找到


image.png

上面信息獲取到url,直接利用url打開(kāi)訪問(wèn)json數(shù)據(jù)
在此之前要構(gòu)造請(qǐng)求頭
說(shuō)明一下url的組成
kw 搜索內(nèi)容
cityId 城市ID
kt 不知道為啥一定要為3,其他的關(guān)聯(lián)度有問(wèn)題。。
其他的無(wú)關(guān)緊要

# 根據(jù)第一頁(yè)的URL,抓取“python”崗位的信息
url = r'https://fe-api.zhaopin.com/c/i/sou?pageSize=60&cityId=763&workExperience=-1&education=-1&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3&lastUrlQuery=%7B%22jl%22:%22489%22,%22kw%22:%22%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E5%B8%88%22,%22kt%22:%223%22%7D&at=9c5682b1a4f54de89c899fb7efc7e359&rt=54eaf1be1b8845c089439d53365ea5dd&_v=0.84300214&x-zp-page-request-id=280f6d80d733447fbebafab7b8158873-1541403039080-617179'
# 構(gòu)造請(qǐng)求的頭信息,防止反爬蟲(chóng)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'}
二、爬取

利用requests.get函數(shù)發(fā)送請(qǐng)求,基于response返回json數(shù)據(jù)
具體的匹配規(guī)則如下代碼

# 利用for循環(huán),生成規(guī)律的鏈接,并對(duì)這些鏈接進(jìn)行請(qǐng)求的發(fā)送和解析內(nèi)容
for i in range(0,20001,60):
    url ='https://fe-api.zhaopin.com/c/i/sou?start='+str(i)+r'&pageSize=60&cityId=763&workExperience=-1&education=-1&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3&lastUrlQuery=%7B%22p%22:5,%22jl%22:%22489%22,%22kw%22:%22%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E5%B8%88%22,%22kt%22:%223%22%7D&at=17a95e7000264c3898168b11c8f17193&rt=57a342d946134b66a264e18fc60a17c6&_v=0.02365098&x-zp-page-request-id=a3f1b317599f46338d56e5d080a05223-1541300804515-144155'
    response = requests.get(url, headers = headers)
    print('Down Loading:','https://fe-api.zhaopin.com/c/i/sou?start='+str(i)+'&pageSize=60','......')
    name = 'python'
    company = [i['company']['name'] for i in response.json()['data']['results']]
    size = [i['company']['size']['name'] for i in response.json()['data']['results']]
    type = [i['company']['type']['name'] for i in response.json()['data']['results']]
    positionURL = [i['positionURL'] for i in response.json()['data']['results']]
    workingExp = [i['workingExp']['name'] for i in response.json()['data']['results']]
    eduLevel = [i['eduLevel']['name'] for i in response.json()['data']['results']]
    salary = [i['salary'] for i in response.json()['data']['results']]
    jobName = [i['jobName'] for i in response.json()['data']['results']]
    welfare = [i['welfare'] for i in response.json()['data']['results']]
    city = [i['city']['items'][0]['name'] for i in response.json()['data']['results']]
    createDate = [i['createDate']for i in response.json()['data']['results']]
    jobs.append(pd.DataFrame({'name':name,'company':company,'size':size,'type':type,'positionURL':positionURL,
                              'workingExp':workingExp,'eduLevel':eduLevel,'salary':salary,
                              'jobName':jobName,'welfare':welfare,'city':city,'createDate':createDate}))

將數(shù)據(jù)導(dǎo)出到Excel文件中,也可以存到數(shù)據(jù)庫(kù)
拼接所有頁(yè)碼下的招聘信息
jobs2 = pd.concat(jobs)

將數(shù)據(jù)導(dǎo)出到Excel文件中
jobs2.to_excel(r'G:\python.xlsx', index = False)

完成,上面的其實(shí)可以?xún)?yōu)化,參考了別人的,由于老板要的緊,就這樣寫(xiě)了。有空改一下,寫(xiě)好一點(diǎn)

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容