由于是第一次寫作可能代碼風(fēng)格比較丑而且語言表達(dá)不好,各位看官請見諒.
下面進(jìn)入正題臨時(shí)接到一個(gè)任務(wù)爬取企查查的網(wǎng)絡(luò)熱詞,并且要定時(shí)更新. 下面是要爬取的網(wǎng)頁內(nèi)容.

image
之前有寫過這個(gè)頁面的解析代碼,但是事件過的太久已經(jīng)找不到了.有點(diǎn)難受,不過這個(gè)頁面沒有反爬.話不多說直接上代碼
url ='https://www.qichacha.com/cms_topsearch'
ht = requests.get(url=url,headers=headers)
et = etree.HTML(ht.text)
uls = et.xpath('//ul[@class="list-group topsearch-list"][1]/a')
# jinri熱搜
for ulin uls[:51]:
type_ ='今日熱搜'
search_num = ul.xpath('./span[last()]/text()')[0]
company = ul.xpath('./span[last()-1]/text()')[0]
company_url ='https://www.qichacha.com' + ul.xpath('./@href')[0]
date =str(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
print(company, search_num, company_url, date)
cursor = conn.cursor()
sql ='insert into top_search(type_,company,search_num,company_url,sj_time) values(%r,%r,%r,%r,%r)' % (
type_, company, search_num, company_url, date)
cursor.execute(sql)
conn.commit()
uls = et.xpath('//ul[@class="list-group topsearch-list"][1]/a')
# 一周熱搜
for ulin uls[51:101]:
type_ ='一周熱搜'
search_num = ul.xpath('./span[last()]/text()')[0]
company = ul.xpath('./span[last()-1]/text()')[0]
company_url ='https://www.qichacha.com' + ul.xpath('./@href')[0]
date =str(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
print(company, search_num, company_url, date)
cursor = conn.cursor()
sql ='insert into top_search(type_,company,search_num,company_url,sj_time) values(%r,%r,%r,%r,%r)' % (
type_, company, search_num, company_url, date)
cursor.execute(sql)
conn.commit()
uls = et.xpath('//ul[@class="list-group topsearch-list"][1]/a')
# 一月熱搜
for ulin uls[101:]:
type_ ='一月熱搜'
search_num = ul.xpath('./span[last()]/text()')[0]
company = ul.xpath('./span[last()-1]/text()')[0]
company_url ='https://www.qichacha.com' + ul.xpath('./@href')[0]
date =str(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
print(company, search_num, company_url, date)
cursor = conn.cursor()
sql ='insert into top_search(type_,company,search_num,company_url,sj_time) values(%r,%r,%r,%r,%r)' % (
type_, company, search_num, company_url, date)
cursor.execute(sql)
conn.commit()
頁面解析比較簡單,畢竟新手熟悉下流程
然后就是改成定時(shí)任務(wù),我用的是python內(nèi)置庫 schedule
schedule.every(1).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).days.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
每隔1分鐘執(zhí)行一次任務(wù)
每隔一小時(shí)執(zhí)行一次任務(wù)
每天的10:30執(zhí)行一次任務(wù)
每隔5到10天執(zhí)行一次任務(wù)
每周一的這個(gè)時(shí)候執(zhí)行一次任務(wù)
每周三13:15執(zhí)行一次任務(wù)
def seach():
schedule.every(20).seconds.do(qcc_reci)
while True:
schedule.run_pending()
time.sleep(1)
seach()
run_pending:運(yùn)行所有可以運(yùn)行的任務(wù)
第一次寫簡書,很多格式不會用.....