爬取全國各地天氣情況

首先我們打開中國天氣網的首頁[http://www.weather.com.cn/textFC/hb.shtml]

image.png

首先右鍵檢查:


image.png

找到包含我們要爬取內容的標簽:

 conMidtab = soup.find('div', class_="conMidtab")
    tables = conMidtab.find_all('table')
image.png

然后在我們的每一個標簽當中找到我們需要的內容:

for table in tables:
        trs = table.find_all('tr')[2:]
image.png

接下來就是內容的提取:

for index, tr in enumerate(trs):
            # for tr in trs:
            tds = tr.find_all('td')
            city_td = tds[0]
            #print(city_td)
            if index == 0:
                city_td = tds[1]
            city = list(city_td.stripped_strings)[0]

            weather_td = tds[1]
            if index == 0:
                weather_td = tds[2]
            weather_td = list(weather_td.stripped_strings)[0]
            print({ "city" : city, "weather" : weather_td })


完整的代碼如下:

import requests
from bs4 import BeautifulSoup
def parse_page(url):
    headers = {
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"}
    response = requests.get(url, headers=headers)
    # print(response.content.decode('utf-8'))
    text = response.content.decode('utf-8')
    soup = BeautifulSoup(text, 'html5lib')
    # print(soup)
    conMidtab = soup.find('div', class_="conMidtab")
    tables = conMidtab.find_all('table')
    # print(tables)
    for table in tables:
        trs = table.find_all('tr')[2:]
        #print(trs)
        for index, tr in enumerate(trs):
            # for tr in trs:
            tds = tr.find_all('td')
            city_td = tds[0]
            #print(city_td)
            if index == 0:
                city_td = tds[1]
            city = list(city_td.stripped_strings)[0]

            weather_td = tds[1]
            if index == 0:
                weather_td = tds[2]
            weather_td = list(weather_td.stripped_strings)[0]
            print({ "city" : city, "weather" : weather_td })



def main():
    urls = {'http://www.weather.com.cn/textFC/hb.shtml',
            'http://www.weather.com.cn/textFC/hn.shtml',
            'http://www.weather.com.cn/textFC/db.shtml',
            'http://www.weather.com.cn/textFC/hd.shtml',
            'http://www.weather.com.cn/textFC/hz.shtml',
            'http://www.weather.com.cn/textFC/xb.shtml',
            'http://www.weather.com.cn/textFC/xn.shtml',
            'http://www.weather.com.cn/textFC/gat.shtml'}
    for url in urls:
        parse_page(url)

if __name__ == '__main__':
    main()

最后運行結果如下(部分結果以北京為例):
{'city': '北京', 'weather': '晴'}
{'city': '海淀', 'weather': '晴'}
{'city': '朝陽', 'weather': '晴'}
{'city': '順義', 'weather': '晴'}
{'city': '懷柔', 'weather': '晴'}
{'city': '通州', 'weather': '晴'}
{'city': '昌平', 'weather': '晴'}
{'city': '延慶', 'weather': '晴'}
{'city': '豐臺', 'weather': '晴'}
{'city': '石景山', 'weather': '晴'}
{'city': '大興', 'weather': '晴'}
{'city': '房山', 'weather': '晴'}
{'city': '密云', 'weather': '晴'}
{'city': '門頭溝', 'weather': '晴'}
{'city': '平谷', 'weather': '晴'}
{'city': '東城', 'weather': '晴'}
{'city': '西城', 'weather': '晴'}

?著作權歸作者所有,轉載或內容合作請聯系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容