亚洲一区二区三区操操,青青草原app

需求

開始是需要把省份的名稱，省份編碼經(jīng)緯度導入數(shù)據(jù)庫，為后面接口提供數(shù)據(jù)。
需要爬取的經(jīng)緯度地址：（因為開始就找到這個）

思路

先通過WebDriver把頁面爬取下來，然后觀察結構解析需要的表格部分，最后把爬取下來的數(shù)據(jù)用excel保存再導入數(shù)據(jù)庫

前期準備：

安裝Selenium WebDriver
pip install selenium
Selenium WebDriver提供了各種語言的編程接口，來進行Web自動化開發(fā)。
安裝完成后，運行python解釋器，執(zhí)行命令import selenium，如果沒有異常，則表示安裝成功了，如下所示

image.png
下載瀏覽器的驅動
chrom瀏覽器的web driver（chromedriver.exe），可以在下面網(wǎng)址訪問：
http://npm.taobao.org/mirrors/chromedriver/

firefox（火狐瀏覽器）的web driver （geckodriver.exe）在這里訪問：
https://github.com/mozilla/geckodriver/releases

其他瀏覽器驅動可以見下面列表:

Edge:https://developer.microsoft.com/en-us/micrsosft-edage/tools/webdriver

Safari:https://webkit.org/blog/6900/webdriver-support-in-safari-10/

下載對應版本：

image.png

下載BeautifulSoup
BeautifulSoup4是一個HTML/XML的解析器，主要的功能是解析和提取HTML/XML的數(shù)據(jù)。和lxml庫一樣。
BeautifulSoup4用來解析HTML比較簡單，API使用非常人性化，支持CSS選擇器，是Python標準庫中的HTML解析器，也支持lxml解析器。
pip install beautifulsoup4
下載openpyxl
OpenPyXl是一個Python的模塊可以用來處理excle表格
安裝：
直接 pip install openpyxl就可以

實現(xiàn)步驟

先引入需要模塊

from selenium import webdriver
from bs4 import BeautifulSoup
from openpyxl.workbook import Workbook
from openpyxl.writer.excel import ExcelWriter

指定chrom驅動頁面最大化

driver = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe") # 這是我的驅動地址自己改改
driver.maximize_window()

get 方法打開指定網(wǎng)址

driver.get(
    "https://blog.csdn.net/abcmaopao/article/details/79554904")
html = driver.execute_script("return document.documentElement.outerHTML")

通過beautifulSoup解析
使用BeautifulSoup類解析這段代碼，獲取一個BeautifulSoup的對象，然后按照標準格式輸出。

soup = BeautifulSoup(html, 'lxml')

獲取市級的excel表格

if(soup):
    # 創(chuàng)建工作簿獲取當前工作表sheet然后取個名字
    wb = Workbook()
    ws = wb.active
    ws.title = u'省份經(jīng)緯度'
  # list用來保存數(shù)據(jù)
    list=[]
  # 遍歷表的的每一行，然后把每一行的每一列變成一個數(shù)組
  # 再把這個數(shù)組壓入list中
    for tr in soup.find_all('tr'):
        col = []
        for td in tr.find_all('td'):
            col.append(td.get_text())
            # ws.cell(row=i, column=j).value = td.get_text()
        list.append(col)
    print(list)
# 輸出看看然后導入excel表格
    i = 0
    for r in list:
        if(i==0):
            j = 0
            for c in r:
                ws.cell(row=i+1, column=j+1).value = c
                print(i,j,c)
                j += 1
            i += 1
        elif(i>0 and int(r[1])%10000!=0):
            j = 0
            for c in r:
                ws.cell(row=i+1, column=j+1).value = c
                print(i,j,c)
                j += 1
            i += 1
# 保存
    wb.save('市級.xlsx')
    print("保存成功！")
    driver.quit()

image.png

保存出來的表結構

完整代碼

from selenium import webdriver
import time
from bs4 import BeautifulSoup
from openpyxl.workbook import Workbook
from openpyxl.writer.excel import ExcelWriter


driver = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe")
driver.maximize_window()

driver.get(
    "https://blog.csdn.net/abcmaopao/article/details/79554904")
html = driver.execute_script("return document.documentElement.outerHTML")

soup = BeautifulSoup(html, 'lxml')
if(soup):
    wb = Workbook()
    ws = wb.active
    ws.title = u'省份經(jīng)緯度'
    list=[]
    for tr in soup.find_all('tr'):
        col = []
        for td in tr.find_all('td'):
            col.append(td.get_text())
            # ws.cell(row=i, column=j).value = td.get_text()
        list.append(col)
    print(list)
    i = 0
    for r in list:
        if(i==0):
            j = 0
            for c in r:
                ws.cell(row=i+1, column=j+1).value = c
                print(i,j,c)
                j += 1
            i += 1
        elif(i>0 and int(r[1])%10000!=0):
            j = 0
            for c in r:
                ws.cell(row=i+1, column=j+1).value = c
                print(i,j,c)
                j += 1
            i += 1

        # j += 1
        # i+=1

    wb.save('市級.xlsx')
    print("保存成功！")
    driver.quit()

抽取全國各省市的DataV.GeoAtlas json地圖數(shù)據(jù)

然后現(xiàn)在需要把[全國地圖json api] (http://datav.aliyun.com/tools/atlas/#&lat=30.316551722910077&lng=104.20306438764393&zoom=3.5) 下載到本地，但這次要省級的
一樣的思路把省級的行政編碼爬取下來

image.png

也就是把elif(i>0 and int(r[1])%10000!=0):改成elif(i>0 and int(r[1])%10000==0):

然后這次變成讀取每一個省份的編碼，動態(tài)爬取json保存

完整代碼

from openpyxl.workbook import Workbook
from openpyxl  import load_workbook

def getJson(code):
    path = "https://geo.datav.aliyun.com/areas/bound/geojson?code="
    driver.get(path + code+'_full')
    html = driver.execute_script("return document.documentElement.outerHTML")
    soup = BeautifulSoup(html, 'lxml')
    print(soup.get_text())
    if (soup.get_text()):
        f = open(code + '_full.json', 'w',encoding='utf-8')
        f.write(soup.get_text())
        f.close()
        print("保存成功" + code)

driver = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe")
driver.maximize_window()
code = "100000"
getJson(code)
wb = load_workbook('test1.xlsx')["省份經(jīng)緯度"]
print(wb.rows)
list=[]
i = 0
for row in wb.rows:
    if(i>0):
        chil = []
        print(row[1].value)
        code = row[1].value
        getJson(code)
    i += 1

driver.quit()

參考

openpyxl 使用
 Selenium WebDriver
beautifulSoup
阿里地圖api

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

【python爬蟲demo存檔】爬取全國省份編碼和DataV.GeoAtlas抽取全國GeoJSON數(shù)據(jù)

【python爬蟲demo存檔】爬取全國省份編碼和DataV.GeoAtlas抽取全國GeoJSON數(shù)據(jù)

需求

思路

前期準備：

實現(xiàn)步驟

完整代碼

抽取全國各省市的DataV.GeoAtlas json地圖數(shù)據(jù)

完整代碼

參考

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

【python爬蟲demo存檔】爬取全國省份編碼和DataV.GeoAtlas抽取全國GeoJSON數(shù)據(jù)

需求

思路

前期準備：

實現(xiàn)步驟

完整代碼

抽取全國各省市的DataV.GeoAtlas json地圖數(shù)據(jù)

完整代碼

參考

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av