通過(guò)高德地圖抓起所有城市的銀行信息

目標(biāo):通過(guò)高德地圖的搜索接口,抓取每個(gè)城市的所有銀行的分行信息

思路:1. 在本地mysql中存儲(chǔ)有全國(guó)各城市名稱、編碼

  1. 將城市編碼讀取到數(shù)組中

  2. 按照數(shù)據(jù)讀取每個(gè)編碼,組拼URL,通過(guò)POST請(qǐng)求訪問(wèn)接口

  3. 獲取xml后解析出我們需要的數(shù)據(jù),插入到mysql中

第一步,定義訪問(wèn)接口的基本參數(shù)

file_name='result.txt'          # write result to this file

url_header='http://restapi.amap.com/v3/place/text?&keyword=&types=160100&'

url_end='&citylimit=true&&output=xml&offset=20&page=1&key=c787ae8e49424a657127c3ed64cfe053&extensions=base'

url_amap='city='

each_page_rec=20          # results that displays in one page

xml_file='tmp.xml'           # xml filen name

第二步,建立本地?cái)?shù)據(jù)庫(kù)訪問(wèn)請(qǐng)求,獲取數(shù)據(jù)庫(kù)中的所有城市編碼
首先,在本地mysql中建立一張region表,可以從網(wǎng)上down一份全國(guó)各地省市區(qū)編碼表,結(jié)構(gòu)如下圖:

省市區(qū)編碼表.png

為了方便大家,我將表的結(jié)構(gòu)及數(shù)據(jù)導(dǎo)出為sql語(yǔ)句,直接復(fù)制到mysql中執(zhí)行即可,鏈接如下 http://www.itdecent.cn/p/0b9b0e3cda5f

def getallcity():
    cityarr = []
    connection = pymysql.connect(host='127.0.0.1', user='root', passwd='123456', port=3306,
                                 db='icoachu', charset="utf8")
    cursor = connection.cursor()
    sql = "select * from region where parent_id in (select id from region where parent_id=0)"
    try:
        cursor.execute(sql)
        rows = cursor.fetchall()
        for row in rows:
            cityarr.append(row[0])
        return cityarr
    finally:
        cursor.close()
        connection.close()
    return cityarr

關(guān)于如何訪問(wèn)本地mysql的,比較簡(jiǎn)單,此處不做說(shuō)明,需要強(qiáng)調(diào)的是在try 語(yǔ)句中,一定要在finally中關(guān)閉cursor及connection。

第三步,通過(guò)接口訪問(wèn)獲取html數(shù)據(jù),并將數(shù)據(jù)寫入到文件中

# get html by url and save the data to xml file
def gethtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    # print(html)

    try:
        # open xml file and save data to it
        with open(xml_file, 'wb+') as xml_file_handle:
            xml_file_handle.write(html)
    except IOError as err:
        print
        "IO error: " + str(err)
        return -1
    return 0

第四步,獲取xml格式的數(shù)據(jù)之后,解析相關(guān)字段,并插入到mysql中

# phrase data from xml
def parsexml():
    total_rec = 1  # record number

    # open xml file and get data record
    try:
        with open(file_name, 'a') as file_handle:
            dom = minidom.parse(xml_file)
            root = dom.getElementsByTagName("response")  # The function getElementsByTagName returns NodeList.

            for node in root:
                total_rec = node.getElementsByTagName('count')[0].childNodes[0].nodeValue

                pois = node.getElementsByTagName("pois")
                for poi in pois[0].getElementsByTagName('poi'):
                    branch_id = poi.getElementsByTagName("id")[0].childNodes[0].nodeValue
                    branch_name = poi.getElementsByTagName("name")[0].childNodes[0].nodeValue
                    branch_type = poi.getElementsByTagName("type")[0].childNodes[0].nodeValue
                    bank_type = poi.getElementsByTagName("typecode")[0].childNodes[0].nodeValue
                    pname = poi.getElementsByTagName("pname")[0].childNodes[0].nodeValue
                    cityname = poi.getElementsByTagName("cityname")[0].childNodes[0].nodeValue
                    aname = poi.getElementsByTagName("adname")[0].childNodes[0].nodeValue
                    # address = poi.getElementsByTagName("address")[0].childNodes[0].nodeValue
                    # biz_type = poi.getElementsByTagName("biz_type")[0].childNodes[0].nodeValue
                    # tel = poi.getElementsByTagName("tel")[0].childNodes[0].nodeValue
                    # distance = poi.getElementsByTagName("distance")[0].childNodes[0].nodeValue
                    arr = branch_type.split(';')
                    bank_name = arr[-1]
                    sql = "insert into bankinfo(branch_id, branch_name, branch_type, bank_name, bank_type, pname, cityname, aname) values('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s')" % (
                    branch_id, branch_name.replace('(', '').replace(')', ''), branch_type, bank_name, bank_type, pname, cityname, aname)

                    connection = pymysql.connect(host='127.0.0.1', user='root', passwd='123456', port=3306,
                                                 db='icoachu', charset="utf8")
                    cursor = connection.cursor()
                    try:
                        print(sql)
                        cursor.execute(sql)
                        connection.commit()
                        if cursor.rowcount != 1:
                            raise Exception("數(shù)據(jù)插入失敗%s", sql)
                    finally:
                        connection.close()
                        cursor.close()

    except IOError as err:
        print
        "IO error: " + str(err)

    return total_rec

第五步,在主函數(shù)中實(shí)現(xiàn)處理相關(guān)調(diào)用

if __name__ == '__main__':
    cityarr = getallcity()
    for cityId in cityarr:
        url = r'%scity=%s%s' % (url_header, cityId, url_end)
        if gethtml(url) == 0:
            total_record_str = parsexml()
            total_record = int(str(total_record_str))
            if (total_record % each_page_rec) != 0:
                page_number = total_record / each_page_rec + 2
            else:
                page_number = total_record / each_page_rec + 1

            for each_page in frange(2, float(page_number)):
                print
                'parsing page ' + str(each_page) + ' ... ...'
                url = url.replace('page=' + str(each_page - 1), 'page=' + str(each_page))
                print(url)
                gethtml(url)
                parsexml()
        else:
            print
            'error: fail to get xml from amap'

完整的代碼如下

# coding:utf-8


# 目標(biāo):通過(guò)高德地圖的搜索接口,抓取每個(gè)城市的所有銀行的分行信息
# 思路:1. 在本地mysql中存儲(chǔ)有全國(guó)各城市名稱、編碼
#      2. 將城市編碼讀取到數(shù)組中
#      3. 按照數(shù)據(jù)讀取每個(gè)編碼,組拼URL,通過(guò)POST請(qǐng)求訪問(wèn)接口
#      4. 獲取xml后解析出我們需要的數(shù)據(jù),插入到mysql中


import urllib
import xml.dom.minidom as minidom
import string
import urllib.request
import pymysql

file_name = 'result.txt'  # write result to this file
url_header = 'http://restapi.amap.com/v3/place/text?&keyword=&types=160100&'
url_end = '&citylimit=true&&output=xml&offset=20&page=1&key=c787ae8e49424a657127c3ed64cfe053&extensions=base'
url_amap = 'city='
each_page_rec = 20  # results that displays in one page
xml_file = 'tmp.xml'  # xml filen name


# get html by url and save the data to xml file
def gethtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    # print(html)

    try:
        # open xml file and save data to it
        with open(xml_file, 'wb+') as xml_file_handle:
            xml_file_handle.write(html)
    except IOError as err:
        print
        "IO error: " + str(err)
        return -1
    return 0


# phrase data from xml
def parsexml():
    total_rec = 1  # record number

    # open xml file and get data record
    try:
        with open(file_name, 'a') as file_handle:
            dom = minidom.parse(xml_file)
            root = dom.getElementsByTagName("response")  # The function getElementsByTagName returns NodeList.

            for node in root:
                total_rec = node.getElementsByTagName('count')[0].childNodes[0].nodeValue

                pois = node.getElementsByTagName("pois")
                for poi in pois[0].getElementsByTagName('poi'):
                    branch_id = poi.getElementsByTagName("id")[0].childNodes[0].nodeValue
                    branch_name = poi.getElementsByTagName("name")[0].childNodes[0].nodeValue
                    branch_type = poi.getElementsByTagName("type")[0].childNodes[0].nodeValue
                    bank_type = poi.getElementsByTagName("typecode")[0].childNodes[0].nodeValue
                    pname = poi.getElementsByTagName("pname")[0].childNodes[0].nodeValue
                    cityname = poi.getElementsByTagName("cityname")[0].childNodes[0].nodeValue
                    aname = poi.getElementsByTagName("adname")[0].childNodes[0].nodeValue
                    # address = poi.getElementsByTagName("address")[0].childNodes[0].nodeValue
                    # biz_type = poi.getElementsByTagName("biz_type")[0].childNodes[0].nodeValue
                    # tel = poi.getElementsByTagName("tel")[0].childNodes[0].nodeValue
                    # distance = poi.getElementsByTagName("distance")[0].childNodes[0].nodeValue
                    arr = branch_type.split(';')
                    bank_name = arr[-1]
                    sql = "insert into bankinfo(branch_id, branch_name, branch_type, bank_name, bank_type, pname, cityname, aname) values('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s')" % (
                    branch_id, branch_name.replace('(', '').replace(')', ''), branch_type, bank_name, bank_type, pname, cityname, aname)

                    connection = pymysql.connect(host='127.0.0.1', user='root', passwd='123456', port=3306,
                                                 db='icoachu', charset="utf8")
                    cursor = connection.cursor()
                    try:
                        print(sql)
                        cursor.execute(sql)
                        connection.commit()
                        if cursor.rowcount != 1:
                            raise Exception("數(shù)據(jù)插入失敗%s", sql)
                    finally:
                        connection.close()
                        cursor.close()

    except IOError as err:
        print
        "IO error: " + str(err)

    return total_rec


def frange(start, stop, step=1):
    i = start
    while i < stop:
        yield i
        i += step


def getallcity():
    cityarr = []
    connection = pymysql.connect(host='127.0.0.1', user='root', passwd='123456', port=3306,
                                 db='icoachu', charset="utf8")
    cursor = connection.cursor()
    sql = "select * from region where parent_id in (select id from region where parent_id=0)"
    try:
        cursor.execute(sql)
        rows = cursor.fetchall()
        for row in rows:
            cityarr.append(row[0])
        return cityarr
    finally:
        cursor.close()
        connection.close()
    return cityarr


if __name__ == '__main__':
    cityarr = getallcity()
    for cityId in cityarr:
        url = r'%scity=%s%s' % (url_header, cityId, url_end)
        if gethtml(url) == 0:
            total_record_str = parsexml()
            total_record = int(str(total_record_str))
            if (total_record % each_page_rec) != 0:
                page_number = total_record / each_page_rec + 2
            else:
                page_number = total_record / each_page_rec + 1

            for each_page in frange(2, float(page_number)):
                print
                'parsing page ' + str(each_page) + ' ... ...'
                url = url.replace('page=' + str(each_page - 1), 'page=' + str(each_page))
                print(url)
                gethtml(url)
                parsexml()
        else:
            print
            'error: fail to get xml from amap'

數(shù)據(jù)庫(kù)中數(shù)據(jù)如下:

查詢結(jié)果.png
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • Spring Cloud為開(kāi)發(fā)人員提供了快速構(gòu)建分布式系統(tǒng)中一些常見(jiàn)模式的工具(例如配置管理,服務(wù)發(fā)現(xiàn),斷路器,智...
    卡卡羅2017閱讀 136,694評(píng)論 19 139
  • 1. Java基礎(chǔ)部分 基礎(chǔ)部分的順序:基本語(yǔ)法,類相關(guān)的語(yǔ)法,內(nèi)部類的語(yǔ)法,繼承相關(guān)的語(yǔ)法,異常的語(yǔ)法,線程的語(yǔ)...
    子非魚(yú)_t_閱讀 34,834評(píng)論 18 399
  • 國(guó)家電網(wǎng)公司企業(yè)標(biāo)準(zhǔn)(Q/GDW)- 面向?qū)ο蟮挠秒娦畔?shù)據(jù)交換協(xié)議 - 報(bào)批稿:20170802 前言: 排版 ...
    庭說(shuō)閱讀 12,516評(píng)論 6 13
  • 分手很久,我還是喜歡你,即使很喜歡也知道我們?cè)僖膊豢赡茉谝黄鹆?,不久的將?lái),你會(huì)娶別的女孩,我會(huì)嫁給別的男生,此生...
    Mermaid66閱讀 251評(píng)論 0 0
  • 貪婪,不知足的表現(xiàn)形式是完美主義。我們要做到有舍才有得,先定一個(gè)小目標(biāo)。 耶穌于是對(duì)眾人說(shuō):“你們要謹(jǐn)慎自守,免去...
    Charging99閱讀 182評(píng)論 0 0

友情鏈接更多精彩內(nèi)容