AV女神色,日日骚一区二区三区,Av在线操

Beautiful Soup簡(jiǎn)介

BeautifulSoup是python的一個(gè)庫(kù)，最主要的功能是從網(wǎng)頁(yè)抓取數(shù)據(jù)(on quick-turnaround screen scraping projects)。

官方描述的3個(gè)主要特征為：

Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. It doesn’t take much code to write an application.

Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don’t have to think about encodings, unless the document doesn’t specifyan encoding and Beautiful Soup can’t autodetect one. Then you just have to specify the original encoding.

翻譯過(guò)來(lái)就是：

Beautiful Soup提供一些簡(jiǎn)單的、python式的函數(shù)用來(lái)處理導(dǎo)航、搜索、修改分析樹(shù)等功能。它是一個(gè)工具箱，通過(guò)解析文檔為用戶(hù)提供需要抓取的數(shù)據(jù)，因?yàn)楹?jiǎn)單，所以不需要多少代碼就可以寫(xiě)出一個(gè)完整的應(yīng)用程序。

Beautiful Soup自動(dòng)將輸入文檔轉(zhuǎn)換為Unicode編碼，輸出文檔轉(zhuǎn)換為utf-8編碼。你不需要考慮編碼方式，除非文檔沒(méi)有指定一個(gè)編碼方式，這時(shí)，Beautiful Soup就不能自動(dòng)識(shí)別編碼方式了。然后，你僅僅需要說(shuō)明一下原始編碼方式就可以了。

Beautiful Soup已成為和lxml、html6lib一樣出色的python解釋器，為用戶(hù)靈活地提供不同的解析策略或強(qiáng)勁的速度。
Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.

Beautifulsoup安裝（windows）

1、到http://www.crummy.com/software/BeautifulSoup/ 網(wǎng)站上下載，最新版本是4.3.2。
2、下載完成之后需要解壓縮，假設(shè)放到C:/python27/下。
3、"運(yùn)行cmd"---"cd c:\python27\beautifulsoup4-4.3.2",切換到c:\python27\beautifulsoup4-4.3.2目錄下(根據(jù)自己解壓后所放的目錄和自己的版本號(hào)修改)。
4、運(yùn)行命令：
- setup.py build
- setup.py install
5、python命令下 import bs4，沒(méi)報(bào)錯(cuò)說(shuō)明安裝成功。

新版本的beautifulsoup官方已經(jīng)將beautifulsoup改名為bs4了。所以不能再使用這樣的語(yǔ)句：

from beautifulsoup import beautifulsoup

而應(yīng)該是:

from bs4 import beautifulsoup

【坑爹??！】，因?yàn)檫@個(gè)折騰了一個(gè)多小時(shí)。

一個(gè)查詢(xún)某一個(gè) NS 服務(wù)器的所有域名的爬蟲(chóng)

原文

通過(guò)搜索可找到 sitedossier.com 這個(gè)網(wǎng)站可以提供域名服務(wù)器的信息。那么就要寫(xiě)個(gè)爬蟲(chóng)來(lái)抓查詢(xún)結(jié)果了

import urllib2
import re
import argparse
from bs4 import BeautifulSoup 

class Crawler(object):

    def __init__(self, args):
        self.ns = args.ns

    def _getSoup(self, url):
        req = urllib2.Request(
            url = url,
            headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4'}
        )
        content = urllib2.urlopen(req).read()
        return BeautifulSoup(content)

    def _isLastPage(self, soup):
        if not soup.find(text=re.compile("End of list.")):
            return False
        else:
            return True

    def _getItem(self, soup):
        itemList = soup.findAll('li')
        for item in itemList:
            print item.find('a').string

    def _getNextPage(self, soup):
        nextUrl = 'http://www.sitedossier.com' + soup.ol.nextSibling.nextSibling.get('href')
        self.soup = self._getSoup(nextUrl)

    def start(self):
        url = 'http://www.sitedossier.com/nameserver/' + self.ns
        self.soup = self._getSoup(url)
        self._getItem(self.soup)
        while not self._isLastPage(self.soup):
            self._getNextPage(self.soup)
            self._getItem(self.soup)

def main():
    parser = argparse.ArgumentParser(description='A crawler for sitedossier.com') 
    parser.add_argument('-ns', type=str, required=True, metavar='NAMESERVER', dest='ns', help='Specify the nameserver')
    args = parser.parse_args()

    crawler = Crawler(args)
    crawler.start()

if __name__ == '__main__':
    main()

用法：

$ python crawler_ns.py -ns dns.baidu.com

保存結(jié)果到文件：

$ python crawler_ns.py -ns dns.baidu.com >> result.txt

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

BeautifulSoup簡(jiǎn)介與安裝

BeautifulSoup簡(jiǎn)介與安裝

Beautiful Soup簡(jiǎn)介

Beautifulsoup安裝（windows）

一個(gè)查詢(xún)某一個(gè) NS 服務(wù)器的所有域名的爬蟲(chóng)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

BeautifulSoup簡(jiǎn)介與安裝

Beautiful Soup簡(jiǎn)介

Beautifulsoup安裝（windows）

一個(gè)查詢(xún)某一個(gè) NS 服務(wù)器的所有域名的爬蟲(chóng)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av