在线播放国产一区,亚洲情色在线精品影院

Node.js Request+Cheerio實(shí)現(xiàn)一個(gè)小爬蟲-基礎(chǔ)功能實(shí)現(xiàn)1：內(nèi)容抓取
 Node.js Request+Cheerio實(shí)現(xiàn)一個(gè)小爬蟲-基礎(chǔ)功能實(shí)現(xiàn)2：文件寫入
 Node.js Request+Cheerio實(shí)現(xiàn)一個(gè)小爬蟲-基礎(chǔ)功能實(shí)現(xiàn)3：流程控制及并發(fā)控制
 Node.js Request+Cheerio實(shí)現(xiàn)一個(gè)小爬蟲-番外篇：代理設(shè)置

之前幾篇介紹了簡單爬蟲的實(shí)現(xiàn)方法。在防止被目標(biāo)網(wǎng)站403方法中，除了控制并發(fā)之外，最重要的就是使用代理。

之前使用的Request模塊中，也是有Proxy模塊的。
不過由于我英文太爛，官方文檔中關(guān)于Proxy的使用沒怎么看懂。在網(wǎng)上找了一下，也只有這一篇如何用nodejs設(shè)置proxy進(jìn)行https請求？.

在做了相同的修改之后，程序并沒有順利抓取到網(wǎng)站。不知道是為什么，如果有知道的同學(xué)，還請指點(diǎn)一下。


    var proxy = 'http://114.215.241.176:8080';
    var option = {
        url: url,
        proxy: proxy,
        headers: {
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.6',
            'Host': 'www.dianping.com',
            'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Mobile Safari/537.36',
            'Cache-Control': 'max-age=0',
            'Connection': 'keep-alive'
        }
    };

    request(url, option, function(error, response, body) {
        console.info(response.statusCode);  // 在這里就會(huì)報(bào)錯(cuò)，因?yàn)閞equest請求就沒有建立
        if (!error && response.statusCode == 200) {
    ......省略后續(xù)代碼

不過，雖然Request模塊沒能搞定，我們還是可以使用原生或者借助其他模塊來實(shí)現(xiàn)通過代理來進(jìn)行抓取。

原生方法
首先就說一下Node.js自帶的使用代理的方法。這個(gè)方法在http/https模塊中。
利用下面代碼就可以實(shí)現(xiàn)。


var http = require('http'); // 使用http模塊，也可以換成https模塊
var opt = {
    host: '58.246.194.70', // 這里是代理服務(wù)器的地址
    port: '808', // 這里是代理服務(wù)器的端口號
    method: 'GET', // 這里是發(fā)送的方法
    path: 'url', // 這里是訪問的路徑
    headers: {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.6',
        'Host': 'www.dianping.com',
        'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Mobile Safari/537.36',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive'
    }
};

var body = '';
var req = http.request(opt, function(res) {
    console.log("Got response: " + res.statusCode);
    res.on('data', function(d) {
        body += d;
    }).on('end', function() {
        //console.log(res);
        console.info('============');
        
        console.log(body)
    });

}).on('error', function(e) {
    console.log("Got error: " + e.message);
})

req.end();

使用SuperAgent以及superagent-proxy模塊
上面是使用了原生方法的例子。當(dāng)然，為了使用方便以及加快開發(fā)的速度，我們就會(huì)引入模塊。SuperAgent也是一個(gè)可以封裝好的http模塊，功能和Request模塊也差不多。如果要使用代理模塊的還，還需要額外的拓展模塊SuperAgent-Proxy。

SuperAgent官網(wǎng)地址
 SuperAgent-proxy官網(wǎng)地址
安裝方法就不介紹了，基本玩node的同學(xué)應(yīng)該都清楚。

那么就來看看使用SuperAgent的代碼。

const superagent = require('superagent');  // 引入SuperAgent
require('superagent-proxy')(superagent);  // 引入SuperAgent-proxy

var proxy = 'http://114.215.241.176:8080'; // 設(shè)置代理

var header = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.6',
    'Host': 'www.dianping.com',
    'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Mobile Safari/537.36',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive'
};

superagent  // 發(fā)起請求
    .get('目標(biāo)URL')
    .set('header', header)
    .proxy(proxy)
    .end(onresponse);

// 對返回的response進(jìn)行處理
function onresponse(err, res) {
    if (err) {
        console.log(err);
    } else {
        console.log(res.status, res.headers);
        //console.log(res.body);
    }
}

上面就是通過代理來抓取頁面的方法。如果和并發(fā)控制結(jié)合在一起的話能更有效的防止403吧。
感覺代理還是一個(gè)挺難搞的東西（當(dāng)然也可能是因?yàn)槲矣⑽牟粔?，看不懂文檔的關(guān)系T T）
不過，只是簡單的抓取頁面的話應(yīng)該也不一定會(huì)用上代理。所以這一篇就作為這一個(gè)Node爬蟲的番外了。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Node.js Request+Cheerio實(shí)現(xiàn)一個(gè)小爬蟲-番外篇：代理設(shè)置

Node.js Request+Cheerio實(shí)現(xiàn)一個(gè)小爬蟲-番外篇：代理設(shè)置

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Node.js Request+Cheerio實(shí)現(xiàn)一個(gè)小爬蟲-番外篇：代理設(shè)置

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av