爬蟲實(shí)戰(zhàn)0——CNVD漏洞庫(kù)共享漏洞爬?。╦s繞過反爬)

寫論文需要用到CNVD漏洞庫(kù)的數(shù)據(jù),然而,該頁(yè)面有反爬機(jī)制,無(wú)法抓取全部數(shù)據(jù),因此,使用js繞過反爬,實(shí)現(xiàn)效果如下:

CNVD共享漏洞爬蟲效果

可以直接到GitHub查看完整代碼,歡迎留言點(diǎn)贊打賞提issue點(diǎn)star

環(huán)境

  • windows 10
  • Chrome瀏覽器
  • Sublime Text 3代碼編輯器

前期準(zhǔn)備

注冊(cè)該網(wǎng)頁(yè)賬號(hào)并登陸即可

需求分析

  1. 首先,我們需要該漏洞庫(kù)的全部漏洞數(shù)據(jù),但是,使用python書寫爬蟲會(huì)被反爬機(jī)制識(shí)別到,從而無(wú)法自動(dòng)大量下載數(shù)據(jù)

  2. 這里,發(fā)現(xiàn)該網(wǎng)頁(yè)有共享的xml數(shù)據(jù)

    共享漏洞

    因此,我們考慮從這里做文章

  3. 然鵝,一個(gè)個(gè)點(diǎn)擊下載也十分耗時(shí),因此,考慮使用js腳本進(jìn)行下載

  4. 這里有兩個(gè)思路:

  • 一是分別控制腳本挨個(gè)點(diǎn)擊鏈接并翻頁(yè)
  • 二是直接請(qǐng)求每個(gè)鏈接獲得數(shù)據(jù)
  1. 這里采用第二種思路,通過查看鏈接發(fā)現(xiàn)其均為https://www.cnvd.org.cn/shareData/download/ + 一個(gè)數(shù)字的形式,因此,直接使用循環(huán)遍歷請(qǐng)求即可

代碼編寫

確定了思路之后,直接開始編寫代碼,但是遇到了一個(gè)問題,就是瀏覽器無(wú)法通過js請(qǐng)求直接保存為本地文件,這里借鑒了一篇博客,使用FileSaver.js這個(gè)腳本來(lái)實(shí)現(xiàn)js下載文件到本地

FileSaver.js

該腳本代碼如下:

/* FileSaver.js
 * A saveAs() FileSaver implementation.
 * 1.3.2
 * 2016-06-16 18:25:19
 *
 * By Eli Grey, http://eligrey.com
 * License: MIT
 *   See https://github.com/eligrey/FileSaver.js/blob/master/LICENSE.md
 */
 
/*global self */
/*jslint bitwise: true, indent: 4, laxbreak: true, laxcomma: true, smarttabs: true, plusplus: true */
 
/*! @source http://purl.eligrey.com/github/FileSaver.js/blob/master/FileSaver.js */
 
var saveAs = saveAs || (function(view) {
    "use strict";
    // IE <10 is explicitly unsupported
    if (typeof view === "undefined" || typeof navigator !== "undefined" && /MSIE [1-9]\./.test(navigator.userAgent)) {
        return;
    }
    var
          doc = view.document
          // only get URL when necessary in case Blob.js hasn't overridden it yet
        , get_URL = function() {
            return view.URL || view.webkitURL || view;
        }
        , save_link = doc.createElementNS("http://www.w3.org/1999/xhtml", "a")
        , can_use_save_link = "download" in save_link
        , click = function(node) {
            var event = new MouseEvent("click");
            node.dispatchEvent(event);
        }
        , is_safari = /constructor/i.test(view.HTMLElement) || view.safari
        , is_chrome_ios =/CriOS\/[\d]+/.test(navigator.userAgent)
        , throw_outside = function(ex) {
            (view.setImmediate || view.setTimeout)(function() {
                throw ex;
            }, 0);
        }
        , force_saveable_type = "application/octet-stream"
        // the Blob API is fundamentally broken as there is no "downloadfinished" event to subscribe to
        , arbitrary_revoke_timeout = 1000 * 40 // in ms
        , revoke = function(file) {
            var revoker = function() {
                if (typeof file === "string") { // file is an object URL
                    get_URL().revokeObjectURL(file);
                } else { // file is a File
                    file.remove();
                }
            };
            setTimeout(revoker, arbitrary_revoke_timeout);
        }
        , dispatch = function(filesaver, event_types, event) {
            event_types = [].concat(event_types);
            var i = event_types.length;
            while (i--) {
                var listener = filesaver["on" + event_types[i]];
                if (typeof listener === "function") {
                    try {
                        listener.call(filesaver, event || filesaver);
                    } catch (ex) {
                        throw_outside(ex);
                    }
                }
            }
        }
        , auto_bom = function(blob) {
            // prepend BOM for UTF-8 XML and text/* types (including HTML)
            // note: your browser will automatically convert UTF-16 U+FEFF to EF BB BF
            if (/^\s*(?:text\/\S*|application\/xml|\S*\/\S*\+xml)\s*;.*charset\s*=\s*utf-8/i.test(blob.type)) {
                return new Blob([String.fromCharCode(0xFEFF), blob], {type: blob.type});
            }
            return blob;
        }
        , FileSaver = function(blob, name, no_auto_bom) {
            if (!no_auto_bom) {
                blob = auto_bom(blob);
            }
            // First try a.download, then web filesystem, then object URLs
            var
                  filesaver = this
                , type = blob.type
                , force = type === force_saveable_type
                , object_url
                , dispatch_all = function() {
                    dispatch(filesaver, "writestart progress write writeend".split(" "));
                }
                // on any filesys errors revert to saving with object URLs
                , fs_error = function() {
                    if ((is_chrome_ios || (force && is_safari)) && view.FileReader) {
                        // Safari doesn't allow downloading of blob urls
                        var reader = new FileReader();
                        reader.onloadend = function() {
                            var url = is_chrome_ios ? reader.result : reader.result.replace(/^data:[^;]*;/, 'data:attachment/file;');
                            var popup = view.open(url, '_blank');
                            if(!popup) view.location.href = url;
                            url=undefined; // release reference before dispatching
                            filesaver.readyState = filesaver.DONE;
                            dispatch_all();
                        };
                        reader.readAsDataURL(blob);
                        filesaver.readyState = filesaver.INIT;
                        return;
                    }
                    // don't create more object URLs than needed
                    if (!object_url) {
                        object_url = get_URL().createObjectURL(blob);
                    }
                    if (force) {
                        view.location.href = object_url;
                    } else {
                        var opened = view.open(object_url, "_blank");
                        if (!opened) {
                            // Apple does not allow window.open, see https://developer.apple.com/library/safari/documentation/Tools/Conceptual/SafariExtensionGuide/WorkingwithWindowsandTabs/WorkingwithWindowsandTabs.html
                            view.location.href = object_url;
                        }
                    }
                    filesaver.readyState = filesaver.DONE;
                    dispatch_all();
                    revoke(object_url);
                }
            ;
            filesaver.readyState = filesaver.INIT;
 
            if (can_use_save_link) {
                object_url = get_URL().createObjectURL(blob);
                setTimeout(function() {
                    save_link.href = object_url;
                    save_link.download = name;
                    click(save_link);
                    dispatch_all();
                    revoke(object_url);
                    filesaver.readyState = filesaver.DONE;
                });
                return;
            }
 
            fs_error();
        }
        , FS_proto = FileSaver.prototype
        , saveAs = function(blob, name, no_auto_bom) {
            return new FileSaver(blob, name || blob.name || "download", no_auto_bom);
        }
    ;
    // IE 10+ (native saveAs)
    if (typeof navigator !== "undefined" && navigator.msSaveOrOpenBlob) {
        return function(blob, name, no_auto_bom) {
            name = name || blob.name || "download";
 
            if (!no_auto_bom) {
                blob = auto_bom(blob);
            }
            return navigator.msSaveOrOpenBlob(blob, name);
        };
    }
 
    FS_proto.abort = function(){};
    FS_proto.readyState = FS_proto.INIT = 0;
    FS_proto.WRITING = 1;
    FS_proto.DONE = 2;
 
    FS_proto.error =
    FS_proto.onwritestart =
    FS_proto.onprogress =
    FS_proto.onwrite =
    FS_proto.onabort =
    FS_proto.onerror =
    FS_proto.onwriteend =
        null;
 
    return saveAs;
}(
       typeof self !== "undefined" && self
    || typeof window !== "undefined" && window
    || this.content
));
// `self` is undefined in Firefox for Android content script context
// while `this` is nsIContentFrameMessageManager
// with an attribute `content` that corresponds to the window
 
if (typeof module !== "undefined" && module.exports) {
  module.exports.saveAs = saveAs;
} else if ((typeof define !== "undefined" && define !== null) && (define.amd !== null)) {
  define("FileSaver.js", function() {
    return saveAs;
  });
}

下載共享漏洞

首先,封裝函數(shù)以調(diào)用FileSaver.js

var downloadTextFile = function(mobileCode,a) {
    if(!mobileCode) {
        mobileCode = '';
    }
     
    var file = new File([mobileCode], a+".txt", { type: "text/plain;charset=utf-8" });
    saveAs(file);
}

然后,因?yàn)樵擁?yè)面使用了jQuery,因此可以直接使用封裝好的ajax請(qǐng)求資源鏈接,書寫代碼循環(huán)遍歷漏洞庫(kù):

var a = 242;
var timer = setInterval(function(){
  a = a+1;
  if(a>733){clearInterval(timer)}
  $.ajax({method:'GET',url:'/shareData/download/'+a,success:function(res){
    downloadTextFile(res,a)}}
)}, 2000)

a為資源鏈接后面的數(shù)字,經(jīng)過觀察,從242開始,到733結(jié)束,結(jié)束的數(shù)字根據(jù)最新的漏洞xml鏈接而定,鼠標(biāo)放在鏈接上,頁(yè)面左下角就會(huì)顯示鏈接:


查看最新的資源鏈接

末尾的2000表示每隔2秒發(fā)送一次請(qǐng)求

運(yùn)行代碼

  1. 打開CNVD漏洞庫(kù)的頁(yè)面

  2. 鼠標(biāo)右鍵單擊檢查

  3. 點(diǎn)擊console控制臺(tái)

  4. 復(fù)制上述代碼(三段代碼合并在一起即可),也可以直接到GitHub下載完整代碼復(fù)制(其中spider.js為完整js代碼,filter為后續(xù)過濾結(jié)果的代碼,歡迎留言點(diǎn)贊打賞提issue點(diǎn)star),粘貼到控制臺(tái)中,按下回車,代碼開始運(yùn)行

  5. 靜等下載完畢即可,下載的文件存放在瀏覽器設(shè)定的下載路徑里

運(yùn)行代碼步驟示意圖

過濾結(jié)果

下載完成后,發(fā)現(xiàn)有一些資源為空,大小僅有1kb:


初始結(jié)果

因此,書寫python將這些結(jié)果過濾掉:

import os

def file_path(path):
    for (root, dirs, files) in os.walk(path):
        for file in files:
            del_small_file(root + '/' + file)

def del_small_file(file_name):
    size = os.path.getsize(file_name)
    file_size = 2 * 1024
    if size < file_size:
        os.remove(file_name)

if __name__ == '__main__':
    path = r'./CNVD'
    file_path(path)

其中,path為存放文件的地址

完成結(jié)果

至此,CNVD漏洞庫(kù)爬取完成,耗時(shí)大概10分鐘,經(jīng)過過濾,共成功抓取文件311個(gè):


爬取結(jié)果

和網(wǎng)頁(yè)上的原數(shù)據(jù)對(duì)比:


CNVD共享數(shù)據(jù)頁(yè)面

數(shù)目吻合,表明我們已經(jīng)爬取了該頁(yè)面的所有共享數(shù)據(jù)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 0x00 背景 在看CNVD漏洞庫(kù)的時(shí)候發(fā)現(xiàn)有師傅發(fā)了某cms前臺(tái)SQL注入漏洞,通過查閱漏洞描述可知道存在問題的...
    漏斗社區(qū)閱讀 1,456評(píng)論 0 1
  • 目前各個(gè)企業(yè)對(duì)于應(yīng)用的安全越來(lái)越重視,而解決應(yīng)用漏洞的本質(zhì)是從代碼安全抓起。通常關(guān)于代碼的安全問題有兩類:代碼本身...
    測(cè)試開發(fā)Kevin閱讀 26,721評(píng)論 0 15
  • 久違的晴天,家長(zhǎng)會(huì)。 家長(zhǎng)大會(huì)開好到教室時(shí),離放學(xué)已經(jīng)沒多少時(shí)間了。班主任說(shuō)已經(jīng)安排了三個(gè)家長(zhǎng)分享經(jīng)驗(yàn)。 放學(xué)鈴聲...
    飄雪兒5閱讀 7,850評(píng)論 16 22
  • 今天感恩節(jié)哎,感謝一直在我身邊的親朋好友。感恩相遇!感恩不離不棄。 中午開了第一次的黨會(huì),身份的轉(zhuǎn)變要...
    余生動(dòng)聽閱讀 10,895評(píng)論 0 11
  • 可愛進(jìn)取,孤獨(dú)成精。努力飛翔,天堂翱翔。戰(zhàn)爭(zhēng)美好,孤獨(dú)進(jìn)取。膽大飛翔,成就輝煌。努力進(jìn)取,遙望,和諧家園??蓯塾巫?..
    趙原野閱讀 3,521評(píng)論 1 1

友情鏈接更多精彩內(nèi)容