噜噜噜成人无码,久久日小视频

前言

使用Python爬取任意網(wǎng)頁的資源文件，比如圖片、音頻、視頻；一般常用的做法就是把網(wǎng)頁的HTML請求下來通過XPath或者正則來獲取自己想要的資源，這里我做了一個爬蟲工具軟件，可以一鍵爬取資源媒體文件；但是需要說明的是，這里爬取資源文件只針對HTML已有的文件，如果需要二次請求的是爬取不到的，比如酷狗音樂播放界面，因?yàn)橐鐾ㄓ霉ぞ?，匹配不同的網(wǎng)站?。?！??????

這里主推圖片爬取，一些需要圖片素材的可以輸入網(wǎng)址一鍵爬??！

還有就是爬取視頻的時候會把磁力鏈接爬取下來！可以使用第三方下載工具下載！??

代碼

爬取資源文件

這里需要說明的就只，有的圖片資源并不是url鏈接，是data:image格式，這里需要轉(zhuǎn)換一下存儲！

def getResourceUrlList(url ,isImage, isAudio, isVideo):
    global imgType_list, audioType_list, videoType_list
    imageUrlList = []
    audioUrlList = []
    videoUrlList = []
 
    url = url.rstrip().rstrip('/')
    htmlStr = str(requestsDataBase(url))
    # print(htmlStr)
    
    Wopen = open('reptileHtml.txt','w')
    Wopen.write(htmlStr)
    Wopen.close()
 
    Ropen = open('reptileHtml.txt','r')
    imageUrlList = []
 
    for line in Ropen:
        line = line.replace("'", '"')
        segmenterStr = '"'
        if "'" in line:
            segmenterStr = "'"
 
        lineList = line.split(segmenterStr)
        for partLine in lineList:
            if isImage == True:
                # 查找圖片
                if 'data:image' in partLine:
                    base64List = partLine.split('base64,')
                    imgData = base64.urlsafe_b64decode(base64List[-1] + '=' * (4 - len(base64List[-1]) % 4))
                    base64ImgType = base64List[0].split('/')[-1].rstrip(';')
                    imageName = zfjTools.getTimestamp() + '.' + base64ImgType
                    imageUrlList.append(imageName + '$==$' + base64ImgType)
 
                # 查找圖片
                for imageType in imgType_list:
                    if imageType in partLine:
                        imgUrl = partLine[:partLine.find(imageType) + len(imageType)].split(segmenterStr)[-1]
 
                        # 修復(fù)URL
                        imgUrl = repairUrl(imgUrl, url)
 
                        sizeType = '_{' + 'size' + '}'
                        if sizeType in imgUrl:
                            imgUrl = imgUrl.replace(sizeType, '')
 
                        imgUrl = imgUrl.strip()
 
                        if imgUrl.startswith('http://') or imgUrl.startswith('https://') and imgUrl not in imageUrlList:
                            imageUrlList.append(imgUrl)
                        else:
                            imgUrl = ''
 
            if isAudio == True:
                # 查找音頻
                for audioType in audioType_list:
                    if audioType in partLine or audioType.lower() in partLine:
                        audioType = audioType.lower() if audioType.lower() in partLine else audioType
                        audioUrl = partLine[:partLine.find(audioType) + len(audioType)].split(segmenterStr)[-1]
 
                        # 修復(fù)URL
                        audioUrl = repairUrl(audioUrl, url)
 
                        if audioUrl.startswith('http://') or audioUrl.startswith('https://') and audioUrl not in audioUrlList:
                            audioUrlList.append(audioUrl)
                        else:
                            audioUrl = ''
 
            if isVideo == True:
                # 查找視頻
                for videoType in videoType_list:
                    if videoType in partLine or videoType.lower() in partLine:
                        videoType = videoType.lower() if videoType.lower() in partLine else videoType
                        videoUrl = partLine[:partLine.find(videoType) + len(videoType)].split(segmenterStr)[-1]
 
                        # 修復(fù)URL
                        videoUrl = repairUrl(videoUrl, url)
 
                        if videoUrl.startswith('http://') or videoUrl.startswith('https://') or videoUrl.startswith('ed2k://') or videoUrl.startswith('magnet:?') or videoUrl.startswith('ftp://') and videoUrl not in videoUrlList:
                            videoUrlList.append(videoUrl)
                        else:
                            videoUrl = ''
 
    return (imageUrlList, audioUrlList, videoUrlList)

爬取自定義節(jié)點(diǎn)

# 統(tǒng)配節(jié)點(diǎn)爬取
def getNoteInfors(url, fatherNode, childNode):
    url = url.rstrip().rstrip('/')
    htmlStr = requestsDataBase(url)
    
    Wopen = open('reptileHtml.txt','w')
    Wopen.write(htmlStr)
    Wopen.close()

    html_etree = etree.HTML(htmlStr)

    dataArray = []

    if html_etree != None:
        nodes_list = html_etree.xpath(fatherNode)
        for k_value in nodes_list:
            partValue = k_value.xpath(childNode)
            if len(partValue) > 0:
                dataArray.append(partValue[0])

    return dataArray

軟件

軟件下載地址https://gitee.com/zfj1128/ZFJObsLib_dmg

使用教學(xué)視頻

資源爬?。烘溄?https://pan.baidu.com/s/1xa9ruF_hMcN49716BJUx2w 密碼:1zpg

節(jié)點(diǎn)爬?。烘溄?https://pan.baidu.com/s/1ebWWYtjoKkiH9mqakR6EMQ 密碼:cosa

使用截圖如下：

WX20190802-162443@2x.png

結(jié)束語

歡迎各位大佬提出寶貴的意見和建議!!!!??????

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python-一鍵爬取圖片、音頻、視頻資源

Python-一鍵爬取圖片、音頻、視頻資源

前言

代碼

爬取資源文件

爬取自定義節(jié)點(diǎn)

軟件

結(jié)束語

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Python-一鍵爬取圖片、音頻、視頻資源

前言

代碼

爬取資源文件

爬取自定義節(jié)點(diǎn)

軟件

結(jié)束語

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Python-一鍵爬取圖片、音頻、視頻資源