Python + selenium 破解極驗(yàn)點(diǎn)擊驗(yàn)證碼

0x00 嘗試

本來(lái)是打算截圖定位的,模板匹配的函數(shù)都寫好了,參見(jiàn)python+opencv 暴力模板匹配,但是后來(lái)發(fā)現(xiàn),這個(gè)驗(yàn)證碼,居然是可以通過(guò)xpath直接定位過(guò)去的。那直接上selenium模擬用戶點(diǎn)擊就好了。代碼參見(jiàn)github

點(diǎn)擊驗(yàn)證碼

0x01 重來(lái)

image.png

那既然可以定位到元素,就嘗試直接鼠標(biāo)定位過(guò)去,然后點(diǎn)擊

btn = driver.find_element_by_xpath('//div[@aria-label="點(diǎn)擊按鈕進(jìn)行驗(yàn)證"]')
ActionChains(driver).move_to_element(btn).perform()

不過(guò),自然不可能這么簡(jiǎn)單。果然,認(rèn)證失敗,需要進(jìn)行二次驗(yàn)證才可以,二次驗(yàn)證也是極驗(yàn)驗(yàn)證碼,會(huì)隨機(jī)出現(xiàn)兩種,一種選字驗(yàn)證碼,一種滑塊驗(yàn)證碼。不過(guò)這次我們主要搞這個(gè)點(diǎn)擊的驗(yàn)證碼。
于是猜測(cè)它是通過(guò)鼠標(biāo)移動(dòng)情況來(lái)辨別是否是人機(jī)。為此,我進(jìn)行了一項(xiàng)測(cè)試,鼠標(biāo)迅速移動(dòng)到按鈕上不減速直接停止,讓自己看起來(lái)像是個(gè)機(jī)器,果然,多次嘗試都是認(rèn)證失敗。然后起始的時(shí)候慢加速,快到按鈕的時(shí)候減速,緩慢停止,就像遵守交通規(guī)則的車一樣,然后就輕而易舉的通過(guò)驗(yàn)證了。

0x02 思路

那既然我們的猜想得到了初步的驗(yàn)證,下面就是想辦法模仿鼠標(biāo)的移動(dòng)了。但是selenium是通過(guò)指定鼠標(biāo)相對(duì)位移來(lái)移動(dòng)鼠標(biāo)的,而且selenium也并沒(méi)提供獲取鼠標(biāo)位置的函數(shù)或者方法。于是初步想法是這樣的:

  1. 鼠標(biāo)先定位到元素
  2. 模仿用戶鼠標(biāo)移動(dòng),隨便繞一圈再回來(lái)
  3. 點(diǎn)擊通過(guò)驗(yàn)證碼

思路很清晰,開(kāi)始實(shí)現(xiàn)就行了。第一步和第三步都很簡(jiǎn)單,主要重點(diǎn)放在第二步:模擬鼠標(biāo)移動(dòng)。因?yàn)槲覀冎恍枰恳徊降南鄬?duì)位置,那么假設(shè)用戶的初試鼠標(biāo)位置為(0, 0),然后下面的事情就是,周期性記錄鼠標(biāo)位置,然后計(jì)算出相鄰坐標(biāo)點(diǎn)的差值,就是相對(duì)位移了。

import pyautogui as pag
import json
import time

class MouseTracker(object):
    """
    This function will generate tracks which is used to move mouse like human in selenium.
    the date will save to a string file. The format of the result:
    [[(x1,y1), delay1], [(x2, y2), delay2], [(x3, y3), delay3]...]
    """
    def __init__(self, filename='track.txt', period=0.01, max_stop_time=0.5):
        """
        :para filename: the filename to save the track of mouse.
        :para period: the fixed time to record mouse position.
        :para max_stop_time: the max_time user stayed which will be considerd as finishing record.
        """

        self.period = period
        self.filename = filename
        self.stop_num = int(max_stop_time/period)
        self.res = []

        # record start point of mouse
        self.start_point = tuple(pag.position())
        # this variable is to previous point 
        self.previous_point = self.start_point
        # save the record of track
        self.track = []
        # save the interval between each point
        self.sleep_time = []
        # calculate loop times
        self.track.append(self.start_point)

    def record(self):
        """
        Record the relative displacement of user's mouse each fixed time.
        """

        print('Moving your mouse to start record, stop moving to finish')

        # record the number of same position.
        num = 0
        # dead loop, break when staying longer than max_stop_time
        while True:
            new = tuple(pag.position())
            time.sleep(self.period)
            if new == self.start_point:
                continue
            if new == self.previous_point:
                num = num + 1
            else:
                self.track.append(new)
                self.sleep_time.append(num*self.period)
                num = 1
            self.previous_point = new
            if num > self.stop_num:
                break;
        self.sleep_time.append(0)

        # A function used to minus two point, like (3,2)-(2,1) is (1,1)
        tuple_minus = lambda x,y:(x[0]-y[0],x[1]-y[1])
        # save generator to speed up
        _range = range(1,len(self.track))
        # get relative displacement, that is the diff coordinate of neightbour
        diff = [tuple_minus(self.track[x],self.track[x-1]) for x in _range]
        # make sure the length of diff list is eaqual to sleep_time's
        diff.insert(0,(0,0))
        # get results list
        for i in range(len(self.track)):
            self.res.append((diff[i], self.sleep_time[i]))


    def print_res(self):
        for i in self.res:
            print(i)

    # save results to file
    def save(self):
        with open(self.filename, 'w') as f:
            json.dump(self.res, f)
            
    def generate(self):
        self.print_res()
        self.record()
        self.save()

if __name__ == "__main__":
    mouseTracker().generate()

然后運(yùn)行程序,我們做一回遵守“交通規(guī)則”的鼠標(biāo),緩加速,慢停止,移動(dòng)一圈鼠標(biāo)回到原點(diǎn)附近的位置。這樣就記錄下來(lái)鼠標(biāo)每一個(gè)周期移動(dòng)的相對(duì)位置。這里經(jīng)過(guò)多次調(diào)試,選擇了默認(rèn)周期為0.01s,也可以初始化的時(shí)候傳入自定義的記錄周期。鼠標(biāo)停止一定時(shí)間會(huì)認(rèn)為結(jié)束記錄,即類的初始化中的max_stop_time參數(shù)。一般設(shè)置成0.5~1s不會(huì)覺(jué)得冗長(zhǎng),剛剛好。默認(rèn)設(shè)置為1s。

0x03 破解

接下來(lái)就是利用selenium來(lái)模擬用戶登陸. 首先找到登陸頁(yè)面. 然后模擬用戶輸入和提交表單. selenium的用法就不贅述了. 直接上代碼

class Crack(object):                                                                                           
    """                                                                                                        
    Crack geetest click CAPTCHA and auto login.                                                                
    Please make sure you have generated the mouse-track file in this path.                                     
    """                                                                                                        
    def __init__(self, username, password, trackfilename='track.txt', proxy=''):                               
        # Get login information and init variable.                                                             
        self.username = username                                                                               
        self.password = password                                                                               
        self.cookies = ''                                                                                      
                                                                                                               
        # This is login entrance                                                                               
        self.url = 'https://passport.weibo.cn/signin/login'                                                    
                                                                                                               
                                                                                                               
        # read track data                                                                                      
        self.trackfilename = trackfilename                                                                     
        self.track = []                                                                                        
        with open(self.trackfilename,'r') as f:                                                                
            self.track = json.load(f)                                                                          
                                                                                                               
        # start Chrome headless, add proxy and run in headless mode.                                           
        if proxy:                                                                                              
            chrome_options = Options()                                                                         
            #chrome_options.add_argument('--headless')                                                         
            chrome_options.add_argument('--proxy-server='+proxy)                                               
            print('proxy set sucess')                                                                          
            self.driver = webdriver.Chrome(chrome_options=chrome_options)                                      
        else:                                                                                                  
            chrome_options = Options()                                                                         
            #chrome_options.add_argument('--headless')                                                         
            self.driver = webdriver.Chrome(chrome_options=chrome_options)                                      
        self.wait = WebDriverWait(self.driver, 6)                                                              
                                                                                                               
        # Waiting for chrome to open and open login entrance.                                                  
        self.driver.implicitly_wait(5)                                                                         
        try:                                                                                                   
            self.driver.get(self.url, 10)                                                                      
        except Exception as e:                                                                                 
            print("Target URL cannot be reached: ", e)                                                         
            self.__del__()                                                                                     
                                                                                                               
    def __del__(self):                                                                                         
        """ Destroy the web browser """                                                                        
        print(self.cookies)                                                                                    
        self.driver.close()                                                                                    
                                                                                                               
    def wait_for_main_page(self):                                                                              
        """                                                                                                    
        Waiting for the loading of main page                                                                   
        :return : If the main page load in given time, return True.                                            
        """                                                                                                    
        try:                                                                                                   
            self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'main-wrap')))                      
            return True                                                                                        
        except:                                                                                                
            return False                                                                                       
                                                                                                               
    def move(self):                                                                                            
        """ Move mouse by using given track """                                                                
        for offset, sleeptime in self.track:                                                                   
            x, y = offset                                                                                      
            ActionChains(self.driver).move_by_offset(x,y).perform()                                            
            time.sleep(sleeptime)                                                                              
        ActionChains(self.driver).click().perform()                                                            
                                                                                                               
    def login(self):                                                                                           
        """ Login weibo by selenium """                                                                        
                                                                                                               
        # Waiting until the presence of login button.                                                          
        class button():                                                                                        
            def __call__(self, driver):                                                                        
                if driver.find_element_by_xpath('//*[@id="loginAction"]'):                                     
                    return True                                                                                
                else:                                                                                          
                    return False                                                                               
        WebDriverWait(self.driver, 15, 0.5).until(button())                                                    
                                                                                                               
        # Input username and password.                                                                         
        print('Inputing username and password...', end='')                                                     
        username_area = self.driver.find_element_by_xpath('//*[@id="loginName"]')                              
        username_area.send_keys(self.username)                                                                 
        time.sleep(1)                                                                                          
        psw_area = self.driver.find_element_by_xpath('//*[@id="loginPassword"]')                               
        psw_area.send_keys(self.password)                                                                      
        print('Ok')                                                                                            
                                                                                                               
        # Submit login form.                                                                                   
        print('Posint form data...', entranc)                                                                  
        btn = self.driver.find_element_by_xpath('//*[@id="loginAction"]')                                      
        btn.click()                                                                                            
        print('Ok')                                                                                            
                                                                                                               
        # If their is a CAPTCHA, then crack it.                                                                
        if self.driver.current_url.find('CAPTCHA'):                                                            
            print('CAPTCHA has been detected, need crack.')                                                    
            self.crack()                                                                                       
        else:                                                                                                  
            ret = self.wait_for_main_page()                                                                    
            self.cookies = self.driver.get_cookies()                                                           
                                                                                                               
    def crack(self):                                                                                           
        """ Crack the click CAPTCHA """                                                                        
        # Waiting for page loading.                                                                            
        class button():                                                                                        
            def __call__(self, driver):                                                                        
                if driver.find_element_by_xpath('//div[@aria-label="點(diǎn)擊按鈕進(jìn)行驗(yàn)證"]'):                              
                    return True                                                                                
                else:                                                                                          
                    return False                                                                               
        print('Loading CAPTCHA...', end='')                                                                    
        WebDriverWait(self.driver, 10, 0.5).until(button())                                                    
        print('Compelete')                                                                                     
                                                                                                               
        # find button and move to the button                                                                   
        print('Cracking...', end='')                                                                           
        btn = self.driver.find_element_by_xpath('//div[@aria-label="點(diǎn)擊按鈕進(jìn)行驗(yàn)證"]')                               
        ActionChains(self.driver).move_to_element(btn).perform()                                               
        self.move()                                                                                            
        ActionChains(self.driver).click().perform()                                                            
        print('Complete')                                                                                      
                                                                                                               
        # waiting from page and get cookies                                                                    
        ret = self.wait_for_main_page()                                                                        
        if ret:                                                                                                
            print('Cracking success!')                                                                         
        # Choose-Word-CAPTCHA  has been appeared, need second varification.                                    
        elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_commit_tip')):                            
            print('Cracking failed, Choose-Word-CAPTCHA has been appeared!')                                   
            self.__del__()                                                                                     
            # crack_choose_CAPTCH()                                                                            
        # Slide-CAPTCHA  has been appeared, need second varification.                                          
        elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_slider_track')):                          
            print('Cracking failed, Slider-CAPTCHA has been appeared!')                                        
            self.__del__()                                                                                     
            # crack_slide_CAPTCH()                                                                             
        else:                                                                                                  
            print('Unknown Error!')                                                                            
            self.__del__()                                                                                     
            # log_error()                                                                                      
                                                                                                               
        if ret:                                                                                                
            cookies_dict = {}                                                                                  
            cookies = self.driver.get_cookies()                                                                
            for d in cookies:                                                                                  
                cookies_dict[d['name']] = d['value']                                                           
            print('Get cookies:', cookies_dict)                                                                
            self.cookies = json.dumps(cookies_dict)                                                            
                                                                                                               
                                                                                                               
if __name__ == '__main__':                                                                                                                            
    Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx').login()                                          

值得注意的, 我在類中增加了代理的參數(shù), 就是說(shuō), 可以提供代理來(lái)破解. 因?yàn)槎啻螌?shí)驗(yàn)發(fā)現(xiàn), 同一個(gè)ip在短時(shí)間內(nèi)連續(xù)登陸就容易出現(xiàn)二次驗(yàn)證, 二次驗(yàn)證碼有兩種形式, 一種是點(diǎn)選的驗(yàn)證碼, 選擇圖片中出現(xiàn)的文字, 另外一種是滑塊驗(yàn)證碼(相對(duì)容易破解). 這兩種不一定出現(xiàn)哪一種,不過(guò)如果登陸太頻繁, 一般觸發(fā)的都是第一種. 代碼中分別留出了這兩種二次驗(yàn)證的擴(kuò)展位置. 日后可以加上去.

代理可以通過(guò)自己建立和維護(hù)代理池來(lái)獲取, 不過(guò)可用的不是很多了. 如果使用代理的話, 就在類的初始化中傳入?yún)?shù):

if __name__ == '__main__':     
    # 這里是代理, ip:port    
    proxy = "xxx.xxx.xxx.xxx:xxxxx"
    proxy = "http://" + proxy                                                                              
    Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx', proxy=proxy).login() 
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容