0x00 嘗試
本來(lái)是打算截圖定位的,模板匹配的函數(shù)都寫好了,參見(jiàn)python+opencv 暴力模板匹配,但是后來(lái)發(fā)現(xiàn),這個(gè)驗(yàn)證碼,居然是可以通過(guò)xpath直接定位過(guò)去的。那直接上selenium模擬用戶點(diǎn)擊就好了。代碼參見(jiàn)github。

0x01 重來(lái)

那既然可以定位到元素,就嘗試直接鼠標(biāo)定位過(guò)去,然后點(diǎn)擊
btn = driver.find_element_by_xpath('//div[@aria-label="點(diǎn)擊按鈕進(jìn)行驗(yàn)證"]')
ActionChains(driver).move_to_element(btn).perform()
不過(guò),自然不可能這么簡(jiǎn)單。果然,認(rèn)證失敗,需要進(jìn)行二次驗(yàn)證才可以,二次驗(yàn)證也是極驗(yàn)驗(yàn)證碼,會(huì)隨機(jī)出現(xiàn)兩種,一種選字驗(yàn)證碼,一種滑塊驗(yàn)證碼。不過(guò)這次我們主要搞這個(gè)點(diǎn)擊的驗(yàn)證碼。
于是猜測(cè)它是通過(guò)鼠標(biāo)移動(dòng)情況來(lái)辨別是否是人機(jī)。為此,我進(jìn)行了一項(xiàng)測(cè)試,鼠標(biāo)迅速移動(dòng)到按鈕上不減速直接停止,讓自己看起來(lái)像是個(gè)機(jī)器,果然,多次嘗試都是認(rèn)證失敗。然后起始的時(shí)候慢加速,快到按鈕的時(shí)候減速,緩慢停止,就像遵守交通規(guī)則的車一樣,然后就輕而易舉的通過(guò)驗(yàn)證了。
0x02 思路
那既然我們的猜想得到了初步的驗(yàn)證,下面就是想辦法模仿鼠標(biāo)的移動(dòng)了。但是selenium是通過(guò)指定鼠標(biāo)相對(duì)位移來(lái)移動(dòng)鼠標(biāo)的,而且selenium也并沒(méi)提供獲取鼠標(biāo)位置的函數(shù)或者方法。于是初步想法是這樣的:
- 鼠標(biāo)先定位到元素
- 模仿用戶鼠標(biāo)移動(dòng),隨便繞一圈再回來(lái)
- 點(diǎn)擊通過(guò)驗(yàn)證碼
思路很清晰,開(kāi)始實(shí)現(xiàn)就行了。第一步和第三步都很簡(jiǎn)單,主要重點(diǎn)放在第二步:模擬鼠標(biāo)移動(dòng)。因?yàn)槲覀冎恍枰恳徊降南鄬?duì)位置,那么假設(shè)用戶的初試鼠標(biāo)位置為(0, 0),然后下面的事情就是,周期性記錄鼠標(biāo)位置,然后計(jì)算出相鄰坐標(biāo)點(diǎn)的差值,就是相對(duì)位移了。
import pyautogui as pag
import json
import time
class MouseTracker(object):
"""
This function will generate tracks which is used to move mouse like human in selenium.
the date will save to a string file. The format of the result:
[[(x1,y1), delay1], [(x2, y2), delay2], [(x3, y3), delay3]...]
"""
def __init__(self, filename='track.txt', period=0.01, max_stop_time=0.5):
"""
:para filename: the filename to save the track of mouse.
:para period: the fixed time to record mouse position.
:para max_stop_time: the max_time user stayed which will be considerd as finishing record.
"""
self.period = period
self.filename = filename
self.stop_num = int(max_stop_time/period)
self.res = []
# record start point of mouse
self.start_point = tuple(pag.position())
# this variable is to previous point
self.previous_point = self.start_point
# save the record of track
self.track = []
# save the interval between each point
self.sleep_time = []
# calculate loop times
self.track.append(self.start_point)
def record(self):
"""
Record the relative displacement of user's mouse each fixed time.
"""
print('Moving your mouse to start record, stop moving to finish')
# record the number of same position.
num = 0
# dead loop, break when staying longer than max_stop_time
while True:
new = tuple(pag.position())
time.sleep(self.period)
if new == self.start_point:
continue
if new == self.previous_point:
num = num + 1
else:
self.track.append(new)
self.sleep_time.append(num*self.period)
num = 1
self.previous_point = new
if num > self.stop_num:
break;
self.sleep_time.append(0)
# A function used to minus two point, like (3,2)-(2,1) is (1,1)
tuple_minus = lambda x,y:(x[0]-y[0],x[1]-y[1])
# save generator to speed up
_range = range(1,len(self.track))
# get relative displacement, that is the diff coordinate of neightbour
diff = [tuple_minus(self.track[x],self.track[x-1]) for x in _range]
# make sure the length of diff list is eaqual to sleep_time's
diff.insert(0,(0,0))
# get results list
for i in range(len(self.track)):
self.res.append((diff[i], self.sleep_time[i]))
def print_res(self):
for i in self.res:
print(i)
# save results to file
def save(self):
with open(self.filename, 'w') as f:
json.dump(self.res, f)
def generate(self):
self.print_res()
self.record()
self.save()
if __name__ == "__main__":
mouseTracker().generate()
然后運(yùn)行程序,我們做一回遵守“交通規(guī)則”的鼠標(biāo),緩加速,慢停止,移動(dòng)一圈鼠標(biāo)回到原點(diǎn)附近的位置。這樣就記錄下來(lái)鼠標(biāo)每一個(gè)周期移動(dòng)的相對(duì)位置。這里經(jīng)過(guò)多次調(diào)試,選擇了默認(rèn)周期為0.01s,也可以初始化的時(shí)候傳入自定義的記錄周期。鼠標(biāo)停止一定時(shí)間會(huì)認(rèn)為結(jié)束記錄,即類的初始化中的max_stop_time參數(shù)。一般設(shè)置成0.5~1s不會(huì)覺(jué)得冗長(zhǎng),剛剛好。默認(rèn)設(shè)置為1s。
0x03 破解
接下來(lái)就是利用selenium來(lái)模擬用戶登陸. 首先找到登陸頁(yè)面. 然后模擬用戶輸入和提交表單. selenium的用法就不贅述了. 直接上代碼
class Crack(object):
"""
Crack geetest click CAPTCHA and auto login.
Please make sure you have generated the mouse-track file in this path.
"""
def __init__(self, username, password, trackfilename='track.txt', proxy=''):
# Get login information and init variable.
self.username = username
self.password = password
self.cookies = ''
# This is login entrance
self.url = 'https://passport.weibo.cn/signin/login'
# read track data
self.trackfilename = trackfilename
self.track = []
with open(self.trackfilename,'r') as f:
self.track = json.load(f)
# start Chrome headless, add proxy and run in headless mode.
if proxy:
chrome_options = Options()
#chrome_options.add_argument('--headless')
chrome_options.add_argument('--proxy-server='+proxy)
print('proxy set sucess')
self.driver = webdriver.Chrome(chrome_options=chrome_options)
else:
chrome_options = Options()
#chrome_options.add_argument('--headless')
self.driver = webdriver.Chrome(chrome_options=chrome_options)
self.wait = WebDriverWait(self.driver, 6)
# Waiting for chrome to open and open login entrance.
self.driver.implicitly_wait(5)
try:
self.driver.get(self.url, 10)
except Exception as e:
print("Target URL cannot be reached: ", e)
self.__del__()
def __del__(self):
""" Destroy the web browser """
print(self.cookies)
self.driver.close()
def wait_for_main_page(self):
"""
Waiting for the loading of main page
:return : If the main page load in given time, return True.
"""
try:
self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'main-wrap')))
return True
except:
return False
def move(self):
""" Move mouse by using given track """
for offset, sleeptime in self.track:
x, y = offset
ActionChains(self.driver).move_by_offset(x,y).perform()
time.sleep(sleeptime)
ActionChains(self.driver).click().perform()
def login(self):
""" Login weibo by selenium """
# Waiting until the presence of login button.
class button():
def __call__(self, driver):
if driver.find_element_by_xpath('//*[@id="loginAction"]'):
return True
else:
return False
WebDriverWait(self.driver, 15, 0.5).until(button())
# Input username and password.
print('Inputing username and password...', end='')
username_area = self.driver.find_element_by_xpath('//*[@id="loginName"]')
username_area.send_keys(self.username)
time.sleep(1)
psw_area = self.driver.find_element_by_xpath('//*[@id="loginPassword"]')
psw_area.send_keys(self.password)
print('Ok')
# Submit login form.
print('Posint form data...', entranc)
btn = self.driver.find_element_by_xpath('//*[@id="loginAction"]')
btn.click()
print('Ok')
# If their is a CAPTCHA, then crack it.
if self.driver.current_url.find('CAPTCHA'):
print('CAPTCHA has been detected, need crack.')
self.crack()
else:
ret = self.wait_for_main_page()
self.cookies = self.driver.get_cookies()
def crack(self):
""" Crack the click CAPTCHA """
# Waiting for page loading.
class button():
def __call__(self, driver):
if driver.find_element_by_xpath('//div[@aria-label="點(diǎn)擊按鈕進(jìn)行驗(yàn)證"]'):
return True
else:
return False
print('Loading CAPTCHA...', end='')
WebDriverWait(self.driver, 10, 0.5).until(button())
print('Compelete')
# find button and move to the button
print('Cracking...', end='')
btn = self.driver.find_element_by_xpath('//div[@aria-label="點(diǎn)擊按鈕進(jìn)行驗(yàn)證"]')
ActionChains(self.driver).move_to_element(btn).perform()
self.move()
ActionChains(self.driver).click().perform()
print('Complete')
# waiting from page and get cookies
ret = self.wait_for_main_page()
if ret:
print('Cracking success!')
# Choose-Word-CAPTCHA has been appeared, need second varification.
elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_commit_tip')):
print('Cracking failed, Choose-Word-CAPTCHA has been appeared!')
self.__del__()
# crack_choose_CAPTCH()
# Slide-CAPTCHA has been appeared, need second varification.
elif EC.presence_of_element_located((By.CLASS_NAME, 'geetest_slider_track')):
print('Cracking failed, Slider-CAPTCHA has been appeared!')
self.__del__()
# crack_slide_CAPTCH()
else:
print('Unknown Error!')
self.__del__()
# log_error()
if ret:
cookies_dict = {}
cookies = self.driver.get_cookies()
for d in cookies:
cookies_dict[d['name']] = d['value']
print('Get cookies:', cookies_dict)
self.cookies = json.dumps(cookies_dict)
if __name__ == '__main__':
Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx').login()
值得注意的, 我在類中增加了代理的參數(shù), 就是說(shuō), 可以提供代理來(lái)破解. 因?yàn)槎啻螌?shí)驗(yàn)發(fā)現(xiàn), 同一個(gè)ip在短時(shí)間內(nèi)連續(xù)登陸就容易出現(xiàn)二次驗(yàn)證, 二次驗(yàn)證碼有兩種形式, 一種是點(diǎn)選的驗(yàn)證碼, 選擇圖片中出現(xiàn)的文字, 另外一種是滑塊驗(yàn)證碼(相對(duì)容易破解). 這兩種不一定出現(xiàn)哪一種,不過(guò)如果登陸太頻繁, 一般觸發(fā)的都是第一種. 代碼中分別留出了這兩種二次驗(yàn)證的擴(kuò)展位置. 日后可以加上去.
代理可以通過(guò)自己建立和維護(hù)代理池來(lái)獲取, 不過(guò)可用的不是很多了. 如果使用代理的話, 就在類的初始化中傳入?yún)?shù):
if __name__ == '__main__':
# 這里是代理, ip:port
proxy = "xxx.xxx.xxx.xxx:xxxxx"
proxy = "http://" + proxy
Crack('xxxxxxxxxx@sina.com', 'xxxxxxxx', proxy=proxy).login()