特點(diǎn):動態(tài)加載規(guī)則,無需重新編譯軟件,書寫簡單,添加自由,適用于輕量級的采集項(xiàng)目
xxx.pholcus.html
<Spider>
<Name>HTML動態(tài)規(guī)則示例</Name>
<DeScription>HTML動態(tài)規(guī)則示例 [Auto Page] [http://xxx.xxx.xxx]</DeScription>
<Pausetime>300</Pausetime>
<EnableLimit>false</EnableLimit>
<EnableCookie>true</EnableCookie>
<EnableKeyin>false</EnableKeyin>
<NotDefaultField>false</NotDefaultField>
<Namespace>
<Script></Script>
</Namespace>
<SubNamespace>
<Script></Script>
</SubNamespace>
<Root>
<Script param="ctx">
console.log("Root");
ctx.JsAddQueue({
Url: "http://xxx.xxx.xxx",
Rule: "登錄頁"
});
</Script>
</Root>
<Rule name="登錄頁">
<AidFunc>
<Script param="ctx,aid">
</Script>
</AidFunc>
<ParseFunc>
<Script param="ctx">
console.log(ctx.GetRuleName());
ctx.JsAddQueue({
Url: "http://xxx.xxx.xxx",
Rule: "登錄后",
Method: "POST",
PostData: "username=44444444@qq.com&password=44444444&login_btn=login_btn&submit=login_btn"
});
</Script>
</ParseFunc>
</Rule>
<Rule name="登錄后">
<ParseFunc>
<Script param="ctx">
console.log(ctx.GetRuleName());
ctx.Output({
"全部": ctx.GetText()
});
ctx.JsAddQueue({
Url: "http://accounts.xxx.xxx/member",
Rule: "個(gè)人中心",
Header: {
"Referer": [ctx.GetUrl()]
}
});
</Script>
</ParseFunc>
</Rule>
<Rule name="個(gè)人中心">
<ParseFunc>
<Script param="ctx">
console.log("個(gè)人中心: " + ctx.GetRuleName());
ctx.Output({
"全部": ctx.GetText()
});
</Script>
</ParseFunc>
</Rule>
</Spider>
Tag:翻譯
<Spider>:蜘蛛???
<DeScription>:描述
<Pausetime>:停頓時(shí)間
<EnableLimit>:啟用限制
<EnableCookie>:啟用瀏覽器緩存
<EnableKeyin>:鍵盤錄入
<NotDefaultField>:不是默認(rèn)字段
<Namespace>:名稱空間
<SubNamespace>:次級名稱空間
<Root>:根
<Rule>:管轄
<AidFunc>:幫助Func
<ParseFunc>:解析Func
javascript:翻譯
param:參數(shù)
JsAddQueue():js添加隊(duì)列
GetRuleName():獲取管轄名稱
Output ():輸出
文章摘自:github /作者:henrylee