寫(xiě)在前面
Perfect[1]這家提供Swift服務(wù)端技術(shù)的公司,推出了Perfct Assistant(PA)[2]這款助手工具來(lái)更"Swift"地創(chuàng)建,開(kāi)發(fā),部署Swift服務(wù)器項(xiàng)目。????
關(guān)于Perfect以及PA的任何疑問(wèn)請(qǐng)登錄Slack的中文頻道[3]并@rockford大神~
服務(wù)端
我們要在Swift服務(wù)器中加入一個(gè)路由,由于服務(wù)器并沒(méi)有部署上線,所以通過(guò)http://127.0.0.1/data訪問(wèn)就行了。
//添加路由
routes.add(method: .get, uri: "/data", handler: dataHandler)
private func dataHandler(request:HTTPRequest,_ response:HTTPResponse)
{
//在請(qǐng)求中創(chuàng)建并開(kāi)始爬一次
var crawler = myCrawler(url:"https://movie.douban.com/")
crawler.start()
//如果有爬到數(shù)據(jù),就添加到Response中返回
response.appendBody(string: crawler.results.characters.count > 0 ? crawler.results : "")
response.completed()
}
這只是一只小小蟲(chóng)。
所以調(diào)用方法也很簡(jiǎn)單:]
//一個(gè)url屬性來(lái)接收傳入的主url
private var url:String
//一個(gè)results屬性來(lái)輸出結(jié)果
internal var results = ""
//myCrawler這個(gè)結(jié)構(gòu)體的初始化和開(kāi)始方法
init(url:String)
{
self.url = url
}
internal mutating func start()
{
do
{
try handleData(data: setUp(urlString: url))
}
catch
{
debugPrint(error)
}
}
那么它究竟做了些什么?這里可能要提到一下網(wǎng)絡(luò)爬蟲(chóng)的原理,根據(jù)這個(gè)試手了一個(gè)簡(jiǎn)單又偷懶的爬蟲(chóng)程序。那么我們來(lái)嘗試爬一下豆瓣電影的本周口碑榜。??(代碼配合注釋食用效果更佳)??
private func setUp(urlString:String) throws ->[String]
{
//目的就是抓到口碑榜上的那些url
var URLArray = [String]()
if let url = URL(string:urlString)
{
debugPrint("開(kāi)始獲取url")
//通過(guò)創(chuàng)建Scanner
let scanner = Scanner(string: try String(contentsOf:url))
while !scanner.isAtEnd
{
//以及首尾字段的定位,抓出url
URLArray.append(scanWith(head:"{from:'mv_rk'})\" href=\"",foot:"\">",scanner:scanner))
}
if URLArray.count == 0
{
throw crawlerError(msg:"數(shù)據(jù)初始化失敗")
}
debugPrint("獲取url結(jié)束")
}
else
{
throw crawlerError(msg:"查詢URL初始化失敗")
}
return URLArray.filter{$0.characters.count > 0}
}
核心的函數(shù)就是
private func scanWith(head:String,foot:String,scanner:Scanner)->String
代碼如下,其實(shí)就是對(duì)傳入的Scanner參數(shù)的內(nèi)容來(lái)獲取夾在head&foot之間的字符串。?因?yàn)楂@取出來(lái)的字符串還包含head的部分所以我們要去掉它。
private func scanWith(head:String,foot:String,scanner:Scanner)->String
{
var str:NSString?
scanner.scanUpTo(head, into: nil)
scanner.scanUpTo(foot, into: &str)
return str == nil ? "" : str!.replacingOccurrences(of: head, with: "")
}
拿到了所有的url之后,就要去對(duì)應(yīng)的頁(yè)面看一下需要的數(shù)據(jù)。比如我想拿到電影模型(名稱,導(dǎo)演,評(píng)分等)以及電影簡(jiǎn)介,只需要更改對(duì)應(yīng)的head&foot來(lái)獲取對(duì)應(yīng)信息就好了。
//因?yàn)橐淖? private mutating func handleData(data:[String]) throws
{
debugPrint("開(kāi)始獲取信息")
var index = 0
//映射成url數(shù)組
for case let url in data.map({ URL(string:$0) })
{
guard let _ = url else { throw crawlerError(msg:"數(shù)據(jù)\(index)初始化失敗") }
DispatchQueue.global().sync
{
do
{
let scanner = Scanner(string: try String(contentsOf:url!))
//創(chuàng)建一個(gè)head & foot 元組,方便處理
var (head,foot) = ("data-name=",".jpg")
//電影模型
var tempStr = (head + self.scanWith(head:head,foot:foot,scanner:scanner) + foot).components(separatedBy: "data-").map{
"\"\($0)".replacingOccurrences(of: "=", with: "\":").trim(string:" ")
}
tempStr.removeFirst()
var content = ""
_ = tempStr.map{ content += "\($0),\n" }
content = content.replace(of: ",", with: "\"")
//電影簡(jiǎn)介
var intro = ""
(head,foot) = try String(contentsOf:url!).contains(string: "<span class=\"all hidden\">") ? ("<span class=\"all hidden\">","</span>") : ("<span property=\"v:summary\" class=\"\">","</span>")
_ = self.scanWith(head:head,foot:foot,scanner:scanner).components(separatedBy: "<br />").map{
intro += $0.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
}
//手動(dòng)拼成JSON
results += "\"\(index)\":{\"content\":{\(content)},\"intro\":\"\(intro)\"},"
}
catch
{
debugPrint(error)
}
}
index += 1
}
debugPrint("獲取信息結(jié)束")
results = results.replace(of: ",", with: "")
results = results.characters.count > 0 ? "{\(results)}" : ""
}
最后附上工具代碼
//自定義一個(gè)錯(cuò)誤處理
struct crawlerError:Error
{
var message:String
init(msg:String)
{
message = msg
}
}
extension String
{
//去掉字符串(空格之類的)
func trim(string:String) -> String
{
return self == "" ? "" : self.trimmingCharacters(in: CharacterSet(charactersIn: string))
}
//替換從末尾出現(xiàn)的第一個(gè)指定字符串
func replace(of pre:String,with next:String)->String
{
return replacingOccurrences(of: pre, with: next, options: String.CompareOptions.backwards, range: index(endIndex, offsetBy: -2)..<endIndex)
}
}
所以。。該結(jié)束了?
移動(dòng)端
Well,被扔上服務(wù)器的爬蟲(chóng)已經(jīng)可以工作了。但覺(jué)得還不夠,光是網(wǎng)頁(yè)上能看到總覺(jué)得整體上還少了點(diǎn)什么。于是昨天又花了一點(diǎn)時(shí)間?在測(cè)試登陸注冊(cè)功能的那個(gè)demo App里加了一個(gè)數(shù)據(jù)展示。??大概就是這個(gè)丑樣子??

(內(nèi)心os) 還有好多東西可以認(rèn)真琢磨,不僅僅是這個(gè)爬蟲(chóng)的部分,服務(wù)器的,移動(dòng)端的,都有很多東西要互相考慮。能收集數(shù)據(jù)了,能存儲(chǔ)數(shù)據(jù)了,能數(shù)據(jù)展示了,全棧Swifter的路才邁出了第一步。也希望這只爬蟲(chóng)能變成蝴蝶而不是蛾子(Absolutely Not!)
萬(wàn)分感謝您一路看我碎碎念到現(xiàn)在。??