HttpClient分享
HttpClient 是Apache Jakarta Common 下的子項目,可以用來提供高效的、最新的、功能豐富的支持 HTTP 協(xié)議的客戶端編程工具包,并且它支持 HTTP 協(xié)議最新的版本和建議。
1. 初識HttpClient 使用HttpClient爬取某網(wǎng)站
前置知識
- Javaj基礎(chǔ)
- 網(wǎng)絡(luò)知識基礎(chǔ)
代碼如下圖:
@Test
public void helloHttpClient() throws Exception {
CloseableHttpClient httpClient=HttpClients.createDefault();
String uri="https://www.tuicool.com/";
HttpGet httpGet=new HttpGet(uri);
CloseableHttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
String entityStr = EntityUtils.toString(entity, DEFAULT_CHARASET);
logger.info(entityStr);
response.close();
httpClient.close();
}
爬取網(wǎng)頁被攔截

被攔截.png
為什么呢?瀏覽器能正常訪問

瀏覽器正常訪問.png

瀏覽器正常訪問關(guān)鍵.png
看到了請求頭部分的User-Agent信息于是模仿瀏覽器
@Test
/**
* 模擬瀏覽器訪問
* 在HttpGet例設(shè)置請求頭內(nèi)容,通過kv的方式賦予瀏覽器的標示
* @throws Exception
*/
public void analogBrowser() throws Exception {
CloseableHttpClient httpClient=HttpClients.createDefault();
String uri="https://www.tuicool.com/";
HttpGet httpGet=new HttpGet(uri);
// 在HttpGet例設(shè)置請求頭內(nèi)容,通過kv的方式賦予瀏覽器的標示,當然也可以賦值為Android、iOS等客戶端
httpGet.setHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
CloseableHttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
logger.info("返回響應(yīng)狀態(tài)編碼為[ {} ]",response.getStatusLine().getStatusCode());
logger.info("返回內(nèi)容編碼類型為[ {} ]",entity.getContentType().getValue());
String entityStr = EntityUtils.toString(entity, DEFAULT_CHARASET);
logger.info(entityStr);
response.close();
httpClient.close();
}
模仿瀏覽器后允許訪問

模仿瀏覽器后允許訪問.png
接下來我們搞點事情,為了以后從某站下載某些不可描述的資料做準備,下載一張圖片
@Test
/**
* 下載圖片
* @throws Exception
*/
public void dowloadPicture() throws Exception {
CloseableHttpClient httpClient=HttpClients.createDefault();
String uri="https://aimg2.tuicool.com/YnmA3y3.jpg!index";
HttpGet httpGet=new HttpGet(uri);
// 在HttpGet例設(shè)置請求頭內(nèi)容,通過kv的方式賦予瀏覽器的標示,當然也可以賦值為Android、iOS等客戶端
httpGet.setHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
CloseableHttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
if(null!=entity) {
logger.info("返回響應(yīng)狀態(tài)編碼為[ {} ]",response.getStatusLine().getStatusCode());
logger.info("返回內(nèi)容編碼類型為[ {} ]",entity.getContentType().getValue());
// 通過流的方式把圖片
InputStream inputStream = entity.getContent();
FileUtils.copyToFile(inputStream, new File("/home/left/aaa.jpg"));
}
response.close();
httpClient.close();
}
控制臺顯示

下載圖片控制臺顯示.png
圖片下載成功

下載圖片1.png
使用代理
@Test
/**
* 使用代理IP
* @throws Exception
*/
public void helloProxyip() throws Exception {
CloseableHttpClient httpClient=HttpClients.createDefault();
String uri="http://www.i2finance.net/";
HttpGet httpGet=new HttpGet(uri);
// 在HttpGet例設(shè)置請求頭內(nèi)容,通過kv的方式賦予瀏覽器的標示,當然也可以賦值為Android、iOS等客戶端
httpGet.setHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
HttpHost proxy=new HttpHost("103.235.199.93", 49176);
RequestConfig config = RequestConfig.custom().setProxy(proxy).build();
// 設(shè)置代理
httpGet.setConfig(config);
CloseableHttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
if(null!=entity) {
logger.info("此次為代理訪問,代理信息為[{}]",httpGet.getConfig().getProxy());
logger.info("返回響應(yīng)狀態(tài)編碼為[ {} ]",response.getStatusLine().getStatusCode());
logger.info("返回內(nèi)容編碼類型為[ {} ]",entity.getContentType().getValue());
}
response.close();
httpClient.close();
}

設(shè)置代理訪問.png
關(guān)于超時,可以通過設(shè)置超時時間(單位毫秒)來判斷是否該結(jié)束當前進行,避免空耗
public static void main(String[] args) throws Exception {
CloseableHttpClient httpClient=HttpClients.createDefault();
String uri="http://www.i2finance.net/";
HttpGet httpGet=new HttpGet(uri);
// 在HttpGet例設(shè)置請求頭內(nèi)容,通過kv的方式賦予瀏覽器的標示,當然也可以賦值為Android、iOS等客戶端
httpGet.setHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
RequestConfig config = RequestConfig.custom()
.setConnectionRequestTimeout(100)
.setSocketTimeout(10)
.build();
// 設(shè)置代理
httpGet.setConfig(config);
CloseableHttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
if(null!=entity) {
logger.info("此次為代理訪問,代理信息為[{}]",httpGet.getConfig().getProxy());
logger.info("返回響應(yīng)狀態(tài)編碼為[ {} ]",response.getStatusLine().getStatusCode());
logger.info("返回內(nèi)容編碼類型為[ {} ]",entity.getContentType().getValue());
}
response.close();
httpClient.close();
}
未超時

未超時.png
超時

超時.png

讀超時.png