最近難得有時間,可以看看平時經(jīng)常用的牛逼的三方框架是怎么實現(xiàn)的,學習學習。比如okhttp ,眼下安卓開發(fā) 網(wǎng)絡框架 okhttp+retrofit 已經(jīng)成了標配。公司項目在使用過程中我做了一些封裝的工作。雖然目前來說沒出過什么問題,但不敢說自己封裝的好。因為從自己角度來說從來沒有認真的看過這個框架的輪子是怎么造的,更別提怎么跑起來的的,就導致了底氣不足。所以,趁有時間趕緊充充電。
網(wǎng)絡上的其他大家的分享
自己看之前也搜了很多帖子,依依拜讀,也算收益頗豐。在此推薦一位安卓開發(fā)工程師的博客,寫的思路比較清晰。
拆輪子系列:拆 OkHttp
同時也從這位兄弟這里盜來一張圖(對 我懶)來開始我的抽絲剝繭。
下面就是okhttp網(wǎng)絡請求的全過程的流程圖。從我的角度解析,可以把這個圖以中間為劃分分為左右兩個部分:左邊的是對client的封裝,右邊則是對http協(xié)議的封裝。說白了 左邊就是個瀏覽器客戶端,右邊是http的面向對象封裝:請求 響應 url 請求頭 GET POST 請求頭啊 響應頭啊 balablabl
(你說對不對?)

抽絲Okhttp中的Http協(xié)議封裝
網(wǎng)上太多人分析okhttp 的請求過程,源碼解析了,看的吐了,覺得自己了然于胸了,但是看歸看,總覺得缺點啥,后來在工作中慢慢體會了,到底缺啥:就是看了很多解析,都是別人家灌輸給你的,接受了多少不一定,而且網(wǎng)上很多都互相抄襲導致千篇一律的,看的多了覺得自己會了,但遇到問題了還是不能快速解決。(哎嗎 廢話太多)。
所以綜上,我決定還是要自己看看源碼 ,一點一點的拜讀人家的智慧結晶,扎實自己的基本功。http協(xié)議對網(wǎng)絡請求來說算是基礎知識或者是必須要了解的,但實際工作中很多人還是對他懵懵懂懂的。所以第一步先啃他了。
順序著啃吧!第一步 URL
圖右上 第一項URL ,在網(wǎng)絡編程中,url這個名詞是使用最頻繁的。那他到底是啥玩意呢?同時要了解另一個名詞 URI 一本書上是這么說的
與 URI(統(tǒng)一資源標識符)相比,我們更熟悉 URL(Uniform
Resource Locator,統(tǒng)一資源定位符)。URL 正是使用 Web 瀏覽器等
訪問 Web 頁面時需要輸入的網(wǎng)頁地址。比如,下圖的 http://baidu.com/
就是 URL。
URI 是 Uniform Resource Identifier 的縮寫。RFC2396 分別對這 3 個單
詞進行了如下定義。
Uniform
規(guī)定統(tǒng)一的格式可方便處理多種不同類型的資源,而不用根據(jù)上下文
環(huán)境來識別資源指定的訪問方式。另外,加入新增的協(xié)議方案(如
http: 或 ftp:)也更容易。
Resource
資源的定義是“可標識的任何東西”。除了文檔文件、圖像或服務(例
如當天的天氣預報)等能夠區(qū)別于其他類型的,全都可作為資源。另
外,資源不僅可以是單一的,也可以是多數(shù)的集合體。
Identifier
表示可標識的對象。也稱為標識符。
綜上所述,URI 就是由某個協(xié)議方案表示的資源的定位標識符。協(xié)議
方案是指訪問資源所使用的協(xié)議類型名稱。
采用 HTTP 協(xié)議時,協(xié)議方案就是 http。除此之外,還有 ftp、
25
mailto、telnet、file 等。標準的 URI 協(xié)議方案有 30 種左右,由隸屬于
國際互聯(lián)網(wǎng)資源管理的非營利社團 ICANN(Internet Corporation for
Assigned Names and Numbers,互聯(lián)網(wǎng)名稱與數(shù)字地址分配機構)的
IANA(Internet Assigned Numbers Authority,互聯(lián)網(wǎng)號碼分配局)管理
頒布。
IANA - Uniform Resource Identifier (URI) SCHEMES(統(tǒng)一資源
標識符方案)
http://www.iana.org/assignments/uri-schemes
URI 用字符串標識某一互聯(lián)網(wǎng)資源,而 URL 表示資源的地點(互聯(lián)
網(wǎng)上所處的位置)??梢?URL 是 URI 的子集。
所以url在整個體系中還是相當?shù)闹匾?。那么在okhttp中,我們肯定可以找到他的實現(xiàn)類:okhttp3.HttpUrl
首先我不上來就貼代碼,我要貼注釋:(帶有道云翻譯的,意外不意外)
A uniform resource locator (URL) with a scheme of either http or https. Use this class to compose and decompose Internet addresses. For example, this code will compose and print a URL for Google search:
一個統(tǒng)一資源定位器(URL),帶有http或https的方案。使用這個類來編寫和分解Internet地址。例如,該代碼將編寫并打印一個用于谷歌搜索的URL:
HttpUrl url = new HttpUrl.Builder()
.scheme("https")
.host("www.google.com")
.addPathSegment("search")
.addQueryParameter("q", "polar bears")
.build();
System.out.println(url);
which prints:
https://www.google.com/search?q=polar%20bears
As another example, this code prints the human-readable query parameters of a Twitter search:
另一個例子是,該代碼打印Twitter搜索的人類可讀查詢參數(shù):
HttpUrl url = HttpUrl.parse("https://twitter.com/search?q=cute%20%23puppies&f=images");
for (int i = 0, size = url.querySize(); i < size; i++) {
System.out.println(url.queryParameterName(i) + ": " + url.queryParameterValue(i));
}
which prints:
q: cute #puppies
f: images
In addition to composing URLs from their component parts and decomposing URLs into their component parts, this class implements relative URL resolution: what address you'd reach by clicking a relative link on a specified page. For example:
除了從組件部分組成URL并將URL分解到組件部分之外,這個類還實現(xiàn)了相對URL解析:單擊指定頁面上的相對鏈接可以訪問哪些地址。例如:
HttpUrl base = HttpUrl.parse("https://www.youtube.com/user/WatchTheDaily/videos");
HttpUrl link = base.resolve("../../watch?v=cbP2N1BQdYc");
System.out.println(link);
which prints:
https://www.youtube.com/watch?v=cbP2N1BQdYc
What's in a URL?
A URL has several components.
Scheme
Sometimes referred to as protocol, A URL's scheme describes what mechanism should be used to retrieve the resource. Although URLs have many schemes (mailto, file, ftp), this class only supports http and https. Use java.net.URI for URLs with arbitrary schemes.
有時稱為協(xié)議,URL的方案描述了應該使用什么機制來檢索資源。雖然url有許多方案(mailto、file、ftp),但這個類只支持http和https。對于帶有任意方案的url使用java.net.URI。
Username and Password
Username and password are either present, or the empty string "" if absent. This class offers no mechanism to differentiate empty from absent. Neither of these components are popular in practice. Typically HTTP applications use other mechanisms for user identification and authentication.
用戶名和密碼要么是存在的,要么是空字符串。這個類沒有提供任何機制來區(qū)分空的和不存在的。這些組件在實踐中都不受歡迎。通常,HTTP應用程序使用其他機制來進行用戶標識和身份驗證。
Host
The host identifies the webserver that serves the URL's resource. It is either a hostname like square.com or localhost, an IPv4 address like 192.168.0.1, or an IPv6 address like ::1.
主機標識服務URL資源的webserver。它是一個主機名,像square.com或localhost,一個IPv4地址,如192.168.0.1,或者一個IPv6地址,比如::1。
Usually a webserver is reachable with multiple identifiers: its IP addresses, registered domain names, and even localhost when connecting from the server itself. Each of a webserver's names is a distinct URL and they are not interchangeable. For example, even if http://square.github.io/dagger and http://google.github.io/dagger are served by the same IP address, the two URLs identify different resources.
通常,一個webserver可以通過多個標識符訪問:它的IP地址、注冊域名,甚至在連接服務器本身時,也可以使用localhost。每個webserver的名稱都是一個不同的URL,它們不能互換。例如,即使http://square.github.io /dagger和http://google.github。io/dagger由相同的IP地址提供,兩個url標識不同的資源。
Port
The port used to connect to the webserver. By default this is 80 for HTTP and 443 for HTTPS. This class never returns -1 for the port: if no port is explicitly specified in the URL then the scheme's default is used.
用于連接到web服務器的端口。默認情況下,HTTP是80,HTTPS是443。這個類永遠不會返回-1對于端口:如果URL中沒有顯式指定端口,則使用scheme的默認值。
Path
The path identifies a specific resource on the host. Paths have a hierarchical structure like "/square/okhttp/issues/1486" and decompose into a list of segments like ["square", "okhttp", "issues", "1486"].
路徑標識主機上的特定資源。路徑有一個層次結構,像“/平方/ okhttp /問題/ 1486”和分解為一系列段(“廣場”、“okhttp”,“問題”,“1486”)。
This class offers methods to compose and decompose paths by segment. It composes each path from a list of segments by alternating between "/" and the encoded segment. For example the segments ["a", "b"] build "/a/b" and the segments ["a", "b", ""] build "/a/b/".
這個類提供了通過分段組合和分解路徑的方法。它通過在“/”和編碼的段之間交替的方式從一個片段列表中組合出每個路徑。例如,分段["a", "b"]建立"/a/b"和分段["a", "b", "]建立"/a/b/"。
If a path's last segment is the empty string then the path ends with "/". This class always builds non-empty paths: if the path is omitted it defaults to "/". The default path's segment list is a single empty string: [""].
如果路徑的最后一個部分是空字符串,那么路徑以“/”結束。這個類總是構建非空路徑:如果路徑被省略,則默認為“/”。默認路徑的段列表是一個空字符串:["]。
Query
The query is optional: it can be null, empty, or non-empty. For many HTTP URLs the query string is subdivided into a collection of name-value parameters. This class offers methods to set the query as the single string, or as individual name-value parameters. With name-value parameters the values are optional and names may be repeated.
查詢是可選的:它可以是空的、空的或非空的。對于許多HTTP url,查詢字符串被細分為一個名稱-值參數(shù)集合。這個類提供了將查詢設置為單個字符串,或者作為單個名稱-值參數(shù)的方法。使用名稱-值參數(shù),值是可選的,名稱可以重復
Fragment
The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query the fragment is not sent to the webserver: it's private to the client.
片段是可選的:它可以是空的、空的或非空的。與主機、端口、路徑和查詢不同,片段并沒有發(fā)送到webserver:它是客戶機的私有屬性。
Encoding
Each component must be encoded before it is embedded in the complete URL. As we saw above, the string cute #puppies is encoded as cute%20%23puppies when used as a query parameter value.
每個組件必須在嵌入完整URL之前進行編碼。正如我們在上面看到的,當被用作查詢參數(shù)值時,字符串可愛的#小狗被編碼為可愛的%20%23小狗。
Percent encoding
Percent encoding replaces a character (like ??) with its UTF-8 hex bytes (like %F0%9F%8D%A9). This approach works for whitespace characters, control characters, non-ASCII characters, and characters that already have another meaning in a particular context.
百分比編碼用UTF-8十六進制字節(jié)(比如%F0%9F%8D%A9)替換一個字符(如)。這種方法適用于空白字符、控制字符、非ascii字符以及在特定上下文中已經(jīng)具有其他含義的字符。
Percent encoding is used in every URL component except for the hostname. But the set of characters that need to be encoded is different for each component. For example, the path component must escape all of its ? characters, otherwise it could be interpreted as the start of the URL's query. But within the query and fragment components, the ? character doesn't delimit anything and doesn't need to be escaped.
除了主機名之外,每個URL組件都使用百分比編碼。但是需要對每個組件進行編碼的字符集是不同的。例如,路徑組件必須脫逃所有的?字符,否則它可以被解釋為URL查詢的開始。但是在查詢和片段組件中,?角色不限制任何東西,也不需要轉義。
HttpUrl url = HttpUrl.parse("http://who-let-the-dogs.out").newBuilder()
.addPathSegment("_Who?_")
.query("_Who?_")
.fragment("_Who?_")
.build();
System.out.println(url);
This prints:
http://who-let-the-dogs.out/_Who%3F_?_Who?_#_Who?_
When parsing URLs that lack percent encoding where it is required, this class will percent encode the offending characters.
IDNA Mapping and Punycode encoding
Hostnames have different requirements and use a different encoding scheme. It consists of IDNA mapping and Punycode encoding.
當解析url時,在需要的地方缺少百分比編碼時,這個類將會對有問題的字符進行編碼。
IDNA映射和Punycode編碼。
主機名有不同的需求,使用不同的編碼方案。它由IDNA映射和Punycode編碼組成。
In order to avoid confusion and discourage phishing attacks, IDNA Mapping transforms names to avoid confusing characters. This includes basic case folding: transforming shouting SQUARE.COM into cool and casual square.com. It also handles more exotic characters. For example, the Unicode trademark sign (?) could be confused for the letters "TM" in http://ho?mail.com. To mitigate this, the single character (?) maps to the string (tm). There is similar policy for all of the 1.1 million Unicode code points. Note that some code points such as "??" are not mapped and cannot be used in a hostname.
為了避免混淆和阻止網(wǎng)絡釣魚攻擊,IDNA映射會轉換名稱以避免混淆字符。這包括基本的案例折頁:轉換呼叫方。進入酷和休閑的square.com。它還可以處理更多的外來字符。例如,在http://ho mail.com中,Unicode商標標識()可能會被混淆為“TM”。減輕這一單一字符(?)映射到字符串(tm)。所有的110萬個Unicode代碼點都有類似的策略。注意,一些代碼點如"??"不映射,hostname.不能使用
Punycode converts a Unicode string to an ASCII string to make international domain names work everywhere. For example, "σ" encodes as "xn--4xa". The encoded string is not human readable, but can be used with classes like InetAddress to establish connections.
Punycode將Unicode字符串轉換為ASCII字符串,以使國際域名在任何地方都能工作。例如,“σ”編碼為“xn - 4 xa”。編碼的字符串不是人類可讀的,但是可以使用像InetAddress這樣的類來建立連接。
Why another URL model?
Java includes both java.net.URL and java.net.URI. We offer a new URL model to address problems that the others don't.
為什么另一個URL模型?
Java包括Java .net. url和Java .net. uri。我們提供了一個新的URL模型來解決其他問題。
Different URLs should be different
Although they have different content, java.net.URL considers the following two URLs equal, and the equals() method between them returns true:
不同的url應該是不同的。
盡管它們有不同的內容,但java.net.URL認為以下兩個url相等,它們之間的equals()方法返回true:
http://square.github.io/
http://google.github.io/
This is because those two hosts share the same IP address. This is an old, bad design decision that makes java.net.URL unusable for many things. It shouldn't be used as a Map key or in a Set. Doing so is both inefficient because equality may require a DNS lookup, and incorrect because unequal URLs may be equal because of how they are hosted.
這是因為這兩個主機共享相同的IP地址。這是一個古老的、糟糕的設計決策,使得java.net.URL不能用于許多事情。它不應該被用作映射鍵或集合,這樣做是低效的,因為相等可能需要DNS查找,并且不正確,因為不相等的url可能因為它們的托管方式而相等。
Equal URLs should be equal
These two URLs are semantically identical, but java.net.URI disagrees:
相等的url應該是相等的。
這兩個url在語義上完全相同,但是java.net.URI不同意:
http://host:80/
http://host
Both the unnecessary port specification (:80) and the absent trailing slash (/) cause URI to bucket the two URLs separately. This harms URI's usefulness in collections. Any application that stores information-per-URL will need to either canonicalize manually, or suffer unnecessary redundancy for such URLs.
Because they don't attempt canonical form, these classes are surprisingly difficult to use securely. Suppose you're building a webservice that checks that incoming paths are prefixed "/static/images/" before serving the corresponding assets from the filesystem.
不必要的端口規(guī)范(:80)和缺失的尾斜杠(/)導致URI將兩個url分開。這會損害URI在集合中的有用性。任何存儲信息/ url的應用程序都需要手動規(guī)范化,或者為這些url帶來不必要的冗余。
因為它們不嘗試規(guī)范形式,所以這些類很難安全地使用。假設您正在構建一個webservice,它檢查傳入的路徑是否為前綴“/static/images/”,然后從文件系統(tǒng)中服務相應的資產(chǎn)。
String attack = "http://example.com/static/images/../../../../../etc/passwd";
System.out.println(new URL(attack).getPath());
System.out.println(new URI(attack).getPath());
System.out.println(HttpUrl.parse(attack).encodedPath());
By canonicalizing the input paths, they are complicit in directory traversal attacks. Code that checks only the path prefix may suffer!
通過規(guī)范化輸入路徑,它們在目錄遍歷攻擊中是串通的。只檢查路徑前綴的代碼可能會受影響!
/static/images/../../../../../etc/passwd
/static/images/../../../../../etc/passwd
/etc/passwd
If it works on the web, it should work in your application
The java.net.URI class is strict around what URLs it accepts. It rejects URLs like "http://example.com/abc|def" because the '|' character is unsupported. This class is more forgiving: it will automatically percent-encode the '|', yielding "http://example.com/abc%7Cdef". This kind behavior is consistent with web browsers. HttpUrl prefers consistency with major web browsers over consistency with obsolete specifications.
如果它在web上工作,它應該在您的應用程序中工作。
uri類對其接受的url是嚴格的。它拒絕像“http://example.com/abc|def”這樣的url,因為“|”字符不受支持。這個類更寬容:它會自動地對“|”編碼,產(chǎn)生“http://example.com/abc%7Cdef”。這種行為與web瀏覽器是一致的。HttpUrl更喜歡與主流web瀏覽器的一致性,而不是與過時的規(guī)范一致。
Paths and Queries should decompose
Neither of the built-in URL models offer direct access to path segments or query parameters. Manually using StringBuilder to assemble these components is cumbersome: do '+' characters get silently replaced with spaces? If a query parameter contains a '&', does that get escaped? By offering methods to read and write individual query parameters directly, application developers are saved from the hassles of encoding and decoding.
路徑和查詢應該分解。
內置的URL模型都不能直接訪問路徑段或查詢參數(shù)。手動使用StringBuilder來組裝這些組件很麻煩:“+”字符會被空格代替嗎?如果一個查詢參數(shù)包含一個'&',那么它會被轉義嗎?通過提供直接讀取和編寫單個查詢參數(shù)的方法,應用程序開發(fā)人員可以省去編碼和解碼的麻煩。
Plus a modern API
The URL (JDK1.0) and URI (Java 1.4) classes predate builders and instead use telescoping constructors. For example, there's no API to compose a URI with a custom port without also providing a query and fragment.
Instances of HttpUrl are well-formed and always have a scheme, host, and path. With java.net.URL it's possible to create an awkward URL like http:/ with scheme and path but no hostname. Building APIs that consume such malformed values is difficult!
This class has a modern API. It avoids punitive checked exceptions: parse() returns null if the input is an invalid URL. You can even be explicit about whether each component has been encoded already.
加上現(xiàn)代的API
URL (JDK1.0)和URI (Java 1.4)類先于構建器,而使用伸縮構造函數(shù)。例如,沒有API可以在沒有提供查詢和片段的情況下使用自定義端口組成URI。
HttpUrl的實例是格式良好的,并且總是有一個scheme、host和path。有了java.net.URL,就有可能創(chuàng)建一個類似http的尷尬URL:/使用scheme和path,但沒有主機名。構建使用這種畸形值的api是很困難的!
這個類有一個現(xiàn)代的API。它避免了懲罰性檢查異常:如果輸入是無效的URL, parse()返回null。您甚至可以清楚地知道每個組件是否已經(jīng)被編碼。
這么多注釋?。。ú皇枪P者不厚道,就是因為注釋多我才貼上來的)原因:看完注釋(有道云神經(jīng)網(wǎng)絡翻譯的,母語是中文都能看懂)很多東西不用我說了,說的很明白了
總結一下注釋的內容:
- HttpUrl類 采用builder 的鏈式調用來構建url,確保url整體的字符串的安全規(guī)范。
- 內部定義了 Scheme ,Username and Password,Host,Port,Path,Query,F(xiàn)ragment等http協(xié)議中url的元素。
- 同時為了確保url字符串的合理規(guī)范,提供了 Percent encoding,IDNA映射和Punycode編碼等工具方法。(話說之前用picasso顯示圖片的時候 遇到中午路徑請求失敗是怎么回事 ,按理說底層用okhttp 應該沒問題啊,以后研究研究)
- java本身的net包中的URL類存在諸多問題(如上),HttpUrl著力解決了這些。
構造器
HttpUrl(Builder builder) {
this.scheme = builder.scheme;
this.username = percentDecode(builder.encodedUsername, false);
this.password = percentDecode(builder.encodedPassword, false);
this.host = builder.host;
this.port = builder.effectivePort();
this.pathSegments = percentDecode(builder.encodedPathSegments, false);
this.queryNamesAndValues = builder.encodedQueryNamesAndValues != null
? percentDecode(builder.encodedQueryNamesAndValues, true)
: null;
this.fragment = builder.encodedFragment != null
? percentDecode(builder.encodedFragment, false)
: null;
this.url = builder.toString();
}
關于HttpUrl類 ,暫時不用看其他的,大概看下他的構造器和他提供的功能,確保以后項目中用到能想起來他就可以。從構造器中我們可以看到,builder 中構建了 scheme host 等必備的數(shù)據(jù),以及封裝起來的queryNamesAndValues等。
另外這個類的方法有幾個覺得比較有用的:
parse(java.lang.String url)
get(java.net.URI uri)|
get(java.net.URL url)
getChecked(java.lang.String url)
用來返回一個經(jīng)過編碼和驗證的標準HttpUrl對象
isHttps()
判斷是否是https請求
newBuilder()
取到一個新的builder
另外 tostring()則返回來一個標準的當前對象的url字符串
HttpUrl如何保證url的合法性
通過上面了解到,okhttp封裝的url對java本身的URL有很多優(yōu)越性,那么他是如何做到的呢。
首先我們從builder入手:
我們看到buider構建的所有傳入方法都對參數(shù)做了為空判斷。比如:
public Builder username(String username) {
if (username == null) throw new NullPointerException("username == null");
this.encodedUsername = canonicalize(username, USERNAME_ENCODE_SET, false, false, false, true);
return this;
}
public Builder encodedUsername(String encodedUsername) {
if (encodedUsername == null) throw new NullPointerException("encodedUsername == null");
this.encodedUsername = canonicalize(
encodedUsername, USERNAME_ENCODE_SET, true, false, false, true);
return this;
}
接下來 調用canonicalize();canonicalize 意為使其規(guī)范,就是是做對字符串進行編碼使其符合url的規(guī)范。那我們來看這個方法做了什么。
/**
* Returns a substring of {@code input} on the range {@code [pos..limit)} with the following
* transformations:
* <ul>
* <li>Tabs, newlines, form feeds and carriage returns are skipped.
* <li>In queries, ' ' is encoded to '+' and '+' is encoded to "%2B".
* <li>Characters in {@code encodeSet} are percent-encoded.
* <li>Control characters and non-ASCII characters are percent-encoded.
* <li>All other characters are copied without transformation.
* </ul>
*
* @param alreadyEncoded true to leave '%' as-is; false to convert it to '%25'.
* @param strict true to encode '%' if it is not the prefix of a valid percent encoding.
* @param plusIsSpace true to encode '+' as "%2B" if it is not already encoded.
* @param asciiOnly true to encode all non-ASCII codepoints.
* @param charset which charset to use, null equals UTF-8.
*/
static String canonicalize(String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
int codePoint;
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (codePoint < 0x20
|| codePoint == 0x7f
|| codePoint >= 0x80 && asciiOnly
|| encodeSet.indexOf(codePoint) != -1
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))
|| codePoint == '+' && plusIsSpace) {
// Slow path: the character at i requires encoding!
Buffer out = new Buffer();
out.writeUtf8(input, pos, i);
canonicalize(out, input, i, limit, encodeSet, alreadyEncoded, strict, plusIsSpace,
asciiOnly, charset);
return out.readUtf8();
}
}
// Fast path: no characters in [pos..limit) required encoding.
return input.substring(pos, limit);
}
static void canonicalize(Buffer out, String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
Buffer encodedCharBuffer = null; // Lazily allocated.
int codePoint;
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (alreadyEncoded
&& (codePoint == '\t' || codePoint == '\n' || codePoint == '\f' || codePoint == '\r')) {
// Skip this character.
} else if (codePoint == '+' && plusIsSpace) {
// Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
out.writeUtf8(alreadyEncoded ? "+" : "%2B");
} else if (codePoint < 0x20
|| codePoint == 0x7f
|| codePoint >= 0x80 && asciiOnly
|| encodeSet.indexOf(codePoint) != -1
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))) {
// Percent encode this character.
if (encodedCharBuffer == null) {
encodedCharBuffer = new Buffer();
}
if (charset == null || charset.equals(Util.UTF_8)) {
encodedCharBuffer.writeUtf8CodePoint(codePoint);
} else {
encodedCharBuffer.writeString(input, i, i + Character.charCount(codePoint), charset);
}
while (!encodedCharBuffer.exhausted()) {
int b = encodedCharBuffer.readByte() & 0xff;
out.writeByte('%');
out.writeByte(HEX_DIGITS[(b >> 4) & 0xf]);
out.writeByte(HEX_DIGITS[b & 0xf]);
}
} else {
// This character doesn't need encoding. Just copy it over.
out.writeUtf8CodePoint(codePoint);
}
}
}
static String canonicalize(String input, String encodeSet, boolean alreadyEncoded, boolean strict,
boolean plusIsSpace, boolean asciiOnly, Charset charset) {
return canonicalize(
input, 0, input.length(), encodeSet, alreadyEncoded, strict, plusIsSpace, asciiOnly,
charset);
}
static String canonicalize(String input, String encodeSet, boolean alreadyEncoded, boolean strict,
boolean plusIsSpace, boolean asciiOnly) {
return canonicalize(
input, 0, input.length(), encodeSet, alreadyEncoded, strict, plusIsSpace, asciiOnly, null);
}
還是通讀注釋,我們知道此方法是把傳入的字符串進行url編碼轉化,在返回回來。涉及到的規(guī)則:
- 空格 制表符 回車 表單輸入 會跳過不編碼
- 在參數(shù)部分 ,空格串 ' '被編碼成+ 而加號 + 被編碼成 %2B
- 可以控制只允許ASCII碼存在
*不需要編碼的其余字符 原樣復制不進行編碼
我們首先看第一個方法:
static String canonicalize(String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
int codePoint;
//此循環(huán)對傳入的字符串從pos 到limit逐一的進行判斷
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (codePoint < 0x20//小于0x20 的字符,0x20表示空格 小于他的 也都是平時我們無法用肉眼看到的隱藏字符 如換行符 空格 等 所以屬于不合法的無意義url字符
|| codePoint == 0x7f//刪除鍵
|| codePoint >= 0x80 && asciiOnly//大于等于0x80超過ascii表范圍并且asciiOnly所以需要編碼
|| encodeSet.indexOf(codePoint) != -1(包含于encodeSet中指定必須編碼)
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit)) //如果是百分號的話 根據(jù)規(guī)則判斷
|| codePoint == '+' && plusIsSpace) {//如果是加號根據(jù)plusIsSpace規(guī)則判斷
// Slow path: the character at i requires encoding!
// 這里是值得學習的 優(yōu)化部分 :既然找到了第一個 需要編碼的位置是 i 那就先把之前的字符先寫到緩存,然后從i位置開始編碼查找吧 這樣避免了繼續(xù)從零卡死對i之前的字符進行重復的操作
Buffer out = new Buffer();
out.writeUtf8(input, pos, i);
canonicalize(out, input, i, limit, encodeSet, alreadyEncoded, strict, plusIsSpace,
asciiOnly, charset);//下面解析
return out.readUtf8();
}
}
// Fast path: no characters in [pos..limit) required encoding.
//經(jīng)過上面判斷沒有發(fā)現(xiàn)需要編碼的字符,直接返回
return input.substring(pos, limit);
}
通過上一通查找判斷確定需要編碼的范圍,接下來就是編碼。
static void canonicalize(Buffer out, String input, int pos, int limit, String encodeSet,
boolean alreadyEncoded, boolean strict, boolean plusIsSpace, boolean asciiOnly,
Charset charset) {
Buffer encodedCharBuffer = null; // Lazily allocated. 延后申請內存提高性能
int codePoint;
for (int i = pos; i < limit; i += Character.charCount(codePoint)) {
codePoint = input.codePointAt(i);
if (alreadyEncoded
&& (codePoint == '\t' || codePoint == '\n' || codePoint == '\f' || codePoint == '\r')) {
// Skip this character. 這些回車 制表符等跳過 不處理
} else if (codePoint == '+' && plusIsSpace) {
// Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
//把加號 + 轉成 %2B ,但如果已經(jīng)經(jīng)過編碼 + 有可能來自空格轉換過來的 就不需要再轉換 直接寫入 +
out.writeUtf8(alreadyEncoded ? "+" : "%2B");
} else if (codePoint < 0x20
|| codePoint == 0x7f
|| codePoint >= 0x80 && asciiOnly
|| encodeSet.indexOf(codePoint) != -1
|| codePoint == '%' && (!alreadyEncoded || strict && !percentEncoded(input, i, limit))) {
//經(jīng)過和上面一樣的判斷 取出不不符合assii碼和一些需要編碼的字符進行百分比編碼
// Percent encode this character.
if (encodedCharBuffer == null) {
encodedCharBuffer = new Buffer();
}
if (charset == null || charset.equals(Util.UTF_8)) {
encodedCharBuffer.writeUtf8CodePoint(codePoint);
} else {
encodedCharBuffer.writeString(input, i, i + Character.charCount(codePoint), charset);
}
//百分比編碼方式
while (!encodedCharBuffer.exhausted()) {
int b = encodedCharBuffer.readByte() & 0xff;
out.writeByte('%');
out.writeByte(HEX_DIGITS[(b >> 4) & 0xf]);
out.writeByte(HEX_DIGITS[b & 0xf]);
}
} else {
// This character doesn't need encoding. Just copy it over.
out.writeUtf8CodePoint(codePoint);
}
}
}
這大概就是整個百分比編碼算法的核心。
除此之外 還提供兩個幾個直接進行百分比編碼的方法如圖:

還有一些合理性檢驗
如 判斷scheme是不是http 或者https等等。
最后通過build方法構建出了一個完美的HttpUrl對象來。
好啦,大概就這樣,筆者學習筆記歡迎指正和建議。
下面會開始關于method的解析。
下篇 抽絲剝繭 okhttp3 (二) http://www.itdecent.cn/p/77f71946ef44