HTTP是瀏覽器和web服務(wù)器之間通信的標(biāo)準(zhǔn)協(xié)議。HTTP規(guī)定了client與server之間建立連接的過程,client怎樣從server請求數(shù)據(jù),
server怎樣回復(fù)這個請求,最終,這個連接是怎樣關(guān)閉的。HTTP連接用TCP/IP協(xié)議用于數(shù)據(jù)傳輸。對于每個client到server的請求,
有如下四個步驟:
Making the connection
client建立到server的默認(rèn)80端口的TCP連接,如果想指定其他端口,可以在URL中指定
Making a request
client給server發(fā)送一則消息請求特定URL的頁面,這個請求的格式一般是這樣:
GET /index.html HTTP/1.0
GET指定了請求的操作。這里請求的操作是讓server返回一個資源。/index.html是一個標(biāo)識從server請求的資源的相對URL。
這個資源假定是在接收請求的機(jī)器上的,因此沒有必要在其前加上前綴http://www.thismachine.com/. HTTP/1.0是client能
解釋的協(xié)議版本。請求以兩個carriage return/linefeed對結(jié)尾(\r\n\r\n in java parlance),不管client和server端平臺
的行是如何結(jié)束的。
雖然GET行是請求的所有內(nèi)容,但是client請求也能包括其他信息,這會以如下形式出現(xiàn):
Keyword: Value
最常見的關(guān)鍵字是Accept,它告訴server在client哪種數(shù)據(jù)能處理。比如說,如下的行說明client能處理4種MIME媒體類型,對
應(yīng)HTML documents,plain text,JPEG和GIF images:
Accept: text/html, text/plain, image/gif, image/jpeg
User-Agent是另一個常見的keyword,它讓server知道發(fā)送信息的瀏覽器類型,使server發(fā)送針對這種特定瀏覽器的優(yōu)化的文件。
如下的行顯示請求來自2.4版的Lynx瀏覽器:
User-Agent: Lynx/2.4 libwww/2.1.4
除了最老的第一代瀏覽器,所有的瀏覽器都包括一個Host域,它確定server的名字,這個域讓web servers區(qū)分相同IP中的服務(wù)的
不同名的主機(jī),示例如下:
Host: www.cafeaulait.org
最終,請求以一個空白行結(jié)束,兩個 carriage return/linefeed對,\r\n\r\n.一個完整的請求可能如下所示:
GET /index.html HTTP/1.0
Accept: text/html, text/plain, image/gif, image/jpeg
User-Agent: Lynx/2.4 libwww/2.1.4
Host: www.cafeaulait.org
除了GET,還有其他幾種請求類型。HEAD僅取回文件頭,而不是實際數(shù)據(jù)。這在檢查文件修改日期以判定本地緩存
是否有效的情況中很常見。POST發(fā)送form data到server,PUT上傳資源到server,而DELETE則從server刪除資源。
The response
server發(fā)送一個響應(yīng)到client。響應(yīng)以請求代碼開始,之后是a header full of metadata,一個空行,請求的文檔
或者error message.假定請求的文檔找到了,響應(yīng)可能如下:
HTTP/1.1 200 OK
Date: Mon, 15 Sep 2003 21:06:50 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Last-Modified: Tue, 15 Apr 2003 17:28:57 GMT
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Content-length: 107
The rest of the document goes here
第一行顯示server使用的協(xié)議(HTTP/1.1),之后是響應(yīng)代碼。200 OK是最常見的響應(yīng)代碼,表示請求成功。表3-1是
HTTP 1.0響應(yīng)代碼的完全列表,HTTP 1.1往這個表增加了很多。其他的header lines顯示server's time frame內(nèi)的
請求的日期,server軟件(Apache 2.0.40),文檔最后更改日期,a promise:server會在它完成傳輸?shù)臅r候關(guān)閉連接,
MIME content type,傳輸?shù)奈臋n的長度(不計算這個header),在這里是107字節(jié)。
Closing the connection
client或server或它們均關(guān)閉連接。因此,每個請求會用一個網(wǎng)絡(luò)連接。如果client重連,server不維持先前連接或
它的結(jié)果的記錄。一個不維持過去請求信息的協(xié)議稱為:stateless;相比較而下,像ftp這樣的stateful協(xié)議能在處理
連接關(guān)閉前處理很多請求。狀態(tài)的缺失是HTTP的優(yōu)點也是缺點。
表 3-1. HTTP 1.0 response codes Response code
Meaning
2xx Successful
Response codes between 200 and 299 indicate that the request was received, understood, and accepted.
200 OK
This is the most common response code. If the request used GET or POST, the requested data is contained in the response along with the usual headers. If the request used HEAD, only the header information is included.
201 Created
The server has created a data file at a URL specified in the body of the response. The web browser should now attempt to load that URL. This is sent only in response to POST requests.
202 Accepted
This rather uncommon response indicates that a request (generally from POST) is being processed, but the processing is not yet complete so no response can be returned. The server should return an HTML page that explains the situation to the user, provides an estimate of when the request is likely to be completed, and, ideally, has a link to a status monitor of some kind.
204 No Content
The server has successfully processed the request but has no information to send back to the client. This is usually the result of a poorly written form-processing program that accepts data but does not return a response to the user indicating that it has finished.
3xx Redirection
Response codes from 300 to 399 indicate that the web browser needs to go to a different page.
300 Multiple Choices
The page requested is available from one or more locations. The body of the response includes a list of locations from which the user or web browser can pick the most appropriate one. If the server prefers one of these locations, the URL of this choice is included in a Location header, which web browsers can use to load the preferred page.
301 Moved Permanently
The page has moved to a new URL. The web browser should automatically load the page at this URL and update any bookmarks that point to the old URL.
302 Moved Temporarily
This unusual response code indicates that a page is temporarily at a new URL but that the document's location will change again in the foreseeable future, so bookmarks should not be updated.
304 Not Modified
The client has performed a GET request but used the If-Modified-Since header to indicate that it wants the document only if it has been recently updated. This status code is returned because the document has not been updated. The web browser will now load the page from a cache.
4xx Client Error
Response codes from 400 to 499 indicate that the client has erred in some fashion, although the error may as easily be the result of an unreliable network connection as of a buggy or nonconforming web browser. The browser should stop sending data to the server as soon as it receives a 4xx response. Unless it is responding to a HEAD request, the server should explain the error status in the body of its response.
400 Bad Request
The client request to the server used improper syntax. This is rather unusual, although it is likely to happen if you're writing and debugging a client.
401 Unauthorized
Authorization, generally username and password controlled, is required to access this page. Either the username and password have not yet been presented or the username and password are invalid.
403 Forbidden
The server understood the request but is deliberately refusing to process it. Authorization will not help. One reason this occurs is that the client asks for a directory listing but the server is not configured to provide it, as shown in Figure 3-1.
404 Not Found
This most common error response indicates that the server cannot find the requested page. It may indicate a bad link, a page that has moved with no forwarding address, a mistyped URL, or something similar.
5xx Server Error
Response codes from 500 to 599 indicate that something has gone wrong with the server, and the server cannot fix the problem.
500 Internal Server Error
An unexpected condition occurred that the server does not know how to handle.
501 Not Implemented
The server does not have the feature that is needed to fulfill this request. A server that cannot handle POST requests might send this response to a client that tried to POST form data to it.
502 Bad Gateway
This response is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request.
503 Service Unavailable
The server is temporarily unable to handle the request, perhaps as a result of overloading or maintenance.
HTTP 1.1把響應(yīng)的數(shù)量增加了一倍多。然而,200到299的響應(yīng)代碼總是表示成功,300到399的響應(yīng)代碼表示重定向,400到499表示
客戶端錯誤。500到599表示服務(wù)端錯誤。
HTTP 1.0由RFC 1945描述;它并不是官方互聯(lián)網(wǎng)標(biāo)準(zhǔn),因為它最初由IETF之外的瀏覽器和server提供商開發(fā)的。HTTP 1.1是由W3C和
IETF的HTTP工作組開發(fā)的推薦標(biāo)準(zhǔn)。它提供client和server間更靈活更強(qiáng)大的通信能力。它擴(kuò)展性也更強(qiáng)。它在RFC 2616中描述。
HTTP 1.0是協(xié)議的基礎(chǔ)版本。所有目前的web servers和瀏覽器都支持。HTTP 1.1為HTTP 1.0增加了很多特性,但并沒怎么改變底層的
設(shè)計和體系結(jié)構(gòu)。
HTTP 1.1首要的改進(jìn)是連接可重用。HTTP 1.0為每個請求打開一個新連接。實際上,在一個web session中打開和關(guān)閉連接的時間可能
會比傳輸數(shù)據(jù)的時間更長,尤其是在session中有很多小文檔的情況。HTTP 1.1允許瀏覽器在單個連接中發(fā)送很多不同請求;連接在明確
關(guān)閉之前一會保持打開狀態(tài)。請求和響應(yīng)都是異步的。瀏覽器不需要等待第一個請求的響應(yīng)到達(dá)再發(fā)送第二個第三個請求。然而,它仍然
保持一個client請求響應(yīng)一個server response的形式。每個請求和響應(yīng)形式都同以前一樣。
HTTP 1.1還有很多其他改進(jìn)。請求包含一個Host header域以便一個web server能服務(wù)不同URL的不同站點。servers和瀏覽器能交換壓縮文件
和文檔的particular byte ranges,它們都能減輕網(wǎng)絡(luò)負(fù)載。HTTP 1.1設(shè)計得對代理server更易使用。HTTP 1.1是HTTP 1.0的超集,所以HTTP 1.1
web servers在與只支持HTTP 1.0的瀏覽器交互時沒有困難,反之亦然。