What is Hashcash?
Hashcash is a proof-of-work system used to limit email spam and denial-of-service attacks, and more recently has become known for its use in bitcoin (and other cryptocurrencies) as part of the mining algorithm. The original idea was first proposed by Cynthia Dwork and Moni Naor and Eli Ponyatovski in their 1992 paper "Pricing via Processing or Combatting Junk Mail". Later a similar proposal called Hashcash was proposed in 1997 by Adam Back.
哈?,F(xiàn)金是一個工作量證明系統(tǒng)。這個系統(tǒng)用來限制垃圾郵件(email spam)和拒絕服務(wù)攻擊(denial-of-service attacks)。最近由于被用于比特幣以及其他加密數(shù)字貨幣中作為挖礦算法的一部分而備受矚目。這個想法最初是1992年由 Cynthia Dwork、Moni Naor 和 Eli Ponyatovski 在他們的論文《通過處理和對抗垃圾郵件來定價》中提出(propose)。隨后在1997年,一個相似的名為“哈?,F(xiàn)金”的概念被Adam Back提出。
How it works
Hashcash is a proof-of-work algorithm that requires a selectable amount of work to compute, but the proof can be verified efficiently. For email uses, a textual encoding of a hashcash stamp is added to the header of an email to prove the sender has expended a modest amount of CPU time calculating the stamp prior to sending the email. In other words, as the sender has taken a certain amount of time to generate the stamp and send the email, it is unlikely that they are a spammer. The receiver can, at negligible computational cost, verify that the stamp is valid. However, the only known way to find a header with the necessary properties is brute force, trying random values until the answer is found; though testing an individual string is easy, if satisfactory answers are rare enough it will require a substantial number of tries to find the answer.
哈?,F(xiàn)金是一個工作量證明算法。這個算法需要定量的計算工作,但是證據(jù)的驗證過程非常高效。為了用于電子郵件中,需要把哈?,F(xiàn)金戳的文本編碼添加到郵件的頭部。這樣就能證明發(fā)送方在發(fā)送郵件前花費了適當(dāng)?shù)?a modest amount of)CPU時鐘來計算哈?,F(xiàn)金戳。換句話說(In other words),由于發(fā)送方花費了一定時間去生成哈?,F(xiàn)金戳并發(fā)送了郵件,這說明他不太可能是一個垃圾郵件發(fā)送者。接收方可以憑借微不足道的(negligible)計算開銷驗證受到的哈?,F(xiàn)金戳是否有效。但是,已知的尋找一個包含必備屬性的頭部的唯一方法就是暴力求解(brute force)。嘗試不同的隨機數(shù)直至找到合適的答案。雖然驗證一個字符串是容易的,但是合格的(satisfactory)答案是非常稀少的,需要進(jìn)行大量的(substantial number of)嘗試才能找到。
The hypothesis is that spammers, whose business model relies on their ability to send large numbers of emails with very little cost per message, will cease to be profitable if there is even a small cost for each spam they send. Receivers can verify whether a sender made such an investment and use the results to help filter email.
這個假設(shè)是基于這樣的事實:垃圾郵件發(fā)送者為了盈利必須發(fā)送大量的電子郵件。每封郵件消耗少量的資源,但是一旦加在一起就會是個不小的數(shù)字,使得他們無法借此盈利(cease to be profitable)。接收方可以驗證郵件的發(fā)送方是否付出了這種小型“投資”,借此過濾垃圾郵件。
Technical details
The header line looks something like this:
頭部格式如下:
X-Hashcash: 1:20:1303030600:adam@cypherspace.org::McMybZIhxKXu57jd:ckvi
The header contains:
頭部包含如下部分:
ver: Hashcash format version, 1 (which supersedes version 0).
ver: 哈?,F(xiàn)金格式版本, 填1(替換了版本0)
bits: Number of "partial pre-image" (zero) bits in the hashed code.
bits: 哈希值中前象的位數(shù)(即哈希值前幾位為全0)
date: The time that the message was sent, in the format YYMMDD[hhmm[ss]].
date: 郵件發(fā)送時間,格式為 YYMMDD[hhmm[ss]]。
resource: Resource data string being transmitted, e.g., an IP address or email address.
resource: 資源數(shù)據(jù)串。例如:IP地址或郵箱地址。
ext: Extension (optional; ignored in version 1).
ext: 擴展名(可選填,如果是版本1則不需填寫)
rand: String of random characters, encoded in base-64 format.
rand: 隨機數(shù),以base-64編碼。
counter: Binary counter (up to 220 ), encoded in base-64 format.
counter: 計數(shù)器(最大為220),以base-64編碼。
The header contains the recipient's email address, the date of the message, and information proving that the required computation has been performed. The presence of the recipient's email address requires that a different header be computed for each recipient. The date allows the recipient to record headers received recently and to ensure that the header is unique to the email message.
頭部包含接收方的郵箱地址,郵件發(fā)送時間和用于證明發(fā)送方已經(jīng)執(zhí)行相關(guān)計算的信息。接收方郵箱地址的存在使得不同的接收方可以計算產(chǎn)生不同的頭部。日期使得接收方可以記錄下近期受到的頭部并保證這個頭部是當(dāng)前的郵件專屬的(is unique to the email message)。
Sender's side
The sender prepares a header and appends a counter value initialized to a random number. It then computes the 160-bit SHA-1 hash of the header. If the first 20 bits (i.e. the 5 most significant hex digits) of the hash are all zeros, then this is an acceptable header. If not, then the sender increments the counter and tries the hash again. Out of 2160 possible hash values, there are 2140 hash values that satisfy this criterion. Thus the chance of randomly selecting a header that will have 20 zeros as the beginning of the hash is 1 in 220. The number of times that the sender needs to try to get a valid hash value is modeled by geometric distribution. Hence the sender will on average have to try 220 values to find a valid header. Given reasonable estimates of the time needed to compute the hash, this would take about 1 second to find. At this time, no more efficient method is known to find a valid header.
發(fā)送方準(zhǔn)備了一個頭部,并且在頭部末尾加上一個計數(shù)器值。這個值使用一個隨機數(shù)初始化。隨后使用這個頭部計算出一個160位的SHA-1哈希值。如果前20位(亦即前5位十六進(jìn)制數(shù))全為0,則這是一個合格的頭部。否則,發(fā)送方遞增計數(shù)器的值并重新計算哈希值。在 2160 個可能出現(xiàn)的哈希值中(out of),有 2140 個哈希值符合條件(criterion)。那么隨機選擇一個開頭20位全0的頭部的概率為 1/220。發(fā)送方要選中一個有效哈希值所需的嘗試次數(shù)遵從幾何分布(geometric distribution)。因此發(fā)送方平均需要嘗試 220 個值才能找到一個合格的頭部。估計計算這樣一個有效哈希值所需要的時間大概是1秒。當(dāng)前,獲取一個合格的頭部沒有更行之有效的方法。
A normal user on a desktop PC would not be significantly inconvenienced by the processing time required to generate the Hashcash string. However, spammers would suffer significantly due to the large number of spam messages sent by them.
一個PC端的普通用戶不會因為生成一個哈?,F(xiàn)金串而消耗的處理時間產(chǎn)生巨大不便(significantly inconvenienced)。但是,一個垃圾郵件發(fā)送者則不得不為(due to)發(fā)送大量的垃圾郵件承受巨大的折磨(suffer significantly)。
Recipient's side
Technically the system is implemented with the following steps:
一般系統(tǒng)會經(jīng)歷如下幾步:
- The recipient's computer calculates the 160-bit SHA-1 hash of the entire string.
1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa
This takes about two microseconds on a 1 GHz machine, far less time than the time it takes for the rest of the e-mail to be received. If the first 20 bits are not all zero, the hash is invalid. (Later versions may require more bits to be zero as machine processing speeds increase.)
- 接收方的電腦計算整個字串的160位SHA-1哈希值。
1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa
在一臺主頻1GHz的機器上大概會消耗兩毫秒的時間來計算這個哈希值,遠(yuǎn)低于收取剩余郵件的時間。如果開頭的20位非全0,那么此哈希值無效。(隨著硬件處理速度的提升,后續(xù)版本可能會要求更多的位數(shù)為0)
- The recipient's computer checks the date in the header (e.g., "060408", which represents the date 8 Apr 2006). If it is not within two days of the current date, it is invalid. (The two-day window compensates for clock skew and network routing time between different systems.)
- 接收者的計算檢查頭部的時間戳(如"060408",代表2006年4月8日)。如果這個日期不在當(dāng)前日期的兩天以內(nèi),那么哈希值無效。(兩天長度的時間窗口補償了(compensate for)時鐘偏差(clock skew)和郵件在雙方之間網(wǎng)絡(luò)路由的時間(network routing time))
- The recipient's computer checks whether the e-mail address in the hash string matches any of the valid e-mail addresses registered by the recipient, or matches any of the mailing lists to which the recipient is subscribed. If a match is not found, the hash string is invalid.
- 接收方的電腦檢查哈希值中的郵件地址是否符合接收方登記的有效郵件地址之一,或者是否匹配接收方注冊的郵箱列表之一。如果沒有匹配的選項,那么這個哈希串是無效的。
- The recipient's computer inserts the hash string into a database. If the string is already in the database (indicating that an attempt is being made to re-use the hash string), it is invalid.
- 接收方的電腦將哈希串插入數(shù)據(jù)庫中。如果該串已經(jīng)存在與數(shù)據(jù)庫中,說明發(fā)送方有意重用這枚哈希串,那么該串無效。
If the hash string passes all of these tests, it is considered a valid hash string. All of these tests take far less time and disk space than receiving the body content of the e-mail.
如果哈希串通過了以上所有驗證,那么可以認(rèn)定這是一條合格的哈希串。所有以上測試所使用的時間和空間遠(yuǎn)低于接收郵件內(nèi)容(body content)的消耗。
Required effort
The time needed to compute such a hash collision is exponential with the number of zero bits. So zero bits can be added (doubling the amount of time needed to compute a hash with each additional zero bit) until it is too expensive for spammers to generate valid header lines.
用于計算哈希碰撞(hash collision)的時間是哈希值規(guī)定全0位數(shù)的指數(shù)級別。全0位數(shù)可以遞增(每增加一個0位則計算一個合格哈希值所需的時間將會翻番)直到令垃圾郵件發(fā)送者感到生成合格的頭部開銷太大。
Confirming that the header is valid always takes the same amount of time, no matter how many zero bits are required for a valid header, since this requires only a single hashing operation.
無論條件中要求多少個全0位,我們總是消耗相同的時間來確認(rèn)頭部是否合格,因為接收方只需要執(zhí)行一次哈希運算(hashing operation)。