從系統(tǒng)性能優(yōu)化談對(duì)象相等性

公司系統(tǒng)中有一接口訪問量大,內(nèi)部計(jì)算邏輯較為復(fù)雜。在優(yōu)化時(shí)打算把Request中的參數(shù)做為Key,Response做為Value放到進(jìn)程內(nèi)緩存中,以降低服務(wù)器壓力,提高接口響應(yīng)速度。因?yàn)镽esponse中一些數(shù)據(jù)時(shí)效性要求較高,所以緩存設(shè)置一個(gè)較短的過期時(shí)間(比如10s)。

但這里牽涉到一個(gè)問題,如何有效的判斷兩次請(qǐng)求的參數(shù)是相等的。C#中自定義類型會(huì)從Object類繼承Equals和GetHashCode兩個(gè)方法,可以根據(jù)實(shí)際需求來重寫這兩個(gè)方法實(shí)現(xiàn)對(duì)象相等性比較。

Object.Equals(Object)

.NET 中不同類型對(duì)于Equals方法的默認(rèn)實(shí)現(xiàn)如下:

Type category Equality defined by Comments
Class derived directly from Object Object.Equals(Object) Reference equality; equivalent to calling Object.ReferenceEquals.
Structure ValueType.Equals Value equality; either direct byte-by-byte comparison or field-by-field comparison using reflection.
Enumeration Enum.Equals Values must have the same enumeration type and the same underlying value.
Delegate MulticastDelegate.Equals Delegates must have the same type with identical invocation lists.
Interface Object.Equals(Object) Reference equality.

Object

通過源碼,可以看到Object中Equals方法的實(shí)現(xiàn),即.NET中所有類型的默認(rèn)實(shí)現(xiàn):

ValueType

反編譯之后,可以看到ValueType中Equals方法的實(shí)現(xiàn),即值類型的默認(rèn)實(shí)現(xiàn),它重寫了Object.Equals方法:

上面可以看到,ValueType中Equals實(shí)現(xiàn)思路如下:

  • obj==null返回false

  • 若this和obj的運(yùn)行時(shí)類型不同則返回false

  • 如果值類型中包含的字段均是值類型則逐字節(jié)比較字段值

  • 若含有引用類型字段,則使用使用反射獲取字段信息,然后調(diào)用字段的Equals方法來逐字段比較相等性

重寫Equals

Object的Equals僅通過引用來比較相等性。應(yīng)該說是identity而非equality,與Python中的is、== 操作符類似;ValueType的Equals中使用了反射性能較差。這種默認(rèn)實(shí)現(xiàn)通常不能滿足需求,自定義實(shí)現(xiàn)Equals思路如下:

  • obj為null,返回false,因?yàn)镋quals是實(shí)例方法,this不會(huì)為null

  • 對(duì)于引用類型,this和obj引用同一個(gè)對(duì)象返回true

  • 調(diào)用GetType方法來判斷this和obj在運(yùn)行時(shí)是否是相同類型

  • 必要時(shí)調(diào)用基類的Equals方法來比較基類中字段的相等性(通常不調(diào)用Object類的Equals)

  • 調(diào)用Equals方法逐字段進(jìn)行比較

根據(jù)上述思路,實(shí)現(xiàn)自定義類型的Equals方法:

public class Entity
{
    public Entity(string tag, int count, IDictionary<string, string> descriptioins)
    {
        this.Tag = tag;
        this.Count = count;
        this.Descriptions = descriptioins;
    }

    public string Tag { private set; get; }

    public int Count { private set; get; }

    public IDictionary<string, string> Descriptions { private set; get; }
    /// <summary>
    /// 逐字段比較相等性
    /// </summary>
    public override bool Equals(object obj)
    {
        if (obj == null)
        {
            return false;
        }

        if (object.ReferenceEquals(this, obj))
        {
            return true;
        }

        // 這里判斷this與obj在運(yùn)行時(shí)類型是否一樣
        // 使用is關(guān)鍵字進(jìn)行類型判斷的話,如果obj是Entity的子類也會(huì)返回true
        // 如果類型被標(biāo)記為sealed,可以使用is來判斷
        if (this.GetType().Equals(obj.GetType()) == false)
        {
            return false;
        }

        var other = obj as Entity;
        if (other == null)
        {
            return false;
        }
        if (this.Tag != other.Tag)
        {
            return false;
        }
        if (this.Count != other.Count)
        {
            return false;
        }
        if (this.Descriptions.FieldsEquales(other.Descriptions) == false)
        {
            return false;
        }

        return true;
    }
    /// <summary>
    /// 得到的哈希值應(yīng)在對(duì)象生命周期中保持不變
    /// </summary>
    public override int GetHashCode() => this.ToString().GetHashCode();
    /// <summary>
    /// 含義同Equals(object obj)
    /// </summary>
    public static bool operator ==(Entity left, Entity right)
    {
        // The null keyword is a literal that represents a null reference, one that does not refer to any object. 
        // null is the default value of reference - type variables.Ordinary value types cannot be null, except for nullable value types.
       if (object.ReferenceEquals(left, null))
       {
           return object.ReferenceEquals(right, null);
       }

       return left.Equals(right);
    }
    /// <summary>
    /// 含義與==相反
    /// </summary>
    public static bool operator !=(Entity left, Entity right) => !(left == right);

    public override string ToString() => JsonConvert.SerializeObject(this);
}
public static class DictionaryExtension
{
    /// <summary>
    /// 調(diào)用Object.Equals(Object)方法逐個(gè)字段進(jìn)行相等性比較
    /// <para>雙方均為null時(shí)返回true,一方為null是返回false</para>
    /// </summary>
    public static bool FieldsEquals<TKey, TValue>(this IDictionary<TKey, TValue> source, IDictionary<TKey, TValue> target)
    {
        if (source == null && target == null)
        {
            return true;
        }
        if (source == null || target == null)
        {
            return false;
        }
        if (object.ReferenceEquals(source, target))
        {
            return true;
        }
        if (source.Keys.Count != target.Keys.Count)
        {
            return false;
        }
        foreach (var key in source.Keys)
        {
            if (target.ContainsKey(key) == false)
            {
                return false;
            }
            var sourceValue = source[key];
            var targetValue = target[key];
            if (object.ReferenceEquals(sourceValue, null))
            {
               if (object.ReferenceEquals(targetValue, null))
               {
                   continue;
               }

                return false;
            }

           if (sourceValue.Equals(targetValue))
           {
                continue;
           }
           return false;
        }
        return true;
    }
}

?? 要調(diào)用GetType方法來判斷this與obj在運(yùn)行時(shí)類型是否相同。若使用is關(guān)鍵字進(jìn)行類型判斷的話,如果obj是Entity的子類也會(huì)返回true。當(dāng)類型不能做為基類時(shí),如被標(biāo)記為sealed或值類型(struct、enum),可以使用is來判斷。

重寫Equals方法應(yīng)滿足以下幾點(diǎn):

  • 自反:x.Equals(x)返回true

  • 對(duì)稱:x.Equals(y)==y.Equals(x)

  • 可傳遞:若x.Equals(y)==true且y.Equals(z)==true,則x.Equals(z)==true

  • 一致性:x,y的值不發(fā)生變化,則x.Equals(y)的結(jié)果也不變

  • x.Equals(null) 返回false

  • x.Equals(y)返回true,如果x,y都是NaN的話

  • Equals方法不要拋出異常

有關(guān)String及StringBuilder對(duì)于Equals的實(shí)現(xiàn),或更多重寫Equals方法的細(xì)節(jié)可參考:Object.Equals。

==

有關(guān)==操作符,請(qǐng)參閱:Equality operators

Object.GetHashCode()

Object

默認(rèn)實(shí)現(xiàn)根據(jù)對(duì)象在內(nèi)存中的地址,即引用來計(jì)算哈希碼。換言之, ReferenceEquals方法返回true的兩個(gè)對(duì)象的哈希碼也相同。

ValueType

默認(rèn)實(shí)現(xiàn)通過反射基于字段的值來計(jì)算哈希碼。換言之,兩個(gè)值類型實(shí)例的所有字段值都相等,那么它們的哈希碼也相等。

重寫GetHashCode

重寫Equals方法后,通常也需要重寫GetHashCode方法,反之亦然。因?yàn)樵诠=Y(jié)構(gòu)(如字典)中,存取數(shù)據(jù)時(shí)需要用到鍵的哈希碼。如下圖是Github上Dictionary根據(jù)key獲取value的一段源碼,代碼中先比較了hashCode是否相等,然后再調(diào)用Enquals方法對(duì)key做相等性判斷:

重寫GetHashCode方法應(yīng)注意以下事項(xiàng):

  • 算法至少使用對(duì)象的一個(gè)實(shí)例字段,不要使用靜態(tài)字段

    保證哈希碼和實(shí)例對(duì)象相關(guān)

  • 算法使用的實(shí)例字段應(yīng)盡可能保持不變

    盡可能保證在對(duì)象生命周期中哈希碼保持不變

  • 兩個(gè)相等的對(duì)象(使用Equals方法判斷)應(yīng)返回相同的哈希碼,但反過來則不成立

  • 如果影響到Euqals方法的字段值未發(fā)生變化,GetHashCode返回的哈希碼也不應(yīng)變化

  • 生成的哈希值隨機(jī)均勻分布

  • 良好的性能

通常,對(duì)于可變引用對(duì)象,應(yīng)重寫GetHashCode方法,除非能保證以下兩點(diǎn):

  • 用于計(jì)算哈希碼的字段不可變

  • 對(duì)象存儲(chǔ)在依賴哈希碼的集合中,對(duì)象的哈希碼不變

如果要重寫可變對(duì)象的GetHashCode方法,盡可能在文檔中指出:如果對(duì)象要用作哈希結(jié)構(gòu)的key,盡可能不要修改該對(duì)象,否則,在讀取數(shù)據(jù)時(shí)可能會(huì)引發(fā)KeyNotFoundException。

?? 不同的.NET版本、不同的平臺(tái)(32位、64位系統(tǒng))對(duì)于GetHashCode的默認(rèn)實(shí)現(xiàn)可能會(huì)有差異。因此,若使用默認(rèn)的GetHashCode方法,須注意以下兩點(diǎn):

  • 不能僅通過哈希碼來判斷對(duì)象是否相等
  • 因?yàn)閷?duì)象可以在應(yīng)用程序域、進(jìn)程、平臺(tái)間傳遞,不要持久化或在生成哈希碼的應(yīng)用程序域之外使用哈希碼

下面是微軟官方文檔中對(duì)于GetHashCode的一段總結(jié),人太懶水平又差,就不翻譯了,抄錄在這里備以后查詢:

A hash function must have the following properties:

  • If two objects compare as equal, the GetHashCode() method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode() methods for the two objects do not have to return different values.
  • The GetHashCode() method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's System.Object.Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.
  • For the best performance, a hash function should generate an even distribution for all input, including input that is heavily clustered. An implication is that small modifications to object state should result in large modifications to the resulting hash code for best hash table performance. - Hash functions should be inexpensive to compute.
  • The GetHashCode() method should not throw exceptions.

For example, the implementation of the GetHashCode() method provided by the String class returns identical hash codes for identical string values. Therefore, two String objects return the same hash code if they represent the same string value. Also, the method uses all the characters in the string to generate reasonably randomly distributed output, even when the input is clustered in certain ranges (for example, many users might have strings that contain only the lower 128 ASCII characters, even though a string can contain any of the 65,535 Unicode characters).

Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. In a hash table with keys that provide a good implementation of a hash function, searching for an element takes constant time (for example, an O(1) operation).

In a hash table with a poor implementation of a hash function, the performance of a search depends on the number of items in the hash table (for example, an O(n) operation, where n is the number of items in the hash table).

A malicious user can input data that increases the number of collisions, which can significantly degrade the performance of applications that depend on hash tables, under the following conditions:

  • When hash functions produce frequent collisions.
  • When a large proportion of objects in a hash table produce hash codes that are equal or approximately equal to one another.
  • When users input the data from which the hash code is computed.

Derived classes that override GetHashCode() must also override Equals(Object) to guarantee that two objects considered equal have the same hash code; otherwise, the Hashtable type might not work correctly.

系統(tǒng)優(yōu)化思路

  • 性能滿足當(dāng)前需求就好,莫要追求極致性能

  • 性能與代碼可讀性之間要有一個(gè)權(quán)衡,喪失了可讀性也就增加了維護(hù)成本

  • 減少I/0(磁盤、網(wǎng)絡(luò))

    優(yōu)化數(shù)據(jù)庫查詢,只查詢必要的字段,即可減少磁盤I/O又能節(jié)省帶寬資源;

    合理使用緩存;

    適當(dāng)拆分一次返回大量數(shù)據(jù)的請(qǐng)求為多個(gè)請(qǐng)求(如,分頁查詢)。適當(dāng)合并多次結(jié)果集較小的查詢(如,Redis中的Pipline);

  • 避免計(jì)算機(jī)做無用功

    使用合理的數(shù)據(jù)結(jié)構(gòu);

    盡可能減少循環(huán)次數(shù);

  • 充分利用CPU(多線程、并行運(yùn)算)

    將一次運(yùn)算拆分為多個(gè)獨(dú)立的運(yùn)算單元,但要注意,不是所有的運(yùn)算任務(wù)都能拆分。同時(shí),也要在單線程的簡(jiǎn)單安全運(yùn)行較慢和多線程的復(fù)雜較為高效之間做適當(dāng)取舍。

  • 異步替換同步,避免線程阻塞

  • 適當(dāng)重構(gòu)代碼,盡可能降低代碼的混亂程度以保持系統(tǒng)的簡(jiǎn)潔

推薦閱讀

RuntimeHelpers.GetHashCode(Object) Method

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容