6 基于BGP EVPN實現(xiàn)Cisco VxLAN控制層面之MAC-IP學習和主機路由通告

一、說明

  • 本篇主要描述BGP EVPN VxLAN VNI間(同租戶,不同VNI)互通的控制層面操作,同時也描述了BGP EVPN VxLAN相同VNI內主機互通的過程;
  • 本篇也描述了數(shù)據(jù)層面的轉發(fā)過程;
  • 本篇網絡拓撲和配置信息全部基于前兩篇“4 基于BGP EVPN實現(xiàn)Cisco VxLAN實驗 & 分布式任播網關”和“5 基于BGP EVPN實現(xiàn)Cisco VxLAN控制層面之MAC學習”;
  • 本篇新加了ARP抑制配置,另外與之前不同,本篇VRF名稱由"Tenant-A"變更為"ta"。

二、拓撲

image.png

三、控制層面操作

3.1 MAC-IP學習過程

  • 本節(jié)詳細介紹了本端VTEP交換機如何從終端主機生成的免費ARP消息中了解其本地連接的主機的IP地址,以及Host Mobility Manager(HMM-主機移動管理器)組件如何將信息裝載進相關VNI的L2RIB中(保留MAC-IP地址信息的L2RIB數(shù)據(jù)庫也被稱為IP VRF);
  • 本節(jié)展示了如何使用BGP EVPN Route Type 2(MAC/MAC-IP通告路由)將路由從L2RIB導出到BGP Loc-RIB,再通過BGP Adj-RIB-Out通告給遠端VTEP交換機;
  • 本節(jié)展示了路由信息如何最終到達遠端VTEP的L2RIB中。

3.1.1 本端VTEP的ARP學習

  • PC1啟動后,它會發(fā)送Gratuitous ARP(GARP-免費ARP)來驗證其IP地址的唯一性,VTEP交換機Leaf-1從接口E1/3接收到GARP消息,并將來自PC1 MAC的MAC-IP地址綁定信息和來自GARP有效載荷的PC1 IP字段裝載進ARP表中;
  • 下方展示了VRF ta的ARP表。在NX-OS中,本地學習的ARP條目的默認老化時間為1500秒,比MAC地址老化計時器短300秒。當ARP老化計時器超時后,交換機會通過向主機發(fā)送ARP請求來檢查主機的存在。如果主機響應ARP請求,則交換機將重置老化計時器。如果主機未響應ARP請求,則該條目將從ARP表中刪除,但在發(fā)送刪除消息之前,會在BGP EVPN表中額外保留1800秒(MAC老化計時器)。MAC地址老化定時器應大于ARP老化定時器,這是因為ARP刷新進程還將更新MAC表,并且可以避免不必要的泛洪。
Leaf-1# sh ip arp vrf ta

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context ta
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
172.16.1.1      00:02:00  0050.7966.6806  Vlan10 

3.1.2 本端VTEP的MAC-IP

  • 主機移動管理器組件(HMM)將MAC-IP信息作為本地路由進行學習;
  • HMM將信息裝載進本地主機數(shù)據(jù)庫中,并將MAC-IP信息轉發(fā)到L2RIB;
  • 本地主機數(shù)據(jù)庫包含有關IP地址(/32)、MAC地址、SVI和本地接口的信息。L2RIB中具有相同的信息(除了沒有SVI外);
  • 下方展示了Leaf-1上部分MAC-IP的學習過程;
Leaf-1# show system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
[10/12/20 14:25:31.870 CST 1 29704] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12
[10/12/20 14:25:31.870 CST 2 29704] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6806, 172.16.1.1), l2 vni 0, l3 vni 13960, 
[10/12/20 14:25:31.870 CST 3 29704] Rcvd MAC-IP ROUTE msg: flags , admin_dist 7, seq 0, soo 0, peerid 0, 
[10/12/20 14:25:31.870 CST 4 29704] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 0, pc-ifindex 0
[10/12/20 14:25:31.871 CST 5 29704] (10,0050.7966.6806,172.16.1.1):MAC-IP entry created
[10/12/20 14:25:31.871 CST 6 29704] (10,0050.7966.6806,172.16.1.1,12):MAC-IP route created with flags 0, l3 vni 13960, seq 0
[10/12/20 14:25:31.871 CST 7 29704] (10,0050.7966.6806,172.16.1.1,12): admin dist 7, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:31.871 CST 8 29704] (10,0050.7966.6806,172.16.1.1,12): esi (F), pc-ifindex 0
[10/12/20 14:25:31.875 CST 9 29704] (10,0050.7966.6806,172.16.1.1,12):Encoding MAC-IP best route (ADD, client id 5), esi: (F)
  • 下方展示了Leaf-上VRF ta的本地主機數(shù)據(jù)庫中與PC1的MAC-IP相關綁定信息;
Leaf-1# show fabric forwarding ip local-host-db vrf ta
HMM host IPv4 routing table information for VRF ta
Status: *-valid, x-deleted, D-Duplicate, DF-Duplicate and frozen, 
        c-cleaned in 00:01:49

    Host                 MAC Address        SVI        Flags      Physical Interface
*   172.16.1.1/32        0050.7966.6806     Vlan10     0x420201   Ethernet1/3
  • 下方表明了有關L2RIB下IP VRF中PC1的MAC-IP的信息是由HMM組件產生的
Leaf-1# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated 
Topology    Mac Address    Prod   Flags         Seq No     Host IP         Next-Hops      
----------- -------------- ------ ---------- --------------- ---------------
10          0050.7966.6806 HMM    --            0          172.16.1.1     Local          
            Sent To: BGP
            L3-Info: 13960

3.1.3 本端VTEP的BGP路由導出

  • VTEP交換機Leaf-1將來自L2RIB的MAC-IP路由裝載進BGP Loc-RIB中;
  • MAC-IP信息被作為單獨的BGP EVPN Route Type 2更新進行通告(使用MAC-only和MAC IP的專用NLRI更新),MAC-only和MAC-IP路由更新攜帶的NLRI信息的區(qū)別在于:MAC-IP通告除了攜帶主機的MAC地址外,還攜帶了主機的IP地址、掩碼信息以及MPLS標簽棧2的信息,該信息定義了VRF ta中使用的L3VNI;
  • 另外MAC-IP更新消息中還有兩個擴展團體屬性,包含RT 65234:13960和路由器MAC 5e00.0000.0007;
  • 下方展示了VTEP交換機Leaf-1如何接收MAC-IP路由信息并將其安裝到RIB和BGP Loc-RIB中的內部過程,掩碼長度包括RD(8×8bit)+MAC地址(6×8bit)+IP地址(4×8bit)=18個8bit即144bit;
Leaf-1# show bgp internal event-history events | in 6806
BRIB:
2020 Oct 12 17:36:36.317231: (default) BRIB: [L2VPN EVPN] Installing prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/144 (local) via 3.3.3.3 label 10010 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
RIB:
2020 Oct 12 17:36:36.319783: (default) RIB: [L2VPN EVPN] add prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1] (flags 0x1) : OK
, total 1
EVENT:
2020 Oct 12 17:36:36.316899: EVT: Received from L2RIB MAC-IP route: Add ESI 0000.0000.0000.0000.0000 topo 10010 mac 0050.7966.6806 ip 172.16.1.1 L3 VN
I 13960 flags 00000000 soo 0 seq 0, reorig :0
  • 下方展示有關PC1的MAC-IP NLRI的BGP Loc-RIB;
Leaf-1# sh bgp l2vpn evpn 172.16.1.1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 969
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007

  Path-id 1 advertised to peers:
    1.1.1.1            2.2.2.2  
  • 上方輸出中的前綴信息解釋可參考下表;
前綴信息 說明 備注
2 BGP EVPN Route-Type 2 MAC/MAC-IP路由通告
0 Ethernet Segment Identifier (ESI) 全部置零=單宿主站點
0 Ethernet Tag Id EVPN路由必須使用0
48 MAC地址長度 /
0050.7966.6806 MAC地址 /
32 IP地址長度 /
172.16.1.1 IP地址 /
/272 MAC-IP VRF NLRI的長度(以bit為單位) RD(8×8bit) + MAC address(6×8bit) + L2VNI Id(3×8bit) + L3VNI Id(3×8bit) + IP address(4×8bit) + ESI(10×8bit) = 34×8bit即272bits
  • 上方輸出中的L2VNI信息顯示在“Received label”字段中,另外還有三個BGP擴展團體屬性;
BGP擴展團體 說明 備注
RT:65234:10010 用于導出/導入策略(L2VNI) VNI 10010對應VLAN 10
RT:65234:13960 用于導出/導入策略(L3VNI) VNI 13960對應VLAN 3960
ENCAP:8 定義數(shù)據(jù)層面的封裝類型為VxLAN /
Router MAC:5000.0003.0007 用于路由數(shù)據(jù)包的內層MAC頭源地址 這是必要的,因為VxLAN為MAC in UDP封裝機制,并且L3邊界上的數(shù)據(jù)有效負載不攜帶源主機的MAC地址,所以使用RMAC。

3.1.4 遠端VTEP的BGP路由導入

  • VTEP交換機Leaf-2接收BGP EVPN MAC路由通告并將其裝載進BGP Adj-RIB-In數(shù)據(jù)庫中,并且無需進行任何修改;
  • Leaf-2從BGP Adj-RIB-In數(shù)據(jù)庫中將路由導入到BGP Loc-RIB,并通過最佳路徑選擇進程將其裝載進L2RIB;
  • 當遠端VTEP交換機Leaf-2將路由從BGP Adj-RIB裝載進BGP Loc-RIB時,它將根據(jù)其BGP RID:VLAN ID組合將RD更改為4.4.4.4:32777,此過程與MAC-only路由導入相同,并且基于相同的RT 65234:10010;
  • 下方展示了內部導入過程,Leaf-2將接收到的MAC-IP路由裝載進RD 3.3.3.3:32777的BGP Adj-RIB-In中,再將此路由導入到RD 4.4.4.4:32777的BGP Adj-RIB-In中,并裝載進BGP Loc-RIB中,最后將其導入L2RIB中。請注意,下方輸出還包含L3RIB的裝載過程;
Leaf-2# show bgp internal event-history events | i 6806
2020 Oct 12 21:52:48.495013: (default) RIB: [L2VPN EVPN]: Send to L2RIB 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:
[0]:[0.0.0.0]/112
2020 Oct 12 21:52:48.494399: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 0 next hops, suppress 0
2020 Oct 12 21:52:48.494371: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x210, in_rib: yes
2020 Oct 12 21:52:48.493006: (default) BRIB: [L2VPN EVPN] Marking imported path for dest 4.4.4.4:32777:[2]:[0]:[0]:[48]:
[0050.7966.6806]:[0]:[0.0.0.0]/112 as deleted, path ibgp
2020 Oct 12 21:52:48.492893: EVT: [L2VPN EVPN] Deleting imported path [2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0]
2020 Oct 12 21:52:48.492506: (default) RIB: [L2VPN EVPN] Add/delete 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, evi_ctx invalid, in_rib: no
2020 Oct 12 21:52:48.491786: (default) BRIB: [L2VPN EVPN] Marking path for dest 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.796
6.6806]:[0]:[0.0.0.0]/112 from peer 2.2.2.2 as deleted, pflags = 0x40000011, reeval=0
2020 Oct 12 21:52:48.474282: (default) RIB: [L2VPN EVPN] Suppressing 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]
:[0.0.0.0]/112 download to L2RIB
2020 Oct 12 21:52:48.474255: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 1 next hops, suppress 1
2020 Oct 12 21:52:48.474189: (default) RIB: [L2VPN EVPN] Adding 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0
.0.0]/112 via 3.3.3.3 to NH list (flags2: 0x0)
2020 Oct 12 21:52:48.473909: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x210, in_rib: yes
2020 Oct 12 21:52:48.473593: (default) IMP: [L2VPN EVPN] Import of 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[
0.0.0.0]/112 (EVI: 0) to RD 4.4.4.4:65534 (0) inhibited, no Type2 for EAD-ES import
2020 Oct 12 21:52:48.472917: (default) IMP: [L2VPN EVPN] Importing prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806
]:[0]:[0.0.0.0]/112 to <default> RD 4.4.4.4:32777
2020 Oct 12 21:52:48.466435: (default) RIB: [L2VPN EVPN] Add/delete 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, evi_ctx invalid, in_rib: no
2020 Oct 12 21:52:48.465106: (default) BRIB: [L2VPN EVPN] Marking path for dest 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.796
6.6806]:[0]:[0.0.0.0]/112 from peer 1.1.1.1 as deleted, pflags = 0x40000011, reeval=0
2020 Oct 12 21:47:48.453800: (default) RIB: [L2VPN EVPN]: Send to L2RIB 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:
[0]:[0.0.0.0]/112
2020 Oct 12 21:47:48.451605: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 1 next hops, suppress 0
2020 Oct 12 21:47:48.451584: (default) RIB: [L2VPN EVPN] Adding 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0
.0.0]/112 via 3.3.3.3 to NH list (flags2: 0x0)
2020 Oct 12 21:47:48.451553: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, in_rib: no
  • 下方展示了Leaf-2上BGP-RIB(BRIB)的部分輸出(Adj-RIB-In和Loc-RIB)。輸出的上半部分描述了從Spine-1接收到的原始、未修改的NLRI,該NLRI裝載在Adj-RIB-In中。輸出的中間部分顯示了已裝載進BGP Loc-RIB中并且修改了RD值的相同NLRI,此NLRI基于RT 65234:10010實現(xiàn)路由的正確導入。輸出的下半部分顯示了與中間部分相同的NLRI(此NLRI與RD 4.4.4.4:3一同裝載),它用于VNI間(L3VNI)的流量轉發(fā),基于在VRF Context中的配置自動生成的RT 65234:13960導入到相關的L3VNI Loc-RIB。
Leaf-2# show bgp l2vpn evpn 172.16.1.1 vrf ta
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 801
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW

  Path type: internal, path is valid, not best reason: Neighbor Address
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 2.2.2.2 (2.2.2.2)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 2.2.2.2 

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 824
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:3    (L3VNI 13960)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 799
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

3.1.5 遠端VTEP的IP VRF

  • 遠端VTEP Leaf-2會驗證從NLRI找到的下一跳IP地址的可達性,HMM組件將MAC-IP路由裝載進L2RIB中。這時本地拓撲ID為10(基于VLAN 10),路由信息的來源是BGP,下一跳接口信息指向VTEP交換機Leaf-1的NVE1接口綁定的源IP地址;
  • 在此階段,兩個VTEP交換機在其L2RIB以及BGP表中都具有了PC1的MAC-IP信息,但是只有本端VTEP交換機Leaf-1才將MAC-IP綁定信息裝載進ARP表中;
  • 下方展示了Leaf-2上的部分MAC-IP學習過程;
Leaf-2# sh system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
[10/12/20 14:25:33.711 CST 1 29679] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 5
[10/12/20 14:25:33.711 CST 2 29679] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6806, 172.16.1.1), l2 vni 0, l3 vni 0, 
[10/12/20 14:25:33.711 CST 3 29679] Rcvd MAC-IP ROUTE msg: flags , admin_dist 0, seq 0, soo 0, peerid 0, 
[10/12/20 14:25:33.711 CST 4 29679] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 1, pc-ifindex 0
[10/12/20 14:25:33.711 CST 5 29679] NH: 3.3.3.3
[10/12/20 14:25:33.713 CST 6 29679] (10,0050.7966.6806,172.16.1.1):MAC-IP entry created
[10/12/20 14:25:33.713 CST 7 29679] (10,0050.7966.6806,172.16.1.1,5):MAC-IP route created with flags 0, l3 vni 0, seq 0
[10/12/20 14:25:33.713 CST 8 29679] (10,0050.7966.6806,172.16.1.1,5): admin dist 20, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:33.714 CST 9 29679] (10,0050.7966.6806,172.16.1.1,5): esi (F), pc-ifindex 0
[10/12/20 14:25:45.795 CST a 29679] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12
[10/12/20 14:25:45.795 CST b 29679] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6808, 172.16.1.3), l2 vni 0, l3 vni 13960, 
[10/12/20 14:25:45.795 CST c 29679] Rcvd MAC-IP ROUTE msg: flags , admin_dist 7, seq 0, soo 0, peerid 0, 
[10/12/20 14:25:45.795 CST d 29679] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 0, pc-ifindex 0
[10/12/20 14:25:45.795 CST e 29679] (10,0050.7966.6808,172.16.1.3):MAC-IP entry created
[10/12/20 14:25:45.795 CST f 29679] (10,0050.7966.6808,172.16.1.3,12):MAC-IP route created with flags 0, l3 vni 13960, s
eq 0
[10/12/20 14:25:45.795 CST 10 29679] (10,0050.7966.6808,172.16.1.3,12): admin dist 7, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:45.795 CST 11 29679] (10,0050.7966.6808,172.16.1.3,12): esi (F), pc-ifindex 0
[10/12/20 14:25:45.800 CST 12 29679] (10,0050.7966.6808,172.16.1.3,12):Encoding MAC-IP best route (ADD, client id 5), es
  • 下方表明了L2RIB中的MAC-IP信息是由BGP產生的;
Leaf-2# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated 
Topology    Mac Address    Prod   Flags         Seq No     Host IP         Next-Hops      
----------- -------------- ------ ---------- --------------- ---------------
10          0050.7966.6806 BGP    --            0          172.16.1.1     3.3.3.3        
            Sent To: ARP
  • 經過以上階段,兩個VTEP交換機都具有了PC1的MAC-IP信息。

3.2 ARP抑制

  • 章節(jié)3.1說明了如何在BGP EVPN VxLAN Fabric中傳播MAC-IP地址信息。本節(jié)介紹了VTEP交換機的ARP抑制機制如何利用MAC-IP綁定信息來減少VxLAN Fabric中不必要的2層BUM(廣播、未知單播、組播)流量。

3.2.1 配置Leaf交換機:啟用ARP抑制

Leaf-1配置:

interface nve1
  member vni 10010
    suppress-arp
  member vni 10020
    suppress-arp

Leaf-2配置:

interface nve1
  member vni 10010
    suppress-arp
  member vni 10020
    suppress-arp

Leaf-3配置:

interface nve1
  member vni 10010
    suppress-arp
  member vni 10020
    suppress-arp

3.2.2 查看ARP抑制緩存

  • 從啟動PC1的階段開始,當PC1開機后,PC1將GARP/ARP消息發(fā)送到網絡,Leaf-1將MAC-IP綁定信息安裝載進VRF ta的ARP表中,下方展示了Leaf-1的ARP表;
Leaf-1# show  ip arp vrf ta
Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context ta
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
172.16.1.1      00:01:03  0050.7966.6806  Vlan10  
  • 當在本端VTEP交換機上啟用基于VNI的ARP抑制時,MAC-IP地址綁定信息也會從ARP表裝載進本地ARP抑制緩存中,下方展示了Leaf-1的ARP抑制緩存表;
Leaf-1# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote
 Vtep Addrs

172.16.1.1      00:03:55 0050.7966.6806   10 Ethernet1/3         L
  • 在遠端VTEP交換機(Leaf-2)上啟用ARP抑制后,ARP抑制緩存信息將從L2RIB中獲取。下方展示了Leaf-2上關于PC1的ARP抑制緩存表;
Leaf-2# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote Vtep Addrs

172.16.1.1      05:01:11 0050.7966.6806   10 (null)              R        3.3.3.3 

3.2.3 抑制場景對比:

  1. 無抑制:當收到ARP請求報文時,本地所有ARP請求都發(fā)往VNI所關聯(lián)的組播組,并且所有加入該組播組的VTEP交換機都會接收ARP請求消息,并將其轉發(fā)到數(shù)據(jù)包VxLAN包頭中VNI ID所定義的廣播域的端口;
  2. ARP抑制:當收到ARP請求報文時,本地VTEP交換機檢查請求的MAC-IP綁定信息是否存儲在本地ARP抑制緩存中。如果檢查通過,則本地交換機直接將ARP回復發(fā)送給請求者,而不會將ARP請求泛洪到網絡中。如果ARP抑制緩存檢查未命中,則將ARP請求泛洪到網絡中(建議在Intra-VNI訪問可達性測試之通過后再啟用ARP抑制);
  3. ARP和未知單播抑制:在命中ARP抑制檢查的情況下,其工作原理與ARP抑制相同。但是如果未命中,則會丟棄ARP請求,所以此特性要求VxLAN Fabric中不能有靜默主機。

3.3 主機路由通告:VNI間路由(L3VNI)

上篇和本篇前半部分介紹了終端主機的MAC和MAC-IP信息如何在VxLAN Fabirc中傳播以及如何利用這些信息實現(xiàn)VNI內交換和MAC地址解析,也介紹了利用ARP抑制機制減少BUM流量。本節(jié)將說明如何將主機路由導入L3RIB,以及如何利用此信息實現(xiàn)VNI間路由。

3.3.1 本端VTEP RIB中的主機路由

  • 章節(jié)3.1介紹了本地VTEP交換機如何將MAC-IP地址綁定信息裝載進ARP表中,以及HMM(主機移動管理器)組件如何將信息裝載進L2RIB中。除了此過程之外,HMM組件還會將ARP表中的MAC-IP信息裝載進L3RIB中;
  • 下方展示了本地VTEP交換機Leaf-1中的VRF ta的RIB。該路由是從VLAN 10中獲悉的,并由HMM裝載進RIB中;
Leaf-1# show  ip route  172.16.1.1 vrf ta
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

172.16.1.1/32, ubest/mbest: 1/0, attached
    *via 172.16.1.1, Vlan10, [190/0], 1d05h, hmm

3.3.2 本端VTEP上BGP進程中的主機路由

  • 章節(jié)3.1還介紹了如何將MAC-IP信息從L2RIB發(fā)送到Loc-RIB,再從Loc-RIB發(fā)送到Adj-RIB-Out,然后將其通告為BGP EVPN Route type 2,發(fā)送至到遠端VTEP交換機;
  • 下方展示了與PC1的IP地址相關的BGP Loc-RIB;
Leaf-1# show bgp l2vpn evpn 172.16.1.1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 969
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007

  Path-id 1 advertised to peers:
    1.1.1.1            2.2.2.2  

3.3.3 遠端VTEP上BGP進程中的主機路由

  • 章節(jié)3.1沒有說明MAC-IP路由信息如何最終進入遠端VTEP交換機的L3RIB;
  • 有關PC1 MAC-IP NLRI的BGP EVPN Route Type 2更新還包含了RT 65234:13960(L3VNI);
  • 接收到的NLRI信息通過BGP的Import Policy Engine(基于RT 65234:13960導入)發(fā)送,最終將L3VNI條目發(fā)送到Loc-RIB;
  • 在Input Policy處理期間,原始RD 3.3.3.3:32777更改為VRF ta特定的RD 4.4.4.4:3:3(3 = VRF ta的VRF ID),RD用于在不同的VRF中的區(qū)分重疊的IP地址;
  • 下方展示了Leaf-2的BGP表,可以看到上方描述的所有詳細信息(其中包含了原始的信息、修改RD后的信息、L3VNI信息等);
Leaf-2# show bgp l2vpn evpn 172.16.1.1 
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 801
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW

  Path type: internal, path is valid, not best reason: Neighbor Address
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 2.2.2.2 (2.2.2.2)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 2.2.2.2 

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 824
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:
[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:3    (L3VNI 13960)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 799
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:
[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

  • 下方展示了Leaf-2上的VRF信息,其中包含了VRF ID;
Leaf-2# show vrf
VRF-Name                           VRF-ID State   Reason                        
default                                 1 Up      --                            
management                              2 Up      --                            
ta                                      3 Up      --   

3.3.4 將主機路由裝載進遠端VTEP的RIB

  • 該路由已從BGP Loc-RIB裝載進L3 RIB。RIB條目包括有關下一跳地址和隧道ID、封裝類型(VxLAN)、網段ID和路由來源(BGP)信息;
  • 在此階段,本端VTEP交換機Leaf-1和遠端VTEP交換機Leaf-2都能夠將來自不同L2VNI主機的流量(VNI間流量)路由到PC1(屬于L2VNI 10010)。
  • 下方展示了Leaf-2上VRF ta RIB中有關172.16.1.1/32的路由條目;
Leaf-2# show ip route 172.16.1.1 vrf ta 
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

172.16.1.1/32, ubest/mbest: 1/0
    *via 3.3.3.3%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x3030303 encap: VXLAN
  • 下方展示了BGP遞歸數(shù)據(jù)庫,其中3.3.3.3用于目標172.16.1.1的下一跳;
Leaf-2# show nve internal bgp rnh database vni 13960
--------------------------------------------
Total peer-vni msgs recvd from bgp: 23
Peer add requests: 14
Peer update requests: 0
Peer delete requests: 9
Peer add/update requests: 14
Peer add ignored (peer exists): 0
Peer update ignored (invalid opc): 0
Peer delete ignored (invalid opc): 0
Peer add/update ignored (malloc error): 0
Peer add/update ignored (vni not cp): 0
Peer delete ignored (vni not cp): 0
--------------------------------------------
Showing BGP RNH Database, size : 5 vni 13960 

Flag codes: 0 - ISSU Done/ISSU N/A        1 - ADD_ISSU_PENDING         
            2 - DEL_ISSU_PENDING          3 - UPD_ISSU_PENDING
        

VNI    Peer-IP            Peer-MAC            Tunnel-ID  Encap     (A/S)  FlagsP
T   
13960  3.3.3.3            5000.0003.0007      0x3030303  vxlan     (1/0)    0  F
AB
13960  5.5.5.5            5000.0005.0007      0x5050505  vxlan     (1/0)    0  F
AB
  • 下方展示了Leaf-2上關于VRF ta的完整路由表;
Leaf-2# show  ip route vrf ta
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

172.16.1.0/24, ubest/mbest: 1/0, attached
    *via 172.16.1.254, Vlan10, [0/0], 1d06h, direct
172.16.1.1/32, ubest/mbest: 1/0
    *via 3.3.3.3%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x3030303 encap: VXLAN
 
172.16.1.3/32, ubest/mbest: 1/0, attached
    *via 172.16.1.3, Vlan10, [190/0], 1d06h, hmm
172.16.1.5/32, ubest/mbest: 1/0
    *via 5.5.5.5%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x5050505 encap: VXLAN
 
172.16.1.254/32, ubest/mbest: 1/0, attached
    *via 172.16.1.254, Vlan10, [0/0], 1d06h, local
172.16.2.0/24, ubest/mbest: 1/0, attached
    *via 172.16.2.254, Vlan20, [0/0], 1d06h, direct
172.16.2.2/32, ubest/mbest: 1/0, attached
    *via 172.16.2.2, Vlan20, [190/0], 1d06h, hmm
172.16.2.4/32, ubest/mbest: 1/0
    *via 5.5.5.5%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x5050505 encap: VXLAN
 
172.16.2.254/32, ubest/mbest: 1/0, attached
    *via 172.16.2.254, Vlan20, [0/0], 1d06h, local

四、數(shù)據(jù)層面操作

4.1 ARP抑制過程

  • 當PC1開機后,即使我們在VTEP Leaf-1的NVE1接口下啟用了ARP抑制,從主機PC1接收到的GARP也會被VxLAN封裝并泛洪到組播組239.0.0.1;
  • 這是因為VTEP Leaf-1在ARP表和ARP抑制緩存中都沒有有關主機PC1的IP/MAC地址信息;
  • 從下方VTEP Leaf-1的Debug輸出中也可以看到上方關于ARP的描述過程,Leaf-從主機PC1接收GARP,它沒有172.16.1.1的緩存條目,因此必須泛洪該幀,然后Leaf-將更新其ARP抑制緩存和L2RIB;
Leaf-1# terminal monitor
Leaf-1# debug ip arp cache
Leaf-1# debug ip arp event
Leaf-1# debug ip arp suppression-event
Leaf-1# 
Leaf-1# 2020 Oct 13 20:47:51.940670 arp: arp_process_receive_packet_msg: VINCI: Anycast Proxy mode  
2020 Oct 13 20:47:51.940988 arp: arp_process_packet_in_l3_mode: GARP:  Vlan: 10, Dest-ip: 172.16.1.1, Mac-Addr: 0050.7966.6806, ifindex: 0x0   
2020 Oct 13 20:47:51.941107 arp: arp_cache_resolve_l3_addr: arp_cache_resolve_l3_addr 
2020 Oct 13 20:47:51.941173 arp: arp_cache_resolve_l3_addr: mac: 0050.7966.6806, phy-ifindex:0x1a000400, is_local:TRUE 
2020 Oct 13 20:47:51.941283 arp: arp_process_receive_packet_msg: GARP count on the interface Vlan10 is 1 
2020 Oct 13 20:47:51.941696 arp: arp_process_receive_packet_msg: NO GARP storm on interface Vlan10 
2020 Oct 13 20:47:51.941771 arp: arp_process_receive_packet_msg: Existing entry found for source 172.16.1.1 on Vlan10 
2020 Oct 13 20:47:51.941839 arp: arp_add_adj: arp_add_adj: Updating MAC on interface Vlan10, phy-interface Ethernet1/3, flags:0x1 
2020 Oct 13 20:47:51.941927 arp: arp_adj_update_state_get_action_on_add: Successful action on add Previous State:0x10, Current State:0x10 Received event:Data Plane Add, entry: 172.16.1.1, 0050.7966.6806, Vlan10, action to be taken send_to_am:FALSE, arp_aging:TRUE 
2020 Oct 13 20:47:51.942079 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Create request for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 268, vlan_mode: 2, ifindex: 0x901000a, phyifindex 0x1a000400 
2020 Oct 13 20:47:51.942191 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Post L2FM lookup MAC binding : for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 268, vlan_mode: 2, ifindex: 0x901000a, phyifindex 0x1a000400 
2020 Oct 13 20:47:51.942251 arp: arp_cache_create_cache_node: create node for uuid:268, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x10 is_timer: 0 
2020 Oct 13 20:47:51.942396 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Entry with same ip/vlan exists 
2020 Oct 13 20:47:51.942472 arp: arp_add_adj: Entry added for 172.16.1.1, 0050.7966.6806, state 2 on interface Vlan10, physical interface Ethernet1/3, ismct 0. flags:0x10, Rearp (interval: 0, count: 0), TTL: 1500 seconds update_shm:TRUE 
2020 Oct 13 20:47:51.942541 arp: arp_add_adj: Adj info: iod: 139, phy-iod: 9, ip: 172.16.1.1, mac: 0050.7966.6806, type: 0, sync: FALSE, suppress-mode: L2/L3 ARP Suppression flags:0x10 
2020 Oct 13 20:47:51.942595 arp: arp_process_receive_packet_msg: VINCI: enhanced_proxy: 0, traditional_proxy: 1, adj_added: 0 
2020 Oct 13 20:47:51.943681 arp: arp_cache_create_cache_node: create node for uuid:268, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x10 is_timer: 0 
2020 Oct 13 20:47:51.944623 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Entry with same ip/vlan exists 
2020 Oct 13 20:47:51.944702 arp: arp_add_adj: Entry added for 172.16.1.1, 0050.7966.6806, state 2 on interface Vlan10, physical interface Ethernet1/3, ismct 0. flags:0x10, Rearp (interval: 0, count: 0), TTL: 1500 seconds update_shm:TRUE 
2020 Oct 13 20:47:51.945113 arp: arp_add_adj: Adj info: iod: 139, phy-iod: 9, ip: 172.16.1.1, mac: 0050.7966.6806, type: 0, sync: FALSE, suppress-mode: L2/L3 ARP Suppression flags:0x10 
2020 Oct 13 20:47:51.945239 arp: arp_process_receive_packet_msg: Received ARP request on Vlan10 (Ethernet1/3) 
2020 Oct 13 20:47:51.945375 arp: arp_process_receive_packet_msg: Gratuitous ARP request received on Vlan10 (Ethernet1/3).Proxy or Anycast Gateway enabled on Vlan10.Dropping the packet 
  • 下方展示了Leaf-2上的Debug ARP中關于PC1的輸出;
Leaf-2# terminal monitor
Leaf-2# debug ip arp cache
Leaf-2# debug ip arp event
Leaf-2# debug ip arp suppression-event
Leaf-2# 
2020 Oct 13 20:55:25.960139 arp: arp_l2rib_msg_cb: arp_l2rib_msg_cb: (Type: Route) Len: 184 Seq: 0, del: 0 (Prod: 5) , peer-id = 0 
2020 Oct 13 20:55:25.960255 arp: arp_l2rib_msg_cb: MAC address: 0050.7966.6806 Remote Host IP: 172.16.1.1 
2020 Oct 13 20:55:25.960564 arp: arp_l2rib_msg_cb: Host IP 172.16.1.1, Remote vtep addr count = 1 
2020 Oct 13 20:55:25.960647 arp: arp_l2rib_msg_cb: RNHs : 3.3.3.3 
2020 Oct 13 20:55:25.960752 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Create request for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 1290, vlan_mode: 2, ifindex: 0x0, phyifindex 0x0 
2020 Oct 13 20:55:25.960893 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Failed to get phy_iod for ifindex 0x0 : Reason no such pss key 
2020 Oct 13 20:55:25.960964 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Post L2FM lookup MAC binding : for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 1290, vlan_mode: 2, ifindex: 0x0, phyifindex 0x0 
2020 Oct 13 20:55:25.961034 arp: arp_cache_create_cache_node: create node for uuid:1290, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x0 is_timer: 0 
2020 Oct 13 20:55:25.961282 arp: arp_cache_create_cache_node: Host IP 172.16.1.1, Remote vtep addr count = 1 
2020 Oct 13 20:55:25.961349 arp: arp_cache_create_cache_node: RNHs : 3.3.3.3 
2020 Oct 13 20:55:25.961622 arp: arp_cache_create_cache_node: New entry: create node 0x6c13ea74 0x6c13ee1c, uuid: 1290, sw-bd: 10, ip:172.16.1.1, mac: 0050.7966.6806, is_local: FALSE, num-macs: 1 
  • 下方展示了Leaf-1的ARP緩存抑制表;
Leaf-1# show ip arp suppression-cache detail 
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote
 Vtep Addrs

172.16.1.1      00:03:44 0050.7966.6806   10 Ethernet1/3         L
  • 下方展示了Leaf-2的ARP緩存抑制表;
Leaf-2# show ip arp suppression-cache detail 
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote
 Vtep Addrs

172.16.1.1      00:03:01 0050.7966.6806   10 (null)              R        3.3.3.
3

4.2 ARP抑制驗證

  • 在PC3(172.16.1.3)上ping PC1(172.16.1.1)
PC3> ping 172.16.1.1
84 bytes from 172.16.1.1 icmp_seq=1 ttl=64 time=58.651 ms
84 bytes from 172.16.1.1 icmp_seq=2 ttl=64 time=52.082 ms
84 bytes from 172.16.1.1 icmp_seq=3 ttl=64 time=54.362 ms
84 bytes from 172.16.1.1 icmp_seq=4 ttl=64 time=67.275 ms
84 bytes from 172.16.1.1 icmp_seq=5 ttl=64 time=50.352 ms
  • 這時本地VTEP Leaf-2能夠應答ARP請求消息,因為它具有存儲在ARP抑制緩存中的信息。 因此,當主機首次加入網絡時,它會發(fā)送一條GARP消息,以確保分配給它的IP地址是唯一的;
  • 由于ARP表或ARP抑制高速緩存都沒有關于要求的IP-mac綁定的條目,因此該消息將泛洪到其他VTEP葉子交換機。但在這些表完成更新后,下次主機間通訊時無需再進行ARP請求泛洪;
  • 下方展示了Leaf-2發(fā)送ARP回復消息的過程;
Leaf-2# 2020 Oct 13 21:02:00.100412 arp: arp_process_receive_packet_msg: VINCI: Anycast Proxy mode  
2020 Oct 13 21:02:00.100797 arp: arp_cache_resolve_l3_addr: arp_cache_resolve_l3_addr 
2020 Oct 13 21:02:00.101111 arp: arp_cache_resolve_l3_addr: mac: 0050.7966.6806, phy-ifindex:0x0, is_local:FALSE 
2020 Oct 13 21:02:00.101405 arp: arp_process_packet_in_l3_mode: ARP request: iod: 139, Vlan: 10, Dest-ip: 172.16.1.1, Mac-Addr: 0050.7966.6806, ifindex: 0x0, is_local: FALSE 
2020 Oct 13 21:02:00.101802 arp: arp_send_response_internal: ARP response from 172.16.1.1 to 172.16.1.3 on Vlan10, phy iod Ethernet1/4, vlan 10, svi_flag: 1 
2020 Oct 13 21:02:00.101867 arp: arp_send_response_internal: arp_send_response_internal: VINCI: is_flood: 0, iod: 139 phyiod: 10 
2020 Oct 13 21:02:00.101953 arp: arp_send_packet: Packet for 0050.7966.6808/172.16.1.3, iod 139(Vlan10), phy_iod 10(Ethernet1/4), phy_is_mct 0, flood_bd 0, flood port 1, skip_unnumbered_flood 0 

4.3 同VRF,不同VNI下的主機互通

  • 關于同VNI下主機互通已在上篇展示,本篇不再展示;
  • 本節(jié)以PC1(172.16.1.1) ping PC2(172.162.2.)為例。

4.3.1 Leaf-1的VNI內交換

  • 因為目標IP地址在另一個子網中,所以PC1使用Anycast Gateway MAC(AGM) 1234.1234.1234作為目標MAC地址,PC1向其默認網關Leaf-1發(fā)送ICMP請求消息,可參考下圖;


    image.png

4.3.2 Leaf-1上將數(shù)據(jù)包從L2VNI 10010路由到L3VNI 13960

  • 本地VTEP交換機Leaf-1接收幀。目標IP地址172.16.2.2(主機PC3)是通過BGP學習的,并與下一跳IP地址4.4.4.4(Leaf-2)一起裝載進RIB中,并在數(shù)據(jù)平面中也封裝了其他信息,例如L3VNI和封裝類型;
  • Leaf-1對下一跳地址進行遞歸路由查找,封裝原始數(shù)據(jù)包并加上包含VNI ID(13960)的VxLAN包頭,并通過Spine-1和Spine2將數(shù)據(jù)包路由到Leaf-2(外層MAC地址屬于Spine-1和Spine-2);
  • 因為VxLAN屬于MAC in UDP封裝類型,所以必須有內層源MAC地址和目標MAC地址。內層源MAC地址是從Inter-VNI路由中使用的SVI(SVI VLAN 3960)中獲取的,內層目標地址是BGP擴展團體通過BGP更新接收到的RMAC。

4.3.3 Leaf-2上將數(shù)據(jù)包從L3VNI 13960路由到L2VNI 10020

  • 當VTEP交換機Leaf-2收到VxLAN封裝的數(shù)據(jù)包時,它將拆掉VxLAN包頭。由于VNI 13960已關聯(lián)到VRF ta,因此路由決策基于VRF ta的RIB;

  • Leaf-2將原始ICMP請求路由到VLAN 20,并通過接口E1/3轉發(fā)出去;

  • 以上過程描述了對稱式集成路由與橋接(IRB)模型,其中數(shù)據(jù)包首先由本地VTEP交換,然后通過使用VxLAN包頭中的公用L3VNI在VxLAN Fabric中進行路由。接收方VTEP交換機收到數(shù)據(jù)包后拆掉VxLAN封裝,并根據(jù)原始IP數(shù)據(jù)包的目標IP地址做出路由決策。在路由選擇決定之后,數(shù)據(jù)包被轉發(fā)到目的地(bridge-route-route-bridge),數(shù)據(jù)包回程遵循相同的模型;

  • 使用對稱式IRB提供了設計上的靈活性,因為與非對稱式IRB不同,無需將所有VNI配置到所有的VTEP交換機。非對稱式IRB基于"bridge-route-bridge"模型,其中沒有公用的L3VNI用于VNI間路由。例如:如果我們在VxLAN Fabric中使用非對稱式IRB,則主機PC1會將數(shù)據(jù)包發(fā)送至默認網關(bridge部分),就像在對稱式IRB中一樣。本地VTEP交換機Leaf-1做出路由決策,但不是使用的公用L3VNI,而是使用VxLAN包頭中的VNI 10020,該包頭關聯(lián)到VLAN 20(VNI 10020關聯(lián)的VLAN),這是“route”部分。接收方VTEP交換機Leaf-2收到數(shù)據(jù)包后拆掉VxLAN包頭,并基于VxLAN 10020將數(shù)據(jù)包轉發(fā)至VLAN 20,最終到達主機PC3。

  • 測試PC1 ping PC3,并在Spine與Leaf之間抓包,下方展示了抓包結果;


    image.png
  • 以上說明了如何在VxLAN Fabric中傳播主機的IP地址以及如何將其裝載進L3RIB中。

五、總結

image.png

六、引用參考

膜拜大佬:Toni Pasanen
https://nwktimes.blogspot.com/2018/05/vxlan-part-vii-vxlan-bgp-evpn-control.html

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內容