1、Overview
??This article presents a high-level overview of Open vSwitch* with the Data Plane Development Kit (OvS-DPDK)—the high performance, open source virtual switch—and links to further technical articles that dive deeper into individual OvS-DPDK features. This article was written for users of OvS who want to know more about DPDK integration.
??Note: Users can download a zip file of the OVS master branch or the 2.6 branch, as well as installation steps for the master branch or the 2.6 branch.
??本文介紹了使用DPDK的Open vSwitch的高級(jí)概述(OvS-DPDK)--高性能、開源的虛擬交換機(jī)--以及進(jìn)一步深入研究OvS-DPDK個(gè)別特性的技術(shù)文章的鏈接。本文是為希望了解更多關(guān)于集成DPDK的OvS用戶編寫的。
??注意:用戶可以下載OvS主分支或2.6分支的zip文件,以及OvS主分支或2.6分支的安裝步驟。
2、OvS-DPDK High-level Architecture
??Open vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache* 2.0 license. It supports SDN control semantics via the OpenFlow* protocol and its OVSDB management interface. It is available from openvswitch.org, GitHub, and is also consumable through Linux distributions.
??Native Open vSwitch generally forwards packets via the kernel space data path (see Figure 1). In the kernel data path, the switching “fastpath” consists of a simple flow table indicating forwarding/action rules for packets that are received. Exception packets (first packet in a flow) do not match any existing entries in the kernel fastpath table and are sent to the user space daemon for processing (slowpath). After user space handles the first packet in the flow, the daemon will then update the flow table in kernel space so that subsequent packets in the flow can be processed in the fastpath and not sent to user space. Following this approach, native OvS can eliminate the costly context switch between kernel and user space for a large percentage of received packets. However, the achievable packet throughput is limited by the forwarding bandwidth of the Linux network stack, which is not suited for use cases requiring a high rate of packet processing; for example, Telco.
??Open vSwitch是一款基于開源Apache* 2.0許可的生產(chǎn)級(jí)多層虛擬交換機(jī)。Open vSwitch通過OpenFlow協(xié)議及其OVSDB管理接口支持SDN控制語義。Open vSwitch可以從openvswitch.org和GitHub獲得,也可以通過Linux發(fā)行版使用。
??原生Open vSwitch通常通過內(nèi)核空間數(shù)據(jù)路徑(data path)轉(zhuǎn)發(fā)數(shù)據(jù)包(參見圖1)。在內(nèi)核數(shù)據(jù)路徑中,交換“fastpath”由一個(gè)簡(jiǎn)單的流表組成,該表指示接收到的數(shù)據(jù)包的轉(zhuǎn)發(fā)/動(dòng)作規(guī)則。異常包(流中的第一個(gè)包)與內(nèi)核fastpath表中的任何現(xiàn)有條目都不匹配,該數(shù)據(jù)包會(huì)被發(fā)送到用戶空間的守護(hù)進(jìn)程進(jìn)行處理(slowpath)。在用戶空間處理流中的第一個(gè)包之后,守護(hù)進(jìn)程將更新內(nèi)核空間中的流表,以便流中的后續(xù)包可以在快速路徑中處理,而不需要再被發(fā)送到用戶空間。按照這種方法,原生OvS可以為接收的大部分包消除內(nèi)核和用戶空間之間代價(jià)高昂的上下文切換。但是,可實(shí)現(xiàn)的數(shù)據(jù)包吞吐量受Linux網(wǎng)絡(luò)棧轉(zhuǎn)發(fā)帶寬的限制,不適合對(duì)數(shù)據(jù)包處理速率要求較高的用例;例如,電信。

??Figure 2 below shows the high-level architecture of OvS-DPDK. OvS switching ports are represented by network devices (or netdevs). Netdev-dpdk is a DPDK-accelerated network device that uses DPDK to accelerate switch I/O, through three separate interfaces: one physical interface (handled by the librte_eth library within DPDK), and two virtual interfaces (librte_vhost and librte_ring). These interface with the physical and virtual devices connected to the virtual switch.
??Other OvS architectural layers provide further functionality and interface with, for example, the SDN controller. Dpif-netdev provides user space forwarding and ofproto is the OvS library that implements an OpenFlow switch. It talks to OpenFlow controllers over the network and to switch hardware or software through an ofproto provider. The ovsdb server maintains the up-to-date switching table information for this OvS instance and communicates this to the SDN controller. The following section provides details of the switching/forwarding tables, with further information on the OvS architecture available through the openvswitch.org website.
??下面的圖2顯示了OvS-DPDK的高級(jí)架構(gòu)。OvS交換端口由網(wǎng)絡(luò)設(shè)備(或netdevs)表示。Netdev-dpdk是一種DPDK網(wǎng)絡(luò)設(shè)備,意思是使用DPDK技術(shù)加速交換機(jī)I/O,通過三個(gè)獨(dú)立的接口:一個(gè)物理接口(由DPDK中的librte_eth庫處理)和兩個(gè)虛擬接口(librte_vhost和librte_ring)。這些物理設(shè)備接口和虛擬設(shè)備的接口,都連接到虛擬交換機(jī)。
??其他OvS架構(gòu)層提供了更進(jìn)一步的功能和接口,比如SDN控制器。dpif-netdev為用戶空間提供轉(zhuǎn)發(fā)功能,ofproto是一個(gè)OvS庫,相當(dāng)于一個(gè)OpenFlow交換機(jī)。它通過網(wǎng)絡(luò)與OpenFlow控制器進(jìn)行對(duì)話,并通過ofproto提供商切換硬件或軟件。ovsdb server維護(hù)此OvS實(shí)例的最新交換表信息,并將此信息傳遞給SDN控制器。下一節(jié)將描述交換/轉(zhuǎn)發(fā)表的詳細(xì)信息,關(guān)于OvS架構(gòu)的更多信息,可以參考o(jì)penvswitch.org網(wǎng)站內(nèi)容。

3、OvS-DPDK Switching Table Hierarchy
??A packet entering OvS-DPDK from a physical or virtual interface receives a unique identifier or hash, based on its header fields, which is then matched against an entry in one of three main switching tables: the exact match cache (EMC), the data path classifier (dpcls), or the ofproto classifier. A packet’s identifier will traverse each of these three tables in order, unless a match is found, in which case the appropriate actions indicated by the match rule in the table will be executed and the packet forwarded out of the switch upon completion of all actions. This scheme is illustrated in Figure 3.
??從物理或虛擬接口進(jìn)入OvS-DPDK的包,根據(jù)其報(bào)頭字段計(jì)算出一個(gè)唯一標(biāo)識(shí)符或hash值,然后將其與“精確匹配緩存(EMC)”、“數(shù)據(jù)路徑分類器(dpcls)”、"ofproto分類器"這三個(gè)主要交換表中的條目進(jìn)行匹配。包的標(biāo)識(shí)符將依次遍歷這三個(gè)表,直到找到一條匹配,接下來,將執(zhí)行表中的匹配規(guī)則指示的適當(dāng)操作,并在完成所有操作后將包轉(zhuǎn)發(fā)出交換機(jī)。該方案如圖3所示。

??The three tables have different characteristics and associated throughput performance/latency. The EMC offers fastest processing for a limited number of table entries. The packet’s identifier must exactly match the entry in this table for all fields—the 5-tuple of source IP and port, destination IP and port, and protocol—for highest speed processing or it will “miss” on the EMC and pass through to the dpcls. The dpcls contains many more table entries (arranged in multiple subtables) and enables wildcard matching of the packet identifier (for example, destination IP and port are specified but any source is allowed). This gives approximately half the throughput performance of the EMC and caters to a much larger number of table entries. Packet flows matched in the dpcls are installed in the EMC so that subsequent packets with the same identifier can be processed at the highest speed.
??A miss on the dpcls results in the packet identifier being sent to the ofproto classifier so that the OpenFlow controller can decide on the action. This path is the least performant, >10x slower than the EMC. Matches in the ofproto classifier result in new table entries being established in the faster switching tables so that subsequent packets in the same flow can be processed more quickly.
??這三個(gè)表具有不同的特性、吞吐量、性能/延遲。EMC處理速度最快,但表項(xiàng)數(shù)目有限。數(shù)據(jù)包的標(biāo)識(shí)符必須與EMC表中所有字段(源IP、源端口、目的IP、目的端口、協(xié)議,5元組)的條目完全匹配,才可以獲得最高的處理速度,否則該數(shù)據(jù)包會(huì)被認(rèn)為EMC緩存“未命中”,并將其傳遞給dpcls。dpcls包含更多的表項(xiàng)(排列在多個(gè)子表中),并允許數(shù)據(jù)包標(biāo)識(shí)符進(jìn)行通配符匹配(例如,指定了目的IP和端口,但允許任何源)。dpcls吞吐量性能大約為EMC一半,但提供了更多的表項(xiàng)。在dpcls中匹配的信息流被更新到EMC中,以便具有相同標(biāo)識(shí)符的后續(xù)信息流能夠以最高速度處理。
??如果dpcls未命中,則會(huì)導(dǎo)致數(shù)據(jù)包被發(fā)送到ofproto分類器,以便OpenFlow控制器可以決定需要執(zhí)行的動(dòng)作。該路徑是性能最差的,差不多比EMC慢10倍。在ofproto分類器中的匹配結(jié)果是在更快的交換表(dpcls)中建立新的表項(xiàng),以便在同一流中的后續(xù)包可以更快地處理。
4、OvS-DPDK Features and Performance
At the time of this writing, the following high-level OvS-DPDK features are available on the OvS master code branch:
- DPDK support for v16.07 (supported version increments with each new DPDK release)
- vHost user support
- vHost reconnect
- vHost multiqueue
- Native tunneling support: VxLAN, GRE, Geneve
- VLAN support
- MPLS support
- Ingress/egress QoS policing
- Jumbo frame support
- Connection tracking
- Statistics: DPDK vHost and extended DPDK stats
- Debug: DPDK pdump support
- Link bonding
- Link status
- VFIO support
- ODL/OpenStack detection of DPDK ports
- vHost user NUMA awareness
??A recent performance comparison between native OvS and OvS-DPDK is highlighted in Figure 4. This shows the throughput in packets-per-second for the Phy-OvS-Phy use case, indicating a ~10x performance enhancement for OvS-DPDK over native OvS, increasing to ~12x with Intel? Hyper-Threading Technology (Intel? HT Technology) enabled (labelled 1C2T, or one physical core with two logical threads, in the figure legend). Similarly, the Phy-OvS-VM-OvS-Phy use case demonstrates a ~9x performance enhancement for OvS-DPDK over native OvS.
??The hardware and software configuration for this data, along with further use case results, can be found in the Intel? Open Network Platform (Intel? ONP) performance report.
??圖4突出顯示了原生OvS和OvS- dpdk之間最近的性能比較。這顯示了Phy-OvS-Phy用例的每秒包吞吐量,表明OvS-dpdk的性能比原生OvS提高了約10倍,在啟用英特爾?超線程技術(shù)(英特爾?HT技術(shù))(標(biāo)記為1C2T,一個(gè)物理核心和兩個(gè)邏輯線程,在圖中圖例)后提高到約12倍。類似地,Phy-OvS-VM-OvS-Phy用例演示了OvS- dpdk相對(duì)于原生OvS的約有9倍的性能增強(qiáng)。
??該數(shù)據(jù)的硬件和軟件配置,以及進(jìn)一步的用例結(jié)果,可以在英特爾?開放網(wǎng)絡(luò)平臺(tái)(英特爾?ONP)性能報(bào)告中找到。

5、OvS-DPDK Availability
??OvS-DPDK is available in the upstream openvswitch.org repository and is also available through Linux distributions as below. The latest milestone release is OvS 2.6 (September 2016), and releases are made with a six-month cadence.
??Code is available for download as follows: OvS master branch; OvS 2.6 release branch. Installation steps for the master branch are available as well as installation steps for the 2.6 release branch.
Packaged versions of OvS with DPDK are available from:
??OvS-DPDK可在openvswitch.org遠(yuǎn)程倉庫中獲得,也可通過如下Linux發(fā)行版獲得。最新的里程碑版本是OvS 2.6(2016年9月),每六個(gè)月發(fā)布一次。
??代碼下載如下:OvS主分支;OvS 2.6發(fā)布分支。主分支的安裝步驟和2.6版本分支的安裝步驟都是可用的。
帶有DPDK的OvS打包版本可從以下網(wǎng)站獲得:
6、 Additional Information
To learn more about OvS-DPDK, check out the following videos and articles on Intel? Developer Zone, 01.org, Intel? Network Builders and Intel? Network Builders University.
User guides:
Developer guides:
- Open vSwitch with DPDK - how to build and install Open vSwitch using a DPDK datapath
- Using Open vSwitch with DPDK - includes advanced performance tuning information
Articles and Videos:
- Rate limiting configuration and usage for OvS with DPDK
- QoS configuration and usage for OvS with DPDK
- Configure vHost User multiqueue for OvS with DPDK
- vHost User NUMA awareness in OvS with DPDK
- DPDK Pdump in OvS with DPDK
- Enabling OvS with DPDK in OpenStack
- Jumbo Frames in Open vSwitch* with DPDK
- vHost User Client Mode in Open vSwitch* with DPDK
- OVS-DPDK Datapath Classifier - Part 1
- OvS-DPDK Datapath Classifier – Part 2
- Link Aggregation Configuration and Usage in Open vSwitch* with DPDK
- Analyzing OvS with DPDK bottlenecks using Intel? VTune Amplifier
- Using OvS and DPDK with Neutron in DevStack
- Build and Test a Simple NFV Inter-VM Use Case with OVS-DPDK (YouTube video series)
OvS with DPDK milestone release webinars:
INB university:
- Open vSwitch with DPDK Architectural Deep Dive
- DPDK Open vSwitch: Accelerating the Path to the Guest
White paper:
Have a question? Feel free to follow up with the query on the Open vSwitch discussion mailing thread.