基礎(chǔ)服務(wù)組件兩大性能殺手:鎖和緩存失效,dpvs 里面處處體現(xiàn)這兩點(diǎn)的優(yōu)化。再次強(qiáng)調(diào)一下,dpvs 是 dpdk 程序,特點(diǎn)是每個(gè)核盡可能不與其它核交互,這就要求共享數(shù)據(jù)都有一份拷貝,或是數(shù)據(jù)私有。舉個(gè)例子,流表 (session) 保存連接信息,每個(gè)核獨(dú)有。但這里有個(gè)問題,full-nat 模式下,返程數(shù)據(jù) outbound packet 也必須分配到同一個(gè) cpu,否則在流表中找不到 conn. 那么這塊 dpvs 是如何優(yōu)化的呢?
數(shù)據(jù)親和性問題

如圖所示,dpvs 機(jī)器有兩塊網(wǎng)卡,nic1 是 wan 外網(wǎng)網(wǎng)卡, nic0 是 lan 內(nèi)網(wǎng)網(wǎng)卡。 當(dāng) client 發(fā)送 packet 時(shí),網(wǎng)卡一般由 rss 來選擇數(shù)據(jù)發(fā)送到哪個(gè)隊(duì)列,一般這個(gè)隊(duì)列都會(huì)綁定到某個(gè)核心 lcore. rss 一般根據(jù)四元組<dport, dip, sport, sip>來分配網(wǎng)卡隊(duì)列。
但是,當(dāng) packet 由 rs 返回 dpvs 時(shí),如果還是根據(jù)四元組來做 rss, 那么得到的隊(duì)列必然無法對應(yīng)到正確的 lcore. 這就會(huì)引起流表數(shù)據(jù) miss, 如果再從其它 lcore 查表,必然會(huì)引起共享數(shù)據(jù)加鎖,和 cpu cache 失效問題。怎么解決呢?
網(wǎng)卡數(shù)據(jù)導(dǎo)流
網(wǎng)卡流入數(shù)據(jù),分配到哪個(gè)隊(duì)列,哪個(gè) cpu, 現(xiàn)在有很多實(shí)現(xiàn)方案。是否支持硬件隊(duì)列又有不同方案,有點(diǎn)亂。歸根結(jié)底,目的就是均勻分配流量到不同 cpu
上古時(shí)代網(wǎng)卡是沒有硬件隊(duì)列的,中斷都打在 cpu0 上,所以有了窮人的方案 rps, 軟件層面將中斷分在不同核上。后來有了硬件隊(duì)列,直接上 rss 將數(shù)據(jù)導(dǎo)向不同隊(duì)列,再綁定 cpu 即可。再后來發(fā)現(xiàn)光有 rss 還不夠,如果中斷在 cpu0 上,處理數(shù)據(jù)的反而在 cpu1 那就會(huì)產(chǎn)生 cache miss, 所以有了 rfs,中斷和處理都在同一個(gè)核。
但是這些沒有解決 dpvs full-nat 返程數(shù)據(jù)的問題,這就引入了 flow director 機(jī)制,精準(zhǔn)的分配網(wǎng)卡流量,而不是簡單的四元組哈希。
dpvs解決方案
引入 fdir 機(jī)制,這里有兩種方案。每個(gè) lcore 分配一個(gè) lip 本地地址,fdir 根據(jù) lip 就會(huì)分配到正確的核。另外一種是 lip 都是同一個(gè),根據(jù) lport 本地端口,來分配正確的核。由于 ip 受限等原因,不可能一個(gè) dpvs 上有幾十個(gè)本地地址,所以 dpvs 采用第二種。

如上圖所示,本地可用端口,根據(jù) cpu 個(gè)數(shù)做掩碼。將端口固定到某個(gè) lcore. 舉個(gè)例子,如果網(wǎng)卡有 16 個(gè)隊(duì)列,那么就配置 16 個(gè) cpu, 掩碼是 0x0F, 端口根據(jù)掩碼取余,就會(huì)對應(yīng)到指定的隊(duì)列和cpu
說完原理直接摟源碼~~
配置文件 fdir
先看一下配置文件,只看 net device dpdk0 就可以
<init> device dpdk0 {
rx {
queue_number 8
descriptor_number 1024
rss all
}
tx {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
}
! promisc_mode
kni_name dpdk0.kni
}
可以看到,rx 隊(duì)列配置了 rss. 并且 dpdk0 網(wǎng)卡配置了 fdir, 具體細(xì)節(jié)暫不看。
默認(rèn) fdir 配置
在對網(wǎng)卡初始化時(shí),會(huì)用到 default_port_conf,這里有關(guān)于網(wǎng)卡 fdir 配置
static struct rte_eth_conf default_port_conf = {
.rxmode = {
.mq_mode = ETH_MQ_RX_RSS,
.max_rx_pkt_len = ETHER_MAX_LEN,
.split_hdr_size = 0,
.header_split = 0,
.hw_ip_checksum = 1,
.hw_vlan_filter = 0,
.jumbo_frame = 0,
.hw_strip_crc = 0,
},
.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
.rss_hf = /*ETH_RSS_IP*/ ETH_RSS_TCP,
},
},
.txmode = {
.mq_mode = ETH_MQ_TX_NONE,
},
.fdir_conf = {
.mode = RTE_FDIR_MODE_PERFECT,
.pballoc = RTE_FDIR_PBALLOC_64K,
.status = RTE_FDIR_REPORT_STATUS/*_ALWAYS*/,
.mask = {
.vlan_tci_mask = 0x0,
.ipv4_mask = {
.src_ip = 0x00000000,
.dst_ip = 0xFFFFFFFF,
},
.ipv6_mask = {
.src_ip = { 0, 0, 0, 0 },
.dst_ip = { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
},
.src_port_mask = 0x0000,
/* to be changed according to slave lcore number in use */
.dst_port_mask = 0x00F8,
.mac_addr_byte_mask = 0x00,
.tunnel_type_mask = 0,
.tunnel_id_mask = 0,
},
.drop_queue = 127,
.flex_conf = {
.nb_payloads = 0,
.nb_flexmasks = 0,
},
},
};
這里有兩點(diǎn),rx_adv_conf 關(guān)于 rss 配置,默認(rèn)是 ETH_RSS_TCP. 最重要的是 fdir_conf,可以看到 mode, pballoc, status 等。這里關(guān)注 mask 即可,fdir 支持不同層的導(dǎo)流,ipv4_mask.src_ip 掩碼是 0,ipv4_mask.dst_ip 位全置1,所以 fdir 只看目地 ip 不看源 ip,src_port_mask 是 0,dst_port_mask 非 0,也就是說 dpvs fdir 只根據(jù) dst_ip, dst_port_mask 計(jì)算,也就是對應(yīng) <lip, lport>, 由于 lip 只有一個(gè),所以等同于只看 lport, 那么如何設(shè)置正確的 dst_port_mask 掩碼呢?
sa_pool 與 fdir 實(shí)始化
每個(gè) lcore 有自己的 sa_pool, 用于管理本地分配的 <lip, lport>, 假如當(dāng)前啟用了 64 個(gè) lcore, 一共有 65535-1024 可用端口,那么每個(gè) lcore 在同一個(gè) lip 上最多使用 (65535-1024)/64 個(gè)地址。
程序初始化時(shí)調(diào)用 sa_pool_init 初始化全局 fdir 表
int sa_pool_init(void)
{
int shift, err;
lcoreid_t cid;
uint16_t port_base;
// slave_lcore_nb 是核的個(gè)數(shù),sa_lcore_mask 核對應(yīng)的bit置為1的 mask
/* enabled lcore should not change after init */
netif_get_slave_lcores(&sa_nlcore, &sa_lcore_mask);
/* how many mask bits needed ? */
for (shift = 0; (0x1<<shift) < sa_nlcore; shift++)
;
if (shift >= 16)
return EDPVS_INVAL; /* bad config */
port_base = 0;
for (cid = 0; cid < RTE_MAX_LCORE; cid++) {
if (cid > 64 || !(sa_lcore_mask & (1L << cid)))
continue;
assert(rte_lcore_is_enabled(cid) && cid != rte_get_master_lcore());
sa_fdirs[cid].mask = ~((~0x0) << shift);
sa_fdirs[cid].lcore = cid;
sa_fdirs[cid].port_base = htons(port_base);
sa_fdirs[cid].soft_id = 0;
port_base++;
}
err = msg_type_mc_register(&sa_stats_msg);
return err;
}
-
netif_get_slave_lcores獲取當(dāng)前啟用的 lcore 個(gè)數(shù),并生成掩碼 - for 循環(huán)為每個(gè)核初始化全局 fdir 配置, mask 掩碼由上一步獲取。其中最重要的是 port_base, fdir 計(jì)算時(shí),lport 經(jīng)過掩碼后,得到的值如果等于 port_base 就會(huì)分配到這個(gè)核心
在使用 ipvsadmin 添加 lip 時(shí),ifa_add_set 調(diào)用 sa_pool_create 初始化 sa_pool
int sa_pool_create(struct inet_ifaddr *ifa, uint16_t low, uint16_t high)
{
struct sa_pool *ap;
int err;
lcoreid_t cid;
low = low ? : DEF_MIN_PORT;
high = high ? : DEF_MAX_PORT;
if (!ifa || low > high || low == 0 || high >= MAX_PORT) {
RTE_LOG(ERR, SAPOOL, "%s: bad arguments\n", __func__);
return EDPVS_INVAL;
}
for (cid = 0; cid < RTE_MAX_LCORE; cid++) {
uint32_t filtids[MAX_FDIR_PROTO];
struct sa_fdir *fdir = &sa_fdirs[cid];
/* skip master and unused cores */
if (cid > 64 || !(sa_lcore_mask & (1L << cid)))
continue;
assert(rte_lcore_is_enabled(cid) && cid != rte_get_master_lcore());
ap = rte_zmalloc(NULL, sizeof(struct sa_pool), 0);
if (!ap) {
err = EDPVS_NOMEM;
goto errout;
}
ap->ifa = ifa;
ap->low = low;
ap->high = high;
rte_atomic32_set(&ap->refcnt, 0);
err = sa_pool_alloc_hash(ap, sa_pool_hash_size, fdir);
if (err != EDPVS_OK) {
rte_free(ap);
goto errout;
}
/* if add filter failed, waste some soft-id is acceptable. */
filtids[0] = fdir->soft_id++;
filtids[1] = fdir->soft_id++;
err = sa_add_filter(ifa->af, ifa->idev->dev, cid, &ifa->addr,
fdir->port_base, filtids);
if (err != EDPVS_OK) {
sa_pool_free_hash(ap);
rte_free(ap);
goto errout;
}
ap->filter_id[0] = filtids[0];
ap->filter_id[1] = filtids[1];
ifa->sa_pools[cid] = ap;
}
return EDPVS_OK;
errout:
sa_pool_destroy(ifa);
return err;
}
- low, high 端口默認(rèn)分別是 1025,65535
- for 循環(huán),為每個(gè) lcore 初始化 sa_pool, 并設(shè)置 fdir
-
rte_zmalloc分配 sa_pool 結(jié)構(gòu)體,并賦初值 ifa, low, high.sa_pool_alloc_hash分配 socket 地址的哈希表,根據(jù) fdir->mask &&
((uint16_t)port & fdir->mask) != ntohs(fdir->port_base) 為當(dāng)前 lcore 分配地址。 -
sa_add_filter調(diào)用__add_del_filter增加 fdir filter, 用于 fdir 匹配
static int __add_del_filter(int af, struct netif_port *dev, lcoreid_t cid,
const union inet_addr *dip, __be16 dport,
uint32_t filter_id[MAX_FDIR_PROTO], bool add)
{
struct rte_eth_fdir_filter filt[MAX_FDIR_PROTO] = {
{
.action.behavior = RTE_ETH_FDIR_ACCEPT,
.action.report_status = RTE_ETH_FDIR_REPORT_ID,
.soft_id = filter_id[0],
},
{
.action.behavior = RTE_ETH_FDIR_ACCEPT,
.action.report_status = RTE_ETH_FDIR_REPORT_ID,
.soft_id = filter_id[1],
},
};
if (af == AF_INET) {
filt[0].input.flow_type = RTE_ETH_FLOW_NONFRAG_IPV4_TCP;
filt[0].input.flow.tcp4_flow.ip.dst_ip = dip->in.s_addr;
filt[0].input.flow.tcp4_flow.dst_port = dport;
filt[1].input.flow_type = RTE_ETH_FLOW_NONFRAG_IPV4_UDP;
filt[1].input.flow.udp4_flow.ip.dst_ip = dip->in.s_addr;
filt[1].input.flow.udp4_flow.dst_port = dport;
} else if (af == AF_INET6) {
filt[0].input.flow_type = RTE_ETH_FLOW_NONFRAG_IPV6_TCP;
memcpy(filt[0].input.flow.ipv6_flow.dst_ip, &dip->in6, sizeof(struct in6_addr));
filt[0].input.flow.tcp6_flow.dst_port = dport;
filt[1].input.flow_type = RTE_ETH_FLOW_NONFRAG_IPV6_UDP;
memcpy(filt[1].input.flow.ipv6_flow.dst_ip, &dip->in6, sizeof(struct in6_addr));
filt[1].input.flow.udp6_flow.dst_port = dport;
} else {
return EDPVS_NOTSUPP;
}
queueid_t queue;
int err;
enum rte_filter_op op, rop;
#ifdef CONFIG_DPVS_SAPOOL_DEBUG
char ipaddr[64];
#endif
if (dev->netif_ops && dev->netif_ops->op_filter_supported) {
if (dev->netif_ops->op_filter_supported(dev, RTE_ETH_FILTER_FDIR) < 0) {
if (dev->nrxq <= 1)
return EDPVS_OK;
RTE_LOG(ERR, SAPOOL, "%s: FDIR is not supported by device %s. Only"
" single rxq can be configured.\n", __func__, dev->name);
return EDPVS_NOTSUPP;
}
} else {
RTE_LOG(ERR, SAPOOL, "%s: FDIR support of device %s is not known.\n",
__func__, dev->name);
return EDPVS_INVAL;
}
err = netif_get_queue(dev, cid, &queue);
if (err != EDPVS_OK)
return err;
filt[0].action.rx_queue = filt[1].action.rx_queue = queue;
op = add ? RTE_ETH_FILTER_ADD : RTE_ETH_FILTER_DELETE;
netif_mask_fdir_filter(af, dev, &filt[0]);
netif_mask_fdir_filter(af, dev, &filt[1]);
err = netif_fdir_filter_set(dev, op, &filt[0]);
if (err != EDPVS_OK)
return err;
err = netif_fdir_filter_set(dev, op, &filt[1]);
if (err != EDPVS_OK) {
rop = add ? RTE_ETH_FILTER_DELETE : RTE_ETH_FILTER_ADD;
netif_fdir_filter_set(dev, rop, &filt[0]);
return err;
}
return err;
}
- struct rte_eth_fdir_filter filt 定義 fdir 過濾條件結(jié)構(gòu)體
- 分別針對 ipv4, ipv6 設(shè)置 filter
- 并不是所有網(wǎng)卡都支持 fdir, 調(diào)用
op_filter_supported查看是否支持 - netif_get_queue(dev, cid, &queue) 獲取當(dāng)前 lcore 所綁定的隊(duì)列 queue
- filt[0].action.rx_queue = filt[1].action.rx_queue = queue 綁定對應(yīng)的網(wǎng)卡硬件隊(duì)列
-
netif_mask_fdir_filter獲取當(dāng)前 fdir 配置,然后添加到 filter 結(jié)構(gòu)體里 -
netif_fdir_filter_set將 filter 過濾條件更新到網(wǎng)卡,由于 netif_ops 是個(gè)函數(shù)指針結(jié)構(gòu)體,根據(jù)是否 bond 網(wǎng)卡操作不同。最終都是調(diào)用 dpdk 提供的 apirte_eth_dev_filter_ctrl實(shí)現(xiàn)。
流表如何使用 fdir
由于 dpdk 封裝了回包時(shí),根據(jù) fdir filter 分配隊(duì)列的邏輯。我們在 dpvs 代碼只能看如何從 sa_pool 中分配本地地址。所有新建立的連接,都會(huì)調(diào)用 dp_vs_conn_new 注冊流表
/* FNAT only: select and bind local address/port */
if (dest->fwdmode == DPVS_FWD_MODE_FNAT) {
if ((err = dp_vs_laddr_bind(new, dest->svc)) != EDPVS_OK)
goto unbind_dest;
}
看代碼得知,只有 full-nat 才支持 local address/port
int dp_vs_laddr_bind(struct dp_vs_conn *conn, struct dp_vs_service *svc)
{
struct dp_vs_laddr *laddr = NULL;
int i;
uint16_t sport = 0;
struct sockaddr_storage dsin, ssin;
if (!conn || !conn->dest || !svc)
return EDPVS_INVAL;
if (svc->proto != IPPROTO_TCP && svc->proto != IPPROTO_UDP)
return EDPVS_NOTSUPP;
if (conn->flags & DPVS_CONN_F_TEMPLATE)
return EDPVS_OK;
/*
* some time allocate lport fails for one laddr,
* but there's also some resource on another laddr.
* use write lock since
* 1. __get_laddr will change svc->laddr_curr;
* 2. we uses svc->num_laddrs;
*/
rte_rwlock_write_lock(&svc->laddr_lock);
for (i = 0; i < dp_vs_laddr_max_trails && i < svc->num_laddrs; i++) {
/* select a local IP from service */
laddr = __get_laddr(svc);
if (!laddr) {
RTE_LOG(ERR, IPVS, "%s: no laddr available.\n", __func__);
rte_rwlock_write_unlock(&svc->laddr_lock);
return EDPVS_RESOURCE;
}
memset(&dsin, 0, sizeof(struct sockaddr_storage));
memset(&ssin, 0, sizeof(struct sockaddr_storage));
if (laddr->af == AF_INET) {
struct sockaddr_in *daddr, *saddr;
daddr = (struct sockaddr_in *)&dsin;
daddr->sin_family = laddr->af;
daddr->sin_addr = conn->daddr.in;
daddr->sin_port = conn->dport;
saddr = (struct sockaddr_in *)&ssin;
saddr->sin_family = laddr->af;
saddr->sin_addr = laddr->addr.in;
} else {
struct sockaddr_in6 *daddr, *saddr;
daddr = (struct sockaddr_in6 *)&dsin;
daddr->sin6_family = laddr->af;
daddr->sin6_addr = conn->daddr.in6;
daddr->sin6_port = conn->dport;
saddr = (struct sockaddr_in6 *)&ssin;
saddr->sin6_family = laddr->af;
saddr->sin6_addr = laddr->addr.in6;
}
if (sa_fetch(laddr->af, laddr->iface, &dsin, &ssin) != EDPVS_OK) {
char buf[64];
if (inet_ntop(laddr->af, &laddr->addr, buf, sizeof(buf)) == NULL)
snprintf(buf, sizeof(buf), "::");
#ifdef CONFIG_DPVS_IPVS_DEBUG
RTE_LOG(ERR, IPVS, "%s: [%d] no lport available on %s, "
"try next laddr.\n", __func__, rte_lcore_id(), buf);
#endif
put_laddr(laddr);
continue;
}
sport = (laddr->af == AF_INET ? (((struct sockaddr_in *)&ssin)->sin_port)
: (((struct sockaddr_in6 *)&ssin)->sin6_port));
break;
}
rte_rwlock_write_unlock(&svc->laddr_lock);
if (!laddr || sport == 0) {
#ifdef CONFIG_DPVS_IPVS_DEBUG
RTE_LOG(ERR, IPVS, "%s: [%d] no lport available !!\n",
__func__, rte_lcore_id());
#endif
if (laddr)
put_laddr(laddr);
return EDPVS_RESOURCE;
}
rte_atomic32_inc(&laddr->conn_counts);
/* overwrite related fields in out-tuplehash and conn */
conn->laddr = laddr->addr;
conn->lport = sport;
tuplehash_out(conn).daddr = laddr->addr;
tuplehash_out(conn).dport = sport;
conn->local = laddr;
return EDPVS_OK;
}
- svc 代表后端服務(wù),由于所有核會(huì)修改 svc->laddr_curr, 所以需要加鎖。這塊大并發(fā)會(huì)不會(huì)有問題,違背了 dpdk 的原則呢?如果 laddr_curr 不在 svc 里是不是可以避免?
- for 循環(huán)開始償試獲取 laddr,如果成功,最后設(shè)置到 conn->laddr 和 conn->lport
-
__get_laddr獲取本地 lip, 因?yàn)閮?nèi)網(wǎng)網(wǎng)卡可能綁定很多個(gè) lip, 所以選取時(shí)也會(huì)有一些負(fù)載均衡策略。輪循的話,是否可以省去 svc->laddr_lock 這個(gè)鎖?好像不行,ipvsadmin 增刪改 lip 的話也會(huì)有問題 -
sa_fetch獲取端口,來完整填充 dsin, ssin 地址,暫時(shí)只看 ipv4 的實(shí)現(xiàn)
static int sa4_fetch(struct netif_port *dev,
const struct sockaddr_in *daddr,
struct sockaddr_in *saddr)
{
struct inet_ifaddr *ifa;
struct flow4 fl;
struct route_entry *rt;
int err;
assert(saddr);
if (saddr && saddr->sin_addr.s_addr != INADDR_ANY && saddr->sin_port != 0)
return EDPVS_OK; /* everything is known, why call this function ? */
/* if source IP is assiged, we can find ifa->this_sa_pool
* without @daddr and @dev. */
if (saddr->sin_addr.s_addr) {
ifa = inet_addr_ifa_get(AF_INET, dev, (union inet_addr*)&saddr->sin_addr);
if (!ifa)
return EDPVS_NOTEXIST;
if (!ifa->this_sa_pool) {
RTE_LOG(WARNING, SAPOOL, "%s: fetch addr on IP without pool.", __func__);
inet_addr_ifa_put(ifa);
return EDPVS_INVAL;
}
err = sa_pool_fetch(sa_pool_hash(ifa->this_sa_pool,
(struct sockaddr_storage *)daddr),
(struct sockaddr_storage *)saddr);
if (err == EDPVS_OK)
rte_atomic32_inc(&ifa->this_sa_pool->refcnt);
inet_addr_ifa_put(ifa);
return err;
}
/* try to find source ifa by @dev and @daddr */
memset(&fl, 0, sizeof(struct flow4));
fl.fl4_oif = dev;
fl.fl4_daddr.s_addr = daddr ? daddr->sin_addr.s_addr : htonl(INADDR_ANY);
fl.fl4_saddr.s_addr = saddr ? saddr->sin_addr.s_addr : htonl(INADDR_ANY);
rt = route4_output(&fl);
if (!rt)
return EDPVS_NOROUTE;;
/* select source address. */
if (!rt->src.s_addr) {
inet_addr_select(AF_INET, rt->port, (union inet_addr *)&rt->dest,
RT_SCOPE_UNIVERSE, (union inet_addr *)&rt->src);
}
ifa = inet_addr_ifa_get(AF_INET, rt->port, (union inet_addr *)&rt->src);
if (!ifa) {
route4_put(rt);
return EDPVS_NOTEXIST;
}
route4_put(rt);
if (!ifa->this_sa_pool) {
RTE_LOG(WARNING, SAPOOL, "%s: fetch addr on IP without pool.",
__func__);
inet_addr_ifa_put(ifa);
return EDPVS_INVAL;
}
/* do fetch socket address */
err = sa_pool_fetch(sa_pool_hash(ifa->this_sa_pool,
(struct sockaddr_storage *)daddr),
(struct sockaddr_storage *)saddr);
if (err == EDPVS_OK)
rte_atomic32_inc(&ifa->this_sa_pool->refcnt);
inet_addr_ifa_put(ifa);
return err;
}
- saddr->sin_addr.s_addr 如果設(shè)置了源地址,也就是 lip, 那么自然 ifa 網(wǎng)絡(luò)接口就確認(rèn)了,直接從 ifa 對應(yīng)的 sa_pool 分配地址即可。
- 如果沒有源地址,那么就要根據(jù)路由,自動(dòng)選擇一個(gè) lip,然后走同樣的邏輯
static inline int sa_pool_fetch(struct sa_entry_pool *pool,
struct sockaddr_storage *ss)
{
assert(pool && ss);
struct sa_entry *ent;
struct sockaddr_in *sin = (struct sockaddr_in *)ss;
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)ss;
ent = list_first_entry_or_null(&pool->free_enties, struct sa_entry, list);
if (!ent) {
pool->miss_cnt++;
return EDPVS_RESOURCE;
}
if (ss->ss_family == AF_INET) {
sin->sin_family = AF_INET;
sin->sin_addr.s_addr = ent->addr.in.s_addr;
sin->sin_port = ent->port;
} else if (ss->ss_family == AF_INET6) {
sin6->sin6_family = AF_INET6;
sin6->sin6_addr = ent->addr.in6;
sin6->sin6_port = ent->port;
} else {
return EDPVS_NOTSUPP;
}
ent->flags |= SA_F_USED;
list_move_tail(&ent->list, &pool->used_enties);
rte_atomic16_inc(&pool->used_cnt);
rte_atomic16_dec(&pool->free_cnt);
return EDPVS_OK;
}
- list_first_entry_or_null 從 lcore 本地 sa_pool 的 free 隊(duì)列里取出第一個(gè)元素,就是可用的地址資源
- 更新 sin 的地址和端口
- 將當(dāng)前資源標(biāo)記為使用
- 將當(dāng)前資源添加到 used 列表中
- 增減統(tǒng)計(jì)計(jì)數(shù)信息
這里是不是有問題?資源并沒有從 free_enties 列表中移除?下一次請求還會(huì)復(fù)用?我去給官方提個(gè) pr...
如果連接釋放了,那么資源也會(huì)回收到 sa_pool 中。代碼實(shí)現(xiàn)在 sa_pool_release 里,同樣資源也沒從 used_entries 中釋放出來... 好奇怪
更新2018-11-19: 學(xué)藝不精,list_move_tail 會(huì)從原有的隊(duì)列中刪除,再添加到新隊(duì)列...
總結(jié)
dpvs 實(shí)現(xiàn)的細(xì)節(jié)還是很多的,代碼在不斷升級更新,就在上周剛剛支持了 ipv6... 社區(qū)還是很偉大的,希望 iqiyi 能一直開源這個(gè)項(xiàng)目~~