概述
最近在工作中遇到一個(gè)問(wèn)題:我們后臺(tái)有個(gè)接口,會(huì)訪問(wèn)外部接口;外部接口配置域名,通過(guò)智能DNS進(jìn)行負(fù)載均衡,但是監(jiān)控的同事告訴我們,所有請(qǐng)求都轉(zhuǎn)發(fā)到其中一臺(tái)機(jī)器上了,并沒(méi)有達(dá)到負(fù)載均衡的目的;有鑒于此,準(zhǔn)備詳細(xì)了解了下Java中的DNS解析;
注:JDK7和JDK8關(guān)于DNS解析的實(shí)現(xiàn)有差異,該問(wèn)題在JDK7下可能不存在;
Java中的DNS解析一般是通過(guò)調(diào)用下面的方法:
public static InetAddress getByName(String host)
public static InetAddress[] getAllByName(String host)
getByName先調(diào)用getAllByName,然后返回地址列表的第一個(gè)地址;
下面主要看看getAllByName的實(shí)現(xiàn);
getAllByName
getAllByName會(huì)調(diào)用getAllByName0方法:
InetAddress[] addresses = getCachedAddresses(host);
/* If no entry in cache, then do the host lookup */
if (addresses == null) {
addresses = getAddressesFromNameService(host, reqAddr);
}
if (addresses == unknown_array)
throw new UnknownHostException(host);
return addresses.clone();
可以看到首先會(huì)從緩存中獲取,如果緩存找不到則調(diào)用getAddressesFromNameService進(jìn)行解析;
private static InetAddress[] getCachedAddresses(String hostname) {
hostname = hostname.toLowerCase();
// search both positive & negative caches
synchronized (addressCache) {
cacheInitIfNeeded();//如果是第一次調(diào)用,執(zhí)行初始化
CacheEntry entry = addressCache.get(hostname);
if (entry == null) {
entry = negativeCache.get(hostname);
}
if (entry != null) {
return entry.addresses;
}
}
// not found
return null;
}
既然JDK對(duì)IP地址解析有緩存,那么它是如何緩存的呢?緩存策略定義在InetAddressCachePolicy類,摘錄其初始化的代碼如下:
static {
Integer tmp = java.security.AccessController.doPrivileged(
new PrivilegedAction<Integer>() {
public Integer run() {
try {
//讀取JDK目錄java.security文件的屬性networkaddress.cache.ttl
String tmpString = Security.getProperty(cachePolicyProp);
if (tmpString != null) {
return Integer.valueOf(tmpString);
}
} catch (NumberFormatException ignored) {
// Ignore
}
try {
//讀取-D指定的系統(tǒng)屬性sun.net.inetaddr.ttl
String tmpString = System.getProperty(cachePolicyPropFallback);
if (tmpString != null) {
return Integer.decode(tmpString);
}
} catch (NumberFormatException ignored) {
// Ignore
}
return null;
}
});
if (tmp != null) {
cachePolicy = tmp.intValue();
if (cachePolicy < 0) {
cachePolicy = FOREVER;//如果配置的是負(fù)數(shù),表示緩存永不過(guò)期
}
propertySet = true;
} else {
//可以通過(guò)-Djava.security.manager-Djava.security.policy=security.policy啟動(dòng)安全管理器
if (System.getSecurityManager() == null) {
cachePolicy = DEFAULT_POSITIVE;//默認(rèn)是不啟動(dòng)SecurityManager的,也就是說(shuō)默認(rèn)緩存失效時(shí)間為30s
}
}
tmp = java.security.AccessController.doPrivileged (
new PrivilegedAction<Integer>() {
public Integer run() {
try {
//讀取networkaddress.cache.negative.ttl屬性,默認(rèn)是10s
String tmpString = Security.getProperty(negativeCachePolicyProp);
if (tmpString != null) {
return Integer.valueOf(tmpString);
}
} catch (NumberFormatException ignored) {
// Ignore
}
try {
//讀取-D指定的系統(tǒng)屬性sun.net.inetaddr.negative.ttl
String tmpString = System.getProperty(negativeCachePolicyPropFallback);
if (tmpString != null) {
return Integer.decode(tmpString);
}
} catch (NumberFormatException ignored) {
// Ignore
}
return null;
}
});
if (tmp != null) {
negativeCachePolicy = tmp.intValue();
if (negativeCachePolicy < 0) {
negativeCachePolicy = FOREVER;
}
propertyNegativeSet = true;
}
}
上面介紹了JVM對(duì)ip地址解析的緩存策略和相關(guān)的配置,接下來(lái)看看,如果緩存找不到,JVM該如何解析ip地址;
getAddressesFromNameService
從上面的代碼看到,InetAddress會(huì)調(diào)用getAddressesFromNameService方法,循環(huán)調(diào)用nameService的lookupAllHostAddr方法,直到找到結(jié)果:
NameService的初始化代碼如下:
impl = InetAddressImplFactory.create();
// get name service if provided and requested
String provider = null;;
String propPrefix = "sun.net.spi.nameservice.provider.";
int n = 1;
nameServices = new ArrayList<NameService>();
//可以通過(guò)sun.net.spi.nameservice.provider.n指定自己的DNS
Provider
provider = AccessController.doPrivileged(
new GetPropertyAction(propPrefix + n));
while (provider != null) {
NameService ns = createNSProvider(provider);
if (ns != null)
nameServices.add(ns);
n++;
provider = AccessController.doPrivileged(
new GetPropertyAction(propPrefix + n));
}
//如果不單獨(dú)指定,創(chuàng)建默認(rèn)的NameService
if (nameServices.size() == 0) {//
NameService ns = createNSProvider("default");
nameServices.add(ns);
}
在這里要特別提下Java提供的DNSNameService,該類可以通過(guò)下述參數(shù)啟用:
-Dsun.net.spi.nameservice.provider.1=dns,sun
-Dsun.net.spi.nameservice.nameservers=192.168.1.188
該類會(huì)根據(jù)sun.net.spi.nameservice.nameservers指定的name server或/etc/resolv.conf文件中配置的name server進(jìn)行DNS解析;
創(chuàng)建默認(rèn)的NameService方法代碼如下:
if (provider.equals("default")) {
// initialize the default name service
nameService = new NameService() {
public InetAddress[] lookupAllHostAddr(String host)
throws UnknownHostException {
return impl.lookupAllHostAddr(host);
}
public String getHostByAddr(byte[] addr)
throws UnknownHostException {
return impl.getHostByAddr(addr);
}
};
}
根據(jù)指定的provider創(chuàng)建NameService的方法如下:
nameService = java.security.AccessController.doPrivileged(
new java.security.PrivilegedExceptionAction<NameService>() {
public NameService run() {
Iterator itr = Service.providers(NameServiceDescriptor.class);
while (itr.hasNext()) {
NameServiceDescriptor nsd
= (NameServiceDescriptor)itr.next();
if (providerName.
equalsIgnoreCase(nsd.getType()+","
+nsd.getProviderName())) {
try {
return nsd.createNameService();
} catch (Exception e) {
e.printStackTrace();
System.err.println(
"Cannot create name service:"
+providerName+": " + e);
}
}
}
return null;
}
}
);
對(duì)于DNSNameServiceDescriptor,其Type和ProviderName分別為dns,sun;
繼續(xù)看默認(rèn)Provider的處理邏輯,可以看到其是通過(guò)impl.lookupAllHostAddr(host)方法進(jìn)行解析的,impl的初始化代碼為:
impl = InetAddressImplFactory.create();
static InetAddressImpl create() {
return InetAddress.loadImpl(isIPv6Supported() ?
"Inet6AddressImpl" : "Inet4AddressImpl");
}
這里以Inet4AddressImpl為例,說(shuō)明DNS的解析:
public native InetAddress[]
lookupAllHostAddr(String hostname) throws UnknownHostException;
public native String getHostByAddr(byte[] addr) throws UnknownHostException;
Inet4AddressImp類的方法是native的,是采用本地方法實(shí)現(xiàn)的:
JNIEXPORT jobjectArray JNICALL
Java_java_net_Inet4AddressImpl_lookupAllHostAddr(JNIEnv *env, jobject this,
jstring host) {
const char *hostname;
jobjectArray ret = 0;
int retLen = 0;
int error = 0;
struct addrinfo hints, *res, *resNew = NULL;
if (!initializeInetClasses(env))
return NULL;
if (IS_NULL(host)) {
JNU_ThrowNullPointerException(env, "host is null");
return 0;
}
hostname = JNU_GetStringPlatformChars(env, host, JNI_FALSE);
CHECK_NULL_RETURN(hostname, NULL);
/* Try once, with our static buffer. */
memset(&hints, 0, sizeof(hints));
hints.ai_flags = AI_CANONNAME;
hints.ai_family = AF_INET;
error = getaddrinfo(hostname, NULL, &hints, &res);
if (error) {
/* report error */
ThrowUnknownHostExceptionWithGaiError(env, hostname, error);
JNU_ReleaseStringPlatformChars(env, host, hostname);
return NULL;
} else {
int i = 0;
struct addrinfo *itr, *last = NULL, *iterator = res;
while (iterator != NULL) {
// remove the duplicate one
int skip = 0;
itr = resNew;
while (itr != NULL) {
struct sockaddr_in *addr1, *addr2;
addr1 = (struct sockaddr_in *)iterator->ai_addr;
addr2 = (struct sockaddr_in *)itr->ai_addr;
if (addr1->sin_addr.s_addr ==
addr2->sin_addr.s_addr) {
skip = 1;
break;
}
itr = itr->ai_next;
}
if (!skip) {
struct addrinfo *next
= (struct addrinfo*) malloc(sizeof(struct addrinfo));
if (!next) {
JNU_ThrowOutOfMemoryError(env, "Native heap allocation failed");
ret = NULL;
goto cleanupAndReturn;
}
memcpy(next, iterator, sizeof(struct addrinfo));
next->ai_next = NULL;
if (resNew == NULL) {
resNew = next;
} else {
last->ai_next = next;
}
last = next;
i++;
}
iterator = iterator->ai_next;
}
retLen = i;
iterator = resNew;
ret = (*env)->NewObjectArray(env, retLen, ni_iacls, NULL);
if (IS_NULL(ret)) {
/* we may have memory to free at the end of this */
goto cleanupAndReturn;
}
i = 0;
while (iterator != NULL) {
jobject iaObj = (*env)->NewObject(env, ni_ia4cls, ni_ia4ctrID);
if (IS_NULL(iaObj)) {
ret = NULL;
goto cleanupAndReturn;
}
setInetAddress_addr(env, iaObj, ntohl(((struct sockaddr_in*)iterator->ai_addr)->sin_addr.s_addr));
setInetAddress_hostName(env, iaObj, host);
(*env)->SetObjectArrayElement(env, ret, i++, iaObj);
iterator = iterator->ai_next;
}
}
}
上面的代碼一大堆,核心是調(diào)用getaddrinfo函數(shù),在getaddrinfo的man文檔中有這么一句話:
the application should try using the addresses in the order in which they are returned. The sorting function used within getaddrinfo() is defined in RFC 3484; the order can be tweaked for a
particular system by editing /etc/gai.conf (available since glibc 2.5).
getaddrinfo返回的地址列表根據(jù)RFC3484規(guī)定的排序算法進(jìn)行了排序,如果這樣的話,那么返回的地址列表順序是規(guī)定的,那就達(dá)不到負(fù)載均衡的目的了;
關(guān)于這個(gè)排序的話題,網(wǎng)上有很多討論:
- https://lists.debian.org/debian-glibc/2007/09/msg00347.html
- https://lists.debian.org/debian-ctte/2007/09/msg00067.html
- https://daniel.haxx.se/blog/2012/01/03/getaddrinfo-with-round-robin-dns-and-happy-eyeballs/
getaddrinfo的部分代碼如下:
int getaddrinfo (const char *__restrict name, const char *__restrict service,
const struct addrinfo *__restrict hints,
struct addrinfo **__restrict pai)
{
int i = 0, j = 0, last_i = 0;
int nresults = 0;
struct addrinfo *p = NULL, **end;
struct gaih *g = gaih, *pg = NULL;
struct gaih_service gaih_service, *pservice;
struct addrinfo local_hints;
while (g->gaih)
{
if (hints->ai_family == g->family || hints->ai_family == AF_UNSPEC)
{
j++;
if (pg == NULL || pg->gaih != g->gaih)
{
pg = g;
i = g->gaih (name, pservice, hints, end);
if (i != 0)
{
/* EAI_NODATA is a more specific result as it says that
we found a result but it is not usable. */
if (last_i != (GAIH_OKIFUNSPEC | -EAI_NODATA))
last_i = i;
if (hints->ai_family == AF_UNSPEC && (i & GAIH_OKIFUNSPEC))
{
++g;
continue;
}
freeaddrinfo (p);
return -(i & GAIH_EAI);
}
if (end)
while (*end)
{
end = &((*end)->ai_next);
++nresults;
}
}
}
++g;
}
if (j == 0)
return EAI_FAMILY;
if (nresults > 1)
{
/* Sort results according to RFC 3484. */
struct sort_result results[nresults];
struct addrinfo *q;
struct addrinfo *last = NULL;
char *canonname = NULL;
for (i = 0, q = p; q != NULL; ++i, last = q, q = q->ai_next)
{
results[i].dest_addr = q;
results[i].got_source_addr = false;
/* If we just looked up the address for a different
protocol, reuse the result. */
if (last != NULL && last->ai_addrlen == q->ai_addrlen
&& memcmp (last->ai_addr, q->ai_addr, q->ai_addrlen) == 0)
{
memcpy (&results[i].source_addr, &results[i - 1].source_addr,
results[i - 1].source_addr_len);
results[i].source_addr_len = results[i - 1].source_addr_len;
results[i].got_source_addr = results[i - 1].got_source_addr;
}
else
{
/* We overwrite the type with SOCK_DGRAM since we do not
want connect() to connect to the other side. If we
cannot determine the source address remember this
fact. */
int fd = socket (q->ai_family, SOCK_DGRAM, IPPROTO_IP);
socklen_t sl = sizeof (results[i].source_addr);
if (fd != -1
&& connect (fd, q->ai_addr, q->ai_addrlen) == 0
&& getsockname (fd,
(struct sockaddr *) &results[i].source_addr,
&sl) == 0)
{
results[i].source_addr_len = sl;
results[i].got_source_addr = true;
}
else
/* Just make sure that if we have to process the same
address again we do not copy any memory. */
results[i].source_addr_len = 0;
if (fd != -1)
close_not_cancel_no_status (fd);
}
/* Remember the canonical name. */
if (q->ai_canonname != NULL)
{
assert (canonname == NULL);
canonname = q->ai_canonname;
q->ai_canonname = NULL;
}
}
/* We got all the source addresses we can get, now sort using
the information. */
qsort (results, nresults, sizeof (results[0]), rfc3484_sort);
/* Queue the results up as they come out of sorting. */
q = p = results[0].dest_addr;
for (i = 1; i < nresults; ++i)
q = q->ai_next = results[i].dest_addr;
q->ai_next = NULL;
/* Fill in the canonical name into the new first entry. */
p->ai_canonname = canonname;
}
if (p)
{
*pai = p;
return 0;
}
if (pai == NULL && last_i == 0)
return 0;
return last_i ? -(last_i & GAIH_EAI) : EAI_NONAME;
}
排序是通過(guò)rfc3484_sort完成的,后面有時(shí)間準(zhǔn)備仔細(xì)看看其排序規(guī)則:
static int
rfc3484_sort (const void *p1, const void *p2)
{
const struct sort_result *a1 = (const struct sort_result *) p1;
const struct sort_result *a2 = (const struct sort_result *) p2;
/* Rule 1: Avoid unusable destinations.
We have the got_source_addr flag set if the destination is reachable. */
if (a1->got_source_addr && ! a2->got_source_addr)
return -1;
if (! a1->got_source_addr && a2->got_source_addr)
return 1;
/* Rule 2: Prefer matching scope. Only interesting if both
destination addresses are IPv6. */
int a1_dst_scope
= get_scope ((struct sockaddr_storage *) a1->dest_addr->ai_addr);
int a2_dst_scope
= get_scope ((struct sockaddr_storage *) a2->dest_addr->ai_addr);
if (a1->got_source_addr)
{
int a1_src_scope = get_scope (&a1->source_addr);
int a2_src_scope = get_scope (&a2->source_addr);
if (a1_dst_scope == a1_src_scope && a2_dst_scope != a2_src_scope)
return -1;
if (a1_dst_scope != a1_src_scope && a2_dst_scope == a2_src_scope)
return 1;
}
/* Rule 3: Avoid deprecated addresses.
That's something only the kernel could decide. */
/* Rule 4: Prefer home addresses.
Another thing only the kernel can decide. */
/* Rule 5: Prefer matching label. */
if (a1->got_source_addr)
{
int a1_dst_label
= get_label ((struct sockaddr_storage *) a1->dest_addr->ai_addr);
int a1_src_label = get_label (&a1->source_addr);
int a2_dst_label
= get_label ((struct sockaddr_storage *) a2->dest_addr->ai_addr);
int a2_src_label = get_label (&a2->source_addr);
if (a1_dst_label == a1_src_label && a2_dst_label != a2_src_label)
return -1;
if (a1_dst_label != a1_src_label && a2_dst_label == a2_src_label)
return 1;
}
/* Rule 6: Prefer higher precedence. */
int a1_prec
= get_precedence ((struct sockaddr_storage *) a1->dest_addr->ai_addr);
int a2_prec
= get_precedence ((struct sockaddr_storage *) a2->dest_addr->ai_addr);
if (a1_prec > a2_prec)
return -1;
if (a1_prec < a2_prec)
return 1;
/* Rule 7: Prefer native transport.
XXX How to recognize tunnels? */
/* Rule 8: Prefer smaller scope. */
if (a1_dst_scope < a2_dst_scope)
return -1;
if (a1_dst_scope > a2_dst_scope)
return 1;
/* Rule 9: Use longest matching prefix. */
if (a1->got_source_addr
&& a1->dest_addr->ai_family == a2->dest_addr->ai_family)
{
int bit1 = 0;
int bit2 = 0;
if (a1->dest_addr->ai_family == PF_INET)
{
assert (a1->source_addr.ss_family == PF_INET);
assert (a2->source_addr.ss_family == PF_INET);
struct sockaddr_in *in1_dst;
struct sockaddr_in *in1_src;
struct sockaddr_in *in2_dst;
struct sockaddr_in *in2_src;
in1_dst = (struct sockaddr_in *) a1->dest_addr->ai_addr;
in1_src = (struct sockaddr_in *) &a1->source_addr;
in2_dst = (struct sockaddr_in *) a2->dest_addr->ai_addr;
in2_src = (struct sockaddr_in *) &a2->source_addr;
bit1 = ffs (in1_dst->sin_addr.s_addr ^ in1_src->sin_addr.s_addr);
bit2 = ffs (in2_dst->sin_addr.s_addr ^ in2_src->sin_addr.s_addr);
}
else if (a1->dest_addr->ai_family == PF_INET6)
{
assert (a1->source_addr.ss_family == PF_INET6);
assert (a2->source_addr.ss_family == PF_INET6);
struct sockaddr_in6 *in1_dst;
struct sockaddr_in6 *in1_src;
struct sockaddr_in6 *in2_dst;
struct sockaddr_in6 *in2_src;
in1_dst = (struct sockaddr_in6 *) a1->dest_addr->ai_addr;
in1_src = (struct sockaddr_in6 *) &a1->source_addr;
in2_dst = (struct sockaddr_in6 *) a2->dest_addr->ai_addr;
in2_src = (struct sockaddr_in6 *) &a2->source_addr;
int i;
for (i = 0; i < 4; ++i)
if (in1_dst->sin6_addr.s6_addr32[i]
!= in1_src->sin6_addr.s6_addr32[i]
|| (in2_dst->sin6_addr.s6_addr32[i]
!= in2_src->sin6_addr.s6_addr32[i]))
break;
if (i < 4)
{
bit1 = ffs (in1_dst->sin6_addr.s6_addr32[i]
^ in1_src->sin6_addr.s6_addr32[i]);
bit2 = ffs (in2_dst->sin6_addr.s6_addr32[i]
^ in2_src->sin6_addr.s6_addr32[i]);
}
}
if (bit1 > bit2)
return -1;
if (bit1 < bit2)
return 1;
}
/* Rule 10: Otherwise, leave the order unchanged. */
return 0;
}
可以看到,首先根據(jù)RFC3484的Rule1~Rule9排序,如果上述規(guī)則都未觸發(fā),則返回原列表;簡(jiǎn)單的說(shuō),返回結(jié)果的順序是不固定的,有可能是DNS Server返回的順序,也有可能不是;因此最好的辦法是在Java層自己進(jìn)行控制;