前言
最近公司在考慮做全鏈路監(jiān)控的事兒,主要是一個(gè)請(qǐng)求進(jìn)來從服務(wù)網(wǎng)關(guān)到各個(gè)業(yè)務(wù)戰(zhàn)隊(duì)會(huì)流轉(zhuǎn)到很多個(gè)戰(zhàn)隊(duì)的業(yè)務(wù)模塊,如果一個(gè)業(yè)務(wù)中出現(xiàn)問題就會(huì)影響到整個(gè)調(diào)用鏈路的結(jié)果(響應(yīng)時(shí)間、響應(yīng)結(jié)果、異常處理等等)。因此我們需要考慮一個(gè)全鏈路監(jiān)控機(jī)制來完成請(qǐng)求的全鏈路監(jiān)控。最開始考慮直接基于pinpoint的注入插件來做,但是后來發(fā)現(xiàn)由于采樣率等關(guān)系,無法應(yīng)用到生產(chǎn)上,所以考慮自己做一套自己的標(biāo)準(zhǔn)。同時(shí)我們也想引入opentracing的標(biāo)準(zhǔn),因此有了這一篇文章所覆蓋的內(nèi)容。
opentracing簡(jiǎn)單介紹

opentracing的具體信息大家可以參照前言里的相關(guān)鏈接到opentracing的官網(wǎng)和github上的去看看,這里只是做一些簡(jiǎn)單地介紹。opentracing里主要包含以下幾個(gè)組件:
Span
表示分布式調(diào)用鏈條中的一個(gè)調(diào)用單元,比方說某個(gè)dubbo的調(diào)用provider,或者是個(gè)http調(diào)用的服務(wù)提供方,他的邊界包含一個(gè)請(qǐng)求進(jìn)到服務(wù)內(nèi)部再由某種途徑(http/dubbo等)從當(dāng)前服務(wù)出去。一個(gè)span一般會(huì)記錄這個(gè)調(diào)用單元內(nèi)部的一些信息,例如:
- 日志信息
- 標(biāo)簽信息
- 開始/結(jié)束時(shí)間
SpanContext
表示一個(gè)span對(duì)應(yīng)的上下文,span和spanContext基本上是一一對(duì)應(yīng)的關(guān)系,上下文存儲(chǔ)的是一些需要跨越邊界的一些信息,例如:
- spanId 當(dāng)前這個(gè)span的id
- traceId 這個(gè)span所屬的traceId(也就是這次調(diào)用鏈的唯一id)
- baggage 其他的能過跨越多個(gè)調(diào)用單元的信息
這個(gè)SpanContext可以通過某些媒介和方式傳遞給調(diào)用鏈的下游來做一些處理(例如子Span的id生成、信息的繼承打印日志等等)
Tracer
tracer表示的是一個(gè)通用的接口,它相當(dāng)于是opentracing標(biāo)準(zhǔn)的樞紐,它有以下的職責(zé):
- 建立和開啟一個(gè)span
- 從某種媒介中提取和注入一個(gè)spanContext
Carrier
表示的是一個(gè)承載spanContext的媒介,比方說在http調(diào)用場(chǎng)景中會(huì)有HttpCarrier,在dubbo調(diào)用場(chǎng)景中也會(huì)有對(duì)應(yīng)的DubboCarrier。
Formatter
這個(gè)接口負(fù)責(zé)了具體場(chǎng)景中序列化反序列化上下文的具體邏輯,例如在HttpCarrier使用中通常就會(huì)有一個(gè)對(duì)應(yīng)的HttpFormatter。Tracer的注入和提取就是委托給了Formatter
ScopeManager
這個(gè)類是0.30版本之后新加入的組件,這個(gè)組件的作用是能夠通過它獲取當(dāng)前線程中啟用的Span信息,并且可以啟用一些處于未啟用狀態(tài)的span。在一些場(chǎng)景中,我們?cè)谝粋€(gè)線程中可能同時(shí)建立多個(gè)span,但是同一時(shí)間統(tǒng)一線程只會(huì)有一個(gè)span在啟用,其他的span可能處在下列的狀態(tài)中:
- 等待子span完成
- 等待某種阻塞方法
- 創(chuàng)建并未開始
除了上述組件之外,我們?cè)趯?shí)現(xiàn)一個(gè)分布式全鏈路監(jiān)控框架的時(shí)候,還需要有一個(gè)reporter組件,通過它來打印或者上報(bào)一些關(guān)鍵鏈路信息(例如span創(chuàng)建和結(jié)束),只有把這些信息進(jìn)行處理之后我們才能對(duì)全鏈路信息進(jìn)行可視化和真正的監(jiān)控。
簡(jiǎn)單實(shí)現(xiàn)思路
這篇文章先介紹一些關(guān)鍵組件(涵蓋Span、SpanContext、Tracer和ScopeManager)關(guān)鍵邏輯的實(shí)現(xiàn),也借鑒了一點(diǎn)sofa-tracer的實(shí)現(xiàn)思路(比方說spanId生成規(guī)則、traceId生成規(guī)則等,關(guān)于這些信息大家可以移步到sofa-tracer來查看)。我們的項(xiàng)目叫星圖(StarAtlas),因此我們的組件都是以這個(gè)為前綴的,這里省去我們的包名作者日期等注釋信息。
先來看Span:
import io.opentracing.Span;
import io.opentracing.SpanContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* StarAtlasSpan
* <p>
* the implementation of span
*
*/
public class StarAtlasSpan implements Span {
private StarAtlasTracer starAtlasTracer;
private long startTime;
private List<StarAtlasSpanReferenceRelationship> spanReferences;
private String operationName;
private StarAtlasSpanContext spanContext;
private Logger logger = LoggerFactory.getLogger(this.getClass());
public StarAtlasSpan(StarAtlasTracer starAtlasTracer, long startTime,
List<StarAtlasSpanReferenceRelationship> spanReferences,
String operationName, StarAtlasSpanContext spanContext,
Map<String, ?> tags) {
AssertUtils.notNull(starAtlasTracer);
AssertUtils.notNull(spanContext);
this.starAtlasTracer = starAtlasTracer;
this.startTime = startTime;
this.spanReferences = spanReferences != null ? new ArrayList<StarAtlasSpanReferenceRelationship>(
spanReferences) : null;
this.operationName = operationName;
this.spanContext = spanContext;
//tags
this.setTags(tags);
// report extention to be implement
//SpanExtensionFactory.logStartedSpan(this);
}
@Override
public SpanContext context() {
return this.spanContext;
}
@Override
public Span setTag(String s, String s1) {
return null;
}
@Override
public Span setTag(String s, boolean b) {
return null;
}
@Override
public Span setTag(String s, Number number) {
return null;
}
@Override
public Span log(Map<String, ?> map) {
return null;
}
@Override
public Span log(long l, Map<String, ?> map) {
return null;
}
@Override
public Span log(String s) {
return null;
}
@Override
public Span log(long l, String s) {
return null;
}
@Override
public Span setBaggageItem(String s, String s1) {
return null;
}
@Override
public String getBaggageItem(String s) {
return null;
}
@Override
public Span setOperationName(String s) {
return null;
}
@Override
public void finish() {
}
@Override
public void finish(long l) {
}
private void setTags(Map<String, ?> tags) {
if (tags == null || tags.size() <= 0) {
return;
}
for (Map.Entry<String, ?> entry : tags.entrySet()) {
String key = entry.getKey();
if (StringUtils.isBlank(key)) {
continue;
}
Object value = entry.getValue();
if (value == null) {
continue;
}
if (value instanceof String) {
//初始化時(shí)候,tags也可以作為 client 和 server 的判斷依據(jù)
this.setTag(key, (String) value);
} else if (value instanceof Boolean) {
this.setTag(key, (Boolean) value);
} else if (value instanceof Number) {
this.setTag(key, (Number) value);
} else {
logger.error("Span tags unsupported type [" + value.getClass() + "]");
}
}
}
}
這里比較簡(jiǎn)單,就是創(chuàng)建一個(gè)Span,并且注入一些信息,這里注釋了一些打印日志的代碼。在構(gòu)建函數(shù)里面有個(gè)StarAtlasSpanReferenceRelationship的list,這個(gè)類實(shí)際上是標(biāo)識(shí)了這個(gè)Span和其他Span之間的關(guān)系,用于創(chuàng)建Span的時(shí)候維護(hù)父子從屬關(guān)系。
我們?cè)賮砜纯碨panContext:
import io.opentracing.SpanContext;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;
/**
* StarAtlasSpanContext
*
* the span context implementation to store span information
*
*/
public class StarAtlasSpanContext implements SpanContext {
//spanId 分隔符
public static final String RPC_ID_SEPARATOR = ".";
//======================== 以下為序列化數(shù)據(jù)的 key ========================
private static final String TRACE_ID_KET = "tcid";
private static final String SPAN_ID_KET = "spid";
private static final String PARENT_SPAN_ID_KET = "pspid";
private static final String SAMPLE_KET = "sample";
private AtomicInteger childContextIndex = new AtomicInteger(0);
private String spanId;
private String traceId;
private String parentId;
/***
* 默認(rèn)不會(huì)采樣
*/
private boolean isSampled = false;
public StarAtlasSpanContext(String traceId, String spanId, String parentId) {
//默認(rèn)不會(huì)采樣
this(traceId, spanId, parentId, false);
}
public StarAtlasSpanContext(String traceId, String spanId, String parentId, boolean isSampled) {
this.traceId = traceId;
this.spanId = spanId;
this.parentId = StringUtils.isBlank(parentId) ? this.genParentSpanId(spanId) : parentId;
this.isSampled = isSampled;
}
@Override
public Iterable<Map.Entry<String, String>> baggageItems() {
return null;
}
/**
* 獲取下一個(gè)子上下文的 ID
*
* @return 下一個(gè) spanId
*/
public String nextChildContextId() {
return this.spanId + RPC_ID_SEPARATOR + childContextIndex.incrementAndGet();
}
public String getSpanId() {
return spanId;
}
public void setSpanId(String spanId) {
this.spanId = spanId;
}
public String getTraceId() {
return traceId;
}
public void setTraceId(String traceId) {
this.traceId = traceId;
}
public String getParentId() {
return parentId;
}
public void setParentId(String parentId) {
this.parentId = parentId;
}
public boolean isSampled() {
return isSampled;
}
public void setSampled(boolean sampled) {
isSampled = sampled;
}
private String genParentSpanId(String spanId) {
return (StringUtils.isBlank(spanId) || spanId.lastIndexOf(RPC_ID_SEPARATOR) < 0) ? StringUtils.EMPTY_STRING
: spanId.substring(0, spanId.lastIndexOf(RPC_ID_SEPARATOR));
}
}
這個(gè)類跟Span類似,也是存儲(chǔ)了一些spanId、traceId和baggage等信息,另外有幾個(gè)比較特別的函數(shù),包括獲取當(dāng)前上下文的父級(jí)spanId,生成下一級(jí)的子span的id。
接下來再看看Scope和ScopeManager:
import io.opentracing.Scope;
import io.opentracing.ScopeManager;
import io.opentracing.Span;
/**
* StarAtlasScopeManager
* <p>
* the scope manager to store and manage the scope information within a thread
*
*/
public class StarAtlasScopeManager implements ScopeManager {
/**
* the thread local store for the active scope
*/
final ThreadLocal<StarAtlasScope> scopeThreadLocal = new ThreadLocal<>();
/**
* singleton method
*
* @return
*/
public static StarAtlasScopeManager getInstance() {
return StarAtlasScopeManagerSingletonHolder.INSTANCE;
}
private StarAtlasScopeManager() {
}
/**
* the method to active a span
*
* @param span
* @param finishOnClose
* @return
*/
@Override
public Scope activate(Span span, boolean finishOnClose) {
if (!checkCanActivate(span)) {
throw new IllegalStateException("a span cannot be activated more than once");
}
return new StarAtlasScope(this, span, finishOnClose);
}
/**
* the method to get the current active span
*
* @return
*/
@Override
public Scope active() {
return this.scopeThreadLocal.get();
}
/**
* check if the span can be activate
* if the span exists in the recover chain of the current active scope
* then we know that the span has been activate before.
*
* @param span
* @return
*/
private boolean checkCanActivate(Span span) {
StarAtlasScope scope = (StarAtlasScope) this.active();
while (scope != null) {
if (scope.span() == span) {
return false;
}
scope = scope.scopeToRecover;
}
return true;
}
private static class StarAtlasScopeManagerSingletonHolder {
private static final StarAtlasScopeManager INSTANCE = new StarAtlasScopeManager();
}
}
這里ScopeManage主要通過一個(gè)ThreadLocal來存儲(chǔ)當(dāng)前Span的信息(用一個(gè)Scope來包裝)。然后實(shí)現(xiàn)了三個(gè)方法:
- activate 在當(dāng)前線程中激活一個(gè)span,并返回一個(gè)scope封裝當(dāng)前激活的span
- active 返回當(dāng)前線程激活的scope
- checkCanActivate 這是自行實(shí)現(xiàn)的一個(gè)方法,我們激活一個(gè)span封裝scope的時(shí)候會(huì)把激活前線程中激活的scope以scopeToRecover變量存儲(chǔ)在新激活的scope中(具體可參考接下來scope的代碼)。這樣我們就可以根據(jù)當(dāng)前激活的scope以scopeToRecover來不斷地追溯到最初,因此當(dāng)我們激活一個(gè)span的時(shí)候,我們就可以通過這個(gè)span在不在追溯的鏈路上來判斷是否這個(gè)span被重復(fù)激活了。
Scope代碼如下:
import io.opentracing.Scope;
import io.opentracing.Span;
/**
* StarAtlasScope
* <p>
* StarAtlasScope is a wrap class for span
* It represents a active span in current thread.
* And it support close function to deactivate a span
*
*/
public class StarAtlasScope implements Scope {
/**
* finish the span or not when we close the scope
*/
private final boolean finishOnClose;
/**
* the wrapped span
*/
private final Span span;
/**
* scope manager
*/
private final StarAtlasScopeManager scopeManager;
/**
* the scope to recover on close
*/
final StarAtlasScope scopeToRecover;
StarAtlasScope(StarAtlasScopeManager scopeManager, Span span, boolean finishOnClose) {
this.finishOnClose = finishOnClose;
this.span = span;
this.scopeManager = scopeManager;
// store the previous scope to recover
this.scopeToRecover = this.scopeManager.scopeThreadLocal.get();
// push the current scope into thread local
// may extract into a package level method in StarAtlasScopeManager
this.scopeManager.scopeThreadLocal.set(this);
}
/**
* call close means the active period for the current thread and scope comes to an end
*/
@Override
public void close() {
// if the current active scope does not equal to this
// the close operation can not continue
if (scopeManager.active() != this) {
throw new IllegalStateException("can not call scope close in an unexpected way");
}
if (finishOnClose) {
span.finish();
}
// recover the scope
this.scopeManager.scopeThreadLocal.set(this.scopeToRecover);
}
@Override
public Span span() {
return span;
}
}
Scope的實(shí)現(xiàn)基本就是封裝了一個(gè)span,并且在創(chuàng)建的時(shí)候把之前激活的scope存下來(印證了之前的說法),支持兩個(gè)方法:
- close 關(guān)閉當(dāng)前的scope,也連帶的把封裝的span關(guān)閉,并且恢復(fù)線程中激活的scope到之前。
- span 返回封裝的span
最后我們?cè)賮砜纯碩racer:
import io.opentracing.*;
import io.opentracing.propagation.Format;
import java.util.*;
/**
*/
public class StarAtlasTracer implements Tracer {
/**
* traceID的KEY
*/
public static final String KEY_TRACEID = "SA-TRACEID";
/**
* 正常 TRACE 開始的 spanId
*/
public static final String ROOT_SPAN_ID = "0";
@Override
public ScopeManager scopeManager() {
return StarAtlasScopeManager.getInstance();
}
@Override
public Span activeSpan() {
return this.scopeManager().active().span();
}
@Override
public SpanBuilder buildSpan(String operationName) {
return new StarAtlasSpanBuilder(operationName);
}
@Override
public <C> void inject(SpanContext spanContext, Format<C> format, C c) {
}
@Override
public <C> SpanContext extract(Format<C> format, C c) {
return null;
}
/**
* the implementation of span builder
*/
private class StarAtlasSpanBuilder implements SpanBuilder {
private String operationName = StringUtils.EMPTY_STRING;
private long startTime = -1;
private List<StarAtlasSpanReferenceRelationship> references = Collections.emptyList();
private final Map<String, Object> tags = new HashMap<String, Object>();
private boolean ignoreActiveSpan = false;
public StarAtlasSpanBuilder(String operationName){
this.operationName = operationName;
}
@Override
public SpanBuilder asChildOf(SpanContext parentContext) {
return addReference(References.CHILD_OF, parentContext);
}
@Override
public SpanBuilder asChildOf(Span parentSpan) {
if(parentSpan == null){
return this;
}
return asChildOf(parentSpan.context());
}
@Override
public SpanBuilder addReference(String referenceType, SpanContext referencedContext) {
if (referencedContext == null) {
return this;
}
if (!(referencedContext instanceof StarAtlasSpanContext)) {
return this;
}
if (!References.CHILD_OF.equals(referenceType)
&& !References.FOLLOWS_FROM.equals(referenceType)) {
return this;
}
if (references.isEmpty()) {
// Optimization for 99% situations, when there is only one parent
references = Collections.singletonList(new StarAtlasSpanReferenceRelationship(
(StarAtlasSpanContext) referencedContext, referenceType));
} else {
if (references.size() == 1) {
//要保證有順序
references = new ArrayList<StarAtlasSpanReferenceRelationship>(references);
}
references.add(new StarAtlasSpanReferenceRelationship(
(StarAtlasSpanContext) referencedContext, referenceType));
}
return this;
}
@Override
public SpanBuilder ignoreActiveSpan() {
throw new UnsupportedOperationException("unsupport ignore active span right now");
}
@Override
public SpanBuilder withTag(String key, String value) {
this.tags.put(key, value);
return this;
}
@Override
public SpanBuilder withTag(String key, boolean value) {
this.tags.put(key, value);
return this;
}
@Override
public SpanBuilder withTag(String key, Number value) {
this.tags.put(key, value);
return this;
}
@Override
public SpanBuilder withStartTimestamp(long startTime) {
this.startTime = startTime;
return this;
}
@Override
public Scope startActive(boolean finishOnClose) {
Span span = this.start();
return StarAtlasTracer.this.scopeManager().activate(span, finishOnClose);
}
@Override
public Span startManual() {
return null;
}
@Override
public Span start() {
StarAtlasSpanContext spanContext = null;
if(this.references.size() > 0){
// there is a parent context
spanContext = createChildContext();
}else if (!this.ignoreActiveSpan
&& StarAtlasTracer.this.scopeManager().active() != null){
// use the current span as default parent;
Scope currentScope = StarAtlasTracer.this.scopeManager().active();
this.asChildOf(currentScope.span());
spanContext = createChildContext();
}else {
// it should be the root
spanContext = createRootSpanContext();
}
long begin = this.startTime > 0 ? this.startTime : System.currentTimeMillis();
StarAtlasSpan span = new StarAtlasSpan(StarAtlasTracer.this, begin,
this.references, this.operationName, spanContext, this.tags);
return span;
}
private StarAtlasSpanContext createRootSpanContext(){
String traceId = TraceIdGenerator.generate();
return new StarAtlasSpanContext(traceId, ROOT_SPAN_ID, StringUtils.EMPTY_STRING);
}
private StarAtlasSpanContext createChildContext() {
StarAtlasSpanContext preferredReference = preferredReference();
StarAtlasSpanContext sofaTracerSpanContext = new StarAtlasSpanContext(
preferredReference.getTraceId(), preferredReference.nextChildContextId(),
preferredReference.getSpanId(), preferredReference.isSampled());
return sofaTracerSpanContext;
}
/**
* choose the preferred reference
* @return
*/
private StarAtlasSpanContext preferredReference() {
StarAtlasSpanReferenceRelationship preferredReference = references.get(0);
for (StarAtlasSpanReferenceRelationship reference : references) {
// childOf takes precedence as a preferred parent
String referencedType = reference.getReferenceType();
if (References.CHILD_OF.equals(referencedType)
&& !References.CHILD_OF.equals(preferredReference.getReferenceType())) {
preferredReference = reference;
break;
}
}
return preferredReference.getSpanContext();
}
}
}
這里借鑒了一些sofa-tracer里面的實(shí)現(xiàn)。主要邏輯就是實(shí)現(xiàn)了SpanBuilder來完成創(chuàng)建Span的邏輯,并且提供了激活span的接口。
測(cè)試
完成了這些功能之后,我們可以編寫下列單元測(cè)試代碼來進(jìn)行測(cè)試:
import io.opentracing.Scope;
import io.opentracing.Span;
import org.junit.Assert;
import org.junit.Test;
/**
* StarAtlasTracerTest
*
*/
public class StarAtlasTracerTest {
/**
* 測(cè)試僅生成root
*/
@Test
public void generateRoot(){
StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
Span root = starAtlasTracer.buildSpan("root").start();
Assert.assertNotNull(root);
StarAtlasSpanContext context = (StarAtlasSpanContext) root.context();
Assert.assertEquals(context.getSpanId(), "0");
Assert.assertEquals(context.getParentId(), "");
Assert.assertFalse(StringUtils.isBlank(context.getTraceId()));
Assert.assertNull(starAtlasTracer.scopeManager().active());
}
/**
* 測(cè)試生成root并activate
*/
@Test
public void generateRootAndActivate(){
StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
Scope rootScope = starAtlasTracer.buildSpan("root").startActive(true);
Assert.assertNotNull(rootScope);
StarAtlasSpanContext context = (StarAtlasSpanContext) rootScope.span().context();
Assert.assertEquals(context.getSpanId(), "0");
Assert.assertEquals(context.getParentId(), "");
Assert.assertNotNull(starAtlasTracer.scopeManager().active());
Assert.assertEquals(rootScope, starAtlasTracer.scopeManager().active());
rootScope.close();
Assert.assertNull(starAtlasTracer.scopeManager().active());
}
/**
* 測(cè)試生成child并activate
*/
@Test
public void generateChildAndActivate(){
StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
Scope rootScope = starAtlasTracer.buildSpan("root").startActive(true);
StarAtlasSpanContext rootContext = (StarAtlasSpanContext) rootScope.span().context();
Assert.assertNotNull(rootScope);
Span child = starAtlasTracer.buildSpan("child").asChildOf(rootScope.span()).start();
StarAtlasSpanContext context = (StarAtlasSpanContext)child.context();
Assert.assertEquals(context.getSpanId(), "0.1");
Assert.assertEquals(context.getTraceId(), rootContext.getTraceId());
Assert.assertEquals(rootScope, starAtlasTracer.scopeManager().active());
Scope childScope = starAtlasTracer.scopeManager().activate(child, true);
Assert.assertEquals(childScope, starAtlasTracer.scopeManager().active());
childScope.close();
Assert.assertEquals(rootScope, starAtlasTracer.scopeManager().active());
rootScope.close();
}
/**
* 測(cè)試重復(fù)激活span
*/
@Test
public void testDuplicatedActivate(){
StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
Span root = starAtlasTracer.buildSpan("root").start();
Scope rootScope = starAtlasTracer.scopeManager().activate(root, true);
Span child = starAtlasTracer.buildSpan("child").start();
Scope childScope = starAtlasTracer.scopeManager().activate(child, true);
try{
starAtlasTracer.scopeManager().activate(root, true);
} catch (Exception e){
System.out.println(e.getMessage());
Assert.assertTrue(e instanceof IllegalStateException);
}
childScope.close();
rootScope.close();
}
}
具體測(cè)試場(chǎng)景在注釋中都有,有興趣的同學(xué)可以自行泡一下。
后記
本篇文章講解了一下opentracing中的基本概念,并提供了一個(gè)基本的實(shí)現(xiàn)和測(cè)試。后續(xù)有時(shí)間和精力的情況下有可能會(huì)有后續(xù)文章討論一下如何介入dubbo/http等場(chǎng)景。有問題的同學(xué)可以通過評(píng)論來討論。