數(shù)據(jù)內(nèi)容:在腫瘤測序數(shù)據(jù)中,發(fā)現(xiàn)血系變異與體細(xì)胞變異的疊加
數(shù)據(jù)來源:TCGA
數(shù)據(jù)理解:
①TCGA數(shù)據(jù)產(chǎn)生過程:


②主要數(shù)據(jù)的分析:
數(shù)據(jù)操作規(guī)則(均只對一條數(shù)據(jù)的描述):數(shù)據(jù)描述為XXY-XXY-XXXX-XX(目前我們只看前六位數(shù)據(jù))(0——腫瘤,1——正常,10——血液,11——癌旁組織)



(1)前三位數(shù)的選擇
①兩列的tissue中第一位數(shù)據(jù)必須有0開頭和1開頭的數(shù)據(jù),表示既有腫瘤基因數(shù)據(jù)又有正常基因組數(shù)據(jù)
②若前兩列第一,二位數(shù)據(jù)中有01,10,那么則有第三列數(shù)據(jù)的第一,二位的數(shù)據(jù)為11
③第三位首選A
(2)后三位數(shù)的選擇(前提是兩組數(shù)據(jù)的前三位數(shù)字相同)
①優(yōu)先選擇01D的數(shù)據(jù),若出現(xiàn)01W的數(shù)據(jù)的var_count數(shù)據(jù)的值大于3,則選擇01W的數(shù)據(jù)
(3)特殊情況
①前三位為01B和02A出現(xiàn)時,誰的VCF不為0則選擇誰,若VCF均為0則選擇01B
數(shù)據(jù)操作方法(個人的方法):
①對整組數(shù)據(jù)進(jìn)行篩選,將第一列tissue的值是10-,11-開頭的全部挑出來,看他們對應(yīng)的第二列數(shù)據(jù)是否為0開頭的數(shù)據(jù),若不是則全部清空;整理后剩下的數(shù)據(jù)的第一列tissue都是0開頭的
②對第二列tissue進(jìn)行整理:將0開頭的數(shù)據(jù)篩選出,進(jìn)行整理
小tips:如何對篩選出來的數(shù)據(jù)進(jìn)行處理需要注意,因?yàn)閱渭兊膶Y選后的數(shù)據(jù)進(jìn)行成組復(fù)制粘貼會覆蓋隱藏部分的數(shù)據(jù),因此需要在excel中進(jìn)行宏操作,具體操作步驟:http://jingyan.baidu.com/article/295430f12b4aef0c7e00501b.html
宏代碼:
Sub 多區(qū)域復(fù)制粘貼()
On Error Resume Next
Dim SRange() As Range, UPRange As Range, TRange As Range
Dim i As Long, AreaNum As Long
Dim MinR As Long, MinC As Long
AreaNum = Selection.Areas.Count
ReDim SRange(1 To AreaNum)
MinR = ActiveSheet.Rows.Count
MinC = ActiveSheet.Columns.Count
For i = 1 To AreaNum
Set SRange(i) = Selection.Areas(i)
If SRange(i).Row < MinR Then MinR = SRange(i).Row
If SRange(i).Column < MinC Then MinC = SRange(i).Column
Next i
Set UPRange = Cells(SRange(1).Row, SRange(1).Column)
Set TRange = Application.InputBox(prompt:="選擇粘貼區(qū)域的最左上角單元格", Title:="多區(qū)域復(fù)制粘貼", Type:=8)
Application.ScreenUpdating = False
For i = 1 To AreaNum
SRange(i).Copy
TRange.Offset(SRange(i).Row - MinR, SRange(i).Column - MinC).PasteSpecial paste:=xlPasteValues
Next i
Application.ScreenUpdating = True
End Sub