之前寫過查詢數(shù)據(jù)集中的空變量的宏程序,一時沒找到。于是重寫一版,并以此簡單介紹下宏程序的構(gòu)建過程。
這篇文章從功能算法講起,然后編程實現(xiàn)算法邏輯,最后進行宏程序的構(gòu)建,宏程序完整代碼在文章第4部分匯總。
希望這篇文章可以對讀者日常SAS編程工作有所幫助。
先展示宏程序輸出的效果:



更多臨床試驗SAS編程內(nèi)容,歡迎關(guān)注:SAS茶談。
1. 程序算法設(shè)計
宏程序的構(gòu)思設(shè)計,從最小功能單位開始。對于查詢數(shù)據(jù)集中的空變量,我們從單個數(shù)據(jù)集的單個變量的判斷做起。
演示數(shù)據(jù)集使用SASHELP.Class,進行新增空變量處理。
***test dataset;
data class;
set sashelp.class;
a = "";
b = .;
run;

目前數(shù)據(jù)集中有兩個完全為空的變量,宏程序的目的就是把這兩個變量找出來。用什么程序語言來表示空變量,這個需要程序員自己摸索和嘗試。我選用的是,變量不為空的記錄數(shù)為0。聽起來有些拗口,但程序?qū)崿F(xiàn)起來比較簡單。
2. 編程實現(xiàn)算法
Proc SQL和Data步都能夠?qū)崿F(xiàn)非空記錄數(shù)的統(tǒng)計,但因為SQL的聚合函數(shù)跨記錄處理相對方便,我以SQL語句進行演示,先統(tǒng)計變量Name不為空的記錄數(shù):
***Get number of non-missing records;
proc sql noprint;
create table result1 as
select "CLASS" as Dataset length=50, "NAME" as Var length=50, sum(not missing(name)) as Sum
from class;
quit;

依次類推,我們可以統(tǒng)計出每個變量不為空的記錄數(shù),然后依次將各個結(jié)果縱向拼接在一起。
SQL中縱向拼接的查詢表達是outer union,默認是按兩個查詢表依次位置拼接的,相同變量拼接需加上關(guān)鍵字corresponding/corr,語法細節(jié)參考SAS官方文檔:SAS Help Center: query Expression。
***Get number of non-missing records for all variables;
proc sql noprint;
create table result1 as
select "CLASS" as Dataset length=50, "NAME" as Var length=50, sum(not missing(NAME)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "SEX" as Var length=50, sum(not missing(SEX)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "AGE" as Var length=50, sum(not missing(AGE)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "HEIGHT" as Var length=50, sum(not missing(HEIGHT)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "WEIGHT" as Var length=50, sum(not missing(WEIGHT)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "A" as Var length=50, sum(not missing(A)) as Sum
from class
outer union corr
select "CLASS" as Dataset length=50, "B" as Var length=50, sum(not missing(B)) as Sum
from class;
quit;

再獲取數(shù)據(jù)集中所有變量不為空的記錄數(shù)后,我們只需要篩選記錄數(shù)為0的記錄就可以獲取空變量的信息。為展示方便,我們可以把兩個變量信息綜合在一起,這個可以通過轉(zhuǎn)置后橫向拼接實現(xiàn)。
***Display result;
proc transpose data = result1 out=result2 prefix=emp_;
by dataset;
var Var;
where sum = 0;
run;
data result3;
set result2;
length Empvar $2000;
Empvar = catx(", ", of emp_:);
keep dataset empvar;
run;


這樣處理看起來比較簡潔,也方便后續(xù)多數(shù)據(jù)集檢查空變量結(jié)果的拼接。
關(guān)于程序,多解釋一點,SAS中函數(shù)批量處理變量序列時,需要在變量序列前添加of。如果不批量處理,也可以手動輸入每一變量名稱,例如:
empvar = catx(", ", emp_1, emp_2);
如果不知道輸出變量的數(shù)目,使用特定的前綴對變量進行標(biāo)記,再用函數(shù)批量處理,這個過程會變得簡潔許多。
3. 宏程序的構(gòu)建
在手動編程將算法實現(xiàn)后,就可以著手構(gòu)建宏程序。就我個人SAS編程經(jīng)驗來講,宏程序的作用主要有兩個:
- 單個功能的重復(fù)調(diào)用;
- 宏循環(huán)的批量處理。
關(guān)于這兩個作用,讀者可以與自己的宏程序編程經(jīng)歷對照理解,這里就不過多展開。
從以上手動編程的過程中可以看出,程序主要“重復(fù)”的地方在于各個變量不為空記錄的統(tǒng)計。除了變量名稱,拼接程序完全相同。如果我們將需要處理的變量名稱保存到宏變量序列中,就可以通過宏循環(huán)依次進行調(diào)用,并通過宏循環(huán)批量構(gòu)建程序。
3.1 生成宏變量(序列)
宏循環(huán)處理的關(guān)鍵,在于循環(huán)次數(shù)的獲取以及變量名稱宏變量序列。
宏變量的生成常用有2種方法:
- Proc SQL 中的
into :語句- Data步中的
call symputx語句
SAS數(shù)據(jù)集的元數(shù)據(jù)信息保存在SAS字典中,這里我以Data步中的call symputx語句進行舉例。
變量數(shù)目保存到宏變量中:
***Get the number of vairiables;
data tmp1;
set sashelp.vtable;
where libname = "WORK" and memname = "CLASS";
call symputx("nvar", strip(put(nvar,best.)));
run;
%put nvar= &nvar.;


變量名稱保存到宏變量序列中:
***Get variables' name;
data tmp2;
set sashelp.vcolumn;
where libname = "WORK" and memname = "CLASS";
call symputx("var"||strip(put(varnum, best.)), strip(name));
run;
%put var1= &var1.;
%put var7= &var7.;


這里一些讀者可能有這樣的想法:這里完全可以使用一個Data步,從SASHELP.vcolumn數(shù)據(jù)集獲取最后一條數(shù)據(jù)的varnum作為宏變量nvar的取值。類似這樣的處理:
***Get variables' name and nvar;
data tmp3;
set sashelp.vcolumn end=eof;
where libname = "WORK" and memname = "CLASS";
call symputx("var"||strip(put(varnum, best.)), strip(name));
if eof then call symputx("nvar1", strip(put(varnum, best.)));
%put nvar1= &nvar1.;
run;
以當(dāng)前演示數(shù)據(jù)集來看,這兩種方式處理結(jié)果相同。但如果一個數(shù)據(jù)集中沒有任何變量,這時候SASHELP.vcolumn中是沒有記錄的,而SASHELP.vtable中是有nvar=0的記錄。此時后者無法抓取數(shù)據(jù)集的變量數(shù)。
如果感興趣,讀者可以用以下空數(shù)據(jù)集進行測試:
data test;
run;
3.2 宏循環(huán)的實現(xiàn)
宏循環(huán)需要在宏程序中進行,宏程序的構(gòu)建盡可能包含可能的情形。這里根據(jù)變量數(shù)目進行分類處理。
***Temp macro;
%macro check_empty_var;
%if &nvar. = 0 %then %do;
data result;
length Dataset $50 empvar $2000;
dataset = "CLASS";
empvar = "There is no variable in the dataset Class!";
run;
%end;
%else %if &nvar. > 0 %then %do;
%if &nvar. = 1 %then %do;
proc sql noprint;
create table result1 as
select "CLASS" as domain length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from class
;
quit;
%end;
%if &nvar. > 1 %then %do;
proc sql noprint;
create table result1 as
select "CLASS" as Dataset length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from class
%do i = 2 %to &nvar.;
outer union corr
select "CLASS" as Dataset length=50, "&&var&i." as Var length=50, sum(not missing(&&var&i.)) as Sum
from class
%end;
;
quit;
%end;
*Display result;
proc transpose data = result1 out=result2 prefix=emp_;
by dataset;
var Var;
where sum = 0;
run;
data result;
set result2;
length Empvar $2000;
Empvar = catx(", ", of emp_:);
keep dataset empvar;
run;
%end;
%mend check_empty_var;
%check_empty_var;
以上宏程序運行結(jié)果如下,與手動編程結(jié)果保持一致。

3.3 宏參數(shù)的設(shè)置
宏參數(shù)一般有3類:
- 輸入內(nèi)容(變量/數(shù)據(jù)集)
- 輸出內(nèi)容(變量/數(shù)據(jù)集)
- 特定條件
這個宏程序從簡,直接以Reslt數(shù)據(jù)集輸出,不需要額外的篩選條件,只需設(shè)置輸入數(shù)據(jù)集就好。為了省事,也不在宏里判斷輸入數(shù)據(jù)集的邏輯庫名稱、數(shù)據(jù)集名稱,直接定義到宏參數(shù)中。
%macro check_empty_var(libname=WORK, memname=);
...
%mend check_empty_var;
確定宏參數(shù)后,需要在之前初步宏程序中進行內(nèi)容替換,這樣方便以后對不同參數(shù)對象的處理調(diào)用。
4. 宏程序代碼匯總
完整宏程序需綜合以上內(nèi)容,并盡可能考慮多種可能情況,以求宏程序運行穩(wěn)定。例如,輸入數(shù)據(jù)集不存在的情況;數(shù)據(jù)集沒有空變量的情況。
如果輸入數(shù)據(jù)集不存在,最好能在Log中輸出一條Warning記錄作為提醒。為避免一些程序文本檢查機制的誤判,War ning最好能拆開處理下。
其他處理細節(jié)就不再展開描述,匯總程序如下:
%macro check_empty_var(libname=WORK, memname=);
***Dataset exists;
%if %sysfunc(exist(&libname..&memname.)) = 1 %then %do;
***Get the number of vairiables;
data _null_;
set sashelp.vtable;
where libname = upcase("&libname.") and memname = upcase("&memname.");
call symputx("nvar", strip(put(nvar, best.)));
run;
%put nvar= &nvar.;
***Macro loop;
%if &nvar. = 0 %then %do;
data result;
length Dataset $50 Empvar $2000;
dataset = upcase("&libname..&memname.");
Empvar= "There is no variable in the dataset %sysfunc(upcase(&libname..&memname.)).";
run;
%end;
%else %if &nvar. > 0 %then %do;
**Get variables name;
data _null_;
set sashelp.vcolumn;
where libname = upcase("&libname.") and memname = upcase("&memname.");
call symputx("var"||strip(put(varnum, best.)), strip(name));
run;
%put var1= &var1.;
%if &nvar. = 1 %then %do;
proc sql noprint;
create table result1 as
select upcase("&libname..&memname.") as Dataset length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from &libname..&memname.
;
quit;
%end;
%if &nvar. > 1 %then %do;
proc sql noprint;
create table result1 as
select upcase("&libname..&memname.") as Dataset length=50, "&var1." as Var length=50, sum(not missing(&var1.)) as Sum
from &libname..&memname.
%do i = 2 %to &nvar.;
outer union corr
select upcase("&libname..&memname.") as Dataset length=50, "&&var&i." as Var length=50, sum(not missing(&&var&i.)) as Sum
from &libname..&memname.
%end;
;
quit;
%end;
*Display result;
proc transpose data = result1 out=result2 prefix=emp_;
by dataset;
var Var;
where sum = 0;
run;
data result;
set result2;
length emp_1 $50 Empvar$2000;
if not missing(emp_1) then empvar = catx(", ", of emp_:);
else do;
dataset = upcase("&libname..&memname.");
empvar = "There is no empty variable in the dataset %sysfunc(upcase(&libname..&memname.)).";
end;
keep dataset empvar;
run;
%end;
%end;
***Dataset does not exist;
%if %sysfunc(exist(&libname..&memname.)) ne 1 %then %do;
%put %sysfunc(compress(War ning)): Dataset %sysfunc(upcase(&libname..&memname.)) does not exist. ;
%end;
%mend check_empty_var;
%check_empty_var(libname=work, memname=class);
%check_empty_var(libname=work, memname=Yeteng);
如果數(shù)據(jù)集不存在,最后顯示結(jié)果如下:

5. 總結(jié)
文章梳理了查詢數(shù)據(jù)集中所有空變量宏程序構(gòu)建過程,希望能夠?qū)ψx者有所幫助。單個數(shù)據(jù)集處理的完成,也方便后續(xù)對多個數(shù)據(jù)集批量處理。
感謝閱讀, 歡迎關(guān)注:SAS茶談!
若有疑問,歡迎評論交流!