場景來源
在使用kubevirt的場景中,因為某些異常,數(shù)據(jù)盤被清理掉,經(jīng)過分析disk.img在pod內(nèi)和宿主機的存儲路徑,基本可以排除人為刪除,軟件層面只有kubelet或者virt-handler可以去做這個清理。查看路徑,可以看到數(shù)據(jù)盤使用的是kubernetes的empty-dir,極有可能是pod異常,被重啟時,empty-dir也會隨之被清理,現(xiàn)在要確認到底是不是kubelet清理了該文件。
問題分析
如果想知道一個文件是被誰刪除的,有什么辦法呢?監(jiān)測rm命令?如果是語言層面進行刪除,就沒法監(jiān)測了。
我們可以想到一個辦法,從內(nèi)核層面監(jiān)聽內(nèi)核的刪除函數(shù)。kprobe可以實現(xiàn)這個功能。
刪除文件是執(zhí)行系統(tǒng)調(diào)用unlink完成的,但是可能因為鏈接問題或者引用問題,這個文件并不會刪除,所以我們應該找到一個函數(shù),這個函數(shù)如果被調(diào)用到,文件肯定會被刪除,由此可以想到,監(jiān)測底層文件系統(tǒng)的一個刪除函數(shù)。
如ext4,可以監(jiān)測ext4_unlink,xfs要監(jiān)測xfs_vn_unlink函數(shù)。
inode_operations的unlink函數(shù)原型
int (*unlink) (struct inode *,struct dentry *);
我們可以從dentry中拿到文件路徑,通過內(nèi)核的全局變量current拿到當前的進程,由于dentry是該函數(shù)的第二個函數(shù),我們應該從rsi寄存器中獲取地址。下面附上完整代碼。
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
#define MAX_SYMBOL_LEN 64
// 要監(jiān)測的符號信息
static char symbol[MAX_SYMBOL_LEN] = "xfs_vn_unlink";
module_param_string(symbol, symbol, sizeof(symbol), 0644);
/* For each probe you need to allocate a kprobe structure */
static struct kprobe kp = {
.symbol_name = symbol,
};
/* kprobe pre_handler: called just before the probed instruction is executed */
static int handler_pre(struct kprobe *p, struct pt_regs *regs)
{
#ifdef CONFIG_X86
// rdi,rsi,rdx,rcx,r8,r9,寄存器傳參及順序,dentry在第二個參數(shù),所以用rsi。
struct dentry *dentry = (struct dentry*)regs->si;
//pr_info("<%s> pre_handler: p->addr = 0x%p, ip = %lx, flags = 0x%lx\n",
// p->symbol_name, p->addr, regs->ip, regs->flags);
char * name = dentry->d_name.name;
struct task_struct * acurrent =current;
pr_info("name : %s pid : %d execname : %s\n", name, acurrent->pid, acurrent->comm);
#endif
return 0;
}
/* kprobe post_handler: called after the probed instruction is executed */
static void handler_post(struct kprobe *p, struct pt_regs *regs,
unsigned long flags)
{
#ifdef CONFIG_X86
//pr_info("<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n",
// p->symbol_name, p->addr, regs->flags);
#endif
}
/*
* * fault_handler: this is called if an exception is generated for any
* * instruction within the pre- or post-handler, or when Kprobes
* * single-steps the probed instruction.
* */
static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
{
pr_info("fault_handler: p->addr = 0x%p, trap #%dn", p->addr, trapnr);
/* Return 0 because we don't handle the fault. */
return 0;
}
static int __init kprobe_init(void)
{
int ret;
kp.pre_handler = handler_pre;
kp.post_handler = handler_post;
kp.fault_handler = handler_fault;
ret = register_kprobe(&kp);
if (ret < 0) {
pr_err("register_kprobe failed, returned %d\n", ret);
return ret;
}
pr_info("Planted kprobe at %p\n", kp.addr);
return 0;
}
static void __exit kprobe_exit(void)
{
unregister_kprobe(&kp);
pr_info("kprobe at %p unregistered\n", kp.addr);
}
對應的Makefile
# To build modules outside of the kernel tree, we run "make"
# # # in the kernel source tree; the Makefile these then includes this
# # # Makefile once again.
# # # This conditional selects whether we are being included from the
# # # kernel Makefile or not.
# #
# # # called from kernel build system: just declare what our modules are
#obj-m := reg_module.o
obj-m := kprobe_test.o
# #
CROSS_COMPILE =
# #
CC = gcc
# # # Assume the source tree is where the running kernel was built
# # You should set KERNELDIR in the environment if it's elsewhere
KERNELDIR ?= /lib/modules/4.19.0/build
# # The current directory is passed to sub-makes as argument
PWD := $(shell pwd)
all:
make -C $(KERNELDIR) M=$(PWD) modules
clean:
rm -rf *.o *~ core .depend *.symvers .*.cmd *.ko *.mod.c .tmp_versions $(TARGET)
效果展示
加載內(nèi)核模塊后
insmod kprobe_test.ko
[256838.004884] name : kprobe_test.mod.o pid : 10749 execname : rm
[256861.029438] name : .kprobe_test.c.swx pid : 10756 execname : vim
[256861.029456] name : .kprobe_test.c.swp pid : 10756 execname : vim
[256901.147068] name : .viminfo pid : 10756 execname : vim
[256901.262790] name : .kprobe_test.c.swp pid : 10756 execname : vim
[256913.980868] name : abc pid : 10767 execname : rm
[256929.192739] name : .messages.swpx pid : 10769 execname : vim
[256929.192767] name : .messages.swp pid : 10769 execname : vim
[256933.188426] name : .viminfo pid : 10769 execname : vim
[256933.289964] name : .messages.swp pid : 10769 execname : vim
[256938.301215] name : .kprobe_test.c.swx pid : 10770 execname : vim
[256938.301236] name : .kprobe_test.c.swp pid : 10770 execname : vim
[256978.605412] name : .viminfo pid : 10770 execname : vim
[256978.714638] name : .kprobe_test.c.swp pid : 10770 execname : vim
[257128.882336] name : .kprobe_test.c.swx pid : 10798 execname : vim
[257128.882354] name : .kprobe_test.c.swp pid : 10798 execname : vim
[257382.589794] name : .viminfo pid : 10798 execname : vim
[257382.699274] name : .kprobe_test.c.swp pid : 10798 execname : vim
總結
代碼及功能雖然不多,但是需要對內(nèi)核有一些源碼層面的了解。
這個功能當然也可以做為一個metrics來暴露出來,監(jiān)測是哪個進程,甚至是哪個用戶,在什么時間刪除了文件。