使用 Kernel Oops Analyzer工具分析Kernel Oops

什么是kernel oops?

"Oops"是”O(jiān)ut of Print Statement“的縮寫,也被稱為kernel panic。它是Linux內(nèi)核在執(zhí)行期間遇到問(wèn)題時(shí)發(fā)出的一種錯(cuò)誤消息。
當(dāng)內(nèi)核遇到無(wú)法處理的異常情況時(shí),它會(huì)停止執(zhí)行并輸出Oops消息,以幫助開發(fā)人員診斷和解決問(wèn)題。
在Linux系統(tǒng)中,Oops通常由硬件故障、驅(qū)動(dòng)程序錯(cuò)誤、內(nèi)存管理問(wèn)題或其他異常情況引起。
當(dāng)Oops發(fā)生時(shí),系統(tǒng)將停止響應(yīng),并且必須進(jìn)行調(diào)試和修復(fù)才能繼續(xù)運(yùn)行。

Kernel Oops Analyzer

Kernel Oops Analyzer是renhat開發(fā)的一個(gè)在線分析oops的工具,Kernel Oops Analyzer 工具通過(guò)將 oops 消息與知識(shí)庫(kù)中已知問(wèn)題進(jìn)行比較,分析崩潰轉(zhuǎn)儲(chǔ)。

舉例說(shuō)明

首先確認(rèn)OS生成了vmcore-dmesg文件,文件中并包含了oops消息,如下:

image.png
[ 2025.570010] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c2
[ 2025.570043] PGD 0 P4D 0 
[ 2025.570054] Oops: 0002 [#1] SMP NOPTI
[ 2025.570069] CPU: 6 PID: 10250 Comm: reboot Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
[ 2025.570106] Hardware name: Lenovo ThinkSystem SR650 -[7X06CTO1WW]-/-[7X06CTO1WW]-, BIOS -[IVE180H-3.41]- 10/05/2022
[ 2025.570136] RIP: 0010:i40e_shutdown+0x11/0x120 [i40e]
[ 2025.570169] Code: 07 74 0b 48 83 c0 08 48 39 d0 75 e6 5b c3 5b e9 25 fd ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 fd 53 48 8b 9f 70 01 00 00 <f0> 80 8b c2 06 00 00 04 f0 80 8b c0 06 00 00 08 48 8d bb 10 08 00
[ 2025.570223] RSP: 0018:ffffb3dc094a7d90 EFLAGS: 00010282
[ 2025.570241] RAX: ffffffffc05b8620 RBX: 0000000000000000 RCX: 0000000000000000
[ 2025.570263] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff94a0858cd000
[ 2025.570285] RBP: ffff94a0858cd000 R08: ffffffffffffffff R09: ffffffffb6f7a180
[ 2025.570306] R10: 0000000000000001 R11: 0000000000000003 R12: ffff94a0858cd000
[ 2025.570328] R13: ffffffffb5b540fe R14: ffff94a0858cd138 R15: 0000000000000000
[ 2025.570349] FS:  00007fc1d974a980(0000) GS:ffff94a03fb80000(0000) knlGS:0000000000000000
[ 2025.570375] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2025.570393] CR2: 00000000000006c2 CR3: 0000000162086002 CR4: 00000000007706e0
[ 2025.570415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2025.570436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2025.570458] PKRU: 55555554
[ 2025.570468] Call Trace:
[ 2025.570481]  pci_device_shutdown+0x34/0x60
[ 2025.570499]  device_shutdown+0x165/0x1c5
[ 2025.570516]  kernel_restart+0xe/0x30
[ 2025.570533]  __do_sys_reboot+0x1d2/0x210
[ 2025.570547]  ? __switch_to_asm+0x35/0x70
[ 2025.570564]  ? __switch_to_asm+0x41/0x70
[ 2025.570578]  ? __switch_to_asm+0x35/0x70
[ 2025.570592]  ? __switch_to_asm+0x41/0x70
[ 2025.570606]  ? __switch_to_asm+0x35/0x70
[ 2025.570619]  ? __switch_to_asm+0x41/0x70
[ 2025.570633]  ? __switch_to_asm+0x35/0x70
[ 2025.570647]  ? __switch_to_asm+0x41/0x70
[ 2025.570661]  ? __switch_to_asm+0x35/0x70
[ 2025.570675]  ? __switch_to_asm+0x41/0x70
[ 2025.570689]  ? __switch_to_asm+0x35/0x70
[ 2025.570703]  ? __switch_to_asm+0x41/0x70
[ 2025.570716]  ? __switch_to_asm+0x35/0x70
[ 2025.570730]  ? __switch_to_asm+0x41/0x70
[ 2025.570744]  ? __switch_to+0x10c/0x450
[ 2025.570760]  ? syscall_trace_enter+0x1fb/0x2c0
[ 2025.570778]  do_syscall_64+0x5b/0x1a0
[ 2025.571314]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 2025.571830] RIP: 0033:0x7fc1d89c34b7
[ 2025.572334] Code: 01 b8 ff ff ff ff eb c2 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 a1 79 29 00 f7 d8 64 89 02 b8
[ 2025.573399] RSP: 002b:00007ffd9e0cf698 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
[ 2025.573937] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc1d89c34b7
[ 2025.574472] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 2025.575011] RBP: 00007ffd9e0cf6e0 R08: 0000000000000002 R09: 0000000000000000
[ 2025.575538] R10: 000000000000004b R11: 0000000000000246 R12: 0000000000000001
[ 2025.576047] R13: 00000000fffffffe R14: 0000000000000006 R15: 0000000000000000
[ 2025.576535] Modules linked in: binfmt_misc bonding tls resguard_linux(OE) secmodel_linux(OE) syshook_linux(OE) xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc sunrpc vfat fat sddlmfdrv(POE) sddlmadrv(POE) intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul ghash_clmulni_intel rapl ses intel_cstate enclosure scsi_transport_sas intel_uncore pcspkr joydev ioatdma st mei_me ch mei i2c_i801 ipmi_ssif dca lpc_ich wmi ipmi_si acpi_power_meter acpi_pad xfs libcrc32c sd_mod sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops qla2xxx drm nvme_fc i40e megaraid_sas nvme_fabrics tg3 crc32c_intel nvme_core ahci libahci t10_pi libata scsi_transport_fc i2c_algo_bit dm_mirror dm_region_hash dm_log
[ 2025.576615]  dm_mod ipmi_devintf ipmi_msghandler fuse
[ 2025.580737] CR2: 00000000000006c2

訪問(wèn)內(nèi)核 Oops 分析器工具

登錄網(wǎng)站:https://access.redhat.com/labs/kerneloopsanalyzer/

要診斷內(nèi)核崩潰問(wèn)題,請(qǐng)上傳 vmcore 中生成的內(nèi)核oops 日志。

點(diǎn) DETECT,基于 makedumpfile 中的信息與已知解決方案比較 oops 消息。

image.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容