grsec-unoff (RAP) related Call Traces, 171114-1000 oops
(No. 0) 171114-1000-manu 171117-1426-oops 171118-0933-rsys 171118-1030-none 171122-1348-rsys 171123-1254 171123-1530 171124-0102-none 180101-1917-rsync
Here the Call Trace, but don't miss reading in what circumstances this happened further down in the text. The time 2017-11-14 10:00 of this trace is approximative. I took the trace down manually and only later looked up the logs, but no Trace there, and too little logging in a sysvinit system (see 171118-1030 oops about the logging, and how a missed --non-recorded, none-- Call Trace looks in the logs --I mean just the line before and line after where it should have appeared)
NOTES START: These lines in the text further below:
[2706.105207] RIP: 0010:[] [ ] mb_mark_used+0x14a/0x3a0 609.812919] BUG: unable to handle kernel paging request at
really showed such on the black frozen screen. I try and manually record very carefully when I can. The two lines are obviously both messed up and the "BUG:" line is incomplete.
Also this one a little on from the "BUG:" line:
[2706.105455] R13: ffffc9000be77a30 R14: 00000000000010eb R15: 0000000000000001171031-19 [2706.105497] FS: 000003291ce5e740(0000) GS:ffff88032fc80000(0000) knIGS:0000000000000000IOS P2.60 11/11/2013
The "1171031-19" string is part of the name of the kernel in which this Oops happened. It's this kernel: /boot/vmlinuz-4.9.59-unofficial+grsec171031-19
and the "IOS P2.60 11/11/2013" is part of the string
"To Be Filled By O.E.M./970 Extreme4, BIOS P2.60 11/11/2013" (see some other of the Call Traces available in this section for such strings) that for some reason made it only very partially onto the screen getting into freeze.
NOTES END
[2706.104946] IP: [] mb_mark_used+0x14a/0x3a0 [2706.104986] PGD 2c3b067 [ 2706.105001] PUD 2c3d063 PMD 0 [ 706.105024] [2706.105036] Oops: 0002 [#2] SMP [2706.105114] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./970 Extreme4, BIOS P2.60 11/11/2013 [2706.105172] task: ffff88031de7c240 task.stack: ffffc9000be74000 s [2706.105207] RIP: 0010:[ ] [ ] mb_mark_used+0x14a/0x3a0 609.812919] BUG: unable to handle kernel paging request at [2706.105256] RSP: 0018:ffffc9000be77898 EFLAGS: 00010202 [2706.105287] RAX: 0000000000000001 RBX: ffff88025d55b0b0 RCX: 0000000000000001 [2706.105329] RDX: ffffc9000be778ac RSI: 0000000000000001 RDI: ffffc9000be77a30 [2706.105371] RBP: ffffc9000be778e0 R08: ffff88031d817000 R09: 0000000000000003 [2706.105413] R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000001 [2706.105455] R13: ffffc9000be77a30 R14: 00000000000010eb R15: 0000000000000001171031-19 [2706.105497] FS: 000003291ce5e740(0000) GS:ffff88032fc80000(0000) knIGS:0000000000000000IOS P2.60 11/11/2013 [2706.105544] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2706.105578] CR2: ffffffffbfd70bec CR3: 0000000002c24000 CR4: 00000000000006f0 0x1b0 [2706.105620] Stack: [2706.105633] 0001000100000001 00000001000021d6 000000010be77918 f1d89d5b3aaa2a4c [2706.105678] ffff88025d55b080 ffffc9000be77a30 ffff88031df28800 ffff88031dd55000 [2706.105720] 00000000000044b8 ffffc9000be77918 ffffffff81475496 ffff88025d55b080 [2706.105763] Call Trace: 0 0 f [2706.105781] [ ] ext4_mb_use_best_found+0x86/0x1b0 [2706.105819] [ ] ext4_mb_check_limits+0xe7/0x110 0000000 [2706.105855] [ ] ext4_mb_complex_scan_group+0x255/0x470 [2706.105896] [ ] ext4_mb_regular_allocator+0x2d5/0x6d0 006f0 [2706.105936] [ ] ? __list_add+0x59/0xc0 [2706.105968] [ ] ext4_mb_new_blocks+0x6be/0xf50 0bd73bd8 [2706.106005] [ ] ? ext4_find_extent+0x29b/0x3c0 ffffffff [2706.106041] [ ] ext4_ext_map_blocks+0xbc4/0x1640 000000000 [2706.106079] [ ] ? __getblk_gfp+0x72/0x580 [2706.106143] [ ] ext4_map_blocks+0x24f/0xaf0 [2706.106178] [ ] ? ext4_mark_inode_dirty+0xa9/0x370 [2706.106216] [ ] ext4_getblk+0x92/0x380 [2706.106247] [ ] ext4_bread+0x58/0x150 [2706.106280] [ ] ext4_append+0xdd/0x420 x100 [2706.106313] [ ] ext4_mkdir+0x343/0x750 [2706.106346] [ ] vfs_mkdir+0x343/0x750 [2706.106377] [ ] rap_sys_mkdir+0xe9/0x220 [2706.106412] [ ] entry_SYSCALL_64_fastpath+0x1e/0xec [2706.106450] Code: 00 00 8b 4d c4 8b 7d bc 89 c8 c1 e0 10 0b 45 b8 85 ff 0f 45 c7 85 c9 89 45 bc 0f 8e 01 02 00 00 44 8b 7d c4 48 8d 55 cc 4c 89 ef <44> 89 fe eb 0b d7 bf 27 a8 ff ff ff ff cc cc cc e8 61 91 ff ff [2706.106664] RIP [ ] mb_mark_used+0x14a/0x3a0 [2706.106696] RSP [ ] [2706.107917] CR2: [ ] [2706.109137] Kernel panic - not syncing: grsec: halting the system due to suspicious kernel crash caused by root [2706.110442] Kernel Offset: disabled
Now the circumstances in which this happened. You won't believe it! Following this thread on Dyne.Org Devuan DNG Mailing List:
Google abandons UEFI in Chromebooks
I decided to revert back to the old BIOS-style booting. I had moved to EFI-booting and tried it for some time, --I don't want to deny that I haven't found the time to complete the full disk encryption on the HDD when it was booting EFI, i.e. the boot partition remained unencrypted-- but after reading that thread on the Devuan ML, I decided to revert to BIOS-style, as well as encrypt the /boot then.
How full disk encryption --a very useful feature in GNU/Linux security-- is done, you can read at:
Installing to existing partitions/mount? Full disk encrypt? Feedback.
I've told you all this to make it clear that these Call Traces at and maybe a little after 2017-11-14 have all happened after I have reverted to BIOS booting and encrypted the /boot partition.
Can you believe it? I can bacause I know it must have been something related to that change back to BIOS and encrypted boot which somehow caused all the traces of 2017-11-14 and a little after... Because I keep an eye on my system, so I know... But I can't stop wondering about it...
Anyway, this is what happeded after that Call Trace above. Upon a reboot, or that it was startoff, I got this notice when the kernel needed to be booted:
early console in extract_kernel input_data: 0x0000000002d703b4 input_len: 0x000000000086c7ec output: 0x0000000001000000 output_len: 0x00000000024b9868 kernel_total_size: 0x0000000002600000 Decompressing Linux... XZ-compressed data is corrupt -- System halted
Before I explain what I changed and booted successfully afterwords, let me remind of how I maintain my systems.
I build in Air-Gap. I have some three same MBO machines, get same size HDDs is easy enough, and so is partition them the same way. The system that I build in Air-Gap is truly --well, as far as an advanced but non-expert user can do-- never no internet sees it. It's relatively easy to dd'dump a partition and dd restore it on another machine. That's short explanation. If you need more, search for Air-Gapped Installation and my name. You should be able to find it in Gentoo Forums, Debian Forums and Devuan Forums.
I remember I had issues of total system freeze with no response whatsoever and only once in maybe three times a trace like the above after I switched back to BIOS-booting having had under EFI-booting unencrypted /boot partition.
Importantly --even though I don't anymore remember clearly-- I think there were very little of these problems in the Air-Gapped, and instead they mostly happened on the for-online clone of it.
I.e. maybe, or even likely, there were intrusional influences to those events.
Some attacks I can prove, but that study: A noisy MiTM attack is not ready --the link hasn't come to life yet at the time of posting this, and I may decide for a different title to it.
I first suspected there really happened some corruption, and compared the /boot partition with the one of the Air-Gapped. And I copied some file, initramfs I think I remember I copied, from the master Air-Gapped. And chrooted into the affected system and ran the usual grub-install, grub-mkconfig and update-initramfs commands. But later I realized the changes weren't even necessary. Nothing was really corrupted in the /boot!
However, that nothing was really corrupted, except maybe in HDD's cache or some other place in the whole system --every system is almost a tiny universe, that's how complex computing is...-- I figured by reasoning like the above, and by making guesses, and concluding the guesses were likely correct if the outcome was the one that I expected from them.
In short, I'm now pretty sure nothing was corrupted in the /boot itself, because I had it just lately again (I'm writing this very page some eight days after the day of the trace, i.e. it is right at the writing of this very line 2017-11-22 21:00).
And I had pretty much the same
XZ-compressed data is corrupt -- System halted
notice after this morning Call Trace with the 4.9.63 grsec-unoff kernel, which Call Trace you can read locally or on Github .
Actually I'm writing all this to corroborate my claim that I made in that Github comment there, where I promised I would "hopefully be able to explain the circumstancial indications to that".Namely, after getting that "XZ-compressed data is corrupt" message this morning, I just rebooted and on the reboot the system works fine, and continues to do so five hours since. I was already pretty sure the reboot would be fine, because in at least one occasion I needed to do so, and the reboot was successful, between 2017-11-14 and 2017-11-22, but this (at least somewhat vague) story is already very lengthy.
There, I have recounted as much as I figured out (maybe "vaguely" figured out, as I admit later in that issue #13 ).
Just what exactly am I dealing with here, I don't think many people could tell, even if they had my system under their fingertips. Pls. correct me if I'm wrong!
---
The verifiable files necessary for this study, if any, are listed in the main page of this section.
---