Sažetak |
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix tree mod log mishandling of reallocated nodes
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677!
invalid opcode: 0000 [#1] SMP
RIP: 0010:tree_mod_log_rewind+0x1b4/0x200
RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293
RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00
RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001
R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00
FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
btrfs_get_old_root+0x12b/0x420
btrfs_search_old_slot+0x64/0x2f0
? tree_mod_log_oldest_root+0x3d/0xf0
resolve_indirect_ref+0xfd/0x660
? ulist_alloc+0x31/0x60
? kmem_cache_alloc_trace+0x114/0x2c0
find_parent_nodes+0x97a/0x17e0
? ulist_alloc+0x30/0x60
btrfs_find_all_roots_safe+0x97/0x150
iterate_extent_inodes+0x154/0x370
? btrfs_search_path_in_tree+0x240/0x240
iterate_inodes_from_logical+0x98/0xd0
? btrfs_search_path_in_tree+0x240/0x240
btrfs_ioctl_logical_to_ino+0xd9/0x180
btrfs_ioctl+0xe2/0x2ec0
? __mod_memcg_lruvec_state+0x3d/0x280
? do_sys_openat2+0x6d/0x140
? kretprobe_dispatcher+0x47/0x70
? kretprobe_rethook_handler+0x38/0x50
? rethook_trampoline_handler+0x82/0x140
? arch_rethook_trampoline_callback+0x3b/0x50
? kmem_cache_free+0xfb/0x270
? do_sys_openat2+0xd5/0x140
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and
when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot,
starting at 0. 'n' here is the number of items in this block, which in
this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for
a block that shouldn't have been replayed. Consider the following
sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq().
2. We begin removing items from this root, triggering KEY_REPLACE for
it's child slots.
3. We remove one of the 2 children this root node points to, thus triggering
the root node promotion of the remaining child, and freeing this node.
4. We modify a new root, and re-allocate the above node to the root node of
this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2
logical 0 op KEY_REMOVE (slot 1) seq 3
logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4
logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7
logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
>From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root.
2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is
currently logical 0.
3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which
gives us the KEY_REPLACE seq 2, and since that's not a
LOG_ROOT_REPLACE we incorrectly believe that we don't have an old
root, because we expect that the most recent change should be a
LOG_ROOT_REPLACE.
4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE,
so we don't set old_root, we simply use our e
---truncated--- |