Skip to content

can't mount a disk with writeboost cache because of kernel crash on writeboost module. #235

Description

@fbhradec

So, I'me having trouble to mount a ext4 disk with a writeboost cache after a power loss.

The disk and cache are vfio scsi, inside a KVM virtual machine.

once the machine boots, dmwriteboost starts to "mount" the cache device on top of the disk, and a few minutes after I see this in dmesg:

[  168.932006] rcu: INFO: rcu_sched self-detected stall on CPU                                                                                                                                                        
[  168.932352] rcu:     3-....: (5249 ticks this GP) idle=caa/1/0x4000000000000002 softirq=950/950 fqs=2623                                                                                                           
[  168.932691] rcu:      (t=5250 jiffies g=4717 q=6790)                                                                                                                                                                                            
[  168.933036] NMI backtrace for cpu 3                                                                                                                                                                                
[  168.933042] CPU: 3 PID: 315 Comm: dmsetup Tainted: G           OE     4.19.0-18-amd64 #1 Debian 4.19.208-1
[  168.933042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  168.933045] Call Trace:                                                                                                                                                                                                                                                                                         
[  168.933055]  <IRQ>                                                                                                                                                                                                                                                                                              
[  168.933062]  dump_stack+0x66/0x81                                                                                                                                                                                                                                                                               
[  168.933063]  nmi_cpu_backtrace.cold.4+0x13/0x50                                                                                                                                                                                                                                                                 
[  168.933068]  ? lapic_can_unplug_cpu+0x80/0x80                                                                                                                                                                                                                                                                   
[  168.933072]  nmi_trigger_cpumask_backtrace+0xf9/0x100                                                                                                                                                                                                                                                           
[  168.933073]  rcu_dump_cpu_stacks+0x9b/0xcb                                                                                                                                                                                                                                                                      
[  168.933074]  rcu_check_callbacks.cold.81+0x1db/0x335                                                                           
[  168.933076]  ? tick_sched_do_timer+0x60/0x60                                                                                                                                                                       
[  168.933078]  update_process_times+0x28/0x60                                                                                                                                                                                                                                                                    
[  168.933080]  tick_sched_handle+0x22/0x60                                                 
[  168.933081]  tick_sched_timer+0x37/0x70                                                                                                                                                                            
[  168.933083]  __hrtimer_run_queues+0x100/0x280                                                                                                                                                                      
[  168.933084]  hrtimer_interrupt+0x100/0x210                                                                                     
[  168.933086]  ? handle_irq_event+0x47/0x60                                                                                                                                                                          
[  168.933087]  smp_apic_timer_interrupt+0x6a/0x140                                                          
[  168.933090]  apic_timer_interrupt+0xf/0x20                                                                                     
[  168.933090]  </IRQ>                                                                                                                                                                                                
[  168.933093] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20                                                                                                                                                        
[  168.933094] Code: d8 48 3d 90 d0 03 00 76 cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 e8 66 31 99 ff 66 90 48 89 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 07                                                                                             
[  168.933095] RSP: 0018:ffffa67cc3acfaf0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13                                                                                                                                                                                                                              
[  168.933096] RAX: 0000000000000001 RBX: ffff89dfbcdf7868 RCX: 000000000001fffe                                                                                                                                                                                                                                   
[  168.933096] RDX: 0000000000000001 RSI: 0000000000000206 RDI: 0000000000000206                                                                                                                                                                                                                                   
[  168.933097] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff89dfbcdf7978                                                                                                                                                                                                                                   
[  168.933097] R10: ffff89dfb8419000 R11: 0000000000000001 R12: ffffa67cdd2eaf78                                                  
[  168.933097] R13: ffffa67cdd2eae60 R14: ffff89dfb8419000 R15: 0000000000000000                                                                                                                                      
[  168.933102]  mark_clean_mb+0x34/0x40 [dm_writeboost]                                                                                                                                                               
[  168.933104]  mark_clean_seg+0x3d/0x60 [dm_writeboost]                                   
[  168.933105]  resume_cache+0x558/0x910 [dm_writeboost]                                                                                                                                                                                                                                                           
[  168.933107]  writeboost_ctr+0x408/0x6a0 [dm_writeboost]                                                                                                                                                                                                                                                         
[  168.933117]  dm_table_add_target+0x17d/0x360 [dm_mod]                                                                          
[  168.933120]  table_load+0x122/0x2e0 [dm_mod]                                                                                                                                                                                                                                                                    
[  168.933123]  ? dev_status+0x40/0x40 [dm_mod]                                                                                                                                                                                                                                                                    
[  168.933125]  ctl_ioctl+0x1af/0x3f0 [dm_mod]                                                                                                                                                                        
[  168.933128]  dm_ctl_ioctl+0xa/0x10 [dm_mod]
[  168.933130]  do_vfs_ioctl+0xa4/0x630
[  168.933134]  ? ksys_semctl+0x129/0x160
[  168.933135]  ksys_ioctl+0x60/0x90
[  168.933136]  __x64_sys_ioctl+0x16/0x20
[  168.933138]  do_syscall_64+0x53/0x110
[  168.933140]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  168.933142] RIP: 0033:0x7fabe6e0d427
[  168.933144] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
[  168.933144] RSP: 002b:00007ffe93cb46b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[  168.933145] RAX: ffffffffffffffda RBX: 00007fabe6eeba7f RCX: 00007fabe6e0d427
[  168.933145] RDX: 00005622fbcced40 RSI: 00000000c138fd09 RDI: 0000000000000003
[  168.933146] RBP: 00007ffe93cb4770 R08: 00007fabe6f3a310 R09: 00007ffe93cb4520
[  168.933146] R10: 00007fabe6f3965a R11: 0000000000000206 R12: 00007fabe6f3965a
[  168.933147] R13: 00007fabe6f3965a R14: 00007fabe6f3965a R15: 00007fabe6f3965a
[  196.244011] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [dmsetup:315]
[  196.244369] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel qxl ttm drm_kms_helper intel_rapl_perf drm evdev joydev virtio_console serio_raw pcspkr sg virtio_balloon qemu_fw_cfg button nfsd auth_rpcgss dm_writeboost(OE) libcrc32c nfs_acl lockd dm_mod grace su
nrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb sr_mod cdrom sd_mod ata_generic virtio_net net_failover failover virtio_scsi crc32c_intel ata_piix libata uhci_hcd ehci_hcd aesni_intel scsi_mod psmouse usbcore aes_x86_64 crypto_simd cryptd glue_helper i2c_piix4 virtio_pci
 virtio_ring virtio usb_common floppy
[  196.244392] CPU: 3 PID: 315 Comm: dmsetup Tainted: G           OE     4.19.0-18-amd64 #1 Debian 4.19.208-1

this message keeps repeating for a while until the vm freezes.

I've tried to mount the disk without writeboost, but it won't mount. Even fsck can't fix it, probably because there's data on the cache that needs to be flushed to disk.

Is there any way to force the cache data to be dumped to disk (hence fixing the disk) in a situation like this?

Do you think upgrading the kernel (currently 4.19.0-18 debian) would help getting writeboost to finish the initialization without crashing?

thanks for any insights...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions