Skip to content

systems with SELinux fail to activate MD arrays with MD 4.5 #259

@mwilck

Description

@mwilck

I have spent the last two days debugging an odd issue we found with mdadm 4.5. I'd mentioned it on #245. It has been observed on openSUSE Tumbleweed, but because openSUSE uses the same SELinux setup as Fedora, it will probably affect Fedora and possibly other distros, too.

This is not actually an mdadm "bug", but I felt that it deserves being reported here anyway.

We observed boot failures on systems with MD RAID. MD arrays were not activated. On closer inspection, it turned out the the md_mod module was never loaded. Strangely, simply running mdadm commands from the shell would work. Looking further, we found that mdadm's attempts to call modprobe md_mod were blocked by SELinux. Consequently, we created an SELinux PR. Initially I wasn't convinced that relaxing SELinux rules for this was a good idea, but in the meantime I think it's the best we can get.

However, the respective part of the SELinux rules had been unchanged for quite some time. So why didn't we observe the problem with SLE 16.0? The reason is subtle, and related to the mdadm version update.

On 16.0, too, I observed that mdadm's attempts to load md_mod from udev rules were prevented by SELinux. The reason that MD arrays could be set up nontheless is an old kernel feature called CONFIG_BLOCK_LEGACY_AUTOLOAD, which I hadn't heard of before:

config BLOCK_LEGACY_AUTOLOAD
        bool "Legacy autoloading support"
        default y
        help
          Enable loading modules and creating block device instances based on
          accesses through their device special file.  This is a historic Linux
          feature and makes no sense in a udev world where device files are
          created on demand, but scripts that manually create device nodes and
          then call losetup might rely on this behavior.

I realized this because on 16.0, I observed this message in the kernel logs:

kernel: block device autoloading is deprecated and will be removed.

This message has been introduced to the kernel in 15.7 (2022) already (fbdee71 ("block: deprecate autoloading based on dev_t")), but the feature is still there, and is still enabled in recent kernels. What it does is that if someone opens a block device for which the kernel has no driver, it will call request_module() to try to load the respective driver.

And this is what mdadm did until 4.4. I am not sure if that was intentional, but if it failed to create an array via normal means, in dev_open() it'd call mknod() to create a temporary device node with major:minor 9:0, and open() it. This open() would cause the kernel to load the md_mod module in blkdev_get_no_open(), at which point everything starts to work as if mod_mod had been loaded beforehand: md_mod is loaded, the MD device is created, and the array possibly started.

mdadm 4.5 changed this code in d354d31 ("mdadm: Create array with sync del gendisk mode"). If mdadm can't access certain sysfs attribute of the md_mod module early on, it prints an error message and quits, and never gets to the point where earlier mdadm versions would try the mknod()/open() trick.

To avoid misunderstanding: I am not arguing that the old behavior should be reinstated. It is indeed "legacy" behavior. I just wanted to note this as an explanation why the behavior has changed with mdadm 4.5.

CC: @XiaoNi87, @mtkaczyk, @ncroxon, @ca-hu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions