Skip to content

pjha2908-ad/kernel_prog

Repository files navigation

Link:
========

Command:
==========

Command Description
df(disk free) The df command shows disk space usage for all mounted filesystems.
Options:
-h : human readable (MB/GB)
--local: locally mounted filesystem
echo c > /proc/sysrq-trigger
  • The Command: It sends the "c" (crash) signal directly to the kernel's SysRq (System Request) handler.
  • The Result: The kernel will instantly trigger a "NULL pointer dereference."
  • Why do this?Developers use it to test if their Kdump (Kernel Dump)
    configuration is working. If set up correctly, the system won't just freeze;
    it will save a "memory snapshot" (vmcore) to your disk so you can analyze exactly
    why it crashed later.
efibootmgr -v To view the active boot entries registered in your NVRAM, we can use the efibootmgr command.
  • • efibootmgr makes managing UEFI boot targets possible.
  • You can use efibootmgr to list all current boot targets.
  • Active boot entries are marked with a *
free -m The free -m command provides a snapshot of your system's memory (RAM) usage in Megabytes.
free -mh
  • total: Your VM's total assigned RAM.
  • used: RAM currently taken by Ubuntu and your running processes.
  • free: RAM that is completely empty.
  • buff/cache: RAM the kernel is using for things like the SLAB and disk
    caching to speed things up.
  • available: The "real" amount of RAM you can still use before the system starts
    slowing down.
  • Gi stands for Gibibytes.
  • While we often use "GB" (Gigabytes) in everyday conversation, computers and Linux
    tools like free use binary units (powers of 2) rather than decimal units (powers
    of 10) to be more precise about memory.
getconf -a | grep CACHE_LINESIZE CPU cacheline detail. (CPU read/write data from and to CP<->RAM in atomic unit called
the CPU cacheline). (unit is byte not bit)
grep <option> Options:
  • -v: In grep, the -v (or --invert-match) flag is used to invert the search.
    Instead of showing lines that match a pattern, it excludes them and displays only the
    lines that do not contain the specified string.
  • -n: the -n (or --line-number) flag prefixes each matching line with its 1-based
    line number from the input file.
  • -w: the -w (or --word-regexp) flag forces the search to match the whole
    word only.
grub2-mkconfig grub2-mkconfig is the actual command-line tool that generates the final grub.cfg
file.
head Makefile To know kernel version. (Check top-level Makefile)
insmod Insert module. It loads a kernel module (.ko file) directly into the running kernel.
ls -R /boot/efi ISafely view /boot/efi files.
lsmod List all currently loaded kernel modules.
Each line has 3 parts:
  • Module name: Name of the kernel module
  • Size: Module size in bytes
  • Usage Count
  • Used by: How many other modules depend on it
lspci lspci is a Linux command that lists all PCI and PCI‑Express devices on your system.
PCI devices include:
  • Network cards (e.g., e1000e, igb, ixgbe)
  • Storage controllers (NVMe, SATA)
  • GPUs
  • USB controllers
  • Audio devices
  • Bridges, root ports, PCI switches

Options:
  1. -vv: Show detailed information.
  2. -k: Show which kernel driver is attached.
  3. -nn: Show device class + vendor names
lstop Visualizes NUMA hierarchy.
man -k grub Keywords of GRUB
make defconfig Default Kernel Configuration.
It generates a standard, safe, general-purpose configuration.
  • Boots on nearly all systems
  • Enables commonly used filesystems (ext4, btrfs, FAT, etc.)
  • Enables basic networking, USB, PCI, etc.
  • Includes many drivers → so build is still large
  • Does not include all the modules your Ubuntu kernel has
make distclean mrproper + remove editor backup and patch files. run this cmd in
the root of the kernel source tree, useful when you want to restart
the kernel build procedure from scratch.
make help to see make command option details
make -j8 implifying up to eight processes performing the build in parallel. All the
build processes write to the same stdout location - the console or terminal
window. Hence, the output may be out of order or mixed up.
make localmodconfig It reads currently loaded kernel modules from /proc/modules,
hardware info from /sys, /proc, lsmod and creates a new .config file.
  • It keeps only the kernel features your system actually uses.
  • Removes thousands of unused drivers.
  • Makes kernel compilation 5-10 x faster.
make menuconfig UI to fine-tune kernel configuration.
  • [*]: On, feature compiled and built in to the
    kernel image. (y)
  • [ ]: Off, not built at all (n).
  • <*>: One feature compiled and built in
    the kernel module (y).
  • <M>: Module, feature compiled and built as
    a kernel module (an LKM) (m)
  • <>: Off, not built at all (n)
  • { . }: A dependency exists for this config option;
    hence, it's required to be built or compiled as either a module (m) or to
    the kernel image (y).
  • -*- : A dependency requires this item to be compiled
    in (y).
  • ( … ): Prompt: an alphanumeric input is required.
    Press the Enter key while on this option and a prompt box appears.
  • <Menu name> ---> A sub-menu follows. Press Enter
    on this item to naqvigate to the sub-menu.
make modules Compile .ko (kernel object) file
make modules_install Getting the kernel modules installed. Sudo is not required if INSTALL_MOD_PATH refers
a location that does not require root for writing.
make oldconfig Update current config utilizing a provided .config as base
modinfo -p <module_name>
  • Shows parameters for a loadable module.
  • Shows parameter names and types.
mknod mknod is a command used to create special files in Linux/Unix, such as:
  • Character device files (e.g., /dev/null, /dev/zero, /dev/tty)
  • Block device files(e.g., /dev/sda, /dev/loop0)
  • Named pipes (FIFOs)

These special files are usually found in /dev.
mknod NAME TYPE MAJOR MINOR
where:
  • NAME: name of the device file.
  • TYPE: One of:
    • c or u: character device.
    • b: block device
    • p: FIFO (pipe)
  • MAJOR: major number of the device.
  • MINOR: minor numberof the device.
mkfs make filesystem
mkfs is a Linux command used to create (format) a filesystem on a block device —
for example, a partition like /dev/sdb1.Because it destroys all data on the target
device, it must be used carefully.
ps -A Lists all processes currently running on the system.
ps -el List of the processes and their respective nice values (under the column marked NI).
ps -eo state,uid,pid,ppid,rtprio,time,comm List of the processes and their respective real-time priority (under the column marked RTPRIO).
A value of "-" means the process is not real-time.
ps -LA shows all threads of all processes on the system.
ps aux shows a detailed snapshot of all running processes using BSD-style (Berkeley
Software Distribution) options.
  • a → processes from all users
  • u → user-oriented format
  • x → include processes without a controlling TTY (daemons)
pstree pstree is a classic tool that shows your running processes as a tree structure.
It’s much easier to read than ps when you want to see which process started (parented) another.
readelf -S <module_name> | grep ksym __ksymtab/__ksymtab_gpl section of ELF. Exported symbol of module is present in this section.
rmmod Remove Module
sed '1d'
sed '2d'
Delete 1st line.
Delete 2nd line.
systemctl isolate graphical.target systemctl isolate graphical.target is the command you use to tell Ubuntu to immediately start
the full desktop environment (GUI).
What happens when you run this?
  1. Starts the GUI: It launches your display manager (like GDM, SDDM,
    or LightDM).
  2. Loads User Services: It starts all the background services needed for a
    desktop, such as networking tools, Bluetooth, and sound servers (PulseAudio/Pipewire).
  3. Closes Minimal Shells: If you were in a restricted mode (like multi-user.target
    or text-only mode), it will "isolate" the graphical requirements and bring you to the login screen.
Common Use Cases:
  • Recovering from Maintenance: If you booted into Emergency Mode or Rescue Mode
    to fix a kernel bug and now you want to go back to the normal desktop without rebooting,
    this is the command to use.
  • Testing Graphics Drivers: If you just installed a new driver for your 6.8 kernel
    and want to see if it actually loads the GUI successfully.
time <cmd_name> To see how long a command takes to execute.
ulimit view and set resource limits. -f option to query the maximum possible size of
files written to by the shell process. unlimited only implies that there is no
particular limit imposed by the OS. Of course it's finite, limited by the actual
available disk space on the box.
uname <option>
  • -r: shows the current running kernel version.
  • -a: Shows All system info
  • -s: Kernel name
  • -v: Kernel build version
  • -m: Machine architecture (x86_64, arm64, etc.)
  • -p: Processor type
  • -o: Operating system
  • vmstat -m Slab cache detail (vmstat --> Report virtual memory statistics).
    wc -l counts the number of lines in input.

    C Specifier:
    =========

    Type Specifier
    size_t %zu
    ssize_t %zd
    Kernel pointer for security (hashed value %pk
    Actual pointer (don't use in production) %px
    Physical Address (kptr_restrict) %pa
    Raw buffer as a string of hex characters %*ph (* is replaced by the number of characters). Use it for buffer within 64 chars,
    and use the print_hex_dump_bytes() routine for more.
    IPv4 address %pI4
    IPv6 address %pI6

    printk log level:
    =============

    log level Value
    KERN_EMERG: 0
    KERN_ALERT: 1
    KERN_CRIT: 2
    KERN_ERR: 3
    KERN_WARNING: 4
    KERN_NOTICE: 5
    KERN_INFO: 6
    KERN_DEBUG: 7

    Error Meaning:
    ==========

    Error Meaning
    ESRCH Error - No Such Process
    EINVAL Invalid Argument
    ERESTARTSYS -ERESTARTSYS is a specialized error code used to handle interruptions
    caused by signals during a blocking system call. It is primarily used in
    conjunction with interruptible sleeps (such as mutex_lock_interruptible or
    wait_event_interruptible).
    EINTR EINTR (Error code 4) stands for Interrupted System Call.
    EPERM EPERM (Error code 1) stands for Operation Not Permitted.

    Signals:
    ==========

    Signal Description
    PF_EXITING
    SIGCHLD SIGCHLD (Signal: Child): is the notification the kernel sends to a parent
    process whenever one of its child processes terminates, stops, or continues.
    SIGSTOP SIGSTOP is the "hard pause" button for a process. SIGSTOP cannot be ignored,
    blocked, or handled by the process. When the kernel sends this signal, the process stops
    exactly where it is immediately.
    SIGTTIN SIGTTIN (Signal Terminal Input): is the signal sent to a background process when
    it attempts to read from its controlling terminal (keyboard).
    SIGTTOU Signal: Terminal Output. This is the signal sent to a background process
    when it tries to write data to its controlling terminal (tty).

    Flags:
    ======

    Flags Header File Description
    IRQF_SHARED <linux/interrupt.h> This allows you to share the IRQ line between several devices. Required for devices on the PCI bus.
    IRQF_ONESHOT The IRQ is not enabled after the hardirq handler finishes executing. This flag is typically used by threaded interrupts to ensure that the IRQ remains disabled until the threaded handler completes.
    __IRQF_TIMER It's used to mark the interrupt as a timer interrupt. The timer interrupt fires at periodic intervals and is responsible for implementing the kernel's timer/timeout mechanism, scheduler-related housekeeping and so on.
    _IRQF_NO_SUSPEND It specifies that the interrupt remains enabled even when the system goes into a suspend state.
    IRQF_NO_THREAD IRQF_NO_THREAD flag specifies that this interrupt cannot use the threaded model.
    IRQF_PROBE_SHARED IRQF_PROBE_SHARED is a specialized interrupt flag used by drivers that perform IRQ probing (automatic detection of interrupt lines) on devices that share an interrupt line with other hardware. Tells the kernel that the driver is willing to share the interrupt line even during the sensitive probing phase. It allows the probe to proceed even if the IRQ is already in use by another "shareable" driver.
    IRQF_PERCPU IRQF_PERCPU is a specialized interrupt flag used to indicate that a specific interrupt line is private to each CPU core.
    IRQF_NOBALANCING IRQF_NOBALANCING is a specialized interrupt registration flag used to exclude
    a specific interrupt from the kernel's automatic IRQ balancing mechanism.

    IRQ balancing is a kernel process (often assisted by the userspace irqbalance daemon) that
    periodically redistributes hardware interrupts across different CPU cores to prevent any
    single core from being overwhelmed
    IRQF_IRQPOLL IRQF_IRQPOLL is a specialized interrupt registration flag used to support the
    kernel's irqpoll mechanism. It is primarily a diagnostic and recovery tool used when
    hardware or firmware fails to correctly signal interrupts.

    When the irqpoll boot option is active, the kernel will poll all handlers registered with
    the IRQF_IRQPOLL flag whenever an unhandled interrupt occurs on any line.
    IRQF_FORCE_RESUME IRQF_FORCE_RESUME is a specialized interrupt flag used to ensure that a specific
    interrupt line is re-enabled immediately during the system resume process, even if the
    device itself has not yet been resumed.
    _IRQF_EARLY_RESUME IRQF_EARLY_RESUME is a specialized interrupt flag used to control the timing of
    when an interrupt is re-enabled during the system's transition from sleep (suspend) back
    to a running state (resume).
    IRQF_COND_SUSPEND IRQF_COND_SUSPEND is a specialized interrupt flag used to safely share an
    interrupt line between a standard device and a "non-suspending" device (like a
    system timer) during system sleep transitions.
    IRQF_TRIGGER_NONE IRQF_TRIGGER_NONE is a flag used during interrupt registration to indicate that the
    driver is not specifying a hardware trigger style (such as edge or level). When you use
    IRQF_TRIGGER_NONE, you are telling the kernel to use the default trigger configuration
    already defined for that interrupt line.
    IRQF_TRIGGER_RISING IRQF_TRIGGER_RISING is an interrupt flag used to configure an interrupt as edge-triggered.
    It specifies that the interrupt should be generated specifically when the electrical signal on the
    IRQ line transitions from a low voltage to a high voltage (the "rising edge").
    IRQF_TRIGGER_FALLING IRQF_TRIGGER_FALLING is an interrupt flag used to configure an interrupt as edge-triggered.
    It specifies that the interrupt should be generated exactly when the electrical signal on the
    IRQ line transitions from a high voltage to a low voltage (the "falling edge").
    IRQF_TRIGGER_HIGH IRQF_TRIGGER_HIGH is an interrupt flag used to configure a level-triggered interrupt.
    It specifies that the interrupt remains active as long as the voltage on the IRQ line is held
    at a high logical level.
    IRQF_TRIGGER_LOW IRQF_TRIGGER_LOW is an interrupt flag used to configure a level-triggered interrupt.
    It specifies that the interrupt is considered active as long as the voltage on the IRQ line
    is held at a low logical level.
    TIMER_DEFERABLE <linux/timer.h> TIMER_DEFERRABLE is a flag used when initializing a kernel timer to indicate that
    the timer does not need to wake up a CPU core from a deep sleep (idle) state. Standard
    timers are "hard" deadlines. If a timer expires while a CPU is in a power-saving C-state,
    the hardware will force the CPU to wake up just to handle the interrupt. This consumes
    significant battery/power.
    • The Logic: A deferrable timer says, "I want to run in 5 seconds, but if
      the CPU is asleep, don't wake it up. Just wait until the CPU wakes up for some
      other reason (like a different interrupt), and then run me."
    • The Benefit: It groups non-critical house-keeping tasks together, allowing
      the processor to stay in low-power mode longer.
    TIMER_PINNED <linux/timer.h> TIMER_PINNED is a flag used during timer initialization to ensure that a
    timer's callback function always executes on the same CPU core that scheduled it.
    Normally, the Linux scheduler and the timer wheel may move a timer to a different
    CPU core to balance the load or save power (especially on multi-core SoCs).
    • The Logic: TIMER_PINNED forbids this migration.
    • The Benefit: It improves cache locality. If your timer handler accesses
      data that is already in the L1/L2 cache of a specific CPU, running the timer on
      that same CPU avoids expensive "cache misses" and cross-core data synchronization overhead.
    TIMER_IRQSAFE <linux/timer.h> TIMER_IRQSAFE is a specialized flag used during timer initialization to indicate that the timer's
    callback function can be safely executed in hard-interrupt (atomic) context without triggering deadlocks.
    • The Conflict: If your timer callback needs to acquire a spinlock that is also held by the
      code that was interrupted, you get a deadlock.
    • The Solution: TIMER_IRQSAFE tells the kernel's Timer Wheel that this specific timer is
      designed to be "interrupt-safe." It ensures that the internal locking within the timer subsystem
      itself won't conflict with the execution of your callback.
    PF_EXITING include/linux/sched.h PF_EXITING is a process flag bit used to mark a task that has begun its termination sequence.

    Abbreviation:
    ===========

    Abbreviation Full Form Description
    .ko Kernel object Give us kernel functionality in a modular manner.
    ABI Application Binary Interface ABI refers to the low-level interface between the kernel and other software (either
    user-space applications or kernel modules). Unlike the API (Application Programming
    Interface), which is defined at the source code level, the ABI is defined
    at the binary level (registers, memory layouts, and stack conventions). Unlike the
    user-space interface, the internal kernel ABI is unstable.
    • No Stability Guarantee: There is no stable ABI for kernel modules. If you
      compile a driver for kernel v6.12, it will likely fail to load on v6.13 because
      internal data structures (like struct task_struct) frequently change their internal
      offsets.
    • Version Binding: This is why kernel modules must be recompiled for every
      specific kernel version. The vermagic string in a .ko file ensures the module's
      ABI matches the running kernel exactly.
    ASAN Address SANitizer
    ASLR Adress Space Layout Randomization
    BCC BPF Compiler Collection It is a toolkit and framework for creating eBPF programs used to trace,
    profile,and observe Linux systems at runtime with very low overhead.
    It’s widely used for performance analysis, debugging, networking, and security.
    BDI Backing Device Info It is a core data structure (struct backing_dev_info) that represents the
    properties and state of a storage device (the "backing store") that sits
    underneath a filesystem.
    BIOS Basic Input Output System
    BKL Big Kernel Lock When held, it kept the kernel in a non-preemptible state for long period
    of time. Now has been removed.
    BoF Buffer Overflow
    BSA Buddy System Allocator
    BSP Board Support Package Support files for hardware/SoC
    CFS Completely Fair Scheduler
    cgroup Control groups
    CISC Complex Instruction Set Computing
    CMA Contiguous Memory Allocator
    cmpxchg Compare and Exchange cmpxchg (Compare and Exchange) is an atomic instruction provided
    by the CPU hardware.
    1. Compare: It compares the value at a specific memory address with
      a "target value" (what you expect the value to be).
    2. Match: If the values are equal, it writes a "new value" into that
      memory address.
    3. Fail: If the values are not equal (meaning another core changed
      it first), the write is aborted, and the current value at that address is
      returned so the caller
      can try again.

    bool cmpxchg(int *address, int expected, int new_value) {
      if (*address == expected) {
     *address = new_value;
     return true; // Success!
    }
    return false; // Someone else changed it
    }
    CPIO Copy In, Copy Out CPIO is a simple archive file format used widely in Linux systems—especially for
    initramfs / initrd, embedded systems, and packaging files for the kernel.
      A .cpio file contains:
    • File metadata (permissions, uid, gid)
    • Directory structure
    • File contents
    • Optional compression (gzip, xz, lzma)
    CONFIG_MODULE_SIG Module Signature Verification CONFIG_MODULE_SIG is a kernel build-time option that controls
    module signature verification — i.e., whether the kernel requires .ko modules to be
    cryptographically signed before
    loading. CONFIG_MODULE_SIG_ALL Sign all modules automatically during
    kernel build. CONFIG_MODULE_SIG_FORCE Kernel refuses to load unsigned modules.
    cpuhp CCPU Hotplug It manage the state transitions (online/offline) for each specific CPU core.
    CR3 Control Register 3 CR3 is the control register that holds the physical address of the top-level page
    table (PGD/PML4)
    on x86/x86-64. CR3 is the x86/x86-64 equivalent of
    ARM64’s TTBR0/TTBR1.
    CTF Common Vulnerabilities and Exposures A CVE is a standardized identifier for a publicly disclosed security flaw.
    CVE Common Trace Format
    CWE Common Weakness and Enumeration While CVE (Common Vulnerabilities and Exposures) identifies a specific
    security flaw
    in a program (like a specific bug in kernel 6.17),
    CWE (Common Weakness Enumeration) identifies the type or root cause of that
    weakness.
    DAMON Data Access MONitor Capture and analyse memory access patterns of user-space process.
    dd Disc duplicator
    debugfs Debug File System
    defconfig Default Kernel Configuration
    dentry Directory entry It is a core Virtual File System (VFS) structure that represents a specific
    component in a file path.
    • d_name: The actual name of the file or directory.
    • d_inode: A pointer to the inode associated with this name.
    • d_parent: A pointer to the dentry of the parent directory.
    • d_op: A pointer to dentry_operations (methods like d_revalidate or d_delete).
    DKMS Dynamic Kernel Module Support Framework for module auto-loading.
    dm-verity Device-Mapped-Verity A kernel feature that ensures the integrity of read-only partitions like /system
    and /vendor.
    DSO Dynamic Shared Object
    DSP Digital Signal Processor Special processor for signal ops
    DTB Device Tree Blob Binary hardware description
    DTS Device Tree Source Source format of DTB
    eBPF Extended Berkeley Packet Filter in-kernel programmable VM that lets you run user-defined programs inside
    the Linux kernel without loading kernelmodules. It’s widely used for observability,
    networking, and security.
    ELF Executable and Linkable Format
    Epoll Event Poll epoll (Event Poll) is a scalable Linux-specific I/O event notification
    mechanism used to monitor multiple file descriptors (FDs) to see if I/O is possible
    on any of them.
    EUID Effective User ID Who the kernel trusts for access control.
    EXPORT_SYMBOL By default all symbols (static/global) are private to the kernel modules. Using
    EXPORT_SYMBOL we can make it global, visible to any and all other kernel modules.
    ext2 Second Extended File System It does not keep a log of intended changes.
    • Faster but riskier.
    • Doesn't write journal constantly, so takes less storage.
    • Stable and low overhead.
    ext3 Third Extended File System It records changes in a dedicated area (the "journal") before they are permanently
    applied to the main file system.
    f2fs Fast flash file system
    FIQ Fast Interrupt Request FIQ (Fast Interrupt Request) is a legacy hardware-level interrupt specific
    to the ARM (32-bit) architecture. It was designed to provide a higher-priority,
    lower-latency alternative to the standard IRQ.
    FPU Floating Point Unit
    FSUID File System User ID File-System specific checks.
    GFP Get Free Page
    GIC Generic Interrupt Controller On ARM
    GKI General Kernel Image GKI provides a generic, common Linux kernel image that works across many Android
    devices without vendors heavily modifying the core kernel.
    GPL General Public License If code is upstream into the mainline kernel, it must be under the GNU GPL-2.0
    license.
    GPOS General Purpose Operating System
    GRUB Grand Unified Bootloader
    • Default bootloader for x86 or x86_64
    • GRUB (Grand Unified Bootloader) is the software that loads right after your
      UEFI/BIOS finishes its checks. It’s the menu you see (or that stays hidden) that
      actually starts the Linux kernel.

    • How GRUB works on your system:
      1. UEFI looks at the EFI System Partition (/dev/nvme0n1p1).
      2. It runs the GRUB binary (usually grubx64.efi).
      3. GRUB reads its configuration from your /boot partition (/dev/nvme0n1p2).
      4. It loads the Kernel and Initrd into memory and starts Ubuntu.
      Key GRUB Files on your Ubuntu:
      • /boot/grub/grub.cfg: The "master" config file. Do not edit
        this manually; it’s automatically generated.
      • /etc/default/grub: This is where you make changes (like
        changing the timeout or adding "nomodeset").
      • /etc/grub.d/: A folder of scripts used to build the final config.
    HAL Hardware Abstraction Layer Layer between hw and OS
    HID Human Interface Device
    HRT High-resolution timers It is the interrupt source for the kernel's high-precision timing subsystem, which
    allows for microsecond-level (or even nanosecond-level) accuracy, far exceeding the
    old "jiffies" system.
    I2C Inter-Integrated Circuit
    IDR Integer ID Management The IDR (Integer ID Management) is a library used to map small integer
    identifiers (IDs)to pointer-based data structures. It solves the problem of
    efficiently allocating, managing, and looking up unique IDs—such as file
    descriptors, process IDs (PIDs),or device instance numbers—without the high
    memory overhead of a large array or the slow lookup times of a linked list.
    initramfs Initial RAM filesystem initramfs (Initial RAM Filesystem) is a tiny, temporary root filesystem that
    loads into your RAM right after GRUB but before your actual Ubuntu system starts.
    Think of it as the "bridge" that helps the kernel find and mount your real hard drive.
    Why you need it (especially with your setup)
    • NVMe: The kernel needs a driver to talk to your NVMe SSD.
    • LVM: Your root partition (/) is hidden inside a Logical Volume.
    • Kernel's Problem: The kernel alone doesn't know how to "unlock" an
      LVM volume or talk to every possible SSD brand.
    • The Solution: GRUB loads the initramfs file (found in
      /boot/initrd.img-...). This file contains the basic drivers (modules) and scripts
      needed to activate your LVM and mount the real / filesystem.
    inode Index node Contains file metadata such as access permissions, size, owner, creation time etc.
    The inode object represents all the information needed by the kernel to manipulate
    a file or directory.
    An inode is created in two distinct scenarios:
    physically on the disk and logically in the kernel's memory.
    1. Physical Creation (On-Disk):

    2. A new inode is allocated on the storage medium whenever a new file system
      object is created.
      This happens during:
      • File/Directory Creation: When you run mkdir, touch, or use
        the open() system call with the O_CREAT flag.
      • System Calls: The VFS calls the specific filesystem method
        (like ext3_mkdir or ext4_create).
      • Mechanism: The kernel looks at the Superblock to find a free bit
        in the Inode Bitmap, marks it as used, and initializes the inode structure
        in the disk's inode table.
    3. In-Memory Creation (VFS Objects):

    4. Even if a file already exists on disk, a "virtual" inode object must be created
      in RAM so the OS can work with it. This happens during:
      • Path Lookup: When you access a file (e.g., cat /etc/passwd),
        the kernel finds the inode number on disk and calls alloc_inode to create
        a matching struct inode in the kernel's memory.
      • Mounting: The root inode of a partition is created in memory as soon
        as the device is mounted.
    [IO][A]PIC IO-[Advanced] Programmable Interrupt Controller IO-APIC on x86
    IOCTL Input-Output Control The ioctl system call is used to issue commands to the device (via its driver).
    IoF Integer Overflow
    IRQ Interrupt ReQuest IRQ (Interrupt Request) is a signal sent by hardware to the CPU to indicate that
    an event requires immediate attention. It allows the processor to stop its current task,
    handle the hardware event, and then resume.
    ISR Interrupt Service Routine
    IWI Inter-Work Interrupt It is primarily used on ARM64 and some RISC-V systems to signal a CPU core
    that a new task has been added to its local Workqueue.
    KASAN Kernel Address SANitizer It is a dynamic memory error detector used primarily to find out-of-bounds(buffer
    overflow/underflow), use-after-free bug and double-free access.
    KASLR Kernel ASLR
    KCSAN Kernel Concurrency SANitizer
    Kbuild System for selecting kernel features Kernel Build System
    Kconfig Kernel Configuration System for selecting kernel features
    KMSAN Kernel Memory Sanitizer
    kprobe Kernel probe
    kretprobe Kernel probe return
    KSE Kernel Schedulable Entity In linux, the KSE is a thread, not a process.
    LANANA Linux Assigned Names And Numbers Authority Only these folks can officially assign the device node - the type and the major:minor
    numbers - to devices.
    LDM Linux Device Model
    LKM Loadable Kernel Module Kernel code loaded/unloaded at runtime.
    LLC Last Level Cache
    loff_t Long Offset Type loff_t is a signed 64-bit integer used to represent file positions and offsets.
    LPA Large Physical Address
    LTTng Linux Trace Toolkit- next generation Powerful and popular open-source tracing system for Linux Kernel.
    MAC Mandatory Access Control
    MBR Master Boot Record The Master Boot Record (MBR) is the first sector of a storage device (Sector 0),
    occupying exactly 512 bytes. It is the legacy standard for partitioning disks, used
    primarily by BIOS-based systems to locate and load an operating system.
    sudo xxd -l 512 /dev/nvme0n1
    The output shows a Protective MBR (Master Boot Record).

    00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000100: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000110: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000120: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000130: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000140: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000150: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000160: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000170: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000180: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    00000190: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000001a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000001b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000001c0: 0200 eeff ffff 0100 0000
    ffff ff18 0000 ................
    000001d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
    • 000001c0 (Partition Entry): The data starting with 0200 ee... is a single
      dummy partition entry of type 0xEE. This tells old BIOS systems, "This disk is full,
      don't touch it."
    • 000001fe (Boot Signature): The 55aa at the very end is the standard
      "magic number" that marks this as a valid bootable sector.
    MBR Structure Breakdown:
    The 512 bytes are strictly divided into three main components:
    Component Size Offset(Hex) Purpose
    Bootstrap Code 446 bytes 0x000–0x1BD Executable code (bootloader) that finds the active partition and
    starts the OS.
    Partition Table 64 bytes 0x1BE–0x1FD Contains 4 entries (16 bytes each) describing the disk's primary
    partitions.
    Boot Signature 2 bytes 0x1FE–0x1FF The "magic number" 0x55AA, which validates the sector as a bootable
    MBR.
    min_flt nnumber of minor page faults A minor page fault occurs when:
    • The page is already in RAM
    • But not mapped in the process page table yet
    MMU Memory Management Unit Hardware for memory translation
    Mutex Mutual Exclusion
    NAPI New API
    NBD Network Block Device
    NMI Non-maskable Interrupt NMI (Non-Maskable Interrupt) is a high-priority hardware interrupt that
    cannot be ignored or disabled by standard software masking techniques. It is
    reserved for critical events that must be handled immediately, even if the CPU is
    in a state where regular interrupts are disabled. NMI interrupt lines cannot be shared.
    NUMA Non-Uniform Memory Access
    nvcsw number of non-voluntary context switches A non-voluntary context switch happens when:
    • the kernel forces a task off the CPU
    • the task did not explicitly give up the CPU
    In /proc//stat, min_flt is field number 10.
    OF Open Firmware Used for device tree bindings
    PA Page Allocator
    PB Petabyte It’s a unit of digital storage size.
    PCIe PCI Express
    PFN Page Frame Numbers
    PGD Page Global Directory It is the top-level page table used by the Linux kernel to translate
    virtual addresses → physical addresses.
    PMD Page Middle Directory It is the third level in the Linux page table hierarchy (for most modern configs)
    and sits between PUD and PTE.
    PMI Performance Monitoring Interrupt It is a specialized interrupt generated by the CPU's Performance Monitoring Unit (PMU)
    to signal that a specific hardware counter has overflowed.
    POST Power On Self Test
    PSS Proportional Set Size Physical memory used by a process, where shared pages are divided proportionally among
    all sharers.
    Example:
    • 2 processes share a 10 MB library
    • Each process gets 5 MB PSS from that library
    pts Pseudo-Terminal Slave number 1 Unlike /dev/tty1 (which represents a physical keyboard and monitor attached to the
    machine), a pts is a "fake" terminal created by software.
    Why are you on a PTS?
    You get a pts address whenever you connect to the system via:
    • SSH (Remote login)
    • Terminal Emulators (Gnome Terminal, xterm, Terminator)
    • Multiplexers (Tmux or Screen)
    PTE Page Table Entry It is the lowest (leaf) level of the Linux page table hierarchy and directly maps a
    virtual page to a physical page.
    Virtual Address --> PGD (L0) → PUD (L1) → PMD (L2) → PTE (L3) →
    Physical Page (4 KB).
    pty Pseudo-terminal
    PUD Page Upper Directory It is the second level in the Linux page table hierarchy and sits between PGD
    and PMD.
    RCU Ready-Copy update RCU (Read-Copy-Update) is a high-performance synchronization mechanism that
    allows multiple "readers" to access data simultaneously with a "writer.
    • Read: Readers access data directly. There is zero overhead (no spinning,
      no sleeping).
    • Copy: When a writer wants to change the data, it doesn't modify the
      original. It makes a copy, modifies the copy, and then swaps the pointer to the
      new version.
    • Update: The old data isn't deleted immediately. The kernel waits for
      a Grace Period (until all existing readers are finished) before safely freeing
      the old memory.
    RISC Reduced Instruction Set Computer
    RMW Read-Modify-Write
    RSS Resident Set Size The amount of physical RAM currently occupied by a process. RSS counts only pages
    that are resident in RAM, such as:
    • Code (text) pages
    • Heap
    • Stack
    • Shared libraries (counted per process)
    RTC Real Time Clock
    RTL Real Time Linux
    RUID Real User ID Who started the process.!
    SCL Serial Clock
    SCSI Small Computer System Interface
    SDA Serial Data
    sed Simple encrypt decrypt
    SELinux Security Enhanced Linux
    SEV Send Event SEV (Send Event) is the companion instruction to WFE. It acts as a signaling
    mechanism to wake up processor cores that have entered a low-power standby state. When
    a core executes SEV, it causes an event to be signaled to all cores in the
    multiprocessor system (or within a specific sharing domain).
    1. The Signal: It sets a local "event latch" (a hidden internal bit) on every
      core in the cluster.
    2. The Wakeup: Any core currently "sleeping" in a WFE (Wait For Event) state
      will see this latch set, wake up, and resume instruction execution.
    3. The Latch: If a core is not sleeping when SEV is called, the event latch
      remains set. When that core eventually reaches a WFE instruction, it will see the
      latch is already set and simply continue running without ever going to sleep (this
      prevents "missing" a signal).
    SIMD Single Instruction, Multiple Data
    SLOCs Source Lines of Code
    SMP Symmetric MultiProcessing Multiple CPUs sharing memory
    SOH Start of Header
    SPDX Software Package Data Exchange A shorthand and concise format for expressing the license the code is under. Must
    be 1st line in every source file.
    //SPDX-License-Identifier: GPL-2.0
    SUID Saved User ID For temporarily dropping/regaining privilege.
    SVE Scalable Vector Extension
    systemd System Daemon
    TLB Translation Lookaside Buffer It’s a CPU hardware cache that speeds up virtual → physical
    address translation.
    TGID Thread Group ID
    TTBR0 Translation Table Base Register 0 It is is an ARM64 CPU register that tells the MMU where the page tables
    for user space start. TTBR0_EL1
    holds the physical base address of the page
    tables used for translating user-space virtual addresses.
    TTBR1 Translation Table Base Register 1 For kernel paging table.
    ttv Teletype terminal
    UAF Use After Free
    UB Undefined Behavior
    UBSAN Undefined BehaviorSanatizer UBSAN (Undefined Behavior Sanitizer) is a runtime debugging tool for
    the Linux kernel that detects Undefined Behavior—actions in C that the
    language standard doesn't define, often leading to unpredictable crashes or
    security flaws.
    What it catches:
    It identifies common "silent" bugs that compilers usually ignore:
    • Integer Overflows: Signed integer addition/subtraction exceeding
      its bit limit.
    • Array Out-of-Bounds: Accessing an index outside the declared
      size of an array.
    • Invalid Shifts: Shifting an integer by more bits than its width
      (e.g., shifting a 32-bit int by 33).
    • Misaligned Pointers: Accessing memory through a pointer that
      isn't aligned with the data type.
    • Null Pointer Dereferences: Using a pointer that points to NULL.
    UEFI Unified Extensible Firmware Interface
    • UEFI (Unified Extensible Firmware Interface) is the modern replacement
      for the legacy BIOS.
    • Much more secure, it only allow "signed" operating systems (apps) to be
      booted via it.
    • It requires a special partition called ESP (EFI System Partition);
      it holds a .efi file that contains the initialization code and data,
      unlike the BIOS, where it's written in firmware (EEPROM chip).
    • Faster than BIOS.
    • It lets you run 32- or 64 bit code.
    • Drive size: the BIOS supports only up to 2.2 TB disks, whereas UEFI can
      support disks upto 9 ZB(zettabytes) in size.
    umh User Mode Helper
    UMR Uninitialized Memory Reads
    UTS Unix Timesharing System It provides domain name and hostname isolation.
    VDSO Virtual Dynamic Shared Object
    • Provides fast system calls (e.g. gettimeofday, clock_gettime)
    • Avoids expensive svc (syscall) transitions
    • Architecture-specific
    VFS Virtual File System
    wfe Wait For Event WFE (Wait For Event) is a hint instruction used to put a processor
    into a low-power standby state until a specific "event" occurs.
    w/w Wait/wound It is a specialized mutex implementation used to handle deadlock avoidance
    when a thread needs to acquire multiple locks at once. W/W mutexes use a Ticket
    (Timestamp) system. Every "transaction" (a set of lock attempts) gets a serial number.
    • Wait: If a "younger" thread (higher ticket number) hits a lock held
      by an "older" thread, it must wait.
    • Wound: If an "older" thread (lower ticket number) hits a lock held by
      a "younger" thread, it wounds the younger one. The younger thread must drop all
      its locks and start over (back off).

    Questions:
    ===========

    Q. When to use a Softirq?
    Ans: Use a softirq only if you are writing core kernel infrastructure that requires extreme performance and massive parallelism.
    • Parallelism: The same softirq can run on multiple CPUs simultaneously.
    • Complexity: You must ensure your code is perfectly re-entrant and uses complex fine-grained locking.
    • Usage: Reserved for Networking (NET_RX/NET_TX), Block I/O, and RCU.
    • Static: You cannot add new softirqs without modifying and recompiling the core kernel.

    Q. When to use a Tasklet?
    Ans: Use a tasklet if you are maintaining legacy driver code that requires a simple, atomic bottom half.

    • Ease of Use: Tasklets are dynamically allocatable and don't require you to worry about multi-CPU concurrency.
    • Serialization: A specific tasklet will never run on two CPUs at once. This simplifies locking significantly.
    • Execution: They always run on the same CPU that scheduled them, which is good for cache locality.

    Q. In SMP can 1 method run critical section on 1 core and interrupt handler on 2nd core for the same critical section?
    Ans: Yes, Without proper synchronization, a critical section can be accessed simultaneously by a process on one core and an interrupt handler on another. The Scenario: The "Race Condition" Imagine you have a shared data structure protected by a standard mutex or a simple flag.

    • Core 1: Thread A enters the critical section (acquires a lock).
    • Core 2: A hardware interrupt occurs. The CPU stops what it's doing and jumps to the Interrupt Service Routine (ISR).
    • The Conflict: If the ISR on Core 2 tries to access the same data structure while Thread A is still holding it on Core 1, you have a collision.

    Q. Why standard Mutexes fail here?
    Ans: In the Linux kernel, an Interrupt Handler cannot sleep. Because standard mutexes (mutex_lock) put a thread to sleep if the lock is held, you cannot use them inside an interrupt handler. If the ISR tries to take a mutex held by Core 1, the system will likely crash or panic.

    Q. why spin_lock will not work in this case?
    Ans: A simple spin_lock fails because it does not account for the same-core deadlock scenario. Even in an SMP (Symmetric Multiprocessing) system, an interrupt can fire on the same core that is currently holding the lock.

    The Local Deadlock (Self-Deadlock)
    If a process on Core 1 acquires a simple spin_lock, it successfully enters the critical section. However, if a hardware interrupt occurs on that same core (Core 1) before the lock is released:

    • The kernel stops the process and starts the Interrupt Service Routine (ISR).
    • If the ISR tries to acquire the same spinlock, it will see the lock is already "taken" and will begin spinning (looping) to wait for it.
    • The process that holds the lock can never run to release it because it has been preempted by the very ISR that is now spinning.
    • Result: The core is deadlocked in a permanent "spin".

    Q. Why SMP Doesn't Solve This?
    Ans: While you might think the ISR on Core 2 would be fine (it would just spin until Core 1 finishes), you cannot guarantee which core will receive a specific interrupt. If the interrupt happens to hit the core holding the lock, the entire system can hang.

    In an SMP system, an interrupt arriving on Core 2 cannot physically preempt a process running on Core 1. So why is spin_lock still "wrong"?
    Ans: The reason you are told a "simple spin_lock will not work" is not because of Core 2; it is because of the uncertainty of interrupt routing. In most modern systems, the Programmable Interrupt Controller (APIC) decides which core gets an interrupt. You cannot guarantee the interrupt will always go to Core 2. If that same interrupt happens to be routed to Core 1 while Core 1 is holding the lock:

    • Core 1 stops the process to handle the interrupt.
    • The ISR on Core 1 tries to grab the lock Core 1 is already holding.
    • Deadlock: Core 1 spins forever waiting for itself.

    The Solution: Spinlocks with IRQ Disabling
    To protect a critical section from being accessed by both a thread and an interrupt handler across different cores, you must use a Spinlock combined with Interrupt Disabling. spin_lock_irqsave()
    This is the "gold standard" for this problem. When you call this:

    • On the local core (Core 1): It disables interrupts. This prevents an interrupt from firing on this core and trying to re-enter the critical section.
    • Across the system: It acquires a spinlock. If an interrupt fires on Core 2 and tries to enter the same critical section, it will "spin" (loop rapidly) waiting for Core 1 to release the lock.

    Q. spin_lock() works in process_context or atomic_context?
    Ans: spin_lock()is versatile and can be used in both Process Context and Atomic Context, but its behavior changes how the system treats those contexts.

    Releases

    No releases published

    Packages

     
     
     

    Contributors