Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@ jobs:
set -euxo pipefail
sudo apt-get update
sudo apt-get purge firefox passt
# Microsoft Edge (preinstalled on the runner image) ships two AppArmor
# profiles attached to the same binary (/etc/apparmor.d/msedge and
# /etc/apparmor.d/microsoft-edge-stable). That duplicate attachment makes
# aa-disable abort while parsing all profiles, so remove them first.
sudo rm -f /etc/apparmor.d/msedge /etc/apparmor.d/microsoft-edge-stable
sudo systemctl reload apparmor.service
sudo apt-get install apparmor-utils
sudo aa-disable /usr/sbin/unix_chkpwd
Expand Down Expand Up @@ -90,6 +95,11 @@ jobs:
set -euxo pipefail
sudo apt-get update
sudo apt-get purge firefox passt
# Microsoft Edge (preinstalled on the runner image) ships two AppArmor
# profiles attached to the same binary (/etc/apparmor.d/msedge and
# /etc/apparmor.d/microsoft-edge-stable). That duplicate attachment makes
# aa-disable abort while parsing all profiles, so remove them first.
sudo rm -f /etc/apparmor.d/msedge /etc/apparmor.d/microsoft-edge-stable
sudo systemctl reload apparmor.service
sudo apt-get install apparmor-utils
sudo aa-disable /usr/sbin/unix_chkpwd
Expand Down
59 changes: 57 additions & 2 deletions docs/user/reference/config/overlays.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,20 +47,44 @@ successfully makes a replacement to at least one matching file.
| `file-remove` | Removes a file | `file` | Glob pattern for files to remove |
| `file-rename` | Renames a file within the same directory | `file`, `replacement` | Name of file to rename |

> **Tip:** `file-remove` and `file-search-replace` can also operate inside a source archive by
> setting the `archive` field — see [Archive Overlays](#archive-overlays).

### Archive Overlays

A `file-remove` or `file-search-replace` overlay can modify files **inside** a source archive

@PawelWMS PawelWMS Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this support removing/editing files in nested archives? I'm thinking about cases where a.tar.gz contains a/internal/archive.tar.gz and you want to remove a.tar.gz/a/internal/archive.tar.gz/nested/dir/file-to-remove.txt.

If it doesn't, I don't think it's a P0 (I believe all 10 scripts can be replaced without the nested archives supported), but with the new archive field required for differentiating between "regular" files and archives, adding this extension in the future may be tricky.

Considering that, do you think it would be possible to use the file path alone to denote the file to be removed/edited and automatically detect, if a file on the path is a directory vs an archive as in the a.tar.gz/a/internal/archive.tar.gz/nested/dir/file-to-remove.txt example? Starting with this would give us support for nested archives in one go.

PS Cool idea with re-using existing overlays.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The idea was originally Reubens but I'm glad the implementation is sound. I think it would be better if we revisited this nested archive format in the future. I do agree it's important, but it might introduce complexity to the overlay application process

instead of loose files in the sources tree. Set the `archive` field to scope it to that archive.
The archive is extracted into a temporary directory, the matching files are modified with the
same machinery as loose-file overlays, and the archive is repacked with its original compression
format. Extraction and repacking are handled natively.

> **Note:** Archive overlays are batched per archive — all overlays targeting the same archive
> share a single extract/modify/repack cycle — and the `sources` file is rehashed afterward to
> reflect the repacked archive. They are processed independently of spec and loose-file overlays.

> **Extraction root:** The `file` glob in an archive overlay is interpreted relative to the archive's extraction root. By default the root is inferred: if the archive unpacks to a single top-level directory (the conventional `%{name}-%{version}` layout) that directory is used; otherwise the archive root is used. Set `archive-root` to override this — the equivalent of rpmbuild's `%setup -n` — when an archive's top-level directory does not follow that convention.

| Type | Description | Required Fields |
|------|-------------|-----------------|
| `file-remove` + `archive` | Removes file(s) matching a glob pattern from inside an archive | `archive`, `file` |
| `file-search-replace` + `archive` | Regex-based search and replace on file(s) inside an archive | `archive`, `file`, `regex` |

## Field Reference

| Field | TOML Key | Description | Used By |
|-------|----------|-------------|---------|
| Type | `type` | **Required.** The overlay type to apply | All overlays |
| Description | `description` | Human-readable explanation documenting the need for the change; helps identify overlays in error messages | All (optional) |
| Archive | `archive` | The source archive filename to scope an overlay to (must be a basename, not a path). When set, the overlay operates on files inside that archive. | `file-remove`, `file-search-replace` (optional) |
| Archive root | `archive-root` | Top-level directory inside the archive to treat as the extraction root (mirrors `%setup -n`); inferred when unset. Must be a local relative path (no `..` or absolute paths). When multiple overlays target the same archive, any that set this must agree. | archive-scoped `file-remove` / `file-search-replace` (optional) |
| Tag | `tag` | The spec tag name (e.g., `BuildRequires`, `Requires`, `Version`) | `spec-add-tag`, `spec-insert-tag`, `spec-set-tag`, `spec-update-tag`, `spec-remove-tag` |
| Value | `value` | The tag value to set, or value to match for removal | `spec-add-tag`, `spec-insert-tag`, `spec-set-tag`, `spec-update-tag`, `spec-remove-tag` (optional for matching) |
| Value | `value` | The tag value to set, or value to match for removal. | `spec-add-tag`, `spec-insert-tag`, `spec-set-tag`, `spec-update-tag`, `spec-remove-tag` (optional for matching) |
| Section | `section` | The spec section to target (e.g., `%build`, `%install`, `%files`, `%description`) | `spec-prepend-lines`, `spec-append-lines`, `spec-search-replace` (optional), `spec-remove-section` |
| Package | `package` | The sub-package name for multi-package specs; omit to target the main package | All spec overlays (optional, except `spec-remove-subpackage` which **requires** it) |
| Regex | `regex` | Regular expression pattern to match | `spec-search-replace`, `file-search-replace` |
| Replacement | `replacement` | Literal replacement text; capture group references like `$1` are **not** expanded. Omit or leave empty to delete matched text. | `spec-search-replace`, `file-search-replace`, `file-rename` |
| Lines | `lines` | Array of text lines to insert | `spec-prepend-lines`, `spec-append-lines`, `file-prepend-lines` |
| File | `file` | The name of the non-spec file to modify or add | `file-prepend-lines`, `file-search-replace`, `file-add`, `file-remove`, `file-rename`, `patch-add` (optional), `patch-remove` |
| File | `file` | The name of the non-spec file to modify or add, or a glob pattern. For an archive-scoped overlay, it is matched against the archive's extracted contents. | `file-prepend-lines`, `file-search-replace`, `file-add`, `file-remove`, `file-rename`, `patch-add` (optional), `patch-remove` |
| Source | `source` | Path to source file for `file-add` and `patch-add`; relative paths are relative to the config file | `file-add`, `patch-add` |

> **Note:** For `file-rename`, the `replacement` field is a **filename only** (not a path). The file is renamed within its current directory.
Expand Down Expand Up @@ -274,6 +298,37 @@ description = "Remove CVE patches that are now upstream"
> `PatchN` tags. Macro-based tag numbering (e.g., `Patch%{n}`) is not expanded and may
> conflict with auto-assigned numbers.

### Removing a File from an Archive

Set the `archive` field on a `file-remove` overlay to delete files matching a glob pattern from
inside a source archive. The archive is extracted, matching files are removed, and the archive is
repacked.

```toml
[[components.mypackage.overlays]]
type = "file-remove"
archive = "mypackage-1.0.tar.gz"
file = "vendor/**"
description = "Remove all bundled vendor files"
```

> **Tip:** Without the `archive` field, the same `file-remove` overlay removes a loose file from
> the sources tree instead. The `archive` field is the only thing that scopes it to an archive.

### Search and Replace Inside an Archive

Set the `archive` field on a `file-search-replace` overlay to rewrite content inside an archive:

```toml
[[components.mypackage.overlays]]
type = "file-search-replace"
archive = "mypackage-1.0.tar.xz"
file = "configure.ac"
regex = "AC_CHECK_LIB\\(old_lib"
replacement = "AC_CHECK_LIB(new_lib"
description = "Update library reference in configure script"
```

### Removing a Section

The `spec-remove-section` overlay removes an entire section from the spec, including its
Expand Down
28 changes: 21 additions & 7 deletions internal/app/azldev/cmds/component/preparesources.go
Original file line number Diff line number Diff line change
Expand Up @@ -138,13 +138,7 @@ func PrepareComponentSources(env *azldev.Env, options *PrepareSourcesOptions) er
)
}

if options.AllowNoHashes {
preparerOpts = append(preparerOpts, sources.WithAllowNoHashes())
}

if options.SkipSources {
preparerOpts = append(preparerOpts, sources.WithSkipLookaside())
}
preparerOpts = appendPrepareSourcesOptions(env, preparerOpts, options, distro)

preparer, err := sources.NewPreparer(sourceManager, env.FS(), env, env, preparerOpts...)
if err != nil {
Expand Down Expand Up @@ -194,3 +188,23 @@ func CheckOutputDir(env *azldev.Env, options *PrepareSourcesOptions) error {
"use --force to delete and recreate it",
options.OutputDir)
}

// appendPrepareSourcesOptions appends conditional preparer options that control
// hashing and lookaside behavior. Extracted from
// [PrepareComponentSources] to keep cyclomatic complexity within limits.
func appendPrepareSourcesOptions(
_ *azldev.Env,
opts []sources.PreparerOption,
options *PrepareSourcesOptions,
_ sourceproviders.ResolvedDistro,
) []sources.PreparerOption {
if options.AllowNoHashes {
opts = append(opts, sources.WithAllowNoHashes())
}

if options.SkipSources {
opts = append(opts, sources.WithSkipLookaside())
}

return opts
}
238 changes: 238 additions & 0 deletions internal/app/azldev/core/sources/archiveoverlays.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

package sources

import (
"fmt"
"log/slog"
"os"
"path/filepath"

"github.com/microsoft/azure-linux-dev-tools/internal/global/opctx"
"github.com/microsoft/azure-linux-dev-tools/internal/projectconfig"
"github.com/microsoft/azure-linux-dev-tools/internal/utils/archive"
"github.com/microsoft/azure-linux-dev-tools/internal/utils/rootfs"
)

// applyArchiveOverlays groups archive overlays by target archive and processes
// them in order. Multiple overlays targeting the same archive are batched into
// a single extract/modify/repack cycle. File removals inside the archive reuse
// the same machinery as loose-file overlays ([applyNonSpecOverlay]).
func applyArchiveOverlays(
dryRunnable opctx.DryRunnable,
eventListener opctx.EventListener,
sourcesDirPath string,
overlays []projectconfig.ComponentOverlay,
) error {
groups, err := groupOverlaysByArchive(overlays)
if err != nil {
return err
}

if len(groups) == 0 {
return nil
}

operationCount := 0
for _, group := range groups {
operationCount += len(group.overlays)
}

event := eventListener.StartEvent("Applying archive overlays",
"archives", len(groups),
"operations", operationCount,
)
defer event.End()

for _, group := range groups {
if err := processArchive(dryRunnable, sourcesDirPath, group); err != nil {
return fmt.Errorf("archive overlay failed for %#q:\n%w", group.archive, err)
}
}

return nil
}

// archiveGroup holds overlays targeting the same archive, preserving order.
type archiveGroup struct {
archive string
root string
overlays []projectconfig.ComponentOverlay
}

// groupOverlaysByArchive groups archive overlays by their
// [projectconfig.ComponentOverlay.Archive] field, preserving insertion order
// within each group and across groups. Non-archive overlays are silently skipped.
//
// The optional [projectconfig.ComponentOverlay.ArchiveRoot] override (mirroring
// rpmbuild's `%setup -n`) is reconciled per archive: all overlays targeting the
// same archive that set it must agree, otherwise the configuration is ambiguous
// and an error is returned.
func groupOverlaysByArchive(overlays []projectconfig.ComponentOverlay) ([]archiveGroup, error) {
orderMap := make(map[string]int)

var groups []archiveGroup

for _, overlay := range overlays {
if !overlay.ModifiesArchive() {
continue
}

idx, exists := orderMap[overlay.Archive]
if !exists {
idx = len(groups)
orderMap[overlay.Archive] = idx

groups = append(groups, archiveGroup{archive: overlay.Archive})
}

if overlay.ArchiveRoot != "" {
if groups[idx].root != "" && groups[idx].root != overlay.ArchiveRoot {
return nil, fmt.Errorf(
"conflicting %#q overrides for archive %#q: %#q vs %#q",
"archive-root", overlay.Archive, groups[idx].root, overlay.ArchiveRoot,
)
}

groups[idx].root = overlay.ArchiveRoot
}

groups[idx].overlays = append(groups[idx].overlays, overlay)
}

return groups, nil
}

// processArchive extracts an archive to a temp directory, applies all overlays,
// and deterministically repacks it in-place with the original compression.
func processArchive(
dryRunnable opctx.DryRunnable,
sourcesDirPath string,
group archiveGroup,
) error {
Comment thread
Tonisal-byte marked this conversation as resolved.
archivePath := filepath.Join(sourcesDirPath, group.archive)

// Create a temporary directory for extraction directly on the real filesystem.
// The [archive] package operates exclusively through OS primitives ([os.Root],
// os.*), so the work directory must be a genuine on-disk path regardless of the
// injected FS implementation. Using os.MkdirTemp here (instead of the injected
// FS) makes that requirement explicit and keeps the path valid even when fs is
// an in-memory or otherwise non-OS-backed FS (e.g., in tests or alternate runners).
workDir, err := os.MkdirTemp("", "archive-overlay-")
Comment thread
Tonisal-byte marked this conversation as resolved.
if err != nil {
return fmt.Errorf("creating temp directory:\n%w", err)
}

defer func() {
if removeErr := os.RemoveAll(workDir); removeErr != nil {
slog.Warn("Failed to clean up archive work directory", "error", removeErr)
}
}()

// Extract the archive; compression is inferred from the filename extension.
if err := archive.ExtractAuto(archivePath, workDir); err != nil {
Comment thread
Tonisal-byte marked this conversation as resolved.
Comment thread
Tonisal-byte marked this conversation as resolved.
return fmt.Errorf("extracting archive:\n%w", err)
}
Comment thread
Tonisal-byte marked this conversation as resolved.

// Determine the root of the extracted content. Most source archives have
// a single top-level directory (e.g., "pkg-1.0/"); group.root overrides this
// inference when set (mirrors rpmbuild's `%setup -n`).
extractRoot, err := resolveExtractRoot(workDir, group.root)
if err != nil {
return fmt.Errorf("resolving extract root:\n%w", err)
}

// Confine an FS to the extract root so file overlays reuse the same machinery
// as loose-file overlays. The extracted tree is always on the real filesystem
// (written by the [archive] package), so root it on an OS-backed FS regardless
// of the injected fs implementation.
extractFS, err := rootfs.New(extractRoot)
if err != nil {
return fmt.Errorf("confining FS to extract root:\n%w", err)
}

defer func() {
if closeErr := extractFS.Close(); closeErr != nil {
slog.Warn("Failed to close extract-root FS", "error", closeErr)
}
}()

// Apply each overlay operation in order. Archive overlays are restricted to
// file-remove / file-search-replace (see [projectconfig.ComponentOverlay.ModifiesArchive]),
// which operate solely on the destination tree, so the extract-root FS is passed as
// both the source and destination FS — there is no component-source FS to read from.
for _, overlay := range group.overlays {
if err := applyNonSpecOverlay(dryRunnable, extractFS, extractFS, overlay); err != nil {
return fmt.Errorf("applying %#q operation:\n%w", overlay.Type, err)
}
}

// Deterministically repack the archive in-place, reusing the original compression.
if err := archive.CreateDeterministicArchiveAuto(archivePath, workDir); err != nil {
Comment thread
Tonisal-byte marked this conversation as resolved.
return fmt.Errorf("repacking archive:\n%w", err)
}
Comment thread
Tonisal-byte marked this conversation as resolved.

slog.Info("Archive overlay applied", "archive", group.archive)

return nil
}

// resolveExtractRoot returns the effective root of an extracted archive.
// When rootOverride is set (the `%setup -n` equivalent), the named subdirectory
// of workDir is used; it must be a local path that exists as a directory. When
// rootOverride is empty, the root is inferred: if workDir contains exactly one
// entry and that entry is a directory (the common case for source archives like
// "pkg-1.0/"), that subdirectory is returned; otherwise workDir itself is
// returned.
func resolveExtractRoot(workDir, rootOverride string) (string, error) {
Comment thread
Tonisal-byte marked this conversation as resolved.
if rootOverride != "" {
// Defense in depth: validation already rejects non-local overrides, but
// re-check before joining so a malformed value can never escape workDir.
if !filepath.IsLocal(rootOverride) {
return "", fmt.Errorf("archive root %#q is not a local path", rootOverride)
}

target := filepath.Join(workDir, rootOverride)

info, err := os.Stat(target)
Comment thread
Tonisal-byte marked this conversation as resolved.
if err != nil {
return "", fmt.Errorf("archive root %#q not found after extraction:\n%w", rootOverride, err)
}

if !info.IsDir() {
return "", fmt.Errorf("archive root %#q is not a directory", rootOverride)
}

return target, nil
}

entries, err := os.ReadDir(workDir)
Comment thread
Tonisal-byte marked this conversation as resolved.
if err != nil {
return "", fmt.Errorf("reading extracted directory:\n%w", err)
}

if len(entries) == 1 && entries[0].IsDir() {
return filepath.Join(workDir, entries[0].Name()), nil
}

return workDir, nil
}

// archiveNamesFromOverlays returns the unique archive filenames targeted by
// archive overlays in the given overlay list. Used by [updateSourcesFile] to
// determine which 'sources' entries need rehashing after overlay application.
func archiveNamesFromOverlays(overlays []projectconfig.ComponentOverlay) []string {
seen := make(map[string]bool)

var names []string

for _, overlay := range overlays {
if overlay.ModifiesArchive() && !seen[overlay.Archive] {
seen[overlay.Archive] = true
names = append(names, overlay.Archive)
}
}

return names
}
Loading
Loading