Skip to content

sgl-umons/gawd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

175 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gawd - GitHub Actions Workflow Differ

Tests License: LGPL v3 PyPI Commits SWH

gawd stands for GitHub Actions Workflow Differ. It is a Python library and command-line tool that computes all changes (i.e., diffs) between two GitHub Actions workflow files. Given a pair of workflow files as input, and taking the specific syntax of GitHub Actions into account, gawd reports on the items that were added and removed, as well on items that were moved, renamed or changed based on their similarity. More details can be found in the associated scientific publication.


Project relocation notice

Important: The project repository has moved.

  • Current address: https://github.com/sgl-umons/gawd
  • Old address: https://github.com/pooya-rostami/gawd

All future development, issues, pull requests, and releases are maintained under the new repository. Please update your bookmarks, CI references, and dependency links accordingly.


Installation

gawd can be used in two ways: as a command-line tool and as a reusable Python library.

You can use pip install gawd to install the latest available version from PyPI. Pre-releases are available from the main branch on GitHub and can be installed with pip install git+https://github.com/pooya-rostami/gawd.

Alternatively, gawd is available via Nix.


Usage

As a command-line tool

After installation, the gawd command-line tool should be available in your shell. Otherwise, you can call it using python -m gawd.

usage: gawd [-h] [--threshold X] [--position-weight X] [--job-name-weight X]
            [--short] [--json] [--verbose]
            first second

gawd is an open source GitHub Actions Workflow Differencing tool that is aware
of the specific workflow syntax of GitHub Actions workflows. Given a pair of
workflow files as input, the tool reports on the items that were added and
removed, as well on items that were moved, renamed or changed based on their
similarity.

positional arguments:
  first                 path to first workflow (YAML) file
  second                path to second workflow (YAML) file

optional arguments:
  -h, --help            show this help message and exit
  --threshold X, -t X   ranged from 0 to 1, distance threshold to map items,
                        value closer to 1 favours "changed", value closer to 0
                        favours "added" and "removed" (default is 0.5)
  --position-weight X, -p X
                        ranged from 0 to 1, weight of item positions when
                        comparing sequences (default is 0.2)
  --job-name-weight X, -j X
                        ranged from 0 to 1, weight of job names when comparing
                        jobs (default is 0.2)
  --short, -s           limit the output of values to a few characters
  --json                output in json
  --verbose             output in more detail

Examples:

Let us take the changes made to the workflow file "main.yml" in this commit as an example.

--short Condensed output of gawd when applied to compute the changes:

>>> gawd old_main.yaml new_main.yaml --short
changed jobs.build-linux-clangpdb-gcc5.steps[1].run from "'sudo apt-get update (...) UB_PATH\n'" to "'sudo apt-get update (...) UB_PATH\n'"
renamed jobs.build-linux-clang38 to jobs.build-linux-clangdwarf
changed jobs.build-linux-clang38.name from "'Build Linux CLANG38'" to "'Build Linux CLANGDWARF'"
changed jobs.build-linux-clang38.env.TOOLCHAINS from "'CLANG38'" to "'CLANGDWARF'"
changed jobs.build-linux-clang38.steps[6].with.name from "'Linux CLANG38 Artifacts'" to "'Linux CLANGDWARF Artifacts'"

--json To output the results in JSON format:

>>> gawd old_main.yaml new_main.yaml --short --json
[{"type": "changed", "old": {"path": "jobs.build-linux-clangpdb-gcc5.steps[1].run", "value": "'sudo apt-get update (...) UB_PATH\\n'"}, "new": {"path": "jobs.build-linux-clangpdb-gcc5.steps[1].run", "value": "'sudo apt-get update (...) UB_PATH\\n'"}},
{"type": "renamed", "old": {"path": "jobs.build-linux-clang38", "value": "{'name': 'Build Linu (...) *.zip'}}]}"}, "new": {"path": "jobs.build-linux-clangdwarf", "value": "{'name': 'Build Linu (...) *.zip'}}]}"}},
{"type": "changed", "old": {"path": "jobs.build-linux-clang38.name", "value": "'Build Linux CLANG38'"}, "new": {"path": "jobs.build-linux-clangdwarf.name", "value": "'Build Linux CLANGDWARF'"}},
{"type": "changed", "old": {"path": "jobs.build-linux-clang38.env.TOOLCHAINS", "value": "'CLANG38'"}, "new": {"path": "jobs.build-linux-clangdwarf.env.TOOLCHAINS", "value": "'CLANGDWARF'"}},
{"type": "changed", "old": {"path": "jobs.build-linux-clang38.steps[6].with.name", "value": "'Linux CLANG38 Artifacts'"}, "new": {"path": "jobs.build-linux-clangdwarf.steps[6].with.name", "value": "'Linux CLANGDWARF Artifacts'"}}]

--threshold A value between 0 and 1 representing the sensitivity of gawd in identifying changes. A higher threshold results in more instances of 'changed' or 'renamed', while a lower threshold favours 'added' or 'removed':

>>> gawd old_main.yaml new_main.yaml --short --threshold 0.1
removed jobs.build-linux-clangpdb-gcc5.steps[1] with {'name': 'Install De (...) B_PATH\n'}
added jobs.build-linux-clangpdb-gcc5.steps[1] with {'name': 'Install De (...) B_PATH\n'}
removed jobs.build-linux-clang38 with {'name': 'Build Linu (...) *.zip'}}]}
added jobs.build-linux-clangdwarf with {'name': 'Build Linu (...) *.zip'}}]}

As an importable library

gawd comes with two functions to compare workflow files, namely diff_workflow_files and diff_workflows. The former accepts the paths to two workflow files, loads them and returns the output of diff_workflows. The latter accepts two workflows as Python dictionaries (e.g., loaded with ruamel.yaml) and returns a list of 5-tuples (kind, old_path, old_value, new_path, new_value).

The kind component is one of added, removed, changed, moved, renamed and indicates the kind of change. old_path and new_path correspond to a list notation indicating where the change occurred with numbers being indicators of mapped place in sequential structures, while old_value and new_value correspond to the previous and new values, respectively. Notice that old_path and old_value are set to None in case of an "added" change, and new_path and old_path are None in case of a "removed" change.

Similarly to the extra parameters that can be provided to the CLI tool, the gawd module exposes THRESHOLD, POSITION_WEIGHT and JOB_NAME_WEIGHT.

The following example shows the output of the imported library version of gawd for the changes made to the workflow file main.yml in this commit.

>>> from gawd import diff_workflow_files
>>> old_workflow_path = ...
>>> new_workflow_path = ...
>>> diff_workflow_files(old_workflow_path, new_workflow_path)
[('changed', ['jobs', 'build-linux-clangpdb-gcc5', 'steps', 1, 'run'], 'sudo apt-get update\nsudo apt-get install nasm ...', ['jobs', 'build-linux-clangpdb-gcc5', 'steps', 1, 'run'], 'sudo apt-get update\nsudo apt-get install nasm ...'),
('renamed', ['jobs', 'build-linux-clang38'], {'name': 'Build Linux CLANG38', ...}, ['jobs', 'build-linux-clangdwarf'], {'name': 'Build Linux CLANGDWARF', ...}), 
('changed', ['jobs', 'build-linux-clang38', 'name'], 'Build Linux CLANG38', ['jobs', 'build-linux-clangdwarf', 'name'], 'Build Linux CLANGDWARF'), 
('changed', ['jobs', 'build-linux-clang38', 'env', 'TOOLCHAINS'], 'CLANG38', ['jobs', 'build-linux-clangdwarf', 'env', 'TOOLCHAINS'], 'CLANGDWARF'), 
('changed', ['jobs', 'build-linux-clang38', 'steps', 6, 'with', 'name'], 'Linux CLANG38 Artifacts', ['jobs', 'build-linux-clangdwarf', 'steps', 6, 'with', 'name'], 'Linux CLANGDWARF Artifacts')]

Understanding the output of gawd

gawd reports changes as a list of edit operations. Each operation has a kind and describes how one workflow differs from another. The possible kinds are:

  • added
  • removed
  • changed
  • moved
  • renamed

The detection logic follows the rules below.

Moved vs. renamed

moved and renamed are conceptually similar but apply to different YAML structures:

  • Moved refers to an element whose position has changed within a sequence (e.g., a step moved from index 0 to index 1).
  • Renamed refers to an element whose key has changed within a mapping (e.g., a job renamed from build-linux-clang38 to build-linux-clangdwarf).

Both moved and renamed are only detected within the same parent structure:

  • A step moved within the same job may be detected as moved.
  • A step moved from one job to another will not be detected as a move.

Detection is based on a similarity function that considers:

  • The content of the element (keys and values)
  • Its position in a sequence (for moved)
  • Its name/key (for renamed)

Elements that move farther away (e.g., index 0 → 10) are considered less similar than elements that move a short distance (e.g., index 0 → 1).

The sensitivity of this detection can be tuned using CLI parameters:

  • --threshold
  • --position-weight
  • --job-name-weight

Additions and removals

Additions and removals are detected purely based on path presence.

Given two workflows A (old) and B (new):

  • If B contains a path that does not exist in A (excluding cases detected as moves or renames), it is reported as added.
  • If A contains a path that does not exist in B, it is reported as removed.

Changed

A changed operation is reported when:

  • A path exists in both workflows, and
  • The value at that path differs between the two versions.

Note that "same path" is a simplification: gawd may still report a changed operation when an element is also detected as moved or renamed.

As a result, it is possible to observe multiple change kinds for the same logical element, such as:

  • moved + changed
  • renamed + changed

This reflects the fact that the element both changed location/name and changed content.


Contributions

Contributions are very welcome! Feel free to report bugs or suggest new features using GitHub issues and/or pull requests.


License

This tool is distributed under GNU Lesser General Public License v3.


Citing

You can refer to the scientific publication describing this tool using

@inproceedings{gawd2024MSR,
  author = {Rostami Mazrae, Pooya and Decan, Alexandre and Mens, Tom},
  title = {gawd: A Differencing Tool for GitHub Actions Workflows},
  booktitle = {International Conference on Mining Software Repositories (MSR)- Data and Tools Showcase Track},
  doi = {https://doi.org/10.1145/3643991.3644873},
  pages = {682-682},
  year = 2024
}

You can also refer to this tool directly using:

@software{gawd2023,
  author = {Rostami Mazrae, Pooya and Decan, Alexandre},
  title = {gawd: GitHub Actions Workflow Differ},
  url = {https://github.com/sgl-umons/gawd},
  year = 2023,
  institute = {Software Engineering Lab, University of Mons, Belgium}
}

About

gawd is a Python library and command-line tool for computing syntactic differences between two GitHub Actions workflow files.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages