Skip to content

feat(bash): add .ksh extension to bash parser (#235)#236

Open
azizur100389 wants to merge 1 commit intotirth8205:mainfrom
azizur100389:feat/bash-ksh-extension
Open

feat(bash): add .ksh extension to bash parser (#235)#236
azizur100389 wants to merge 1 commit intotirth8205:mainfrom
azizur100389:feat/bash-ksh-extension

Conversation

@azizur100389
Copy link
Copy Markdown
Contributor

Summary

Register .ksh (Korn shell) with tree-sitter-bash alongside the existing .sh / .bash / .zsh entries shipped in #227. Korn shell is close enough to Bash syntactically that tree-sitter-bash handles the structural features the graph captures (function definitions, commands, source/. includes) correctly.

Closes #235.

Why this PR

In the close comment on #230 you explicitly flagged this as worth a follow-up:

If there's anything your implementation does that mine doesn't (e.g. .ksh extension, additional test coverage), please open a follow-up issue pointing at the specific feature and I'll merge it in. The .ksh extension in particular looks worth adding — I didn't include it in #227.

This PR is exactly that: the tracking issue (#235) plus the minimal change to address it.

Why it matters

Korn shell is still used in legacy AIX/Solaris operations, IBM internal tooling, and enterprise CI scripts. Repositories that ship .ksh scripts currently index to 0 nodes because the extension is unrecognized — the same failure mode that motivated #197.

Implementation

A single line added to EXTENSION_TO_LANGUAGE in code_review_graph/parser.py:

".ksh": "bash",  # Korn shell — close enough to bash for tree-sitter-bash (#235)

All of the bash parsing machinery shipped in #227 (_FUNCTION_TYPES, _CALL_TYPES, _extract_bash_source_command, name/call resolution) already handles any file routed through the "bash" language path — so no further wiring is needed.

Tests added (tests/test_multilang.py::TestBashParsing)

  1. test_detects_language — extended with a .ksh assertion to lock the mapping in as a regression guard.
  2. test_ksh_extension_parses_as_bash — end-to-end regression test that copies tests/fixtures/sample.sh to a temp legacy.ksh, parses it through CodeParser.parse_file(), and asserts:
    • Every node's language field is "bash"
    • The set of extracted Function names is identical to the .sh run
    • The CONTAINS / CALLS / IMPORTS_FROM edge counts per kind match

The second assertion specifically proves .ksh is fully wired through the same structural extraction path as .sh, not a degenerate zero-result read.

Test results

Stage Result
Stage 1 — new targeted tests 2/2 passed
Stage 2 — tests/test_multilang.py full 152/152 passed — zero regressions
Stage 3 — adjacent tests/test_parser.py 67/67 passed
Stage 4 — full suite 733 passed, 8 pre-existing Windows failures (test_incremental x3 + test_main async detection x1 + test_notebook Databricks x4) — verified identical on unchanged main
Stage 5 — ruff check clean on parser.py and test_multilang.py
Stage 6 — end-to-end smoke detect_language("legacy.ksh")"bash"; parsing a real .ksh file produces 6 Function nodes, 18 edges, all tagged language=bash

Zero regressions. Single-line extension mapping change plus a targeted regression guard.

Register .ksh (Korn shell) with tree-sitter-bash alongside the existing
.sh / .bash / .zsh entries added in tirth8205#227. Korn shell is close enough to
bash syntactically that tree-sitter-bash handles the structural features
the graph captures (function definitions, commands, source/. includes)
correctly.

Context
-------
In the close comment on PR tirth8205#230, @tirth8205 explicitly flagged .ksh as a
missing extension:

    "The .ksh extension in particular looks worth adding — I didn't
     include it in tirth8205#227."

This PR addresses exactly that gap. Issue tirth8205#235 tracks the request.

Why it matters
--------------
Korn shell is still used in legacy AIX/Solaris operations, IBM internal
tooling, and enterprise CI scripts. Repositories that ship .ksh scripts
currently index to 0 nodes because the extension is unrecognized — the
same failure mode that motivated tirth8205#197.

Implementation
--------------
One line added to EXTENSION_TO_LANGUAGE in parser.py:
    ".ksh": "bash"

All of the bash parsing machinery shipped in tirth8205#227 (_FUNCTION_TYPES,
_CALL_TYPES, _extract_bash_source_command, name/call resolution) already
supports any file parsed through the "bash" language path, so no further
changes are needed.

Tests added (tests/test_multilang.py::TestBashParsing)
------------------------------------------------------
1. test_detects_language — extended with a .ksh assertion to lock in
   the extension mapping (regression guard for tirth8205#235).
2. test_ksh_extension_parses_as_bash — end-to-end regression test that
   copies the existing tests/fixtures/sample.sh to a temp .ksh file,
   parses it through the real CodeParser, and asserts:
     - every node's language field is "bash"
     - the set of extracted Function names is identical to the .sh run
     - the CONTAINS / CALLS / IMPORTS_FROM edge counts per kind match
   The second assertion proves the .ksh path is fully wired through to
   the same structural extraction as .sh, not a degenerate zero-result
   read.

Test results
------------
Stage 1 (new targeted tests): 2/2 passed.
Stage 2 (tests/test_multilang.py full): 152/152 passed — zero regressions
  across any language.
Stage 3 (tests/test_parser.py adjacent): 67/67 passed.
Stage 4 (full suite): 733 passed. 8 pre-existing Windows failures in
  test_incremental (3) + test_main async coroutine detection (1) +
  test_notebook Databricks (4) — verified identical on unchanged main.
Stage 5 (ruff check on parser.py and test_multilang.py): clean.
Stage 6 (end-to-end smoke): detect_language("legacy.ksh") -> "bash";
  parsing a real .ksh file produces 6 Function nodes, 18 edges, all
  tagged language=bash.

Zero regressions. Single-line extension mapping change plus a targeted
regression guard against the specific issue the maintainer flagged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(bash): add .ksh extension to bash parser

1 participant