feat(bash): add .ksh extension to bash parser (#235)#236
Open
azizur100389 wants to merge 1 commit intotirth8205:mainfrom
Open
feat(bash): add .ksh extension to bash parser (#235)#236azizur100389 wants to merge 1 commit intotirth8205:mainfrom
azizur100389 wants to merge 1 commit intotirth8205:mainfrom
Conversation
Register .ksh (Korn shell) with tree-sitter-bash alongside the existing .sh / .bash / .zsh entries added in tirth8205#227. Korn shell is close enough to bash syntactically that tree-sitter-bash handles the structural features the graph captures (function definitions, commands, source/. includes) correctly. Context ------- In the close comment on PR tirth8205#230, @tirth8205 explicitly flagged .ksh as a missing extension: "The .ksh extension in particular looks worth adding — I didn't include it in tirth8205#227." This PR addresses exactly that gap. Issue tirth8205#235 tracks the request. Why it matters -------------- Korn shell is still used in legacy AIX/Solaris operations, IBM internal tooling, and enterprise CI scripts. Repositories that ship .ksh scripts currently index to 0 nodes because the extension is unrecognized — the same failure mode that motivated tirth8205#197. Implementation -------------- One line added to EXTENSION_TO_LANGUAGE in parser.py: ".ksh": "bash" All of the bash parsing machinery shipped in tirth8205#227 (_FUNCTION_TYPES, _CALL_TYPES, _extract_bash_source_command, name/call resolution) already supports any file parsed through the "bash" language path, so no further changes are needed. Tests added (tests/test_multilang.py::TestBashParsing) ------------------------------------------------------ 1. test_detects_language — extended with a .ksh assertion to lock in the extension mapping (regression guard for tirth8205#235). 2. test_ksh_extension_parses_as_bash — end-to-end regression test that copies the existing tests/fixtures/sample.sh to a temp .ksh file, parses it through the real CodeParser, and asserts: - every node's language field is "bash" - the set of extracted Function names is identical to the .sh run - the CONTAINS / CALLS / IMPORTS_FROM edge counts per kind match The second assertion proves the .ksh path is fully wired through to the same structural extraction as .sh, not a degenerate zero-result read. Test results ------------ Stage 1 (new targeted tests): 2/2 passed. Stage 2 (tests/test_multilang.py full): 152/152 passed — zero regressions across any language. Stage 3 (tests/test_parser.py adjacent): 67/67 passed. Stage 4 (full suite): 733 passed. 8 pre-existing Windows failures in test_incremental (3) + test_main async coroutine detection (1) + test_notebook Databricks (4) — verified identical on unchanged main. Stage 5 (ruff check on parser.py and test_multilang.py): clean. Stage 6 (end-to-end smoke): detect_language("legacy.ksh") -> "bash"; parsing a real .ksh file produces 6 Function nodes, 18 edges, all tagged language=bash. Zero regressions. Single-line extension mapping change plus a targeted regression guard against the specific issue the maintainer flagged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Register
.ksh(Korn shell) withtree-sitter-bashalongside the existing.sh/.bash/.zshentries shipped in #227. Korn shell is close enough to Bash syntactically thattree-sitter-bashhandles the structural features the graph captures (function definitions, commands,source/.includes) correctly.Closes #235.
Why this PR
In the close comment on #230 you explicitly flagged this as worth a follow-up:
This PR is exactly that: the tracking issue (#235) plus the minimal change to address it.
Why it matters
Korn shell is still used in legacy AIX/Solaris operations, IBM internal tooling, and enterprise CI scripts. Repositories that ship
.kshscripts currently index to0 nodesbecause the extension is unrecognized — the same failure mode that motivated #197.Implementation
A single line added to
EXTENSION_TO_LANGUAGEincode_review_graph/parser.py:All of the bash parsing machinery shipped in #227 (
_FUNCTION_TYPES,_CALL_TYPES,_extract_bash_source_command, name/call resolution) already handles any file routed through the"bash"language path — so no further wiring is needed.Tests added (
tests/test_multilang.py::TestBashParsing)test_detects_language— extended with a.kshassertion to lock the mapping in as a regression guard.test_ksh_extension_parses_as_bash— end-to-end regression test that copiestests/fixtures/sample.shto a templegacy.ksh, parses it throughCodeParser.parse_file(), and asserts:languagefield is"bash"Functionnames is identical to the.shrunCONTAINS/CALLS/IMPORTS_FROMedge counts per kind matchThe second assertion specifically proves
.kshis fully wired through the same structural extraction path as.sh, not a degenerate zero-result read.Test results
tests/test_multilang.pyfulltests/test_parser.pytest_incrementalx3 +test_mainasync detection x1 +test_notebookDatabricks x4) — verified identical on unchangedmainruff checkparser.pyandtest_multilang.pydetect_language("legacy.ksh")→"bash"; parsing a real.kshfile produces 6Functionnodes, 18 edges, all taggedlanguage=bashZero regressions. Single-line extension mapping change plus a targeted regression guard.