Feature request: ObjectScript (InterSystems IRIS) language support
What problem does this solve?
InterSystems IRIS and Caché (InterSystems IRIS Documentation) are used in healthcare IT, financial services, and large enterprise systems. Codebases built on them — including some of the largest healthcare interoperability platforms — are written in ObjectScript, a language with no existing support in CBM or most other code graph tools.
ObjectScript files (.cls, .mac, .int) are common in organizations that need code graph analysis the most: large legacy systems where understanding call chains and dependency structure is critical and manual.
What we built
We implemented full ObjectScript support covering:
Two file formats
- UDL (
.cls) — the primary class definition format: Class, Method, ClassMethod, Property, Parameter, Index, Trigger (with body text), XData, Storage, Query members all extracted as nodes.
- MAC/INT routines (
.mac/.int) — tag-based subroutine format.
Both use vendored tree-sitter-objectscript grammars compiled into the C pipeline.
Four call dispatch patterns resolved
ObjectScript has dispatch patterns that are structurally invisible to text search — all four are resolved:
##class(Pkg.Class).Method() — explicit cross-class call (standard, resolved from AST)
..Method() — relative-dot self-call. This is how ~80% of intra-class calls are written in ObjectScript. Without it, CALLS analysis is structurally incomplete. Impact on a large (~1,200-class) corpus: CALLS edges increased ~3.5× from a 2-line change.
$$$MacroName — macro expansion. .inc include files define macros that expand to class names and method calls; resolved at index time from a CBMMacroTable built from project .inc files.
- Type inference from
%New/%OpenId and return type declarations — Set obj = ##class(MyApp.Patient).%New() followed by obj.Save() resolves to MyApp.Patient.Save.
Ensemble production topology
InterSystems Ensemble (the IRIS interoperability framework) routes messages between components via string-keyed dispatch in XML configuration — invisible to normal call analysis. We added:
EnsembleItem nodes — one per production component
ROUTES_TO edges — SendRequestSync("Target", msg) resolved to the target class's message handler
Parsed from ProductionDefinition XData blocks at index time. No live IRIS instance required.
WorkMgr parallel queue dispatch
.Queue("##class(X).method") is a string literal in source. We emit CALLS edges from these sites to the target method.
Validation
Tested against a large real-world ObjectScript codebase (~1,200 classes, ~4,150 methods) for scalability and correctness:
| Node type |
Approx. count |
| Class |
~1,200 |
| Method |
~4,150 |
| XData |
~850 |
| Storage |
~320 |
| EnsembleItem |
~275 |
| Index |
~100 |
| CALLS edges |
~3,350 |
| ROUTES_TO edges |
~290 |
All existing CBM tests pass (zero regressions). New tests cover UDL class/method extraction, all four call dispatch patterns, Ensemble topology parsing, and macro expansion.
Implementation scope
Following the infra-pass pattern in CLAUDE.md:
internal/cbm/grammar_objectscript_udl.c / grammar_objectscript_routine.c — grammar shims
internal/cbm/vendored/grammars/objectscript_udl/ and objectscript_routine/ — vendored tree-sitter grammars (MIT licensed, from intersystems/tree-sitter-objectscript)
internal/cbm/lang_specs.c — CBM_LANG_OBJECTSCRIPT_UDL and CBM_LANG_OBJECTSCRIPT_ROUTINE entries
internal/cbm/cbm.h — enum additions
internal/cbm/extract_defs.c, extract_calls.c, extract_imports.c — ObjectScript extraction logic added alongside existing language handlers
tests/test_extraction.c — ~30 new test cases
The oref self-call resolution (..Method()) and the macro expansion pass are implemented as targeted additions to handle_calls() — no structural changes to the pipeline.
Questions for maintainers
- Grammar vendoring: the
objectscript_udl and objectscript_routine grammars are ~2.5MB of generated C each (comparable to existing vendored grammars). They come from intersystems/tree-sitter-objectscript (MIT licensed). Is there a preferred way to vendor these or should they follow the existing pattern in internal/cbm/vendored/grammars/?
- EnsembleItem node label: this is domain-specific (IRIS Interoperability). Would you prefer a more generic label like
ServiceComponent or WorkflowNode, with ensemble_item as a property?
- PR structure: given the size, would you prefer this as one large PR or split — (a) grammar + basic extraction, (b) CALLS resolution, (c) Ensemble topology?
Happy to open the PR when you give the green light, or to share any specific files/diffs for early review.
Proposed solution
The proposed solution of full ObjectScript language support is implemented in a public fork of CBM repo.
Alternatives considered
No response
Confirmations
Feature request: ObjectScript (InterSystems IRIS) language support
What problem does this solve?
InterSystems IRIS and Caché (InterSystems IRIS Documentation) are used in healthcare IT, financial services, and large enterprise systems. Codebases built on them — including some of the largest healthcare interoperability platforms — are written in ObjectScript, a language with no existing support in CBM or most other code graph tools.
ObjectScript files (
.cls,.mac,.int) are common in organizations that need code graph analysis the most: large legacy systems where understanding call chains and dependency structure is critical and manual.What we built
We implemented full ObjectScript support covering:
Two file formats
.cls) — the primary class definition format: Class, Method, ClassMethod, Property, Parameter, Index, Trigger (with body text), XData, Storage, Query members all extracted as nodes..mac/.int) — tag-based subroutine format.Both use vendored
tree-sitter-objectscriptgrammars compiled into the C pipeline.Four call dispatch patterns resolved
ObjectScript has dispatch patterns that are structurally invisible to text search — all four are resolved:
##class(Pkg.Class).Method()— explicit cross-class call (standard, resolved from AST)..Method()— relative-dot self-call. This is how ~80% of intra-class calls are written in ObjectScript. Without it, CALLS analysis is structurally incomplete. Impact on a large (~1,200-class) corpus: CALLS edges increased ~3.5× from a 2-line change.$$$MacroName— macro expansion..incinclude files define macros that expand to class names and method calls; resolved at index time from a CBMMacroTable built from project.incfiles.%New/%OpenIdand return type declarations —Set obj = ##class(MyApp.Patient).%New()followed byobj.Save()resolves toMyApp.Patient.Save.Ensemble production topology
InterSystems Ensemble (the IRIS interoperability framework) routes messages between components via string-keyed dispatch in XML configuration — invisible to normal call analysis. We added:
EnsembleItemnodes — one per production componentROUTES_TOedges —SendRequestSync("Target", msg)resolved to the target class's message handlerParsed from
ProductionDefinitionXData blocks at index time. No live IRIS instance required.WorkMgr parallel queue dispatch
.Queue("##class(X).method")is a string literal in source. We emit CALLS edges from these sites to the target method.Validation
Tested against a large real-world ObjectScript codebase (~1,200 classes, ~4,150 methods) for scalability and correctness:
All existing CBM tests pass (zero regressions). New tests cover UDL class/method extraction, all four call dispatch patterns, Ensemble topology parsing, and macro expansion.
Implementation scope
Following the infra-pass pattern in CLAUDE.md:
internal/cbm/grammar_objectscript_udl.c/grammar_objectscript_routine.c— grammar shimsinternal/cbm/vendored/grammars/objectscript_udl/andobjectscript_routine/— vendored tree-sitter grammars (MIT licensed, from intersystems/tree-sitter-objectscript)internal/cbm/lang_specs.c—CBM_LANG_OBJECTSCRIPT_UDLandCBM_LANG_OBJECTSCRIPT_ROUTINEentriesinternal/cbm/cbm.h— enum additionsinternal/cbm/extract_defs.c,extract_calls.c,extract_imports.c— ObjectScript extraction logic added alongside existing language handlerstests/test_extraction.c— ~30 new test casesThe oref self-call resolution (
..Method()) and the macro expansion pass are implemented as targeted additions tohandle_calls()— no structural changes to the pipeline.Questions for maintainers
objectscript_udlandobjectscript_routinegrammars are ~2.5MB of generated C each (comparable to existing vendored grammars). They come from intersystems/tree-sitter-objectscript (MIT licensed). Is there a preferred way to vendor these or should they follow the existing pattern ininternal/cbm/vendored/grammars/?ServiceComponentorWorkflowNode, withensemble_itemas a property?Happy to open the PR when you give the green light, or to share any specific files/diffs for early review.
Proposed solution
The proposed solution of full ObjectScript language support is implemented in a public fork of CBM repo.
Alternatives considered
No response
Confirmations