Skip to content

Apply Avocado stub renaming to client code (i.e., code under verification)#124

Open
jyoo980 wants to merge 53 commits intomainfrom
yoo/avocado-stubs-in-client-code
Open

Apply Avocado stub renaming to client code (i.e., code under verification)#124
jyoo980 wants to merge 53 commits intomainfrom
yoo/avocado-stubs-in-client-code

Conversation

@jyoo980
Copy link
Collaborator

@jyoo980 jyoo980 commented Mar 8, 2026

Merge after #123

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
verification/cbmc_stubs/pthread_lib.c (1)

85-85: ⚠️ Potential issue | 🔴 Critical

Inconsistent callback function references.

The callback functions pthread_mutex_cleanup and pthread_rwlock_cleanup were renamed to _avocado_pthread_mutex_cleanup (line 67) and _avocado_pthread_rwlock_cleanup (line 433), but their references in __CPROVER_cleanup calls were not updated:

  • Line 85: __CPROVER_cleanup(mutex, pthread_mutex_cleanup) should reference _avocado_pthread_mutex_cleanup
  • Line 450: __CPROVER_cleanup(lock, pthread_rwlock_cleanup) should reference _avocado_pthread_rwlock_cleanup

This will cause linker errors or incorrect behavior.

🐛 Proposed fix
-  __CPROVER_cleanup(mutex, pthread_mutex_cleanup);
+  __CPROVER_cleanup(mutex, _avocado_pthread_mutex_cleanup);
-  __CPROVER_cleanup(lock, pthread_rwlock_cleanup);
+  __CPROVER_cleanup(lock, _avocado_pthread_rwlock_cleanup);

Also applies to: 450-450

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_stubs/pthread_lib.c` at line 85, Update the
__CPROVER_cleanup calls to use the renamed cleanup functions: replace references
to pthread_mutex_cleanup with _avocado_pthread_mutex_cleanup for the mutex
cleanup and replace pthread_rwlock_cleanup with _avocado_pthread_rwlock_cleanup
for the rwlock cleanup so the cleanup callbacks match the actual function names
(_avocado_pthread_mutex_cleanup and _avocado_pthread_rwlock_cleanup) and avoid
linker/behavior errors.
verification/cbmc_stubs/unistd.c (1)

275-304: ⚠️ Potential issue | 🔴 Critical

Handle str_length == 0 before the end-of-buffer probe.

When error is true, str_length is only constrained to <= nbyte, allowing it to be zero. The code then evaluates buf + str_length - 1 (which underflows to buf - 1) and declares a zero-length VLA, both undefined behavior.

Suggested fix
     if(nbyte>0)
     {
       size_type str_length=__VERIFIER_nondet_size_type();
       __CPROVER_assume(error ? str_length<=nbyte : str_length==nbyte);
-      // check that the memory is accessible
-      (void)*(char *)buf;
-      (void)*(((const char *)buf) + str_length - 1);
-      char contents_nondet[str_length];
-      __CPROVER_array_replace((char*)buf, contents_nondet);
+      if(str_length > 0)
+      {
+        // check that the memory is accessible
+        (void)*(char *)buf;
+        (void)*(((const char *)buf) + str_length - 1);
+        char contents_nondet[str_length];
+        __CPROVER_array_replace((char*)buf, contents_nondet);
+      }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_stubs/unistd.c` around lines 275 - 304, In _avocado_read,
guard the end-of-buffer probe and VLA creation against str_length==0: after
computing and assuming str_length, add a branch so that if(str_length>0) you
perform the end probe ((void)*(((const char*)buf)+str_length-1)) and declare the
VLA/array_replace for size str_length, otherwise skip the end probe and avoid
declaring a zero-length VLA (you may still touch (void)*(char*)buf as already
done); this prevents underflow when str_length==0 while keeping the existing
assumptions about nbyte, error, and str_length.
verification/cbmc_stubs/semaphore.c (1)

121-134: ⚠️ Potential issue | 🟠 Major

Populate *sval before returning success.

_avocado_sem_getvalue returns 0 but leaves the out-parameter untouched. Callers that branch on success can then read an unconstrained value instead of a modeled semaphore count.

Suggested fix
 int _avocado_sem_getvalue(sem_t *sem, int *sval)
 {
   __CPROVER_HIDE:;
   (void)sem;
-  (void)sval;
+  if(sval != 0)
+    *sval = __VERIFIER_nondet_int();

Add one forward declaration near the top of the file:

int __VERIFIER_nondet_int(void);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_stubs/semaphore.c` around lines 121 - 134, The function
_avocado_sem_getvalue returns success but never sets the out-parameter, so
populate *sval with a nondeterministic integer before returning 0; add a forward
declaration for int __VERIFIER_nondet_int(void) near the top of the file if not
already present and then in _avocado_sem_getvalue assign *sval =
__VERIFIER_nondet_int() (optionally constrain it to non-negative if desired)
prior to the return.
verification/cbmc_stubs/stdio.c (2)

121-126: ⚠️ Potential issue | 🔴 Critical

Missing rename: fclose_cleanup should be _avocado_fclose_cleanup.

The __CPROVER_cleanup call passes the old function name fclose_cleanup, but the function was renamed to _avocado_fclose_cleanup at line 52. This will cause the cleanup callback to reference a non-existent function.

🐛 Proposed fix
-  __CPROVER_cleanup(fopen_result, fclose_cleanup);
+  __CPROVER_cleanup(fopen_result, _avocado_fclose_cleanup);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_stubs/stdio.c` around lines 121 - 126, The
__CPROVER_cleanup call is referencing the old function name fclose_cleanup which
was renamed; update the cleanup registration to use the new function name
_avocado_fclose_cleanup so the callback matches the actual cleanup function
(change __CPROVER_cleanup(fopen_result, fclose_cleanup) to
__CPROVER_cleanup(fopen_result, _avocado_fclose_cleanup)), keeping the
surrounding logic that sets fopen_result (fopen_error, _avocado_malloc, FILE)
intact.

172-175: ⚠️ Potential issue | 🔴 Critical

Missing rename: fclose_cleanup should be _avocado_fclose_cleanup.

Same issue as in _avocado__fopen - the cleanup callback references the old function name.

🐛 Proposed fix
-  __CPROVER_cleanup(fopen_result, fclose_cleanup);
+  __CPROVER_cleanup(fopen_result, _avocado_fclose_cleanup);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_stubs/stdio.c` around lines 172 - 175, The cleanup callback
reference is using the old name fclose_cleanup; update the __CPROVER_cleanup
call to use the renamed function _avocado_fclose_cleanup so the cleanup matches
the actual implementation (affecting the block that sets
__CPROVER_set_must(fopen_result, "open") and the subsequent
__CPROVER_cleanup(fopen_result, ...)). Ensure the symbol used in
__CPROVER_cleanup exactly matches _avocado_fclose_cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/data/avocado_stub/test_renaming_no_existing_avocado_names.c`:
- Around line 1-2: Add the missing `#include` <string.h> to the include block at
the top of test_renaming_no_existing_avocado_names.c so calls to strchr are
resolved; update the header list alongside <stdlib.h> and <ctype.h> to ensure
the Avocado renaming tool (and the C compiler) can find the strchr declaration
when parsing or compiling the test.

In `@verification/avocado_stub_util.py`:
- Around line 64-82: The current rename_ansi_identifiers_to_avocado_identifiers
routine performs blind regex replacements which touches declarations, macros,
comments and string literals; change it to use a parsed AST/token-driven
approach so only actual identifier/call sites are rewritten. Specifically,
replace the global re.sub loop over self._ansi_id_to_rename_data in
rename_ansi_identifiers_to_avocado_identifiers with logic that (a) parses
file_content into tokens or an AST (e.g., Python's ast/tokenize or a C parser
used elsewhere), (b) walks nodes to locate identifier and call-expression nodes
only, (c) when an identifier node matches any original_identifier from
self._ansi_id_to_rename_data, emit the renamed_identifier for that token/node
while leaving comments, strings and declarations intact, and (d) reconstruct the
source from the modified tokens/AST so only real call/identifier occurrences are
changed. Ensure you reference rename_metadata.renamed_identifier and
original_identifier during the node check so the mapping remains the same.
- Around line 55-62: The constructor AvocadoIdentifierRenamer.__init__ currently
uses pickle.load on DEFAULT_STUB_MAPPINGS which risks code execution; replace
pickle serialization with safe JSON: change DEFAULT_STUB_MAPPINGS to a .json
filename, modify generate_avocado_stubs.py to write JSON (serialize Path objects
as strings and only include the string fields of RenameData), and change
AvocadoIdentifierRenamer.__init__ to use
json.load(path_to_stub_mappings.open("r")) and reconstruct RenameData objects
(or cast dicts) from the loaded JSON, ensuring any Path fields are re-created
with Path(...) and types match the original RenameData shape.

In `@verification/cbmc_stubs/inet.c`:
- Around line 195-200: The declarations for the renamed byte-swap helpers
(_avocado___builtin_bswap16 and _avocado___builtin_bswap32) are left without
definitions causing unresolved externals; either implement those two functions
(e.g., provide definitions for _avocado___builtin_bswap16 and
_avocado___builtin_bswap32 that perform 16-/32-bit byte swaps) or revert the
hton/ntoh wrappers (e.g., _avocado_htonl, _avocado_htons, _avocado_ntohl,
_avocado_ntohs) to call the compiler builtins (__builtin_bswap16 and
__builtin_bswap32) directly so no external symbols are required. Ensure you
update all references to the renamed symbols consistently.

In `@verification/cbmc_stubs/math.c`:
- Around line 2122-2125: The long-double functions are calling double-only
classifiers which misclassify large finite long double values; replace the
double variants with the long-double variants: in _avocado_copysignl replace
_avocado_signbit(y) with _avocado_signbitl(y); in _avocado_fmaxl and
_avocado_fminl replace _avocado_isnan(g) with _avocado_isnanl(g); and in
_avocado___builtin_powil replace _avocado_isinf(x) and _avocado_isnan(x) with
_avocado_isinfl(x) and _avocado_isnanl(x) respectively so the classifiers
operate on long double inputs.

In `@verification/cbmc_stubs/pthread_lib.c`:
- Around line 301-302: The C keyword "void" was incorrectly renamed to
"_avocado_void" in several thread-local function-pointer declarations; locate
the declarations for __CPROVER_thread_key_dtors (and any other occurrences of
"_avocado_void" in pthread_lib.c, e.g., the thread-local blocks around those
declarations) and restore the correct keyword by replacing "_avocado_void" with
"void" so the signatures read like: __CPROVER_thread_local void
(*__CPROVER_thread_key_dtors[...])(void *); ensuring the function-pointer types
and parameter lists remain unchanged.
- Line 662: The local variable was mistakenly renamed to
_avocado_thread_key_dtors; revert this declaration to use the original local
name thread_key_dtors and keep the initializer __CPROVER_thread_key_dtors as-is,
i.e. replace _avocado_thread_key_dtors with thread_key_dtors in the
pthread_lib.c declaration, and update the rename script so it only targets
function names/calls (not local variable declarations) to prevent future
replacements of symbols like thread_key_dtors.

In `@verification/cbmc_stubs/semaphore.c`:
- Line 5: Rename all externally linked functions and symbols that start with the
reserved "_avocado_" prefix to a non-reserved prefix (for example "avocado_" or
another project-specific prefix) in semaphore.c (e.g., change _avocado_sem_init
to avocado_sem_init) and update all corresponding declarations/usages and any
generated stubs that reference those symbols; regenerate the stubs so the new
non-underscored prefix is used consistently to avoid ISO C reserved-identifier
collisions.

In `@verification/cbmc_stubs/threads.c`:
- Around line 10-12: Several thread stub functions (e.g., _avocado_thrd_create)
have non-void signatures but empty bodies; update each such stub to either
return a deterministic value of the correct type or mark the API as unsupported
using __CPROVER_assume(0) like _avocado_thrd_exit does. For each named stub
(including _avocado_thrd_create and the other renamed non-void thrd_* stubs
referenced in the comment), add a single return with an appropriate
constant/status value for the return type or replace the body with
__CPROVER_assume(0) to indicate unsupported behavior, ensuring no non-void
function ends without an explicit return.

In `@verification/cbmc_verification_client.py`:
- Around line 150-154: The command string construction that uses
replace_call_with_contract_args can produce an extra leading space when
replace_call_with_contract_args is empty; update the construction in
verification/cbmc_verification_client.py (where replace_call_with_contract_args
and function_name are used to build the goto-instrument command) to avoid the
stray space — e.g., build the command from a list of parts and join with spaces
or conditionally include replace_call_with_contract_args, or simply strip() the
final string so the command has no leading space before --enforce-contract.
- Around line 134-143: The command-building for headers currently builds
AVOCADO_STUB_DIR-prefixed header paths even when
verification_input.get_headers() is empty, producing extra spaces; change the
header handling to mirror the replace_call_with_contract_args pattern by setting
header_args to an empty string when verification_input.get_headers() is falsy
and otherwise to a space-joined string of f"{AVOCADO_STUB_DIR}/{header_name}"
for each header_name, then use header_args in the goto-cc command construction
alongside function_name and path_to_file_to_verify so the generated command has
no extra whitespace when there are no headers.

In `@verification/verification_input.py`:
- Around line 8-9: The HEADER_IMPORT_PATTERN is too permissive (uses greedy .*),
so replace it with a stricter regex that anchors the line start, allows optional
whitespace, matches the literal include token, and requires properly balanced
angle brackets (e.g. use a pattern like ^\s*#\s*include\s*\<[^>]+\> to capture
the header name); update HEADER_IMPORT_PATTERN accordingly (and keep
ANGLE_DELIMITERS as needed or remove if redundant), and if you rely on
multi-line input ensure you compile or use re.MULTILINE when matching.

---

Outside diff comments:
In `@verification/cbmc_stubs/pthread_lib.c`:
- Line 85: Update the __CPROVER_cleanup calls to use the renamed cleanup
functions: replace references to pthread_mutex_cleanup with
_avocado_pthread_mutex_cleanup for the mutex cleanup and replace
pthread_rwlock_cleanup with _avocado_pthread_rwlock_cleanup for the rwlock
cleanup so the cleanup callbacks match the actual function names
(_avocado_pthread_mutex_cleanup and _avocado_pthread_rwlock_cleanup) and avoid
linker/behavior errors.

In `@verification/cbmc_stubs/semaphore.c`:
- Around line 121-134: The function _avocado_sem_getvalue returns success but
never sets the out-parameter, so populate *sval with a nondeterministic integer
before returning 0; add a forward declaration for int
__VERIFIER_nondet_int(void) near the top of the file if not already present and
then in _avocado_sem_getvalue assign *sval = __VERIFIER_nondet_int() (optionally
constrain it to non-negative if desired) prior to the return.

In `@verification/cbmc_stubs/stdio.c`:
- Around line 121-126: The __CPROVER_cleanup call is referencing the old
function name fclose_cleanup which was renamed; update the cleanup registration
to use the new function name _avocado_fclose_cleanup so the callback matches the
actual cleanup function (change __CPROVER_cleanup(fopen_result, fclose_cleanup)
to __CPROVER_cleanup(fopen_result, _avocado_fclose_cleanup)), keeping the
surrounding logic that sets fopen_result (fopen_error, _avocado_malloc, FILE)
intact.
- Around line 172-175: The cleanup callback reference is using the old name
fclose_cleanup; update the __CPROVER_cleanup call to use the renamed function
_avocado_fclose_cleanup so the cleanup matches the actual implementation
(affecting the block that sets __CPROVER_set_must(fopen_result, "open") and the
subsequent __CPROVER_cleanup(fopen_result, ...)). Ensure the symbol used in
__CPROVER_cleanup exactly matches _avocado_fclose_cleanup.

In `@verification/cbmc_stubs/unistd.c`:
- Around line 275-304: In _avocado_read, guard the end-of-buffer probe and VLA
creation against str_length==0: after computing and assuming str_length, add a
branch so that if(str_length>0) you perform the end probe ((void)*(((const
char*)buf)+str_length-1)) and declare the VLA/array_replace for size str_length,
otherwise skip the end probe and avoid declaring a zero-length VLA (you may
still touch (void)*(char*)buf as already done); this prevents underflow when
str_length==0 while keeping the existing assumptions about nbyte, error, and
str_length.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5c98f7f8-00a1-4dff-9bcc-c92f66048b06

📥 Commits

Reviewing files that changed from the base of the PR and between f577a5f and 0dcebb4.

⛔ Files ignored due to path filters (1)
  • verification/cbmc_stubs/c_stub_rename_map.pkl is excluded by !**/*.pkl
📒 Files selected for processing (45)
  • generate_avocado_stubs.py
  • run.sh
  • test/data/avocado_stub/test_header_detection.c
  • test/data/avocado_stub/test_renaming_existing_avocado_names.c
  • test/data/avocado_stub/test_renaming_no_existing_avocado_names.c
  • test/verification/test_avocado_stub_util.py
  • test/verification/test_verification_input.py
  • util/__init__.py
  • util/tree_sitter_util.py
  • verification/__init__.py
  • verification/avocado_stub_util.py
  • verification/cbmc_stubs/cprover_contracts.c
  • verification/cbmc_stubs/ctype.c
  • verification/cbmc_stubs/err.c
  • verification/cbmc_stubs/errno.c
  • verification/cbmc_stubs/fcntl.c
  • verification/cbmc_stubs/fenv.c
  • verification/cbmc_stubs/float.c
  • verification/cbmc_stubs/gcc.c
  • verification/cbmc_stubs/getopt.c
  • verification/cbmc_stubs/inet.c
  • verification/cbmc_stubs/intrin.c
  • verification/cbmc_stubs/locale.c
  • verification/cbmc_stubs/math.c
  • verification/cbmc_stubs/mman.c
  • verification/cbmc_stubs/netdb.c
  • verification/cbmc_stubs/process.c
  • verification/cbmc_stubs/pthread_lib.c
  • verification/cbmc_stubs/pwd.c
  • verification/cbmc_stubs/random.c
  • verification/cbmc_stubs/semaphore.c
  • verification/cbmc_stubs/setjmp.c
  • verification/cbmc_stubs/signal.c
  • verification/cbmc_stubs/stdio.c
  • verification/cbmc_stubs/stdlib.c
  • verification/cbmc_stubs/string.c
  • verification/cbmc_stubs/strings.c
  • verification/cbmc_stubs/syslog.c
  • verification/cbmc_stubs/threads.c
  • verification/cbmc_stubs/time.c
  • verification/cbmc_stubs/unistd.c
  • verification/cbmc_stubs/windows.c
  • verification/cbmc_stubs/x86_assembler.c
  • verification/cbmc_verification_client.py
  • verification/verification_input.py

#include <semaphore.h>

int sem_init(sem_t *sem, int pshared, unsigned int value)
int _avocado_sem_init(sem_t *sem, int pshared, unsigned int value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In ISO C, are file-scope or externally linked identifiers that begin with a single underscore reserved for the implementation?

💡 Result:

Yes.

ISO C reserves all identifiers that begin with a single underscore (_) for use as identifiers with file scope (in both the ordinary and tag name spaces). Since any identifier with external linkage necessarily has file scope, an externally linked name like _foo is also reserved. [1]

(Separately, names beginning with __ or with _ followed by an uppercase letter are reserved for any use, not just file scope.) [1]

Sources: [1], [2]


🏁 Script executed:

# Check the actual declarations and their linkage in semaphore.c
cat -n verification/cbmc_stubs/semaphore.c | head -150

Repository: vikramnitin9/rust_verification

Length of output: 4484


Use a non-reserved prefix for the generated Avocado symbols.

All externally linked functions in this file start with _avocado_, which violates ISO C identifier-reservation rules. In ISO C, identifiers beginning with a single underscore are reserved for the implementation in file scope (and external linkage is within file scope). Switching to a non-reserved prefix prevents collisions with compiler and libc internals. Regenerate the stubs with the updated naming scheme.

🧰 Tools
🪛 Clang (14.0.6)

[warning] 5-5: declaration uses identifier '_avocado_sem_init', which is reserved in the global namespace

(bugprone-reserved-identifier)


[warning] 5-5: 2 adjacent parameters of '_avocado_sem_init' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters)


[note] 5-5: the first parameter in the range is 'pshared'

(clang)


[note] 5-5: the last parameter in the range is 'value'

(clang)


[note] 5-5: 'int' and 'unsigned int' may be implicitly converted

(clang)

🪛 Cppcheck (2.19.0)

[style] 5-5: The function '_avocado_sem_init' is never used.

(unusedFunction)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_stubs/semaphore.c` at line 5, Rename all externally linked
functions and symbols that start with the reserved "_avocado_" prefix to a
non-reserved prefix (for example "avocado_" or another project-specific prefix)
in semaphore.c (e.g., change _avocado_sem_init to avocado_sem_init) and update
all corresponding declarations/usages and any generated stubs that reference
those symbols; regenerate the stubs so the new non-underscored prefix is used
consistently to avoid ISO C reserved-identifier collisions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
verification/avocado_stub_util.py (2)

94-103: ⚠️ Potential issue | 🔴 Critical

Replace the repository-controlled pickle with a data-only format.

AvocadoIdentifierRenamer() eagerly deserializes a checked-in .pkl during verification. On untrusted PRs, replacing that file is enough to execute arbitrary Python before any verification happens. Please switch the generated rename map to JSON (or another inert format) and validate the decoded schema before constructing RenameData. Ruff already flags this as S301.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 94 - 103, The constructor of
AvocadoIdentifierRenamer currently uses pickle (pkl.load) to deserialize a
repository-controlled .pkl (DEFAULT_STUB_MAPPINGS) which is unsafe; replace the
pickle path by reading an inert data format (e.g., JSON) from
DEFAULT_STUB_MAPPINGS (change extension and loading to json.load) in __init__,
then explicitly validate the decoded structure/schema (keys, value shapes,
types) before constructing or casting into RenameData entries and assigning
_ansi_id_to_rename_data; remove any use of pkl.load and ensure errors raise a
clear ValueError if schema validation fails rather than executing arbitrary code
during deserialization.

116-127: ⚠️ Potential issue | 🟠 Major

Restrict renaming to parsed identifier/call nodes.

This pass rewrites every textual \bname\b match in the file, so it will also rename comments, string literals, macro bodies, and user declarations. That is broader than stub-call rewriting and can silently change client semantics. Use the C AST/token spans to rewrite only actual identifier/call sites.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 116 - 127, The current loop
in which you perform regex replacement using name_to_replace_pattern and re.subn
rewrites every textual match (comments, strings, macros) instead of limiting to
actual parsed identifier or call sites; modify the logic to use the C AST/token
spans rather than regex: for each original_identifier in
self._ansi_id_to_rename_data iterate the parsed tokens/AST nodes for that file
and collect only token spans whose type is an identifier or call-site (not
comments/strings/macro bodies), then perform replacements by applying
rename_metadata.renamed_identifier to those spans (apply edits in reverse order
to avoid offset shifts) and add rename_metadata.file_path to
avocado_stub_files_to_use only when at least one AST/token replacement occurred;
return the same RenameResult(file_content_with_renaming,
list(avocado_stub_files_to_use)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@verification/cbmc_verification_client.py`:
- Around line 72-80: The rewritten translation unit is losing non-default
includes because include_header_lines is built only from
rename_result.get_headers() and
CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION; update the include
collection to also incorporate headers from the original VerificationInput
(vinput.get_headers() or vinput.headers) so that
file_util.ensure_lines_at_beginning(...) prepends the union of vinput headers,
rename_result.get_headers(), and DEFAULT_HEADERS_FOR_VERIFICATION (preserving
order/uniqueness) before calling _get_cbmc_verification_command.
- Around line 79-83: The subprocess invocation currently passes a shell string
from _get_cbmc_verification_command and runs with shell=True, which is unsafe
given untrusted stub paths from rename_result.get_stub_file_paths(); refactor
_get_cbmc_verification_command to build and return a list (or list of lists if
multiple commands) of argv-style arguments (including each stub path as a
separate element) instead of a single shell string, then change the call site
where result = subprocess.run(...) to call subprocess.run(cmd_list,
capture_output=True, text=True, shell=False) (or iterate and run each list if
multiple commands), ensuring all path interpolations are added as separate list
elements so no shell interpretation occurs.

---

Duplicate comments:
In `@verification/avocado_stub_util.py`:
- Around line 94-103: The constructor of AvocadoIdentifierRenamer currently uses
pickle (pkl.load) to deserialize a repository-controlled .pkl
(DEFAULT_STUB_MAPPINGS) which is unsafe; replace the pickle path by reading an
inert data format (e.g., JSON) from DEFAULT_STUB_MAPPINGS (change extension and
loading to json.load) in __init__, then explicitly validate the decoded
structure/schema (keys, value shapes, types) before constructing or casting into
RenameData entries and assigning _ansi_id_to_rename_data; remove any use of
pkl.load and ensure errors raise a clear ValueError if schema validation fails
rather than executing arbitrary code during deserialization.
- Around line 116-127: The current loop in which you perform regex replacement
using name_to_replace_pattern and re.subn rewrites every textual match
(comments, strings, macros) instead of limiting to actual parsed identifier or
call sites; modify the logic to use the C AST/token spans rather than regex: for
each original_identifier in self._ansi_id_to_rename_data iterate the parsed
tokens/AST nodes for that file and collect only token spans whose type is an
identifier or call-site (not comments/strings/macro bodies), then perform
replacements by applying rename_metadata.renamed_identifier to those spans
(apply edits in reverse order to avoid offset shifts) and add
rename_metadata.file_path to avocado_stub_files_to_use only when at least one
AST/token replacement occurred; return the same
RenameResult(file_content_with_renaming, list(avocado_stub_files_to_use)).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 95aae3cc-82b1-400b-996c-4ffac665ab9b

📥 Commits

Reviewing files that changed from the base of the PR and between 0dcebb4 and dffef74.

📒 Files selected for processing (2)
  • verification/avocado_stub_util.py
  • verification/cbmc_verification_client.py

Comment on lines +72 to +80
include_header_lines = [
f"#include <{header}>"
for header in CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION
for header in rename_result.get_headers()
+ list(CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION)
]
file_util.ensure_lines_at_beginning(default_header_include_lines, file)
file_util.ensure_lines_at_beginning(include_header_lines, file)
try:
vcommand = self._get_cbmc_verification_command(
vinput, path_to_file_to_verify=file
vinput, rename_result=rename_result, path_to_file_to_verify=file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Preserve headers from VerificationInput when building the rewritten translation unit.

This now prepends only Avocado-derived headers plus DEFAULT_HEADERS_FOR_VERIFICATION. Any non-default include carried by vinput.get_headers() disappears from both the temp source and the goto-cc inputs, so verification will start failing for files that depend on additional headers.

💡 Suggested fix
-                include_header_lines = [
-                    f"#include <{header}>"
-                    for header in rename_result.get_headers()
-                    + list(CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION)
-                ]
+                include_header_lines = [
+                    f"#include <{header}>"
+                    for header in dict.fromkeys(
+                        vinput.get_headers()
+                        + rename_result.get_headers()
+                        + list(CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION)
+                    )
+                ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_verification_client.py` around lines 72 - 80, The rewritten
translation unit is losing non-default includes because include_header_lines is
built only from rename_result.get_headers() and
CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION; update the include
collection to also incorporate headers from the original VerificationInput
(vinput.get_headers() or vinput.headers) so that
file_util.ensure_lines_at_beginning(...) prepends the union of vinput headers,
rename_result.get_headers(), and DEFAULT_HEADERS_FOR_VERIFICATION (preserving
order/uniqueness) before calling _get_cbmc_verification_command.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/verification/test_avocado_stub_util.py`:
- Around line 26-44: Update the test_apply_stub_renaming test to also assert on
the RenameResult.avocado_stub_files_to_use list: call
avocado_renamer.rename_ansi_identifiers_to_avocado_identifiers(content_pre_renaming),
capture the returned RenameResult, and add an assertion that
RenameResult.avocado_stub_files_to_use contains the expected stub filenames
corresponding to renamed identifiers (e.g., entries for isspace and strchr).
Keep the existing src_after_renaming assertion but use the same RenameResult
instance to verify both src_after_renaming and avocado_stub_files_to_use for
completeness.
- Around line 47-55: Update the test data used by
test_apply_stub_renaming_existing_avocado_name so it actually contains
already-renamed identifiers (prefix them with the underscore), e.g. change
occurrences of avocado_isspace and avocado_strchr to _avocado_isspace and
_avocado_strchr in
test/data/avocado_stub/test_renaming_existing_avocado_names.c; then run the
existing assertion in test_apply_stub_renaming_existing_avocado_name which calls
avocado_renamer.rename_ansi_identifiers_to_avocado_identifiers(...) and expects
.src_after_renaming to equal the input, confirming idempotency for functions
already using the _avocado_ prefix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5dacd37-64d6-4a2a-8262-6f6116c54231

📥 Commits

Reviewing files that changed from the base of the PR and between dffef74 and 87415ff.

📒 Files selected for processing (1)
  • test/verification/test_avocado_stub_util.py

Comment on lines +26 to +44
def test_apply_stub_renaming() -> None:
content_pre_renaming = _read_file_content(
"test/data/avocado_stub/test_renaming_no_existing_avocado_names.c"
)
expected_content_post_renaming = """#include <stdlib.h>
#include <ctype.h>

int is_separator(int c)
{
return c == '\\0' ||
_avocado_isspace(c) ||
_avocado_strchr(",.()+-/*=~%[];",c) != NULL;
}"""
assert (
avocado_renamer.rename_ansi_identifiers_to_avocado_identifiers(
content_pre_renaming
).src_after_renaming
== expected_content_post_renaming
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider verifying avocado_stub_files_to_use from the RenameResult.

The test only asserts on src_after_renaming but RenameResult also returns the list of stub files that should be included. Verifying this would ensure the renamer correctly tracks which stub files are needed when identifiers like isspace and strchr are renamed.

♻️ Proposed enhancement
 def test_apply_stub_renaming() -> None:
     content_pre_renaming = _read_file_content(
         "test/data/avocado_stub/test_renaming_no_existing_avocado_names.c"
     )
     expected_content_post_renaming = """#include <stdlib.h>
 `#include` <ctype.h>

 int is_separator(int c)
 {
     return c == '\\0' ||
         _avocado_isspace(c) ||
         _avocado_strchr(",.()+-/*=~%[];",c) != NULL;
 }"""
+    rename_result = avocado_renamer.rename_ansi_identifiers_to_avocado_identifiers(
+        content_pre_renaming
+    )
     assert (
-        avocado_renamer.rename_ansi_identifiers_to_avocado_identifiers(
-            content_pre_renaming
-        ).src_after_renaming
+        rename_result.src_after_renaming
         == expected_content_post_renaming
     )
+    assert len(rename_result.avocado_stub_files_to_use) > 0, (
+        "Expected stub files to be tracked for renamed identifiers"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/verification/test_avocado_stub_util.py` around lines 26 - 44, Update the
test_apply_stub_renaming test to also assert on the
RenameResult.avocado_stub_files_to_use list: call
avocado_renamer.rename_ansi_identifiers_to_avocado_identifiers(content_pre_renaming),
capture the returned RenameResult, and add an assertion that
RenameResult.avocado_stub_files_to_use contains the expected stub filenames
corresponding to renamed identifiers (e.g., entries for isspace and strchr).
Keep the existing src_after_renaming assertion but use the same RenameResult
instance to verify both src_after_renaming and avocado_stub_files_to_use for
completeness.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
verification/avocado_stub_util.py (2)

101-108: ⚠️ Potential issue | 🔴 Critical

Replace pickle with JSON to prevent potential code execution via repository-controlled input.

The pickle file is loaded automatically when AvocadoIdentifierRenamer() is instantiated. Since this file lives in the repository, an untrusted PR could replace c_stub_rename_map.pkl with a malicious payload to achieve code execution. The RenameData dataclass contains only JSON-serializable types (str, IdentifierNodeParentType enum, Path), so migrating to JSON is straightforward.

🔒 Proposed migration to JSON

In generate_avocado_stubs.py, change serialization:

-import pickle as pkl
+import json
...
-    try:
-        with Path(f"{rename_map_dir}/{rename_map_pkl}").open("wb") as f:
-            pkl.dump(all_renamed_functions, f)
+    rename_map_json = "c_stub_rename_map.json"
+    try:
+        # Convert RenameData to serializable dict
+        serializable = {
+            k: [{"renamed_identifier": d.renamed_identifier, 
+                 "parent_node_type": d.parent_node_type.value,
+                 "file_path": str(d.file_path)} for d in v]
+            for k, v in all_renamed_functions.items()
+        }
+        with Path(f"{rename_map_dir}/{rename_map_json}").open("w") as f:
+            json.dump(serializable, f, indent=2)

In avocado_stub_util.py, change deserialization:

-import pickle as pkl
+import json
...
-    DEFAULT_STUB_MAPPINGS = f"{AVOCADO_STUB_DIR}/c_stub_rename_map.pkl"
+    DEFAULT_STUB_MAPPINGS = f"{AVOCADO_STUB_DIR}/c_stub_rename_map.json"

     def __init__(self, path_to_stub_mappings: Path = Path(DEFAULT_STUB_MAPPINGS)) -> None:
         try:
-            data = pkl.load(path_to_stub_mappings.open(mode="rb"))
-            self._ansi_id_to_rename_data = cast("dict[str, list[RenameData]]", data)
+            with path_to_stub_mappings.open(mode="r") as f:
+                raw = json.load(f)
+            self._ansi_id_to_rename_data = {
+                k: [RenameData(
+                    renamed_identifier=d["renamed_identifier"],
+                    parent_node_type=IdentifierNodeParentType(d["parent_node_type"]),
+                    file_path=Path(d["file_path"])
+                ) for d in v]
+                for k, v in raw.items()
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 101 - 108, The constructor
AvocadoIdentifierRenamer.__init__ currently uses pickle to load
DEFAULT_STUB_MAPPINGS which allows execution of arbitrary code from
repository-controlled files; replace pickle usage with JSON: update the
generator (generate_avocado_stubs.py) to write a JSON file (serialize RenameData
fields to plain types/strings, converting Path to str and enum to its
name/value), then change verification/avocado_stub_util.py to open and parse
that JSON, reconstructing RenameData instances (convert enum names back to
IdentifierNodeParentType and paths back to Path) and assign to
self._ansi_id_to_rename_data; ensure error handling remains but no longer uses
pickle.

110-136: ⚠️ Potential issue | 🟠 Major

Regex-based renaming affects comments, strings, and macro bodies.

The re.subn with word boundary pattern (\b) will rename identifiers everywhere in the source, including comments, string literals, and macro definitions. While this may be acceptable for the CBMC stub files themselves (where comments/strings mentioning function names are rare), it could cause issues when applied to client code under verification.

For example:

  • Comment: // call isspace here// call _avocado_isspace here
  • String: "isspace""_avocado_isspace"

Consider using AST-based renaming (similar to generate_avocado_stubs.py) for more precise replacements, or document that this behavior is intentional for the verification use case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 110 - 136, The current
regex-based renaming in rename_ansi_identifiers_to_avocado_identifiers blindly
replaces word-boundary matches (via re.subn) and thus touches comments, string
literals and macro bodies; update this to perform token/AST-aware renaming
instead: parse the C source into tokens or an AST (reuse the approach from
generate_avocado_stubs.py or a C tokenizer) and only replace identifier tokens
that are actual identifiers (not in comments, strings or macro expansions),
using get_renamed_identifier(original_identifier, rename_metadata) to compute
the replacement and get_rename_data_for_file_containing_definition(...) to
collect stub files, then return the same
RenameResult(file_content_with_renaming, list(...)) once replacements are
applied; ensure RenameResult and function signatures remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@util/tree_sitter_util.py`:
- Around line 81-86: Replace the raw assertion that a call_expression has a
function child with a defensive check: in the call handling block for node.type
== "call_expression" (where you currently reference function_node and append to
result with IdentifierNodeParentType.CALL_EXPRESSION), first verify
function_node is not None and if it is None simply continue (or log a debug
message) instead of asserting; then keep the existing behavior of only appending
when function_node.type == "identifier" so non-identifier function nodes are
skipped gracefully.

In `@verification/avocado_stub_util.py`:
- Around line 59-85: Fix the typo in the RenameData docstring: change the
incorrect type name "IdentifierNodeParentTYpe" to the correct
"IdentifierNodeParentType" in the Attributes section to match the actual
annotated field and avoid confusion when reading the docs; update the docstring
near the RenameData class definition so it references IdentifierNodeParentType
for parent_node_type.
- Around line 174-203: In get_rename_data_for_file_containing_definition, add a
guard after computing rename_data_for_definition to check if it's empty and
raise a descriptive ValueError (mentioning original_identifier and that no
FUNCTION_DEFINITION was found) instead of letting an IndexError occur; keep the
existing duplicate-check that raises when len(...) > 1 and return
rename_data_for_definition[0] only after handling the empty case.

---

Duplicate comments:
In `@verification/avocado_stub_util.py`:
- Around line 101-108: The constructor AvocadoIdentifierRenamer.__init__
currently uses pickle to load DEFAULT_STUB_MAPPINGS which allows execution of
arbitrary code from repository-controlled files; replace pickle usage with JSON:
update the generator (generate_avocado_stubs.py) to write a JSON file (serialize
RenameData fields to plain types/strings, converting Path to str and enum to its
name/value), then change verification/avocado_stub_util.py to open and parse
that JSON, reconstructing RenameData instances (convert enum names back to
IdentifierNodeParentType and paths back to Path) and assign to
self._ansi_id_to_rename_data; ensure error handling remains but no longer uses
pickle.
- Around line 110-136: The current regex-based renaming in
rename_ansi_identifiers_to_avocado_identifiers blindly replaces word-boundary
matches (via re.subn) and thus touches comments, string literals and macro
bodies; update this to perform token/AST-aware renaming instead: parse the C
source into tokens or an AST (reuse the approach from generate_avocado_stubs.py
or a C tokenizer) and only replace identifier tokens that are actual identifiers
(not in comments, strings or macro expansions), using
get_renamed_identifier(original_identifier, rename_metadata) to compute the
replacement and get_rename_data_for_file_containing_definition(...) to collect
stub files, then return the same RenameResult(file_content_with_renaming,
list(...)) once replacements are applied; ensure RenameResult and function
signatures remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7bdab641-4012-4425-9a8a-7dd31b417462

📥 Commits

Reviewing files that changed from the base of the PR and between fe291a0 and c751537.

⛔ Files ignored due to path filters (1)
  • verification/cbmc_stubs/c_stub_rename_map.pkl is excluded by !**/*.pkl
📒 Files selected for processing (4)
  • generate_avocado_stubs.py
  • test/verification/test_avocado_stub_util.py
  • util/tree_sitter_util.py
  • verification/avocado_stub_util.py

Comment on lines +81 to +86
if node.type == "call_expression":
function_node = node.child_by_field_name("function")
assert function_node, "Expected a function node under a call_expression node"
if function_node.type == "identifier":
assert function_node.text, "Expected an identifier node to have text"
result.append((function_node, IdentifierNodeParentType.CALL_EXPRESSION))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider handling non-identifier function nodes gracefully.

The current implementation silently skips call expressions where function_node.type != "identifier" (e.g., function pointer calls like (*fn_ptr)(args)). This is correct for the ANSI-C stub renaming use case, but the assertion on line 83 could fail if the tree-sitter grammar produces a call_expression without a function field in malformed input. A defensive check would be safer.

♻️ Optional defensive check
         if node.type == "call_expression":
             function_node = node.child_by_field_name("function")
-            assert function_node, "Expected a function node under a call_expression node"
+            if function_node is None:
+                # Malformed AST; skip this node
+                for child in node.children:
+                    traverse(child)
+                return
             if function_node.type == "identifier":
                 assert function_node.text, "Expected an identifier node to have text"
                 result.append((function_node, IdentifierNodeParentType.CALL_EXPRESSION))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if node.type == "call_expression":
function_node = node.child_by_field_name("function")
assert function_node, "Expected a function node under a call_expression node"
if function_node.type == "identifier":
assert function_node.text, "Expected an identifier node to have text"
result.append((function_node, IdentifierNodeParentType.CALL_EXPRESSION))
if node.type == "call_expression":
function_node = node.child_by_field_name("function")
if function_node is None:
# Malformed AST; skip this node
for child in node.children:
traverse(child)
return
if function_node.type == "identifier":
assert function_node.text, "Expected an identifier node to have text"
result.append((function_node, IdentifierNodeParentType.CALL_EXPRESSION))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@util/tree_sitter_util.py` around lines 81 - 86, Replace the raw assertion
that a call_expression has a function child with a defensive check: in the call
handling block for node.type == "call_expression" (where you currently reference
function_node and append to result with
IdentifierNodeParentType.CALL_EXPRESSION), first verify function_node is not
None and if it is None simply continue (or log a debug message) instead of
asserting; then keep the existing behavior of only appending when
function_node.type == "identifier" so non-identifier function nodes are skipped
gracefully.

Comment on lines +59 to +85
@dataclass(frozen=True)
class RenameData:
"""Data for renaming functions in libraries defined in the ANSI-C standard to Avocado stubs.

The CBMC verifier includes its own copy of stub functions in libraries defined in the ANSI-C
standard that it uses for verifying client code that makes use them. This can lead to
issues related to internal C functions that are used by its copy of stub functions:

https://github.com/diffblue/cbmc/issues/8844

Avocado stubs aim to mitigate this issue by ensuring only the functions directly in the copy of
the standard library are used. However, standard library functions cannot be defined twice,
and so each function (i.e., stub) is prepended with `avocado_` to avoid naming collisions.


See `generate_avocado_stubs.py` for how Avocado stubs are generated.

Attributes:
renamed_identifier (str): The function's name as an Avocado stub.
parent_node_type (IdentifierNodeParentTYpe): The type of the nearest parent declaration node
of the renamed identifier.
file_path (Path): The path to the CBMC stub file in which the function is defined.
"""

renamed_identifier: str
parent_node_type: IdentifierNodeParentType
file_path: Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor typo in docstring.

Line 78 has a typo: IdentifierNodeParentTYpe should be IdentifierNodeParentType.

📝 Proposed fix
-        parent_node_type (IdentifierNodeParentTYpe): The type of the nearest parent declaration node
+        parent_node_type (IdentifierNodeParentType): The type of the nearest parent declaration node
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@dataclass(frozen=True)
class RenameData:
"""Data for renaming functions in libraries defined in the ANSI-C standard to Avocado stubs.
The CBMC verifier includes its own copy of stub functions in libraries defined in the ANSI-C
standard that it uses for verifying client code that makes use them. This can lead to
issues related to internal C functions that are used by its copy of stub functions:
https://github.com/diffblue/cbmc/issues/8844
Avocado stubs aim to mitigate this issue by ensuring only the functions directly in the copy of
the standard library are used. However, standard library functions cannot be defined twice,
and so each function (i.e., stub) is prepended with `avocado_` to avoid naming collisions.
See `generate_avocado_stubs.py` for how Avocado stubs are generated.
Attributes:
renamed_identifier (str): The function's name as an Avocado stub.
parent_node_type (IdentifierNodeParentTYpe): The type of the nearest parent declaration node
of the renamed identifier.
file_path (Path): The path to the CBMC stub file in which the function is defined.
"""
renamed_identifier: str
parent_node_type: IdentifierNodeParentType
file_path: Path
`@dataclass`(frozen=True)
class RenameData:
"""Data for renaming functions in libraries defined in the ANSI-C standard to Avocado stubs.
The CBMC verifier includes its own copy of stub functions in libraries defined in the ANSI-C
standard that it uses for verifying client code that makes use them. This can lead to
issues related to internal C functions that are used by its copy of stub functions:
https://github.com/diffblue/cbmc/issues/8844
Avocado stubs aim to mitigate this issue by ensuring only the functions directly in the copy of
the standard library are used. However, standard library functions cannot be defined twice,
and so each function (i.e., stub) is prepended with `avocado_` to avoid naming collisions.
See `generate_avocado_stubs.py` for how Avocado stubs are generated.
Attributes:
renamed_identifier (str): The function's name as an Avocado stub.
parent_node_type (IdentifierNodeParentType): The type of the nearest parent declaration node
of the renamed identifier.
file_path (Path): The path to the CBMC stub file in which the function is defined.
"""
renamed_identifier: str
parent_node_type: IdentifierNodeParentType
file_path: Path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 59 - 85, Fix the typo in the
RenameData docstring: change the incorrect type name "IdentifierNodeParentTYpe"
to the correct "IdentifierNodeParentType" in the Attributes section to match the
actual annotated field and avoid confusion when reading the docs; update the
docstring near the RenameData class definition so it references
IdentifierNodeParentType for parent_node_type.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
verification/cbmc_verification_client.py (2)

87-92: ⚠️ Potential issue | 🔴 Critical

Preserve the non-default headers already carried by VerificationInput.

This include block now prepends only rename_result.get_headers() plus the defaults. Any headers still tracked on vinput but not embedded in contents_of_file_to_verify will disappear from the rewritten translation unit, which will break files that rely on those prototypes or macros.

💡 Proposed fix
             include_header_lines = [
                 f"#include <{header}>"
-                for header in rename_result.get_headers()
-                + list(CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION)
+                for header in dict.fromkeys(
+                    vinput.get_headers()
+                    + rename_result.get_headers()
+                    + list(CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION)
+                )
             ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_verification_client.py` around lines 87 - 92, The current
include_header_lines construction replaces any headers carried on the
VerificationInput (vinput) with only rename_result.get_headers() plus the
defaults, causing previously-tracked non-default headers to be lost; update the
logic that builds include_header_lines (used with
file_util.ensure_lines_at_beginning) to merge vinput's headers with
rename_result.get_headers() and
CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION, preserving existing
vinput headers and removing duplicates (respecting order), so all headers
tracked on the VerificationInput remain present in the rewritten translation
unit.

94-98: ⚠️ Potential issue | 🔴 Critical

Stop executing this interpolated command string via the shell.

The stub paths and temp-file path are joined into a single shell command and executed with subprocess.run(..., shell=True). That makes quoting fragile and turns any unexpected metacharacter in a path into command injection. Return argv vectors from _get_cbmc_verification_command() and run each step with shell=False.

Also applies to: 118-123, 147-175

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_verification_client.py` around lines 94 - 98, The code
currently builds a single interpolated shell command from
_get_cbmc_verification_command and executes it with subprocess.run(...,
shell=True), which is unsafe; change _get_cbmc_verification_command to return an
argv list (list[str]) instead of a single string and update all callers (the
places invoking subprocess.run at the vcommand assignment and the other
occurrences around the regions you flagged) to call subprocess.run(argv_list,
capture_output=True, text=True, shell=False) so paths are passed as arguments
(not parsed by a shell) and avoid command injection; ensure any logic that
previously concatenated paths or added quoting is moved into the list
construction and that any subprocess.run calls still handle stdout/stderr as
before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@verification/cbmc_verification_client.py`:
- Around line 87-92: The current include_header_lines construction replaces any
headers carried on the VerificationInput (vinput) with only
rename_result.get_headers() plus the defaults, causing previously-tracked
non-default headers to be lost; update the logic that builds
include_header_lines (used with file_util.ensure_lines_at_beginning) to merge
vinput's headers with rename_result.get_headers() and
CbmcVerificationClient.DEFAULT_HEADERS_FOR_VERIFICATION, preserving existing
vinput headers and removing duplicates (respecting order), so all headers
tracked on the VerificationInput remain present in the rewritten translation
unit.
- Around line 94-98: The code currently builds a single interpolated shell
command from _get_cbmc_verification_command and executes it with
subprocess.run(..., shell=True), which is unsafe; change
_get_cbmc_verification_command to return an argv list (list[str]) instead of a
single string and update all callers (the places invoking subprocess.run at the
vcommand assignment and the other occurrences around the regions you flagged) to
call subprocess.run(argv_list, capture_output=True, text=True, shell=False) so
paths are passed as arguments (not parsed by a shell) and avoid command
injection; ensure any logic that previously concatenated paths or added quoting
is moved into the list construction and that any subprocess.run calls still
handle stdout/stderr as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0bed3e57-dba0-434c-906b-929c41b8f9f0

📥 Commits

Reviewing files that changed from the base of the PR and between c751537 and e980702.

📒 Files selected for processing (1)
  • verification/cbmc_verification_client.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
verification/__init__.py (1)

10-27: 🧹 Nitpick | 🔵 Trivial

Consider exporting RenameResult for type annotation purposes.

RenameResult is returned by AvocadoIdentifierRenamer.rename_ansi_identifiers_to_avocado_identifiers() but is not included in __all__. External callers may need to import it for type annotations.

♻️ Proposed fix to export RenameResult
-from .avocado_stub_util import RenameData, AvocadoIdentifierRenamer, AVOCADO_STUB_DIR, get_stub_mappings
+from .avocado_stub_util import RenameData, RenameResult, AvocadoIdentifierRenamer, AVOCADO_STUB_DIR, get_stub_mappings
     "RenameData",
+    "RenameResult",
     "AvocadoIdentifierRenamer",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/__init__.py` around lines 10 - 27, The module fails to export
the RenameResult type used by callers of
AvocadoIdentifierRenamer.rename_ansi_identifiers_to_avocado_identifiers; update
the top-level import to also import RenameResult from avocado_stub_util and add
"RenameResult" to the __all__ list so external code can import it for type
annotations and static typing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/verification/test_avocado_stub_util.py`:
- Around line 8-9: The module instantiates AvocadoIdentifierRenamer() at import
time via the avocado_renamer variable which forces the pickle to load during
import; change this to a pytest fixture named avocado_renamer that constructs
and returns an AvocadoIdentifierRenamer instance (so pickle loading happens
per-test and failures surface as test errors), and update test functions to
accept the avocado_renamer fixture as a parameter instead of using the
module-level avocado_renamer variable.

In `@verification/avocado_stub_util.py`:
- Around line 169-176: The class method get_renamed_identifier currently only
checks for >1 and then returns next(iter(renamed_identifiers)), which yields a
StopIteration for an empty rename_data; update get_renamed_identifier to use the
same check as the standalone function (if len(renamed_identifiers) != 1) and
raise a clear ValueError when the set is empty or has multiple entries, keeping
the existing descriptive message that mentions original_identifier and the
renamed_identifiers so behavior is consistent with the standalone function.

---

Outside diff comments:
In `@verification/__init__.py`:
- Around line 10-27: The module fails to export the RenameResult type used by
callers of
AvocadoIdentifierRenamer.rename_ansi_identifiers_to_avocado_identifiers; update
the top-level import to also import RenameResult from avocado_stub_util and add
"RenameResult" to the __all__ list so external code can import it for type
annotations and static typing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bec2eaf7-8599-45c4-9014-f9d73d6eeae3

📥 Commits

Reviewing files that changed from the base of the PR and between e980702 and b40cdca.

📒 Files selected for processing (5)
  • test/data/avocado_stub/test_header_detection.c
  • test/verification/test_avocado_stub_util.py
  • test/verification/test_verification_input.py
  • verification/__init__.py
  • verification/avocado_stub_util.py

Comment on lines +8 to +9
avocado_renamer = avocado_stub_util.AvocadoIdentifierRenamer()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Module-level instantiation loads pickle at import time.

Creating AvocadoIdentifierRenamer() at module level means the pickle file is loaded when the test module is imported. If the pickle file is missing or corrupted, all tests in this module will fail to import rather than showing individual test failures. Consider using a pytest fixture for better test isolation and clearer error reporting.

♻️ Proposed fixture-based approach
-avocado_renamer = avocado_stub_util.AvocadoIdentifierRenamer()
+import pytest
+
+@pytest.fixture
+def avocado_renamer():
+    return avocado_stub_util.AvocadoIdentifierRenamer()

Then update test functions to accept avocado_renamer as a parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/verification/test_avocado_stub_util.py` around lines 8 - 9, The module
instantiates AvocadoIdentifierRenamer() at import time via the avocado_renamer
variable which forces the pickle to load during import; change this to a pytest
fixture named avocado_renamer that constructs and returns an
AvocadoIdentifierRenamer instance (so pickle loading happens per-test and
failures surface as test errors), and update test functions to accept the
avocado_renamer fixture as a parameter instead of using the module-level
avocado_renamer variable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
verification/cbmc_verification_client.py (1)

93-97: ⚠️ Potential issue | 🔴 Critical

Build verifier invocations as argv lists, not a shell string.

Lines 146-177 flatten stub paths from RenameResult.get_stub_file_paths() into a shell command, and Line 97 runs it with shell=True. Those paths come from repository-controlled stub metadata, so spaces or shell metacharacters in a stub path can change the command that actually executes. Return argv lists and call subprocess.run(..., shell=False) instead.

Also applies to: 146-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/cbmc_verification_client.py` around lines 93 - 97, The verifier
command is being built as a shell string and executed with subprocess.run(...,
shell=True) which is unsafe; update _get_cbmc_verification_command to return an
argv list (list[str]) instead of a single shell string and ensure any flattening
of RenameResult.get_stub_file_paths() produces list elements (do not join/escape
into one string), then call subprocess.run(vcommand, capture_output=True,
text=True, shell=False) so the verifier is executed as an argv list; apply the
same change to the other invocation sites around the
RenameResult.get_stub_file_paths() handling (the block referenced at lines
146-177).
verification/avocado_stub_util.py (4)

196-207: ⚠️ Potential issue | 🟠 Major

Guard the no-definition path before indexing [0].

If rename_metadata contains declarations or call sites but no FUNCTION_DEFINITION, Line 207 raises IndexError and hides the real data problem. Raise a descriptive ValueError when the filtered list is empty.

Suggested fix
         rename_data_for_definition = [
             data
             for data in rename_metadata
             if data.parent_node_type == IdentifierNodeParentType.FUNCTION_DEFINITION
         ]
+        if not rename_data_for_definition:
+            msg = f"No definition found for '{original_identifier}'"
+            raise ValueError(msg)
         if len(rename_data_for_definition) > 1:
             msg = (
                 f"Expected a single definition for '{original_identifier}', "
                 f"but got {rename_data_for_definition}"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 196 - 207, The code filters
rename_metadata into rename_data_for_definition but immediately indexes [0],
which can raise IndexError when no FUNCTION_DEFINITION exists; update this block
to first check if rename_data_for_definition is empty and raise a descriptive
ValueError (mentioning original_identifier and that no function definition was
found), keep the existing duplicate (>1) check and message for multiple
definitions, and then return rename_data_for_definition[0]; reference the
variables rename_data_for_definition, rename_metadata,
IdentifierNodeParentType.FUNCTION_DEFINITION, and original_identifier when
making the change.

123-140: ⚠️ Potential issue | 🟠 Major

Don't rewrite the translation unit with raw regex replacement.

Lines 125-133 run re.subn() across the entire source text, so comments, string literals, macro bodies, and user declarations named like ANSI functions get renamed too. That is broader than client call-site renaming and can change the program being verified. Drive this from parsed C identifiers/call nodes instead of whole-file text substitution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 123 - 140, The current
implementation in the loop that builds file_content_with_renaming uses raw text
substitution via re.subn over the whole file (loop over
self._ansi_id_to_rename_data, get_renamed_identifier,
get_rename_data_for_file_containing_definition) which will rename identifiers in
comments, strings, macros and other non-code locations; change this to drive
renaming from a parsed C AST/token stream instead: parse the translation unit
into tokens/AST nodes, locate only identifier tokens and call/decl nodes that
match original_identifier, and perform replacements at those token positions
(preserving surrounding text unchanged) while still collecting
rename_data_for_definition into avocado_stub_files_to_use and returning a
RenameResult; use existing methods get_renamed_identifier and
get_rename_data_for_file_containing_definition to compute new names and metadata
but apply them only to AST/token-validated identifier occurrences rather than
via regex over the entire file.

156-176: ⚠️ Potential issue | 🟡 Minor

Handle empty rename metadata before next(iter(...)).

Line 176 only rejects the > 1 case. If a mapping entry is empty, this falls through to StopIteration instead of the descriptive ValueError that the module-level helper already raises.

Suggested fix
         renamed_identifiers = {rename.renamed_identifier for rename in rename_data}
-        if len(renamed_identifiers) > 1:
+        if len(renamed_identifiers) != 1:
             msg = (
                 f"Expected exactly one renaming for {original_identifier}, "
                 f"but got {renamed_identifiers}"
             )
             raise ValueError(msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 156 - 176, The
get_renamed_identifier method currently only checks for >1 renamings and then
calls next(iter(renamed_identifiers)), which raises StopIteration for an empty
set; update get_renamed_identifier to explicitly handle an empty
renamed_identifiers (set created from rename_data) and raise a clear ValueError
(similar wording to the existing module-level helper) when no renaming is found
for original_identifier, otherwise keep the existing >1 check and return the
single element; reference the get_renamed_identifier function, the
renamed_identifiers set, and RenameData in your change.

102-112: ⚠️ Potential issue | 🔴 Critical

Replace repo-controlled pickle deserialization before merge.

Line 108 still does a pickle.load() from verification/cbmc_stubs/c_stub_rename_map.pkl. Because that mapping file is generated into the repo and loaded automatically by AvocadoIdentifierRenamer(), an untrusted PR can turn verification into arbitrary code execution. This needs a data-only format end-to-end (generate_avocado_stubs.py writer plus both loader call sites here).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verification/avocado_stub_util.py` around lines 102 - 112, The constructor
for AvocadoIdentifierRenamer currently uses insecure pickle.load
(DEFAULT_STUB_MAPPINGS / __init__) to deserialize a repo-controlled .pkl;
replace this with a data-only format and loader: change
generate_avocado_stubs.py to emit the stub mapping as JSON (or another safe,
text-only format) and update DEFAULT_STUB_MAPPINGS to the new
filename/extension, then in AvocadoIdentifierRenamer.__init__ open and parse the
JSON with json.load (or the chosen safe parser) and assign to
self._ansi_id_to_rename_data (preserve the dict[str, list[RenameData]] shape),
and raise the same ValueError on parse/file errors; ensure any other loader call
sites are updated to the new format.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@verification/avocado_stub_util.py`:
- Around line 196-207: The code filters rename_metadata into
rename_data_for_definition but immediately indexes [0], which can raise
IndexError when no FUNCTION_DEFINITION exists; update this block to first check
if rename_data_for_definition is empty and raise a descriptive ValueError
(mentioning original_identifier and that no function definition was found), keep
the existing duplicate (>1) check and message for multiple definitions, and then
return rename_data_for_definition[0]; reference the variables
rename_data_for_definition, rename_metadata,
IdentifierNodeParentType.FUNCTION_DEFINITION, and original_identifier when
making the change.
- Around line 123-140: The current implementation in the loop that builds
file_content_with_renaming uses raw text substitution via re.subn over the whole
file (loop over self._ansi_id_to_rename_data, get_renamed_identifier,
get_rename_data_for_file_containing_definition) which will rename identifiers in
comments, strings, macros and other non-code locations; change this to drive
renaming from a parsed C AST/token stream instead: parse the translation unit
into tokens/AST nodes, locate only identifier tokens and call/decl nodes that
match original_identifier, and perform replacements at those token positions
(preserving surrounding text unchanged) while still collecting
rename_data_for_definition into avocado_stub_files_to_use and returning a
RenameResult; use existing methods get_renamed_identifier and
get_rename_data_for_file_containing_definition to compute new names and metadata
but apply them only to AST/token-validated identifier occurrences rather than
via regex over the entire file.
- Around line 156-176: The get_renamed_identifier method currently only checks
for >1 renamings and then calls next(iter(renamed_identifiers)), which raises
StopIteration for an empty set; update get_renamed_identifier to explicitly
handle an empty renamed_identifiers (set created from rename_data) and raise a
clear ValueError (similar wording to the existing module-level helper) when no
renaming is found for original_identifier, otherwise keep the existing >1 check
and return the single element; reference the get_renamed_identifier function,
the renamed_identifiers set, and RenameData in your change.
- Around line 102-112: The constructor for AvocadoIdentifierRenamer currently
uses insecure pickle.load (DEFAULT_STUB_MAPPINGS / __init__) to deserialize a
repo-controlled .pkl; replace this with a data-only format and loader: change
generate_avocado_stubs.py to emit the stub mapping as JSON (or another safe,
text-only format) and update DEFAULT_STUB_MAPPINGS to the new
filename/extension, then in AvocadoIdentifierRenamer.__init__ open and parse the
JSON with json.load (or the chosen safe parser) and assign to
self._ansi_id_to_rename_data (preserve the dict[str, list[RenameData]] shape),
and raise the same ValueError on parse/file errors; ensure any other loader call
sites are updated to the new format.

In `@verification/cbmc_verification_client.py`:
- Around line 93-97: The verifier command is being built as a shell string and
executed with subprocess.run(..., shell=True) which is unsafe; update
_get_cbmc_verification_command to return an argv list (list[str]) instead of a
single shell string and ensure any flattening of
RenameResult.get_stub_file_paths() produces list elements (do not join/escape
into one string), then call subprocess.run(vcommand, capture_output=True,
text=True, shell=False) so the verifier is executed as an argv list; apply the
same change to the other invocation sites around the
RenameResult.get_stub_file_paths() handling (the block referenced at lines
146-177).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 65b37420-1053-418b-993c-86d9483b784b

📥 Commits

Reviewing files that changed from the base of the PR and between b40cdca and b130281.

📒 Files selected for processing (4)
  • test/data/avocado_stub/test_renaming_no_existing_avocado_names.c
  • test/verification/test_avocado_stub_util.py
  • verification/avocado_stub_util.py
  • verification/cbmc_verification_client.py

@jyoo980
Copy link
Collaborator Author

jyoo980 commented Mar 12, 2026

@CodeRabbit pause

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

✅ Actions performed

Reviews paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants