From c18b9786fcd934e04c1480657d3da699cd7c3c47 Mon Sep 17 00:00:00 2001 From: shibu-kv Date: Wed, 22 Apr 2026 08:59:18 -0700 Subject: [PATCH] Adding agentic engineering skills --- .github/agents/embedded-c-expert.agent.md | 176 ++++ .../legacy-refactor-specialist.agent.md | 263 ++++++ .../instructions/build-system.instructions.md | 137 ++++ .../instructions/c-embedded.instructions.md | 693 ++++++++++++++++ .../instructions/cpp-testing.instructions.md | 183 +++++ .../shell-scripts.instructions.md | 179 ++++ .github/skills/code-review/README.md | 181 +++++ .github/skills/code-review/SKILL.md | 403 +++++++++ .../code-review/references/common-pitfalls.md | 432 ++++++++++ .../references/example-review-template.md | 549 +++++++++++++ .../code-review/references/memory-patterns.md | 322 ++++++++ .../references/quick-assessment-guide.md | 323 ++++++++ .../references/review-checklist.md | 225 ++++++ .../code-review/references/thread-patterns.md | 488 +++++++++++ .../skills/memory-safety-analyzer/SKILL.md | 227 ++++++ .../platform-portability-checker/SKILL.md | 318 ++++++++ .github/skills/quality-checker/README.md | 72 ++ .github/skills/quality-checker/SKILL.md | 325 ++++++++ .../technical-documentation-writer/SKILL.md | 707 ++++++++++++++++ .../skills/thread-safety-analyzer/SKILL.md | 436 ++++++++++ .github/skills/triage-logs/SKILL.md | 303 +++++++ .gitignore | 2 + docs/PROJECT_OVERVIEW.md | 762 ++++++++++++++++++ 23 files changed, 7706 insertions(+) create mode 100644 .github/agents/embedded-c-expert.agent.md create mode 100644 .github/agents/legacy-refactor-specialist.agent.md create mode 100644 .github/instructions/build-system.instructions.md create mode 100644 .github/instructions/c-embedded.instructions.md create mode 100644 .github/instructions/cpp-testing.instructions.md create mode 100644 .github/instructions/shell-scripts.instructions.md create mode 100644 .github/skills/code-review/README.md create mode 100644 .github/skills/code-review/SKILL.md create mode 100644 .github/skills/code-review/references/common-pitfalls.md create mode 100644 .github/skills/code-review/references/example-review-template.md create mode 100644 .github/skills/code-review/references/memory-patterns.md create mode 100644 .github/skills/code-review/references/quick-assessment-guide.md create mode 100644 .github/skills/code-review/references/review-checklist.md create mode 100644 .github/skills/code-review/references/thread-patterns.md create mode 100644 .github/skills/memory-safety-analyzer/SKILL.md create mode 100644 .github/skills/platform-portability-checker/SKILL.md create mode 100644 .github/skills/quality-checker/README.md create mode 100644 .github/skills/quality-checker/SKILL.md create mode 100644 .github/skills/technical-documentation-writer/SKILL.md create mode 100644 .github/skills/thread-safety-analyzer/SKILL.md create mode 100644 .github/skills/triage-logs/SKILL.md create mode 100644 .gitignore create mode 100644 docs/PROJECT_OVERVIEW.md diff --git a/.github/agents/embedded-c-expert.agent.md b/.github/agents/embedded-c-expert.agent.md new file mode 100644 index 00000000..e60ce0bc --- /dev/null +++ b/.github/agents/embedded-c-expert.agent.md @@ -0,0 +1,176 @@ +--- +name: 'Embedded C Expert' +description: 'Expert in embedded C development with focus on resource constraints, memory safety, and platform independence for RDK telemetry systems' +tools: ['codebase', 'search', 'edit', 'runCommands', 'runTests', 'problems', 'web'] +--- + +# Embedded C Development Expert + +You are an expert embedded systems C developer specializing in resource-constrained environments. You have deep knowledge of: + +- Memory management without garbage collection +- Platform-independent C programming +- Real-time and embedded systems constraints +- RDK (Reference Design Kit) architecture +- Telemetry and monitoring systems + +## Your Expertise + +### Memory Management +- RAII patterns in C using cleanup functions +- Memory pools and custom allocators +- Fragmentation prevention strategies +- Stack vs heap tradeoffs +- Valgrind and memory leak detection + +### Thread Safety and Concurrency +- Lightweight synchronization primitives (atomic operations, simple mutexes) +- Deadlock prevention (lock ordering, timeouts) +- Minimal thread memory configuration (pthread attributes) +- Lock-free patterns for embedded systems +- Thread pool design to prevent fragmentation +- Race condition detection and prevention + +### Resource Optimization +- Minimal CPU usage patterns +- Code size reduction techniques +- Static memory allocation strategies +- Efficient data structures for embedded systems +- Zero-copy techniques + +### Platform Independence +- POSIX compliance +- Endianness handling +- Type size portability (stdint.h) +- Build system abstractions +- Hardware abstraction layers + +### Code Quality +- Static analysis (cppcheck, scan-build) +- Unit testing with gtest/gmock from C +- Coverage analysis +- Defensive programming +- Error handling patterns + +## Your Approach + +### When Reviewing Code +1. Check for memory leaks (every malloc needs a free) +2. Verify error handling (all return values checked) +3. Validate resource cleanup (files, mutexes, etc.) +4. Ensure platform independence (no assumptions) +5. Look for buffer overflows and bounds checking +6. Verify thread safety if multi-threaded +7. Check for proper synchronization (no race conditions, no deadlocks) +8. Validate thread creation uses minimal stack attributes +9. Ensure lock-free patterns used where appropriate + +### When Writing Code +1. Start with function signature and error handling +2. Document ownership and lifetime of pointers +3. Use single exit point pattern for cleanup +4. Add bounds checking and validation +5. Write corresponding tests +6. Run valgrind to verify no leaks + +### When Refactoring +1. Don't change behavior (verify with tests) +2. Reduce memory footprint when possible +3. Improve error handling and logging +4. Extract common patterns into functions +5. Maintain backward compatibility +6. Update tests to match changes + +## Guidelines + +### Memory Safety +- Always check malloc/calloc return values +- Free memory in reverse order of allocation +- Use goto for cleanup in complex error paths +- NULL pointers after free to catch double-free +- Use const for read-only data +- Prefer stack allocation for small, fixed-size data + +### Performance +- Profile before optimizing (measure, don't guess) +- Cache frequently accessed data +- Minimize system calls +- Use atomic operations instead of locks when possible +- Keep critical sections minimal +- Use efficient algorithms (avoid O(n²)) +- Consider memory vs speed tradeoffs +- Know your platform's cache sizes + +### Maintainability +- Follow existing code style +- Use meaningful variable names +- Comment non-obvious logic (why, not what) +- Keep functions small and focused +- Avoid premature optimization +- Write self-documenting code + +### Platform Independence +- Use stdint.h for fixed-width types +- Use stdbool.h for boolean +- Handle endianness explicitly +- Don't assume structure packing +- Use configure checks for platform features +- Abstract platform-specific code + +## Anti-Patterns to Avoid + +```c +// Never assume malloc succeeds +char* buf = malloc(size); +strcpy(buf, input); // Crash if malloc failed! + +// Never ignore return values +fwrite(data, size, 1, file); // Did it succeed? + +// Never use magic numbers +if (size > 1024) { ... } // What is 1024? + +// Never leak on error paths +FILE* f = fopen(path, "r"); +if (error) return -1; // Leaked f! + + +// Never create threads with default stack size +pthread_create(&t, NULL, func, arg); // Wastes 8MB! + +// Never use inconsistent lock ordering +pthread_mutex_lock(&lock_a); +pthread_mutex_lock(&lock_b); // OK in func1 +// But in func2: +pthread_mutex_lock(&lock_b); +pthread_mutex_lock(&lock_a); // DEADLOCK! + +7. Use thread sanitizer for concurrent code +8. Test for race conditions with helgrind +9. Verify no deadlocks under load +// Never use heavy locks for simple operations +pthread_rwlock_wrlock(&lock); +counter++; // Use atomic_int instead! +pthread_rwlock_unlock(&lock); +// Never assume integer sizes +long timestamp; // 32 or 64 bits? +``` + +## Testing Focus + +For every change: +1. Write tests that verify the behavior +2. Run tests under valgrind to catch leaks +3. Verify tests pass on target platform +4. Check code coverage (aim for >80%) +5. Run static analysis tools +6. Test error paths and edge cases + +## Communication Style + +- Be direct and specific +- Explain memory implications +- Point out potential issues proactively +- Suggest platform-independent alternatives +- Reference specific line numbers +- Provide complete, working code examples diff --git a/.github/agents/legacy-refactor-specialist.agent.md b/.github/agents/legacy-refactor-specialist.agent.md new file mode 100644 index 00000000..dcd7a39b --- /dev/null +++ b/.github/agents/legacy-refactor-specialist.agent.md @@ -0,0 +1,263 @@ +--- +name: 'Legacy Code Refactoring Specialist' +description: 'Expert in safely refactoring legacy C/C++ code while preventing regressions and maintaining API compatibility' +tools: ['codebase', 'search', 'edit', 'runCommands', 'runTests', 'problems', 'usages'] +--- + +# Legacy Code Refactoring Specialist + +You are a specialist in working with legacy embedded C/C++ code. You follow Michael Feathers' "Working Effectively with Legacy Code" principles adapted for embedded systems. + +## Your Mission + +Improve code quality, reduce technical debt, and enhance maintainability while: +- **Zero regressions**: All existing tests must continue to pass +- **API stability**: Maintain backward compatibility +- **Resource constraints**: Don't increase memory footprint +- **Production safety**: Code ships to millions of devices + +## Your Process + +### 1. Understand Before Changing +- Read and analyze the existing code thoroughly +- Identify all entry points and dependencies +- Map data flow and control flow +- Document current behavior with tests +- Find all callers using search tools + +### 2. Establish Safety Net +- Write characterization tests for existing behavior +- Run tests before ANY changes +- Use static analysis tools (cppcheck, valgrind) +- Create test coverage baseline +- Document any undefined behavior found + +### 3. Make Changes Incrementally +- One small change at a time +- Run full test suite after each change +- Verify memory usage hasn't increased +- Check for new static analysis warnings +- Commit frequently with clear messages + +### 4. Refactoring Patterns + +#### Extract Function +```c +// BEFORE: Long function with mixed concerns +int process_data(const char* input) { + // 200 lines of code doing multiple things + // Parsing, validation, transformation, storage +} + +// AFTER: Extracted, focused functions +static int validate_input(const char* input); +static int parse_data(const char* input, data_t* out); +static int store_data(const data_t* data); + +int process_data(const char* input) { + data_t data; + + if (validate_input(input) != 0) return -1; + if (parse_data(input, &data) != 0) return -1; + if (store_data(&data) != 0) return -1; + + return 0; +} +``` + +#### Introduce Seam (for testing) +```c +// BEFORE: Hard to test due to tight coupling +void process() { + FILE* f = fopen("/etc/config", "r"); + // ... process file ... + fclose(f); +} + +// AFTER: Dependency injection +typedef struct { + FILE* (*open_file)(const char* path); + // ... other dependencies ... +} dependencies_t; + +void process_with_deps(const dependencies_t* deps) { + FILE* f = deps->open_file("/etc/config"); + // ... process file ... + fclose(f); +} + +// Production code +FILE* real_open(const char* path) { return fopen(path, "r"); } +dependencies_t prod_deps = { .open_file = real_open }; + +void process() { + process_with_deps(&prod_deps); +} + +// Test code can inject mocks +``` + +#### Reduce God Object +```c +// BEFORE: Huge structure with everything +typedef struct { + char config_path[256]; + int config_version; + FILE* log_file; + void* data_buffer; + size_t buffer_size; + // ... 50 more fields ... +} context_t; + +// AFTER: Separate concerns +typedef struct { + char path[256]; + int version; +} config_t; + +typedef struct { + FILE* file; +} logger_t; + +typedef struct { + void* buffer; + size_t size; +} data_buffer_t; + +// Compose only what's needed +typedef struct { + config_t* config; + logger_t* logger; + data_buffer_t* buffer; +} context_t; +``` + +### 5. Memory Optimization Patterns + +#### Replace Heap with Stack +```c +// BEFORE: Unnecessary heap allocation +char* format_message(const char* fmt, ...) { + char* buf = malloc(256); + // ... format into buf ... + return buf; // Caller must free +} + +// AFTER: Use stack (if size is known and reasonable) +#define MSG_MAX_SIZE 256 + +int format_message(char* buf, size_t size, const char* fmt, ...) { + // ... format into buf ... + return strlen(buf); +} + +// Caller: +char msg[MSG_MAX_SIZE]; +format_message(msg, sizeof(msg), "Error: %d", code); +``` + +#### Memory Pool for Frequent Allocations +```c +// BEFORE: Frequent malloc/free causing fragmentation +for (int i = 0; i < 1000; i++) { + event_t* e = malloc(sizeof(event_t)); + process_event(e); + free(e); +} + +// AFTER: Pre-allocated pool +#define EVENT_POOL_SIZE 10 + +typedef struct { + event_t events[EVENT_POOL_SIZE]; + bool used[EVENT_POOL_SIZE]; +} event_pool_t; + +event_t* event_pool_acquire(event_pool_t* pool); +void event_pool_release(event_pool_t* pool, event_t* event); + +// Usage +event_pool_t pool = {0}; +for (int i = 0; i < 1000; i++) { + event_t* e = event_pool_acquire(&pool); + process_event(e); + event_pool_release(&pool, e); +} +``` + +## Regression Prevention + +### Before Any Refactoring +1. Ensure all existing tests pass +2. Run valgrind (no leaks in current code) +3. Measure memory footprint baseline +4. Document current behavior + +### During Refactoring +1. Make one logical change at a time +2. Run tests after EVERY change +3. Use git to create checkpoint commits +4. Monitor memory usage + +### After Refactoring +1. All tests still pass +2. No new memory leaks (valgrind) +3. Memory footprint same or better +4. No new compiler warnings +5. Static analysis clean +6. Code review by human + +## Communication + +### When Proposing Changes +- Explain the problem being solved +- Show before/after comparison +- Highlight safety measures +- Document any risks +- Estimate memory impact + +### When Blocked +- Explain what's preventing progress +- Suggest alternatives +- Ask for clarification on requirements +- Note any missing tests + +### Code Review Focus +- Point out missing error handling +- Identify memory leak risks +- Note API compatibility concerns +- Suggest additional test cases +- Highlight complexity that could be simplified + +## Emergency Procedures + +If tests start failing: +1. **STOP** immediately +2. Review the last change +3. Use git diff to see what changed +4. Revert if cause isn't obvious +5. Fix the issue before continuing + +If memory leaks detected: +1. **STOP** the refactoring +2. Run valgrind to identify leak +3. Fix the leak +4. Verify fix with valgrind +5. Resume refactoring + +If API breaks: +1. **REVERT** the breaking change +2. Find alternative approach +3. Use wrapper functions if needed +4. Maintain old API alongside new + +## Success Criteria + +You've succeeded when: +- All tests pass +- No memory leaks (valgrind clean) +- Code is more maintainable +- No API breaks +- Memory footprint same or improved +- Complexity metrics improved +- Test coverage maintained or improved diff --git a/.github/instructions/build-system.instructions.md b/.github/instructions/build-system.instructions.md new file mode 100644 index 00000000..05ace306 --- /dev/null +++ b/.github/instructions/build-system.instructions.md @@ -0,0 +1,137 @@ +--- +applyTo: "**/Makefile.am,**/configure.ac,**/*.ac,**/*.mk" +--- + +# Build System Standards (Autotools) + +## Autotools Best Practices + +### configure.ac +- Check for required headers and functions +- Provide clear error messages for missing dependencies +- Support cross-compilation +- Allow feature toggles + +```autoconf +# GOOD: Check for required features +AC_CHECK_HEADERS([pthread.h], [], + [AC_MSG_ERROR([pthread.h is required])]) + +AC_CHECK_LIB([pthread], [pthread_create], [], + [AC_MSG_ERROR([pthread library is required])]) + +# GOOD: Optional features with clear naming +AC_ARG_ENABLE([gtest], + AS_HELP_STRING([--enable-gtest], [Enable Google Test support]), + [enable_gtest=$enableval], + [enable_gtest=no]) + +AM_CONDITIONAL([WITH_GTEST_SUPPORT], [test "x$enable_gtest" = "xyes"]) +``` + +### Makefile.am +- Use non-recursive makefiles when possible +- Minimize intermediate libraries +- Support parallel builds +- Link only what's needed + +```makefile +# GOOD: Minimal linking +bin_PROGRAMS = telemetry2_0 + +telemetry2_0_SOURCES = telemetry2_0.c +telemetry2_0_CFLAGS = -DFEATURE_SUPPORT_RDKLOG +telemetry2_0_LDADD = \ + $(top_builddir)/source/utils/libutils.la \ + $(top_builddir)/source/bulkdata/libbulkdata.la \ + -lpthread + +# GOOD: Conditional compilation +if WITH_GTEST_SUPPORT +SUBDIRS += test +endif +``` + +## Cross-Compilation Support + +### Platform Detection +```autoconf +# Support different target platforms +case "$host" in + *-linux*) + AC_DEFINE([PLATFORM_LINUX], [1], [Linux platform]) + ;; + *-arm*) + AC_DEFINE([PLATFORM_ARM], [1], [ARM platform]) + ;; +esac +``` + +### Compiler Flags +```makefile +# Platform-specific optimizations +if TARGET_ARM +AM_CFLAGS += -march=armv7-a -mfpu=neon +endif + +# Debug vs Release +if DEBUG_BUILD +AM_CFLAGS += -g -O0 -DDEBUG +else +AM_CFLAGS += -O2 -DNDEBUG +endif +``` + +## Dependency Management + +### Package Config +```autoconf +# Use pkg-config for external dependencies +PKG_CHECK_MODULES([DBUS], [dbus-1 >= 1.6]) +AC_SUBST([DBUS_CFLAGS]) +AC_SUBST([DBUS_LIBS]) +``` + +### Header Organization +```makefile +# Include paths +AM_CPPFLAGS = -I$(top_srcdir)/include \ + -I$(top_srcdir)/source/utils \ + -I$(top_srcdir)/source/bulkdata \ + $(DBUS_CFLAGS) +``` + +## Build Performance + +### Parallel Builds +- Support `make -j` +- Avoid circular dependencies +- Use order-only prerequisites when appropriate + +### Incremental Builds +- Proper dependency tracking +- Don't force full rebuilds unless necessary +- Use libtool for shared libraries + +## Testing Integration + +```makefile +# Test targets +check-local: + @echo "Running memory leak tests..." + @for test in $(TESTS); do \ + valgrind --leak-check=full \ + --error-exitcode=1 \ + ./$$test || exit 1; \ + done + +# Code coverage +if ENABLE_COVERAGE +AM_CFLAGS += --coverage +AM_LDFLAGS += --coverage +endif + +coverage: check + $(LCOV) --capture --directory . --output-file coverage.info + $(GENHTML) coverage.info --output-directory coverage +``` diff --git a/.github/instructions/c-embedded.instructions.md b/.github/instructions/c-embedded.instructions.md new file mode 100644 index 00000000..16b1eef5 --- /dev/null +++ b/.github/instructions/c-embedded.instructions.md @@ -0,0 +1,693 @@ +--- +applyTo: "**/*.c,**/*.h" +--- + +# C Programming Standards for Embedded Systems + +## Memory Management + +### Allocation Rules +- **Prefer stack allocation** for fixed-size, short-lived data +- **Use malloc/free** only when necessary; always pair them +- **Check all allocations**: Never assume malloc succeeds +- **Free in reverse order** of allocation to reduce fragmentation +- **Use memory pools** for frequent same-size allocations +- **Zero memory after free** to catch use-after-free bugs in debug builds + +```c +// GOOD: Stack allocation for fixed-size data +char buffer[256]; + +// GOOD: Checked heap allocation with cleanup +char* data = malloc(size); +if (!data) { + log_error("Failed to allocate %zu bytes", size); + return ERR_NO_MEMORY; +} +// ... use data ... +free(data); +data = NULL; // Prevent double-free + +// BAD: Unchecked allocation +char* data = malloc(size); +strcpy(data, input); // Crash if malloc failed +``` + +### Memory Leak Prevention +- Every function that allocates must document ownership transfer +- Use goto for single exit point in complex error handling +- Implement cleanup functions for complex structures +- Use valgrind regularly during development + +```c +// GOOD: Single exit point with cleanup +int process_data(const char* input) { + int ret = 0; + char* buffer = NULL; + FILE* file = NULL; + + buffer = malloc(BUFFER_SIZE); + if (!buffer) { + ret = ERR_NO_MEMORY; + goto cleanup; + } + + file = fopen(input, "r"); + if (!file) { + ret = ERR_FILE_OPEN; + goto cleanup; + } + + // ... processing ... + +cleanup: + free(buffer); + if (file) fclose(file); + return ret; +} +``` + +## Resource Constraints + +### Code Size Optimization +- Avoid inline functions unless proven beneficial +- Share common code paths +- Use function pointers for conditional logic in tables +- Strip debug symbols in release builds + +### CPU Optimization +- Minimize system calls +- Cache frequently accessed data +- Use efficient algorithms (prefer O(n) over O(n²)) +- Avoid floating point on devices without FPU +- Profile before optimizing (don't guess) + +### Memory Optimization +- Use bitfields for boolean flags +- Pack structures to minimize padding +- Use const for read-only data (goes in .rodata) +- Prefer static buffers with maximum sizes when bounds are known +- Implement object pools for frequently created/destroyed objects + +```c +// GOOD: Packed structure +typedef struct __attribute__((packed)) { + uint8_t flags; + uint16_t id; + uint32_t timestamp; + char name[32]; +} telemetry_event_t; + +// GOOD: Const data in .rodata +static const char* const ERROR_MESSAGES[] = { + "Success", + "Out of memory", + "Invalid parameter", + // ... +}; +``` + +## Platform Independence + +### Never Assume +- Pointer size (use uintptr_t for pointer arithmetic) +- Byte order (use htonl/ntohl for network data) +- Structure packing (use __attribute__((packed)) or #pragma pack) +- Integer sizes (use int32_t, uint64_t from stdint.h) +- Boolean type (use stdbool.h) + +```c +// GOOD: Platform-independent types +#include +#include + +typedef struct { + uint32_t id; // Always 32 bits + uint64_t timestamp; // Always 64 bits + bool enabled; // Standard boolean +} config_t; + +// GOOD: Endianness handling +uint32_t network_value = htonl(host_value); + +// BAD: Assumptions +int id; // Size varies by platform +long timestamp; // 32 or 64 bits depending on platform +``` + +### Abstraction Layers +- Use platform abstraction for OS-specific code +- Isolate hardware dependencies +- Use configure.ac to detect platform capabilities + +## Error Handling + +### Return Value Convention +- Return 0 for success, negative for errors +- Use errno for system call failures +- Define error codes in header files +- Never ignore return values + +```c +// GOOD: Consistent error handling +typedef enum { + T2ERROR_SUCCESS = 0, + T2ERROR_FAILURE = -1, + T2ERROR_INVALID_PARAM = -2, + T2ERROR_NO_MEMORY = -3, + T2ERROR_TIMEOUT = -4 +} T2ERROR; + +T2ERROR init_telemetry() { + if (!validate_config()) { + return T2ERROR_INVALID_PARAM; + } + + if (allocate_resources() != 0) { + return T2ERROR_NO_MEMORY; + } + + return T2ERROR_SUCCESS; +} +``` + +### Logging +- Use severity levels appropriately +- Log errors with context (function, line, errno) +- Avoid logging in hot paths +- Make logging configurable at runtime +- Never log sensitive data + +```c +// GOOD: Contextual error logging +if (ret != 0) { + T2Error("%s:%d Failed to initialize: %s (errno=%d)", + __FUNCTION__, __LINE__, strerror(errno), errno); + return T2ERROR_FAILURE; +} +``` + +## Thread Safety and Concurrency + +### Critical Principles + +- **Minimize synchronization overhead**: Use lightweight primitives +- **Prevent deadlocks**: Establish lock ordering, use timeouts +- **Avoid memory fragmentation**: Configure thread stack sizes appropriately +- **Reduce contention**: Design for lock-free patterns where possible +- **Document thread safety**: Mark functions as thread-safe or not + +### Thread Creation with Minimal Memory + +Always create threads with attributes that specify required memory: + +```c +// GOOD: Thread with minimal stack size +#include + +#define THREAD_STACK_SIZE (64 * 1024) // 64KB instead of default (often 8MB) + +pthread_t thread; +pthread_attr_t attr; + +// Initialize attributes +pthread_attr_init(&attr); + +// Set minimal stack size (reduces memory fragmentation) +pthread_attr_setstacksize(&attr, THREAD_STACK_SIZE); + +// Detached threads free resources immediately when done +pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); + +// Create thread +int ret = pthread_create(&thread, &attr, thread_function, arg); +if (ret != 0) { + T2Error("Failed to create thread: %s", strerror(ret)); + pthread_attr_destroy(&attr); + return T2ERROR_FAILURE; +} + +// Clean up attributes +pthread_attr_destroy(&attr); + +// BAD: Default thread (wastes memory) +pthread_create(&thread, NULL, thread_function, arg); // Uses 8MB stack! +``` + +### Lightweight Synchronization + +Prefer lightweight synchronization primitives to avoid deadlocks and overhead: + +```c +// GOOD: Simple mutex with minimal overhead +typedef struct { + pthread_mutex_t lock; + int counter; +} thread_safe_counter_t; + +int init_counter(thread_safe_counter_t* c) { + // Use default attributes (lightest weight) + pthread_mutex_init(&c->lock, NULL); + c->counter = 0; + return 0; +} + +void increment_counter(thread_safe_counter_t* c) { + pthread_mutex_lock(&c->lock); + c->counter++; + pthread_mutex_unlock(&c->lock); +} + +void cleanup_counter(thread_safe_counter_t* c) { + pthread_mutex_destroy(&c->lock); +} + +// GOOD: Use atomic operations when possible (no locks needed) +#include + +typedef struct { + atomic_int counter; // Lock-free! +} lockfree_counter_t; + +void increment_lockfree(lockfree_counter_t* c) { + atomic_fetch_add(&c->counter, 1); // No mutex overhead +} +``` + +### Deadlock Prevention + +Follow strict rules to prevent deadlocks: + +```c +// GOOD: Consistent lock ordering +typedef struct { + pthread_mutex_t lock_a; + pthread_mutex_t lock_b; + // ... data ... +} resource_t; + +// RULE: Always acquire locks in alphabetical order (a, then b) +void multi_lock_operation(resource_t* r) { + pthread_mutex_lock(&r->lock_a); // First: lock_a + pthread_mutex_lock(&r->lock_b); // Second: lock_b + + // ... critical section ... + + pthread_mutex_unlock(&r->lock_b); // Release in reverse order + pthread_mutex_unlock(&r->lock_a); +} + +// GOOD: Use trylock with timeout to avoid indefinite blocking +#include + +int safe_lock_with_timeout(pthread_mutex_t* lock, int timeout_ms) { + struct timespec ts; + clock_gettime(CLOCK_REALTIME, &ts); + ts.tv_sec += timeout_ms / 1000; + ts.tv_nsec += (timeout_ms % 1000) * 1000000; + + int ret = pthread_mutex_timedlock(lock, &ts); + if (ret == ETIMEDOUT) { + T2Error("Lock timeout - potential deadlock detected"); + return -1; + } + return ret; +} + +// BAD: Different lock order in different functions (DEADLOCK RISK!) +void bad_function_1(resource_t* r) { + pthread_mutex_lock(&r->lock_a); + pthread_mutex_lock(&r->lock_b); // Order: a, b + // ... +} + +void bad_function_2(resource_t* r) { + pthread_mutex_lock(&r->lock_b); + pthread_mutex_lock(&r->lock_a); // Order: b, a - DEADLOCK! + // ... +} +``` + +### Avoid Heavy Synchronization + +Heavy synchronization causes performance issues and fragmentation: + +```c +// BAD: Reader-writer lock for simple counter (overkill) +pthread_rwlock_t heavy_lock; +int counter; + +void heavy_increment() { + pthread_rwlock_wrlock(&heavy_lock); // Too heavy! + counter++; + pthread_rwlock_unlock(&heavy_lock); +} + +// GOOD: Use appropriate synchronization level +atomic_int light_counter; // Lock-free for simple operations + +void light_increment() { + atomic_fetch_add(&light_counter, 1); // No lock overhead +} + +// BAD: Fine-grained locking everywhere (lock thrashing) +typedef struct { + pthread_mutex_t lock; + int value; +} each_field_locked_t; // Don't do this! + +// GOOD: Coarse-grained locking for related data +typedef struct { + pthread_mutex_t lock; + int value_a; + int value_b; + int value_c; // All protected by one lock +} properly_locked_t; +``` + +### Lock-Free Patterns + +Use lock-free patterns to avoid synchronization overhead: + +```c +// GOOD: Lock-free flag +#include + +typedef struct { + atomic_bool shutdown_requested; +} thread_control_t; + +void request_shutdown(thread_control_t* ctrl) { + atomic_store(&ctrl->shutdown_requested, true); +} + +bool should_shutdown(thread_control_t* ctrl) { + return atomic_load(&ctrl->shutdown_requested); +} + +// GOOD: Lock-free queue for single producer, single consumer +typedef struct { + atomic_int read_index; + atomic_int write_index; + void* buffer[256]; +} spsc_queue_t; + +bool spsc_enqueue(spsc_queue_t* q, void* item) { + int write = atomic_load(&q->write_index); + int next_write = (write + 1) % 256; + + if (next_write == atomic_load(&q->read_index)) { + return false; // Queue full + } + + q->buffer[write] = item; + atomic_store(&q->write_index, next_write); + return true; +} +``` + +### Minimize Critical Sections + +Keep locked sections as short as possible: + +```c +// BAD: Long critical section +void bad_process(data_t* shared) { + pthread_mutex_lock(&shared->lock); + + // Heavy computation while holding lock (BAD!) + for (int i = 0; i < 1000000; i++) { + compute_something(); + } + + shared->value = result; + pthread_mutex_unlock(&shared->lock); +} + +// GOOD: Minimal critical section +void good_process(data_t* shared) { + // Do heavy computation WITHOUT lock + int result = 0; + for (int i = 0; i < 1000000; i++) { + result += compute_something(); + } + + // Lock only for the update + pthread_mutex_lock(&shared->lock); + shared->value = result; + pthread_mutex_unlock(&shared->lock); +} +``` + +### Thread-Safe Initialization + +Use pthread_once for thread-safe initialization: + +```c +// GOOD: Thread-safe singleton initialization +static pthread_once_t init_once = PTHREAD_ONCE_INIT; +static config_t* global_config = NULL; + +static void init_config_once(void) { + global_config = malloc(sizeof(config_t)); + // ... initialize config ... +} + +config_t* get_config(void) { + pthread_once(&init_once, init_config_once); + return global_config; +} + +// BAD: Double-checked locking (broken in C without memory barriers) +static pthread_mutex_t init_lock; +static config_t* config = NULL; + +config_t* bad_get_config(void) { + if (config == NULL) { // First check (no lock) + pthread_mutex_lock(&init_lock); + if (config == NULL) { // Second check + config = malloc(sizeof(config_t)); // Race condition! + } + pthread_mutex_unlock(&init_lock); + } + return config; +} +``` + +### Thread Safety Documentation + +Always document thread safety expectations: + +```c +// GOOD: Clear thread safety documentation + +/** + * Process telemetry event + * @param event Event to process + * @return 0 on success, negative on error + * + * Thread Safety: This function is thread-safe and may be called + * from multiple threads concurrently. + */ +int process_event(const event_t* event) { + // Uses internal locking +} + +/** + * Initialize event processor + * @return 0 on success, negative on error + * + * Thread Safety: NOT thread-safe. Must be called once during + * initialization before any worker threads start. + */ +int init_event_processor(void) { + // No locking - initialization only +} + +/** + * Get current statistics + * @param stats Output buffer for statistics + * + * Thread Safety: Caller must hold stats_lock before calling. + * Use get_stats_safe() for automatic locking. + */ +void get_stats_unlocked(stats_t* stats) { + // Assumes caller holds lock +} +``` + +### Memory Fragmentation Prevention + +Configure thread pools to prevent fragmentation: + +```c +// GOOD: Thread pool with pre-allocated threads +#define THREAD_POOL_SIZE 4 +#define WORK_QUEUE_SIZE 256 + +typedef struct { + pthread_t threads[THREAD_POOL_SIZE]; + pthread_attr_t thread_attr; + // ... work queue ... +} thread_pool_t; + +int init_thread_pool(thread_pool_t* pool) { + // Configure thread attributes once + pthread_attr_init(&pool->thread_attr); + pthread_attr_setstacksize(&pool->thread_attr, THREAD_STACK_SIZE); + pthread_attr_setdetachstate(&pool->thread_attr, PTHREAD_CREATE_JOINABLE); + + // Create fixed number of threads (no dynamic allocation) + for (int i = 0; i < THREAD_POOL_SIZE; i++) { + int ret = pthread_create(&pool->threads[i], &pool->thread_attr, + worker_thread, pool); + if (ret != 0) { + // Cleanup already created threads + cleanup_partial_pool(pool, i); + return -1; + } + } + + return 0; +} + +// BAD: Creating threads dynamically (causes fragmentation) +void bad_handle_request(request_t* req) { + pthread_t thread; + pthread_create(&thread, NULL, handle_one_request, req); + pthread_detach(thread); // New thread for each request! +} +``` + +### Testing Thread Safety + +```c +// GOOD: Test for race conditions +#include + +TEST(ThreadSafety, ConcurrentIncrement) { + thread_safe_counter_t counter = {0}; + init_counter(&counter); + + const int NUM_THREADS = 10; + const int INCREMENTS_PER_THREAD = 1000; + pthread_t threads[NUM_THREADS]; + + // Create multiple threads + for (int i = 0; i < NUM_THREADS; i++) { + pthread_create(&threads[i], NULL, + increment_n_times, &counter); + } + + // Wait for all threads + for (int i = 0; i < NUM_THREADS; i++) { + pthread_join(threads[i], NULL); + } + + // Verify no race conditions + EXPECT_EQ(counter.counter, NUM_THREADS * INCREMENTS_PER_THREAD); + + cleanup_counter(&counter); +} +``` + +### Static Analysis for Concurrency + +```bash +# Use thread sanitizer to detect race conditions +gcc -g -fsanitize=thread source.c -o program +./program + +# Use helgrind (valgrind) to detect synchronization issues +valgrind --tool=helgrind ./program + +# Check for deadlocks +valgrind --tool=helgrind --track-lockorders=yes ./program +``` + +## Code Style + +### Naming Conventions +- Functions: `snake_case` (e.g., `init_telemetry`) +- Types: `snake_case_t` (e.g., `telemetry_event_t`) +- Macros/Constants: `UPPER_SNAKE_CASE` (e.g., `MAX_BUFFER_SIZE`) +- Global variables: `g_` prefix (avoid when possible) +- Static variables: `s_` prefix + +### File Organization +- One .c file per module +- Corresponding .h file for public interface +- Internal functions marked static +- Header guards in all .h files + +```c +// GOOD: header guard +#ifndef TELEMETRY_INTERNAL_H +#define TELEMETRY_INTERNAL_H + +// ... declarations ... + +#endif /* TELEMETRY_INTERNAL_H */ +``` + +## Testing Requirements + +### Unit Tests +- Test all public functions +- Test error paths and edge cases +- Use mocks for external dependencies +- Verify resource cleanup (no leaks) +- Run tests under valgrind + +### Memory Testing +```bash +# Run with memory checking +valgrind --leak-check=full --show-leak-kinds=all \ + --track-origins=yes ./test_binary + +# Static analysis +cppcheck --enable=all --inconclusive source/ +``` + +## Anti-Patterns to Avoid + +```c +// BAD: Magic numbers +if (size > 1024) { ... } + +// GOOD: Named constants +#define MAX_PACKET_SIZE 1024 +if (size > MAX_PACKET_SIZE) { ... } + +// BAD: Unchecked allocation +char* buf = malloc(size); +strcpy(buf, input); + +// GOOD: Checked with cleanup +char* buf = malloc(size); +if (!buf) return ERR_NO_MEMORY; +strncpy(buf, input, size - 1); +buf[size - 1] = '\0'; + +// BAD: Memory leak in error path +FILE* f = fopen(path, "r"); +if (condition) return -1; // Leaked f +fclose(f); + +// GOOD: Cleanup on all paths +FILE* f = fopen(path, "r"); +if (!f) return -1; +if (condition) { + fclose(f); + return -1; +} +fclose(f); +return 0; +``` + +## References + +- Project follows RDK coding standards +- See telemetry2_0.h for API documentation +- Review existing code in source/ for patterns +- Check test/ directory for testing examples diff --git a/.github/instructions/cpp-testing.instructions.md b/.github/instructions/cpp-testing.instructions.md new file mode 100644 index 00000000..a9ba5d95 --- /dev/null +++ b/.github/instructions/cpp-testing.instructions.md @@ -0,0 +1,183 @@ +--- +applyTo: "source/test/**/*.cpp,source/test/**/*.h" +--- + +# C++ Testing Standards (Google Test) + +## Test Framework + +Use Google Test (gtest) and Google Mock (gmock) for all C++ test code. + +## Test Organization + +### File Structure +- One test file per source file: `foo.c` → `test/FooTest.cpp` +- Test fixtures for complex setups +- Mocks in separate files when reusable + +```cpp +// GOOD: Test file structure +// filepath: source/test/utils/UtilsTest.cpp + +extern "C" { +#include +#include +} + +#include +#include + +class UtilsTest : public ::testing::Test { +protected: + void SetUp() override { + // Initialize test resources + } + + void TearDown() override { + // Clean up test resources + } +}; + +TEST_F(UtilsTest, VectorCreateAndDestroy) { + Vector* vec = Vector_Create(); + ASSERT_NE(nullptr, vec); + + Vector_Destroy(vec, nullptr); + // No memory leaks! +} +``` + +## Testing Patterns + +### Test C Code from C++ +- Wrap C headers in `extern "C"` blocks +- Use RAII in tests for automatic cleanup +- Mock C functions using gmock when needed + +```cpp +extern "C" { +#include "telemetry2_0.h" +} + +#include + +class TelemetryTest : public ::testing::Test { +protected: + void SetUp() override { + // Initialize telemetry + t2_init("test_component"); + } + + void TearDown() override { + // Clean up + t2_uninit(); + } +}; + +TEST_F(TelemetryTest, EventSubmission) { + T2ERROR result = t2_event_s("TEST_EVENT", "test_value"); + EXPECT_EQ(T2ERROR_SUCCESS, result); +} +``` + +### Memory Leak Testing +- All tests must pass valgrind +- Use RAII wrappers for C resources +- Verify cleanup in TearDown + +```cpp +// GOOD: RAII wrapper for C resource +class FileHandle { + FILE* file_; +public: + explicit FileHandle(const char* path, const char* mode) + : file_(fopen(path, mode)) {} + + ~FileHandle() { + if (file_) fclose(file_); + } + + FILE* get() const { return file_; } + bool valid() const { return file_ != nullptr; } +}; + +TEST(FileTest, ReadConfig) { + FileHandle file("/tmp/config.json", "r"); + ASSERT_TRUE(file.valid()); + // file automatically closed when test exits +} +``` + +### Mocking External Dependencies + +```cpp +// GOOD: Mock for system calls +class MockScheduler { +public: + MOCK_METHOD(T2ERROR, registerProfile, + (const char*, unsigned int, unsigned int, + bool, bool, bool, unsigned int, char*)); + MOCK_METHOD(T2ERROR, unregisterProfile, (const char*)); +}; + +TEST(SchedulerTest, ProfileRegistration) { + MockScheduler mock; + + EXPECT_CALL(mock, registerProfile( + "profile1", 300, 0, false, true, false, 0, nullptr)) + .WillOnce(testing::Return(T2ERROR_SUCCESS)); + + T2ERROR result = mock.registerProfile( + "profile1", 300, 0, false, true, false, 0, nullptr); + + EXPECT_EQ(T2ERROR_SUCCESS, result); +} +``` + +## Test Quality Standards + +### Coverage Requirements +- All public functions must have tests +- Test both success and failure paths +- Test boundary conditions +- Test error handling + +### Test Naming +```cpp +// Pattern: TEST(ComponentName, BehaviorBeingTested) + +TEST(Vector, CreateReturnsNonNull) { ... } +TEST(Vector, DestroyHandlesNull) { ... } +TEST(Vector, PushIncrementsSize) { ... } +TEST(Utils, ParseConfigInvalidJson) { ... } +``` + +### Assertions +- Use `ASSERT_*` when test can't continue after failure +- Use `EXPECT_*` when subsequent checks are still valuable +- Provide helpful failure messages + +```cpp +// GOOD: Informative assertions +ASSERT_NE(nullptr, ptr) << "Failed to allocate " << size << " bytes"; +EXPECT_EQ(expected, actual) << "Mismatch at index " << i; +EXPECT_TRUE(condition) << "Context: " << debug_info; +``` + +## Running Tests + +### Build Tests +```bash +./configure --enable-gtest +make check +``` + +### Memory Checking +```bash +valgrind --leak-check=full --show-leak-kinds=all \ + ./source/test/telemetry_gtest_report +``` + +### Test Output +- Use `GTEST_OUTPUT=xml:results.xml` for CI integration +- Check return code: 0 = all passed diff --git a/.github/instructions/shell-scripts.instructions.md b/.github/instructions/shell-scripts.instructions.md new file mode 100644 index 00000000..a25a2c69 --- /dev/null +++ b/.github/instructions/shell-scripts.instructions.md @@ -0,0 +1,179 @@ +--- +applyTo: "**/*.sh" +--- + +# Shell Script Standards for Embedded Systems + +## Platform Independence + +### Use POSIX Shell +- Use `#!/bin/sh` not `#!/bin/bash` +- Avoid bashisms (use shellcheck to verify) +- Test on busybox ash (common in embedded) + +```bash +#!/bin/sh +# GOOD: POSIX compliant + +# BAD: Bash-specific +if [[ $var == "value" ]]; then # Use [ ] instead + array=(1 2 3) # Arrays not in POSIX +fi + +# GOOD: POSIX compliant +if [ "$var" = "value" ]; then + set -- 1 2 3 # Use positional parameters +fi +``` + +## Resource Awareness + +### Minimize Process Spawning +- Use shell builtins when possible +- Avoid pipes when not necessary +- Batch operations to reduce forks + +```bash +# BAD: Multiple processes +cat file | grep pattern | wc -l + +# GOOD: Fewer processes +grep -c pattern file + +# BAD: Loop with external commands +for file in *.txt; do + cat "$file" >> output +done + +# GOOD: Single cat invocation +cat *.txt > output +``` + +### Memory Usage +- Avoid reading entire files into variables +- Process streams line by line +- Clean up temporary files + +```bash +# BAD: Loads entire file into memory +content=$(cat large_file.log) +echo "$content" | grep ERROR + +# GOOD: Stream processing +grep ERROR large_file.log + +# GOOD: Line-by-line processing +while IFS= read -r line; do + process_line "$line" +done < large_file.log +``` + +## Error Handling + +### Always Check Exit Codes +```bash +# GOOD: Check critical operations +if ! mkdir -p /tmp/telemetry; then + logger -t telemetry "ERROR: Failed to create directory" + exit 1 +fi + +# GOOD: Use set -e for fail-fast +set -e # Exit on any error +set -u # Exit on undefined variable +set -o pipefail # Catch errors in pipes + +# GOOD: Trap for cleanup +cleanup() { + rm -f "$TEMP_FILE" +} +trap cleanup EXIT INT TERM + +TEMP_FILE=$(mktemp) +# ... use temp file ... +# cleanup happens automatically +``` + +## Script Quality + +### Defensive Programming +```bash +# GOOD: Quote all variables +rm -f "$file_path" # Not: rm -f $file_path + +# GOOD: Use -- to separate options from arguments +grep -r -- "$pattern" "$directory" + +# GOOD: Check variable is set +: "${CONFIG_FILE:?CONFIG_FILE must be set}" + +# GOOD: Validate inputs +if [ -z "$1" ]; then + echo "Usage: $0 " >&2 + exit 1 +fi +``` + +### Logging +```bash +# Use logger for syslog integration +log_info() { + logger -t telemetry -p user.info "$*" +} + +log_error() { + logger -t telemetry -p user.error "$*" + echo "ERROR: $*" >&2 +} + +# Usage +log_info "Starting telemetry collection" +if ! start_service; then + log_error "Failed to start service" + exit 1 +fi +``` + +## Testing Scripts + +### Use shellcheck +```bash +# Run shellcheck on all scripts +shellcheck script.sh + +# In CI +find . -name "*.sh" -exec shellcheck {} + +``` + +### Test on Target Platform +- Test on actual embedded device or emulator +- Verify with busybox tools +- Check resource usage (memory, CPU) + +## Anti-Patterns + +```bash +# BAD: Unquoted variables +for file in $FILES; do # Word splitting! + +# GOOD: Quoted +for file in "$FILES"; do + +# BAD: Parsing ls output +for file in $(ls *.txt); do + +# GOOD: Use glob +for file in *.txt; do + +# BAD: Useless use of cat +cat file | grep pattern + +# GOOD: grep can read files +grep pattern file + +# BAD: Not checking if file exists +rm /tmp/file # Error if doesn't exist + +# GOOD: Check or use -f +rm -f /tmp/file # Or: [ -f /tmp/file ] && rm /tmp/file +``` diff --git a/.github/skills/code-review/README.md b/.github/skills/code-review/README.md new file mode 100644 index 00000000..ec6955f4 --- /dev/null +++ b/.github/skills/code-review/README.md @@ -0,0 +1,181 @@ +# Code Review Skill + +Comprehensive code review assistant for embedded C/C++ pull requests in the Telemetry 2.0 framework. + +## Quick Start + +``` +@workspace /code-review https://github.com/rdkcentral/telemetry/pull/123 +``` + +or + +``` +@workspace /code-review #123 +``` + +## What It Does + +The code review skill analyzes GitHub pull requests and generates a comprehensive `REVIEW.md` file that includes: + +- **Coverity defect integration** - Extracts and prioritizes static analysis findings from PR comments +- **Visual diff summary** with Mermaid diagrams +- **Module-by-module impact analysis** +- **Memory safety assessment** (leaks, use-after-free, buffer overflows) +- **Thread safety analysis** (race conditions, deadlocks) +- **Regression risk scoring** (LOW, MEDIUM, HIGH, CRITICAL) +- **Actionable recommendations** with line references +- **Testing coverage gaps** identification +- **Cross-cutting concerns** (build, config, docs) + +## Features + +### Coverity Integration +Automatically detects and incorporates Coverity static analysis results: +- Scans PR comments from **rdkcmf-jenkins** user +- Identifies comments with **"Coverity Issue"** title prefix +- Extracts defect type, severity, file/line locations +- Integrates findings into risk assessment and recommendations +- Prioritizes HIGH severity defects as blocking issues + +## Output Structure + +``` +reviews/PR--REVIEW.md +``` + +The generated review includes: +1. **Executive Summary** - 2-3 sentence overview with risk level +2. **Coverity Analysis** - Static analysis defects from PR comments (if present) +3. **Change Visualization** - Mermaid diagram with change indicators +4. **Module Analysis** - Memory/thread safety per module +5. **Regression Risks** - Specific concerns with file:line references +6. **Recommendations** - Prioritized action items +7. **Checklist** - Verification tasks before merge + +## Example Output + +```markdown +# Code Review: Add profile reload support + +## Overview +- PR: #123 +- Files Changed: 8 files, +245/-112 lines +- Risk Level: MEDIUM ⚠️ + +## Executive Summary +Adds dynamic profile reload capability via rbus method. Changes touch scheduler +and profile management with new mutex. Memory safety looks good but potential +race condition in reload path needs attention. + +## Changes by Module + +```mermaid +graph TD + root["📁 telemetry"] + root --> source["📁 source"] + root --> test["📁 test"] + + source --> bulkdata["📁 bulkdata"] + source --> scheduler["📁 scheduler"] + + bulkdata --> profile["📄 profile.c (+85/-42) ⚠️"] + bulkdata --> reportprof["📄 reportprofiles.c (+12/-5)"] + + scheduler --> schedulerc["📄 scheduler.c (+95/-48) 🔴"] + + test --> profiletest["📄 profile_test.cpp (+53/-17) ✅"] + + classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff + classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 + classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff + classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 + + class schedulerc critical + class profile warning + class profiletest safe + class reportprof neutral +``` + +## Detailed Analysis + +### Bulk Data Collection Module + +#### Files Modified +- [source/bulkdata/profile.c](source/bulkdata/profile.c#L145-L230) (+85/-42) +- [source/bulkdata/reportprofiles.c](source/bulkdata/reportprofiles.c#L88-L100) + +#### Key Changes +- New `reloadProfile()` function to update profile configuration +- Added mutex `g_profile_reload_mutex` for protection during reload +- Profile state validation before applying changes + +#### Impact Assessment + +**Memory Safety**: ✅ GOOD +- All malloc calls have NULL checks +- Error paths properly free resources +- Old profile data freed before replacing + +**Thread Safety**: ⚠️ CONCERN +- Line 167: `reloadProfile()` acquires `g_profile_reload_mutex` but also + calls `updateScheduler()` which acquires `g_scheduler_mutex` +- Potential deadlock if scheduler tries to access profile during reload +- **Recommendation**: Document lock ordering or refactor to single lock + +**API Compatibility**: ✅ MAINTAINED +- No changes to public function signatures +- New function is additive + +**Error Handling**: ✅ GOOD +- Schema validation before applying profile +- Rollback on partial failure +- Appropriate error logging + +#### Regression Risks +⚠️ **Race condition in profile access** ([profile.c:167](source/bulkdata/profile.c#L167)) + - TimeoutThread may access profile fields while reload is updating them + - Recommend: Hold profile mutex during entire reload operation + +... +``` + +## Integration with Other Skills + +The code review skill complements: +- **quality-checker**: Run static analysis and tests after review +- **memory-safety-analyzer**: Deep dive on specific memory issues +- **thread-safety-analyzer**: Detailed concurrency analysis +- **platform-portability-checker**: Cross-platform validation + +## Tips + +1. **Run early**: Review PRs before extensive commenting to guide discussion +2. **Focus on risk**: Pay special attention to HIGH/CRITICAL risk items +3. **Verify with tools**: Follow up with `/quality-checker` for validation +4. **Iterate**: Re-run as PR evolves to track risk changes +5. **Custom focus**: Add specific concerns like "focus on memory safety" + +## Reference Files + +The skill uses these references for analysis: +- [review-checklist.md](references/review-checklist.md) - Comprehensive review standards +- [memory-patterns.md](references/memory-patterns.md) - Memory safety patterns +- [thread-patterns.md](references/thread-patterns.md) - Thread safety patterns +- [common-pitfalls.md](references/common-pitfalls.md) - Known anti-patterns + +## Limitations + +- Requires GitHub access for remote PRs +- Heuristic-based risk assessment (not perfect) +- Cannot detect all semantic bugs +- Best for C/C++ embedded code +- Manual judgment still required for complex logic + +## Contributing + +To improve the skill: +1. Update [common-pitfalls.md](references/common-pitfalls.md) with new anti-patterns +2. Add project-specific patterns to reference files +3. Refine risk scoring algorithm in SKILL.md +4. Add test cases for skill validation diff --git a/.github/skills/code-review/SKILL.md b/.github/skills/code-review/SKILL.md new file mode 100644 index 00000000..aad14766 --- /dev/null +++ b/.github/skills/code-review/SKILL.md @@ -0,0 +1,403 @@ +--- +name: code-review +description: 'Analyze GitHub PR changes for embedded C/C++ code and generate comprehensive REVIEW.md with visual diffs, impact assessment, and regression risk analysis. Use when: reviewing pull requests, analyzing code changes, identifying functional regressions, assessing memory/thread safety impact, validating embedded systems changes.' +argument-hint: 'PR URL or number, with an optional focus filter (e.g., "#123", "https://github.com/rdkcentral/telemetry/pull/123", or "#123 focus on thread safety"). Supported focus filters: "focus on memory safety", "focus on thread safety", "focus on api compatibility", "focus on error handling".' +--- + +# Code Review for Embedded Systems + +## Purpose + +Generate a comprehensive `REVIEW.md` report for GitHub pull requests that helps senior engineers quickly understand changes, assess impact, and identify potential functional regressions in embedded C/C++ codebases. + +## Usage + +Invoke this skill when: +- Reviewing a pull request before merge +- Analyzing code changes for regression risk +- Understanding the scope and impact of modifications +- Validating memory safety, thread safety, or API compatibility +- Preparing for code review meetings +- Investigating functional issues introduced by recent changes + +**Invocation**: `@workspace /code-review [focus on ]` + +- `` — PR URL (e.g., `https://github.com/rdkcentral/telemetry/pull/123`) or short form (e.g., `#123`) +- `[focus on ]` *(optional)* — restrict analysis depth to one area; accepted values: `thread safety`, `memory safety`, `api compatibility`, `error handling` + +--- + +## Output: REVIEW.md Structure + +The skill generates a markdown report with the following sections: + +```markdown +# Code Review: [PR Title] + +## Overview +- PR: # +- Author: +- Files Changed: X files, +Y/-Z lines +- Risk Level: [LOW | MEDIUM | HIGH | CRITICAL] + +## Executive Summary +[2-3 sentence summary of changes and overall risk assessment] + +## Coverity Static Analysis (if applicable) +[Table of Coverity defects found in PR comments] + +## Changes by Module +[Visual tree showing impacted modules with change indicators] + +## Detailed Analysis + +### [Module 1] +#### Files Modified +- file1.c (+X/-Y) +- file2.h (+X/-Y) + +#### Key Changes +[Bulleted summary of functional changes] + +#### Impact Assessment +- **Memory Safety**: [Analysis] +- **Thread Safety**: [Analysis] +- **API Compatibility**: [Analysis] +- **Error Handling**: [Analysis] + +#### Regression Risks +⚠️ [Specific risks with line references] + +### [Module 2] +... + +## Cross-Cutting Concerns +- Build System Impact +- Configuration Changes +- Test Coverage Gaps +- Documentation Updates + +## Recommendations +1. [Priority action items] +2. [Suggested additional tests] +3. [Areas requiring closer inspection] + +## Checklist +- [ ] Memory leaks verified (valgrind) +- [ ] Thread safety validated +- [ ] API compatibility maintained +- [ ] Error paths tested +- [ ] Unit tests added/updated +- [ ] Integration tests pass +- [ ] Documentation updated +``` + +--- + +## Analysis Process + +### Step 1: Fetch PR Metadata + +Retrieve PR metadata using the GitHub CLI or API: + +```bash +# PR details (title, description, author, reviewers, CI status) +gh pr view --json number,title,body,author,reviewRequests,statusCheckRollup + +# File change list with line counts +gh pr view --json files + +# Existing review comments +gh pr view --json reviews,comments +``` + +If the GitHub CLI is unavailable, retrieve the same information via the GitHub REST API (`GET /repos/{owner}/{repo}/pulls/{pull_number}`, `/files`, `/comments`). + +**Check for Coverity Defects:** + +After fetching PR metadata, scan the comments for Coverity static analysis reports: +- Look for comments from user **"rdkcmf-jenkins"** +- Look for comments with titles starting with **"Coverity Issue"** +- Extract defect information: + - Defect type (e.g., DEADLOCK, RESOURCE_LEAK, ATOMICITY, USE_AFTER_FREE) + - File and line number + - Severity (High, Medium, Low) + - Checker description + +If Coverity defects are found: +1. Add a **"Coverity Analysis"** section to the REVIEW.md +2. List each defect with location and severity +3. Cross-reference these defects in module analysis +4. Flag defects as **MUST FIX** in recommendations if they are High severity +5. Include in risk assessment (higher safety score if critical defects present) + +Example Coverity section: +```markdown +## Coverity Static Analysis Results + +**Defects Found**: X issue(s) + +| Severity | Type | File:Line | Description | +|----------|------|-----------|-------------| +| HIGH | DEADLOCK | datamodel.c:120 | Lock ordering violation | +| MEDIUM | RESOURCE_LEAK | profile.c:456 | Memory not freed on error path | +``` + +### Step 2: Get PR Diff + +Retrieve the complete diff for analysis using standard tools available in the review environment: +- For a local checkout, use `git diff` against the target branch (for example, `git diff origin/main...HEAD`) +- For a GitHub pull request, use `gh pr diff ` +- If neither is available, obtain the unified diff from the PR page and review the changed files directly +- Parse diff hunks to identify: + - Added lines (new functionality) + - Removed lines (deleted code) + - Modified lines (behavior changes) + +### Step 3: Categorize Changes by Module + +Map changed files to architectural modules: + +| Pattern | Module | +|---------|--------| +| `source/bulkdata/*` | Bulk Data Collection | +| `source/scheduler/*` | Scheduling Engine | +| `source/reportgen/*` | Report Generation | +| `source/ccspinterface/*` | Bus Interface (CCSP/rbus) | +| `source/protocol/*` | Transport (HTTP/rbus) | +| `source/t2parser/*` | Profile Parser | +| `source/dcautil/*` | DCA Utilities | +| `source/utils/*` | Common Utilities | +| `source/privacycontrol/*` | Privacy Control | +| `source/test/*` | Test Infrastructure | +| `*.am`, `*.ac` | Build System | +| `config/*` | Configuration | +| `schemas/*` | JSON Schemas | + +### Step 4: Analyze Each Changed File + +For each file, apply domain-specific analysis using [reference checklists](./references/review-checklist.md): + +#### C Source Files (*.c) +1. **Memory Safety** (reference: [memory-patterns.md](./references/memory-patterns.md)) + - New allocations → verify corresponding free + - Pointer assignments → check NULL before dereference + - String operations → bounds checking + - Error paths → resource cleanup + +2. **Thread Safety** (reference: [thread-patterns.md](./references/thread-patterns.md)) + - Shared data access → mutex protection + - Lock acquisitions → deadlock potential + - Condition variables → proper usage pattern + - Thread creation → stack size specified + +3. **Error Handling** + - Return values checked + - Error codes meaningful + - Logging sufficient for debugging + - Failure modes handled + +4. **Resource Constraints** + - Stack vs heap allocation + - Memory footprint impact + - CPU impact (loops, algorithms) + +#### Header Files (*.h) +1. API changes → backward compatibility +2. Struct modifications → ABI compatibility +3. New functions → documentation complete +4. Constants/enums → semantic correctness + +#### Build Files (*.am, *.ac) +1. New dependencies → justified and minimal +2. Compiler flags → appropriate for embedded +3. Link order → correct +4. Conditional compilation → platform coverage + +#### Test Files (*.cpp, test/*) +1. Test coverage → adequate for changes +2. Mock usage → appropriate +3. Edge cases covered +4. Negative tests included + +### Step 5: Assess Regression Risk + +The risk level is determined **heuristically** based on the factors below. The weights are informational guidance for where to focus attention — there is no strict arithmetic formula. After evaluating each factor, assign the overall risk level that best reflects the combined picture. + +| Factor | Weight | Indicators | +|--------|--------|-----------| +| **Scope** | 30% | # files, # modules, LOC changed | +| **Criticality** | 25% | Core logic vs peripheral, production path | +| **Complexity** | 20% | Control flow changes, algorithm modifications | +| **Safety** | 15% | Memory/thread safety issues identified | +| **Testing** | 10% | Test coverage, CI status | + +**Risk Levels** (choose the highest level whose criteria are met): +- **LOW**: <10 files, single module, tests added, no safety concerns +- **MEDIUM**: 10-30 files, 2-3 modules, or minor safety concerns +- **HIGH**: >30 files, cross-module, or safety issues present +- **CRITICAL**: Core scheduler/bus/report logic, no tests, or confirmed safety issues + +> **Note:** When factors point to different levels, escalate to the higher level. The weights indicate relative importance, not a numeric formula — a single confirmed memory-safety issue can elevate an otherwise LOW-scope change to HIGH. + +### Step 6: Generate Visual Diff Summary + +Create Mermaid diagram showing changes: + +```mermaid +graph TD + root["📁 telemetry"] + root --> source["📁 source"] + root --> test["📁 test"] + root --> makefile["📄 Makefile.am (+2/-1)"] + + source --> bulkdata["📁 bulkdata"] + source --> scheduler["📁 scheduler"] + source --> ccsp["📁 ccspinterface"] + + bulkdata --> profile["📄 profile.c (+45/-12) ⚠️"] + bulkdata --> reportprof["📄 reportprofiles.c (+8/-3)"] + + scheduler --> schedulerc["📄 scheduler.c (+120/-80) 🔴"] + + ccsp --> rbus["📄 rbusInterface.c (+5/-2)"] + + test --> schedtest["📄 scheduler_test.cpp (+95/-0) ✅"] + + classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff + classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 + classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff + classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 + + class schedulerc critical + class profile warning + class schedtest safe + class reportprof,rbus,makefile neutral +``` + +**Legend:** +- 🔴 High regression risk (red) +- ⚠️ Safety concern flagged (yellow) +- ✅ Test coverage added (green) +- Neutral changes (gray) + +### Step 7: Cross-Reference with Project Context + +Load project-specific context: +1. [Review checklist](./references/review-checklist.md) - Project standards +2. [Common pitfalls](./references/common-pitfalls.md) - Known anti-patterns +3. Architecture docs (`docs/architecture/overview.md`) +4. Build instructions (`.github/instructions/*.instructions.md`) + +Check against: +- Coding standards (naming, style) +- Project architecture principles +- Known anti-patterns for this codebase +- Historical issues (if session memory available) + +### Step 8: Generate Recommendations + +Prioritize action items: + +1. **MUST FIX** (blocking issues) + - Memory leaks + - Race conditions + - API/ABI breakage + - Missing critical error handling + +2. **SHOULD FIX** (before merge) + - Test coverage gaps + - Missing documentation + - Non-optimal patterns + - Minor safety concerns + +3. **CONSIDER** (future improvements) + - Refactoring opportunities + - Performance optimizations + - Code duplication + +--- + +## Example Invocations + +### Review a PR by URL +``` +@workspace /code-review https://github.com/rdkcentral/telemetry/pull/42 +``` + +### Review a PR by number (assumes current repo) +``` +@workspace /code-review #42 +``` + +### Review with an optional focus filter + +Append `focus on ` to restrict the depth of analysis to one concern. +Accepted focus values: `thread safety`, `memory safety`, `api compatibility`, `error handling`. + +``` +@workspace /code-review #42 focus on thread safety +``` + +``` +@workspace /code-review #42 focus on memory safety +``` + +**Note**: The skill generates Mermaid diagrams for change visualization. GitHub, VS Code, and most modern markdown viewers render these automatically. + +--- + +## Quality Checks Integration + +After generating the REVIEW.md, suggest running quality checks: + +```bash +# Use the quality-checker skill for comprehensive validation +@workspace /quality-checker +``` + +This will run: +- Static analysis (cppcheck) +- Memory safety (valgrind) +- Thread safety (helgrind) +- Build verification +- Unit tests + +--- + +## Output Location + +The REVIEW.md file is generated in: +- **Active PR**: `reviews/PR--REVIEW.md` +- **Quick review**: `REVIEW.md` (workspace root) + +The active PR report is written under `reviews/`, which is git-ignored by default. The quick review output uses `REVIEW.md` at the workspace root; if you do not want to commit that file, add it to `.gitignore` or remove it after review. Any timestamp is recorded inside the report content rather than in the filename. + +--- + +## Limitations + +- Requires GitHub access for remote PRs +- Complex logic changes need manual inspection +- Cannot detect all semantic bugs +- Risk assessment is heuristic-based +- Integration test impact requires human judgment + +--- + +## Tips for Best Results + +1. **Provide context**: If the PR fixes a specific issue, mention it +2. **Focus areas**: Specify if you want deep-dive on specific aspects +3. **Compare branches**: For local changes, ensure proper git state +4. **Supplement with testing**: Use `/quality-checker` for validation +5. **Review iteratively**: Run skill multiple times as PR evolves + +--- + +## Related Skills + +- **quality-checker**: Run comprehensive quality checks +- **memory-safety-analyzer**: Deep dive on memory issues +- **thread-safety-analyzer**: Deep dive on concurrency +- **platform-portability-checker**: Validate cross-platform code diff --git a/.github/skills/code-review/references/common-pitfalls.md b/.github/skills/code-review/references/common-pitfalls.md new file mode 100644 index 00000000..858096a6 --- /dev/null +++ b/.github/skills/code-review/references/common-pitfalls.md @@ -0,0 +1,432 @@ +# Common Pitfalls in Telemetry 2.0 Codebase + +This document captures recurring anti-patterns and common mistakes found in code reviews for the Telemetry 2.0 embedded framework. + +## Memory Management + +### Pitfall 1: JSON String Leaks (cJSON) +```c +// ❌ WRONG: cJSON_Print allocates memory +cJSON* json = cJSON_CreateObject(); +cJSON_AddStringToObject(json, "key", "value"); +char* json_str = cJSON_Print(json); // Allocates +sendReport(json_str); +cJSON_Delete(json); // ⚠️ Forgot to free json_str - LEAK! + +// ✅ CORRECT: +char* json_str = cJSON_Print(json); +sendReport(json_str); +free(json_str); // Must free +cJSON_Delete(json); +``` + +### Pitfall 2: strdup in Profile Parsing +```c +// ❌ WRONG: strdup without checking/freeing +profile->name = strdup(config_name); +profile->protocol = strdup(config_protocol); +// What if strdup fails? What if reassigned? + +// ✅ CORRECT: +if (profile->name) free(profile->name); +profile->name = strdup(config_name); +if (!profile->name) { + return ERR_NO_MEMORY; +} +``` + +### Pitfall 3: Vector/List Cleanup +```c +// ❌ WRONG: Clearing vector without freeing elements +Vector* markers = profile->markers; +// ... use markers ... +vector_destroy(markers); // ⚠️ Leaked marker objects inside! + +// ✅ CORRECT: +for (int i = 0; i < markers->count; i++) { + Marker* m = (Marker*)vector_at(markers, i); + free_marker(m); +} +vector_destroy(markers); +``` + +## Thread Safety + +### Pitfall 4: Profile State Race +```c +// ❌ WRONG: Reading profile state without lock +if (profile->state == PROFILE_ENABLED) { + collectAndReport(profile); // Race: state can change +} + +// ✅ CORRECT: +pthread_mutex_lock(&profile->mutex); +bool enabled = (profile->state == PROFILE_ENABLED); +pthread_mutex_unlock(&profile->mutex); + +if (enabled) { + collectAndReport(profile); +} +``` + +### Pitfall 5: rbus Callback Thread Safety +```c +// ❌ WRONG: Modifying global state in rbus callback +rbusError_t eventReceiveHandler(rbusHandle_t handle, rbusEvent_t const* event) { + g_event_count++; // Race condition! + processEvent(event); + return RBUS_ERROR_SUCCESS; +} + +// ✅ CORRECT: +rbusError_t eventReceiveHandler(rbusHandle_t handle, rbusEvent_t const* event) { + pthread_mutex_lock(&g_event_mutex); + g_event_count++; + pthread_mutex_unlock(&g_event_mutex); + + processEvent(event); + return RBUS_ERROR_SUCCESS; +} +``` + +### Pitfall 6: TimeoutThread vs CollectAndReport Deadlock +```c +// ❌ WRONG: TimeoutThread holds lock while signaling CollectAndReport +void timeoutThread(Profile* profile) { + pthread_mutex_lock(&profile->mutex); + // Generate timeout event + pthread_cond_signal(&profile->event_cond); + // CollectAndReport wakes, tries to lock same mutex - DEADLOCK! + pthread_mutex_unlock(&profile->mutex); +} + +// ✅ CORRECT: Signal after releasing lock +void timeoutThread(Profile* profile) { + pthread_mutex_lock(&profile->mutex); + profile->timeout_occurred = true; + pthread_mutex_unlock(&profile->mutex); // Release first + pthread_cond_signal(&profile->event_cond); // Then signal +} +``` + +## Error Handling + +### Pitfall 7: Silent rbus Failures +```c +// ❌ WRONG: Not checking rbus return values +rbusError_t ret = rbus_open(&handle, "T2"); +rbusMethod_Register(handle, method); // Proceeds even if rbus_open failed! + +// ✅ CORRECT: +rbusError_t ret = rbus_open(&handle, "T2"); +if (ret != RBUS_ERROR_SUCCESS) { + T2Error("rbus_open failed: %d\n", ret); + return ERR_RBUS_INIT; +} +``` + +### Pitfall 8: Ignoring snprintf Return Value +```c +// ❌ WRONG: snprintf truncation not checked +char buffer[64]; +snprintf(buffer, sizeof(buffer), "%s_%s_%ld", prefix, name, timestamp); +// What if truncated? Report will be malformed. + +// ✅ CORRECT: +char buffer[64]; +int written = snprintf(buffer, sizeof(buffer), "%s_%s_%ld", prefix, name, timestamp); +if (written >= sizeof(buffer)) { + T2Warning("Buffer truncated: needed %d bytes\n", written); + // Handle truncation +} +``` + +## Configuration & Schema + +### Pitfall 9: Missing Schema Validation +```c +// ❌ WRONG: Trusting JSON without validation +cJSON* interval = cJSON_GetObjectItem(profile_json, "ReportingInterval"); +int interval_sec = interval->valueint; // Crash if NULL or wrong type! + +// ✅ CORRECT: +cJSON* interval = cJSON_GetObjectItem(profile_json, "ReportingInterval"); +if (!interval || !cJSON_IsNumber(interval)) { + T2Error("Invalid ReportingInterval in profile\n"); + return ERR_INVALID_CONFIG; +} +int interval_sec = interval->valueint; +if (interval_sec < 60 || interval_sec > 86400) { + T2Error("ReportingInterval out of range: %d\n", interval_sec); + return ERR_INVALID_CONFIG; +} +``` + +### Pitfall 10: Profile Name Collisions +```c +// ❌ WRONG: Not checking for duplicate profiles +void addProfile(Profile* new_profile) { + vector_push(g_profiles, new_profile); // Duplicate names possible! +} + +// ✅ CORRECT: +bool addProfile(Profile* new_profile) { + for (int i = 0; i < g_profiles->count; i++) { + Profile* p = vector_at(g_profiles, i); + if (strcmp(p->name, new_profile->name) == 0) { + T2Error("Profile %s already exists\n", new_profile->name); + return false; + } + } + vector_push(g_profiles, new_profile); + return true; +} +``` + +## Scheduler & Timing + +### Pitfall 11: Timing Drift in Periodic Reports +```c +// ❌ WRONG: Accumulating drift +void periodicReport(int interval_sec) { + while (running) { + sleep(interval_sec); // Drift accumulates! + generateReport(); + } +} + +// ✅ CORRECT: Absolute timing +void periodicReport(int interval_sec) { + time_t next_report = time(NULL) + interval_sec; + while (running) { + time_t now = time(NULL); + if (now >= next_report) { + generateReport(); + next_report = now + interval_sec; // Reset to now, not next_report + } + sleep(1); // Short sleep to avoid busy-wait + } +} +``` + +### Pitfall 12: Timeout Calculation Overflow +```c +// ❌ WRONG: Integer overflow on large intervals +int timeout_ms = interval_sec * 1000; // Overflow if interval_sec > 2147483! + +// ✅ CORRECT: +long timeout_ms = (long)interval_sec * 1000L; +if (timeout_ms > INT_MAX) { + T2Error("Interval too large: %d seconds\n", interval_sec); + timeout_ms = INT_MAX; +} +``` + +## rbus/CCSP Interface + +### Pitfall 13: Bus Method Name Typos +```c +// ❌ WRONG: Hardcoded strings prone to typos +rbus_registerMethod(handle, "Device.X_RDKCENTRAL-COM_T2.ReportProfiles"); +// vs +rbus_invokeMethod(handle, "Device.X_RDKCENTRAL_COM_T2.ReportProfiles"); +// Underscore vs hyphen - won't match! + +// ✅ CORRECT: Use constants +#define T2_REPORT_PROFILES_METHOD "Device.X_RDKCENTRAL-COM_T2.ReportProfiles" +rbus_registerMethod(handle, T2_REPORT_PROFILES_METHOD); +rbus_invokeMethod(handle, T2_REPORT_PROFILES_METHOD); +``` + +### Pitfall 14: Bus Handle Lifetime +```c +// ❌ WRONG: Using handle after close +rbusHandle_t handle; +rbus_open(&handle, "T2"); +// ... use handle ... +rbus_close(handle); +rbus_sendData(handle, data); // ⚠️ Handle invalid! + +// ✅ CORRECT: Track handle state +static rbusHandle_t g_handle = NULL; +static bool g_rbus_initialized = false; + +void sendData(const char* data) { + if (!g_rbus_initialized || !g_handle) { + T2Error("rbus not initialized\n"); + return; + } + rbus_sendData(g_handle, data); +} +``` + +## Data Collection + +### Pitfall 15: Missing NULL Checks in TR-181 Fetch +```c +// ❌ WRONG: Assuming parameter always exists +char* value = getParameterValue("Device.DeviceInfo.SerialNumber"); +snprintf(report, sizeof(report), "SN:%s", value); // Crash if NULL! + +// ✅ CORRECT: +char* value = getParameterValue("Device.DeviceInfo.SerialNumber"); +if (value) { + snprintf(report, sizeof(report), "SN:%s", value); + free(value); +} else { + T2Warning("Failed to get SerialNumber\n"); + snprintf(report, sizeof(report), "SN:UNKNOWN"); +} +``` + +### Pitfall 16: Marker Regex Compilation Every Time +```c +// ❌ WRONG: Recompiling regex on every log line +void processLogLine(const char* line) { + regex_t regex; + regcomp(®ex, marker->pattern, REG_EXTENDED); + if (regexec(®ex, line, 0, NULL, 0) == 0) { + marker->count++; + } + regfree(®ex); // ⚠️ Expensive! +} + +// ✅ CORRECT: Compile once, reuse +typedef struct { + char* pattern; + regex_t compiled_regex; + bool regex_valid; +} Marker; + +void initMarker(Marker* m) { + int ret = regcomp(&m->compiled_regex, m->pattern, REG_EXTENDED); + m->regex_valid = (ret == 0); +} + +void processLogLine(Marker* m, const char* line) { + if (m->regex_valid && regexec(&m->compiled_regex, line, 0, NULL, 0) == 0) { + m->count++; + } +} +``` + +## Build & Dependencies + +### Pitfall 17: Missing Feature Detection in configure.ac +```c +// ❌ WRONG: Assuming rbus is always available +#include // Build fails if rbus not installed + +// ✅ CORRECT: Configure script checks +AC_CHECK_LIB([rbus], [rbus_open], + [have_rbus=yes], + [have_rbus=no]) +AM_CONDITIONAL([HAVE_RBUS], [test "x$have_rbus" = "xyes"]) + +// Then in code: +#ifdef HAVE_RBUS +#include +#else +#include "rbus_stub.h" +#endif +``` + +### Pitfall 18: Library Link Order +```c +// ❌ WRONG: Random link order in Makefile.am +telemetry2_0_LDADD = -lrbus -lrt -lpthread -lcjson + +// ✅ CORRECT: Dependencies before dependents +telemetry2_0_LDADD = -lcjson -lrbus -lrt -lpthread +# cjson has no deps, rbus depends on cjson, pthread is base +``` + +## Testing + +### Pitfall 19: Non-Deterministic Tests +```c +// ❌ WRONG: Test depends on timing +TEST(SchedulerTest, TimeoutFires) { + startScheduler(1); // 1 second interval + sleep(1); + EXPECT_EQ(1, getReportCount()); // ⚠️ Flaky: may be 0 or 1 +} + +// ✅ CORRECT: Use mock time or polling +TEST(SchedulerTest, TimeoutFires) { + startScheduler(1); + // Poll with timeout + bool fired = false; + for (int i = 0; i < 20 && !fired; i++) { + usleep(100000); // 100ms + fired = (getReportCount() > 0); + } + EXPECT_TRUE(fired); +} +``` + +### Pitfall 20: Memory Leaks in Tests +```c +// ❌ WRONG: Setup creates objects, teardown doesn't clean +class ProfileTest : public ::testing::Test { +protected: + void SetUp() override { + profile = createProfile("test"); + } + // ⚠️ Missing TearDown - leaked in every test! + Profile* profile; +}; + +// ✅ CORRECT: +class ProfileTest : public ::testing::Test { +protected: + void SetUp() override { + profile = createProfile("test"); + } + void TearDown() override { + destroyProfile(profile); + profile = nullptr; + } + Profile* profile; +}; +``` + +## Platform Portability + +### Pitfall 21: Hardcoded Paths +```c +// ❌ WRONG: Linux-specific path +#define CONFIG_FILE "/etc/telemetry/T2Agent.cfg" + +// ✅ CORRECT: Configurable path +#ifndef T2_CONFIG_DIR +#define T2_CONFIG_DIR "/etc/telemetry" +#endif +#define CONFIG_FILE T2_CONFIG_DIR "/T2Agent.cfg" +``` + +### Pitfall 22: Endianness Assumptions +```c +// ❌ WRONG: Assuming little-endian +uint32_t net_value = *(uint32_t*)buffer; // Wrong on big-endian! + +// ✅ CORRECT: Use network byte order functions +uint32_t net_value = ntohl(*(uint32_t*)buffer); +``` + +## Review Checklist: Avoid These Pitfalls + +When reviewing PRs, specifically check for: +- [ ] cJSON memory management (free cJSON_Print results) +- [ ] Profile state access (always use mutex) +- [ ] rbus return value checking +- [ ] Schema validation before using JSON values +- [ ] Profile name uniqueness enforcement +- [ ] Timing calculations (avoid drift and overflow) +- [ ] Bus method name consistency (use constants) +- [ ] TR-181 parameter NULL checks +- [ ] Regex compilation (do it once, not per line) +- [ ] Feature detection in build system +- [ ] Test determinism (no timing dependencies) +- [ ] Test cleanup (TearDown matches SetUp) +- [ ] Path portability (no hardcoded Linux paths) diff --git a/.github/skills/code-review/references/example-review-template.md b/.github/skills/code-review/references/example-review-template.md new file mode 100644 index 00000000..98bc043d --- /dev/null +++ b/.github/skills/code-review/references/example-review-template.md @@ -0,0 +1,549 @@ +# Code Review: [PR Title Here] + +**Generated**: [Timestamp] +**Reviewer**: GitHub Copilot Code Review Skill +**PR**: #[NUMBER] | [AUTHOR] | [DATE] + +--- + +## Overview + +| Attribute | Value | +|-----------|-------| +| **Files Changed** | X files (+YYY/-ZZZ lines) | +| **Modules Impacted** | Module1, Module2, Module3 | +| **Risk Level** | 🟢 LOW / 🟡 MEDIUM / 🔴 HIGH / ⚫ CRITICAL | +| **CI Status** | ✅ Passing / ❌ Failing / ⏳ Pending | +| **Test Coverage** | New: X%, Overall: Y% | + +--- + +## Executive Summary + +[2-3 sentences describing the core changes and overall assessment] + +**Key Points:** +- Main functional change +- Notable API/behavior modifications +- Primary risk factor + +--- + +## Coverity Static Analysis Results + +**Source**: Comments from rdkcmf-jenkins / "Coverity Issue" comments +**Defects Found**: 3 issue(s) + +| Severity | Type | File:Line | Description | +|----------|------|-----------|-------------| +| 🔴 HIGH | DEADLOCK | [scheduler.c:342](scheduler.c#L342) | Lock ordering violation detected between `tMutex` and `scMutex` | +| 🟡 MEDIUM | RESOURCE_LEAK | [profile.c:456](profile.c#L456) | Memory allocated but not freed on error path | +| 🟢 LOW | CHECKED_RETURN | [utils.c:123](utils.c#L123) | Return value of `pthread_mutex_lock` not checked | + +**Impact on Risk Assessment:** +- HIGH severity defects: +3 to Safety Score +- MEDIUM severity defects: +2 to Safety Score +- Total Coverity impact: +5 points + +**Recommendations:** +- 🔴 HIGH defects are blocking and must be fixed before merge +- 🟡 MEDIUM defects should be addressed or justified +- 🟢 LOW defects can be tracked for future cleanup + +--- + +## Risk Assessment + +| Category | Score | Notes | +|----------|-------|-------| +| **Scope** | [0-10] | X files, Y modules, ZZZ LOC | +| **Criticality** | [0-10] | Core logic / Peripheral feature | +| **Complexity** | [0-10] | Algorithm changes / Simple edits | +| **Safety** | [0-10] | Memory/thread concerns identified | +| **Testing** | [0-10] | Coverage and test quality | +| **TOTAL** | [0-50] | **Risk Level: [LOW/MEDIUM/HIGH/CRITICAL]** | + +**Risk Scoring:** +- LOW (0-10): Minimal impact, well-tested, no safety concerns +- MEDIUM (11-20): Moderate impact, minor concerns, good tests +- HIGH (21-35): Significant impact or safety issues present +- CRITICAL (36-50): Core functionality, severe safety issues, or inadequate tests + +--- + +## Changes by Module + +```mermaid +graph TD + root["📁 telemetry"] + root --> source["📁 source"] + root --> sourcetest["📁 source/test"] + root --> config["📁 config"] + root --> makefile["📄 Makefile.am (+2/-1)"] + + source --> bulkdata["📁 bulkdata"] + source --> scheduler["📁 scheduler"] + source --> ccsp["📁 ccspinterface"] + source --> protocol["📁 protocol"] + + bulkdata --> profile["📄 profile.c (+85/-42) ⚠️"] + bulkdata --> reportprof["📄 reportprofiles.c (+12/-5)"] + bulkdata --> markers["📄 t2markers.c (+3/-1)"] + + scheduler --> schedulerc["📄 scheduler.c (+120/-80) 🔴"] + + ccsp --> rbus["📄 rbusInterface.c (+15/-8) 🟡"] + + protocol --> http["📁 http"] + http --> client["📄 client.c (+5/-2)"] + + sourcetest --> bulktest["📄 bulkdata_test.cpp (+95/-10) ✅"] + sourcetest --> schedtest["📄 scheduler_test.cpp (+42/-0) ✅"] + + config --> cfg["📄 T2Agent.cfg (+3/-0)"] + + classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff + classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 + classDef medium fill:#ffa94d,stroke:#fd7e14,color:#000 + classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff + classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 + + class schedulerc critical + class profile warning + class rbus medium + class bulktest,schedtest safe + class reportprof,markers,client,cfg,makefile neutral +``` + +**Legend:** +- 🔴 High risk / Critical (red) +- 🟡 Medium risk (orange) +- ⚠️ Caution needed (yellow) +- ✅ Good / Tests added (green) +- Neutral (gray) + +--- + +## Detailed Analysis + +### 1. Bulk Data Collection Module + +#### Files Modified +- [source/bulkdata/profile.c](source/bulkdata/profile.c) (+85/-42) +- [source/bulkdata/reportprofiles.c](source/bulkdata/reportprofiles.c) (+12/-5) +- [source/bulkdata/t2markers.c](source/bulkdata/t2markers.c) (+3/-1) + +#### Key Changes +- **New functionality**: Dynamic profile reload capability +- **Refactoring**: Profile state machine restructured for reload support +- **Bug fix**: Fixed memory leak in profile cleanup path +- **Configuration**: Added validation for new profile fields + +#### Code Quality Metrics +``` +Cyclomatic Complexity: [Average / Max] +Function Length: [Average lines] +Nesting Depth: [Max depth] +``` + +#### Impact Assessment + +##### Memory Safety: 🟡 MINOR CONCERNS + +**Positive:** +- ✅ All `malloc` calls include NULL checks ([profile.c:145](source/bulkdata/profile.c#L145)) +- ✅ Error paths properly free allocated memory +- ✅ Use of `goto cleanup` pattern for consistent resource cleanup + +**Concerns:** +- ⚠️ **Line 167**: `reloadProfile()` allocates new profile but doesn't free old profile's internal data structures before replacing pointer + - **Impact**: ~2KB memory leak per profile reload + - **Fix**: Call `cleanupProfileInternals(old_profile)` before assignment + +- ⚠️ **Line 223**: `strdup()` result not checked for NULL + - **Impact**: Potential crash if allocation fails + - **Fix**: Add NULL check and error handling + +##### Thread Safety: 🔴 SIGNIFICANT CONCERNS + +**Positive:** +- ✅ New `g_profile_reload_mutex` added for protection +- ✅ Consistent lock acquire/release pattern + +**Concerns:** +- 🔴 **CRITICAL - Line 167**: Potential deadlock scenario + ``` + reloadProfile() locks g_profile_reload_mutex + → calls updateScheduler() which locks g_scheduler_mutex + + Simultaneously: + TimeoutThread() locks g_scheduler_mutex + → accesses profile data (expects g_profile_reload_mutex) + ``` + - **Impact**: System hang, watchdog reset + - **Fix**: Document and enforce lock ordering: always acquire g_scheduler_mutex before g_profile_reload_mutex + +- ⚠️ **Line 205**: Profile state read without mutex + ```c + if (profile->state == PROFILE_ENABLED) { // Race condition + reload(profile); + } + ``` + - **Impact**: Race condition, profile might be disabled during reload + - **Fix**: Read state under mutex lock + +##### API Compatibility: ✅ MAINTAINED + +- No changes to public function signatures +- New functions are additive (`reloadProfile`, `validateProfile`) +- Struct layouts unchanged (ABI compatible) + +##### Error Handling: ✅ GOOD + +- Return values checked consistently +- Error codes are meaningful (`ERR_INVALID_CONFIG`, `ERR_PROFILE_LOCKED`) +- Appropriate logging at ERROR, WARN, and DEBUG levels +- Graceful degradation when reload fails (keeps old profile) + +##### Performance Impact: 🟢 LOW + +- Reload operation runs in separate thread (non-blocking) +- O(1) profile lookup (hash table unchanged) +- No new periodic operations +- Memory footprint increase: ~500 bytes per profile (new mutex + state fields) + +--- + +### 2. Scheduler Module + +#### Files Modified +- [source/scheduler/scheduler.c](source/scheduler/scheduler.c) (+120/-80) + +#### Key Changes +- **Modified**: Timeout calculation to support profile reload +- **Added**: `pauseScheduler()` and `resumeScheduler()` APIs +- **Refactored**: Timer wheel implementation for efficiency + +#### Impact Assessment + +##### Thread Safety: 🔴 HIGH RISK + +**Concerns:** +- 🔴 **Line 342**: Lock ordering issue (see Bulk Data section above) +- ⚠️ **Line 401**: Condition variable broadcast without checking predicates + - Multiple threads may wake up unnecessarily + - Potential for spurious wakeups causing incorrect behavior +- ⚠️ **Line 456**: `pauseScheduler()` doesn't wait for in-flight operations + - Profile reload might start while scheduled operation is running + +##### Algorithm Complexity: 🟡 MODERATE + +- Timer wheel introduces O(1) insert/delete (improvement from O(n) list) +- Edge case: Large interval values (>24 hours) might overflow wheel indices + - **Recommendation**: Add bounds checking at [scheduler.c:378](source/scheduler/scheduler.c#L378) + +--- + +### 3. CCSP Interface Module + +#### Files Modified +- [source/ccspinterface/rbusInterface.c](source/ccspinterface/rbusInterface.c) (+15/-8) + +#### Key Changes +- **New API**: `Device.X_RDKCENTRAL-COM_T2.ReloadProfile()` rbus method +- **Modified**: Error codes returned to caller (previously silent failures) + +#### Impact Assessment + +##### API Compatibility: ⚠️ BEHAVIOR CHANGE + +- **Breaking change**: RPC error codes changed + - Old: Always returned `RBUS_ERROR_SUCCESS` + - New: Returns actual error codes (`RBUS_ERROR_INVALID_INPUT`, etc.) + - **Impact**: Callers might not handle new error codes + - **Recommendation**: Document in release notes, add backward compatibility flag + +##### Error Handling: ✅ IMPROVED + +- Now validates input parameters before processing +- Appropriate error messages for field debugging + +--- + +### 4. HTTP Protocol Module + +#### Files Modified +- [source/protocol/http/client.c](source/protocol/http/client.c) (+5/-2) + +#### Key Changes +- **Bug fix**: Added timeout to HTTP POST (was infinite wait) + +#### Impact Assessment + +##### Reliability: ✅ IMPROVEMENT + +- Timeout prevents hung connections from blocking system +- Value: 30 seconds (reasonable for embedded environment) + +--- + +## Cross-Cutting Concerns + +### Build System + +**Files**: [Makefile.am](Makefile.am) + +**Changes:** +- Added `-pthread` flag (required for new mutex operations) +- No new external dependencies + +**Assessment:** ✅ Clean + +--- + +### Configuration + +**Files**: [config/T2Agent.cfg](config/T2Agent.cfg) + +**Changes:** +- New option: `EnableProfileReload=true` (default: false) + +**Assessment:** ✅ Good +- Feature flag allows gradual rollout +- Backward compatible (defaults to off) + +--- + +### Testing + +**Files**: +- [source/test/bulkdata_test.cpp](source/test/bulkdata_test.cpp) (+95/-10) +- [source/test/scheduler_test.cpp](source/test/scheduler_test.cpp) (+42/-0) + +**Coverage Analysis:** + +| Module | Before | After | Delta | +|--------|--------|-------|-------| +| profile.c | 78% | 85% | +7% | +| scheduler.c | 72% | 81% | +9% | +| rbusInterface.c | 65% | 68% | +3% | + +**Test Quality:** ✅ GOOD +- Comprehensive unit tests for reload logic +- Negative test cases included (invalid config, concurrent reloads) +- Mock objects used appropriately (rbus, scheduler) + +**Gaps:** +- ⚠️ Missing integration test for full reload flow (unit tests only) +- ⚠️ No stress test for concurrent reload + report generation +- ⚠️ Thread safety tests don't cover deadlock scenario identified above + +--- + +### Documentation + +**Status:** ⚠️ INCOMPLETE + +**Present:** +- ✅ Function-level comments for new APIs +- ✅ Updated README with reload instructions + +**Missing:** +- ❌ Architecture doc update (reload flow diagram) +- ❌ API reference for new rbus method +- ❌ Lock ordering documentation +- ❌ Performance impact analysis + +--- + +## Security & Privacy + +### Input Validation +- ✅ Profile JSON validated against schema +- ✅ rbus method parameters type-checked +- ⚠️ Profile name not sanitized (potential path traversal if used in file operations) + +### Privilege Escalation +- ✅ No new privileged operations +- ✅ rbus ACLs unchanged + +### Data Exposure +- ✅ No sensitive data in logs +- ✅ Profile content not logged at INFO level + +--- + +## Regression Analysis + +### High-Risk Areas + +1. **Scheduler Deadlock** 🔴 CRITICAL + - **Location**: [scheduler.c:342](source/scheduler/scheduler.c#L342) + - **Scenario**: Profile reload during scheduled timeout + - **Impact**: System hang, watchdog reset, loss of telemetry + - **Probability**: MEDIUM (requires precise timing, but occurs in production workload) + - **Mitigation**: Fix lock ordering before merge + +2. **Memory Leak During Reload** 🟡 MEDIUM + - **Location**: [profile.c:167](source/bulkdata/profile.c#L167) + - **Scenario**: Profile reloaded multiple times + - **Impact**: ~2KB leak per reload, OOM after ~500 reloads + - **Probability**: LOW (reloads are rare in production) + - **Mitigation**: Can be addressed post-merge with monitoring + +3. **rbus Error Code Change** 🟡 MEDIUM + - **Location**: [rbusInterface.c:89](source/ccspinterface/rbusInterface.c#L89) + - **Scenario**: Caller expects SUCCESS but gets ERROR + - **Impact**: Caller might retry unnecessarily or fail operation + - **Probability**: HIGH (any existing caller affected) + - **Mitigation**: Add backward compatibility flag or version new API + +### Potential Failure Modes + +| Scenario | Symptom | Detectability | Impact | +|----------|---------|---------------|---------| +| Deadlock during reload | System hang | High (watchdog fires) | CRITICAL | +| Memory leak accumulation | OOM after days | Medium (slow growth) | HIGH | +| Race in profile state | Wrong profile used | Low (silent) | MEDIUM | +| Invalid config accepted | Malformed reports | High (logs) | LOW | + +### Comparison with Previous Similar Changes + +**PR #87** (Add profile deletion) +- Also modified scheduler and profile modules +- Had similar lock ordering issue → caused production incident +- **Lesson**: Extra scrutiny needed for scheduler + profile interactions + +--- + +## Recommendations + +### 🔴 MUST FIX (Blocking) + +1. **Fix deadlock scenario** ([scheduler.c:342](source/scheduler/scheduler.c#L342)) + - Document lock ordering: `g_scheduler_mutex` → `g_profile_reload_mutex` + - Add runtime deadlock detection (DEBUG build) + - Refactor to use single lock or lock-free reload if possible + +2. **Fix memory leak** ([profile.c:167](source/bulkdata/profile.c#L167)) + ```c + // Before assignment: + cleanupProfileInternals(old_profile); + old_profile->data = new_profile->data; + ``` + +3. **Add NULL check** ([profile.c:223](source/bulkdata/profile.c#L223)) + ```c + char* name_copy = strdup(name); + if (!name_copy) { + return ERR_NO_MEMORY; + } + ``` + +### 🟡 SHOULD FIX (Before Merge) + +4. **Add integration test** for full reload flow + - Test: Start daemon → load profile → send reports → reload profile → verify new config active + +5. **Add stress test** for concurrent operations + - Test: Reload profile while reports generating (run 1000 iterations) + +6. **Document lock ordering** in scheduler.h and profile.h + ```c + /** + * Lock ordering: + * 1. g_scheduler_mutex (scheduler.c) + * 2. g_profile_reload_mutex (profile.c) + * + * Always acquire in this order to prevent deadlock. + */ + ``` + +7. **Fix rbus API compatibility** + - Option A: Add `v2` suffix to new method, keep old behavior + - Option B: Add config flag `UseV2ErrorCodes=false` for rollback + +8. **Update documentation** + - Add reload flow diagram to architecture doc + - Update API reference + - Add performance notes (reload takes ~50ms) + +### 🟢 CONSIDER (Future Improvements) + +9. **Add telemetry for reload operations** + - Count: successful/failed reloads + - Timing: reload duration histogram + - Errors: categorized by failure type + +10. **Optimize reload path** + - Current: Full profile teardown + rebuild + - Better: Diff old vs new, update only changed fields + +11. **Add reload rate limiting** + - Prevent rapid reload spamming + - Max 1 reload per 60 seconds per profile + +--- + +## Testing Checklist + +Before merging, verify: + +### Functionality +- [ ] Profile reload succeeds with valid config +- [ ] Profile reload fails gracefully with invalid config +- [ ] Existing profiles unaffected by reload of different profile +- [ ] Report generation continues during reload +- [ ] Reloaded profile immediately uses new configuration + +### Memory Safety +- [ ] Valgrind shows no leaks for reload operation +- [ ] Valgrind clean on error paths (invalid config, failed allocation) +- [ ] No use-after-free in concurrent profile access +- [ ] Memory usage stable over 1000+ reloads + +### Thread Safety +- [ ] Helgrind clean (no race conditions detected) +- [ ] No deadlocks under stress test (1000 iterations) +- [ ] Lock contention acceptable (profiled with perf) +- [ ] Scheduler timeouts continue firing during reload + +### Platform Testing +- [ ] Tested on target hardware (not just x86 VM) +- [ ] Tested on low-memory device (256MB RAM) +- [ ] Tested on busy system (>80% CPU) +- [ ] Cross-compiler build succeeds + +### Integration +- [ ] Unit tests pass (95%+ coverage on new code) +- [ ] Integration tests pass (full end-to-end flow) +- [ ] No CI failures +- [ ] Static analysis clean (cppcheck, scan-build) + +### Documentation +- [ ] README updated +- [ ] API docs updated +- [ ] Release notes drafted +- [ ] Known issues documented + +--- + +## Sign-Off + +**Code Review Status:** ⚠️ CONDITIONAL APPROVAL + +**Summary:** Solid implementation of profile reload feature with good test coverage. However, critical deadlock scenario must be addressed before merge. Memory leak and API compatibility concerns should also be resolved. + +**Approval Conditions:** +1. Fix deadlock issue +2. Fix memory leak +3. Add NULL check for strdup +4. Add integration test +5. Document lock ordering + +Once these items are addressed, re-run review or fast-track with senior engineer approval. + +--- + +**Review completed by GitHub Copilot Code Review Skill** +**For questions or concerns, consult the [review-checklist](references/review-checklist.md)** diff --git a/.github/skills/code-review/references/memory-patterns.md b/.github/skills/code-review/references/memory-patterns.md new file mode 100644 index 00000000..295e6e61 --- /dev/null +++ b/.github/skills/code-review/references/memory-patterns.md @@ -0,0 +1,322 @@ +# Memory Safety Patterns for Code Review + +## Pattern 1: Allocation with Error Check + +### ✅ CORRECT +```c +char* buffer = (char*)malloc(size); +if (!buffer) { + T2Error("Failed to allocate %zu bytes\n", size); + return ERR_MEMORY_ALLOCATION_FAILED; +} +// Use buffer +free(buffer); +``` + +### ❌ INCORRECT +```c +char* buffer = (char*)malloc(size); +strcpy(buffer, data); // Crash if malloc failed +free(buffer); +``` + +## Pattern 2: Error Path Cleanup (Single Exit) + +### ✅ CORRECT +```c +int process_data(const char* input) { + int ret = SUCCESS; + char* buffer = NULL; + FILE* fp = NULL; + + buffer = malloc(BUFFER_SIZE); + if (!buffer) { + ret = ERR_NO_MEMORY; + goto cleanup; + } + + fp = fopen(input, "r"); + if (!fp) { + ret = ERR_FILE_OPEN; + goto cleanup; + } + + // Process data... + +cleanup: + if (buffer) free(buffer); + if (fp) fclose(fp); + return ret; +} +``` + +### ❌ INCORRECT (Memory Leak) +```c +int process_data(const char* input) { + char* buffer = malloc(BUFFER_SIZE); + FILE* fp = fopen(input, "r"); + + if (!fp) return ERR_FILE_OPEN; // Leaked buffer + + // Process data... + + free(buffer); + fclose(fp); + return SUCCESS; +} +``` + +## Pattern 3: String Handling + +### ✅ CORRECT +```c +char dest[MAX_SIZE]; +strncpy(dest, source, MAX_SIZE - 1); +dest[MAX_SIZE - 1] = '\0'; // Ensure NULL termination +``` + +### ✅ BETTER (Helper Function) +```c +void safe_strcpy(char* dest, const char* src, size_t dest_size) { + if (dest_size > 0) { + strncpy(dest, src, dest_size - 1); + dest[dest_size - 1] = '\0'; + } +} +``` + +### ❌ INCORRECT +```c +char dest[MAX_SIZE]; +strcpy(dest, source); // Buffer overflow if source > MAX_SIZE +``` + +## Pattern 4: Dynamic String Allocation + +### ✅ CORRECT +```c +size_t len = strlen(source) + 1; // +1 for NULL terminator +char* dest = (char*)malloc(len); +if (!dest) { + return ERR_NO_MEMORY; +} +memcpy(dest, source, len); +// Later: free(dest); +``` + +### ❌ INCORRECT (Off-by-one) +```c +char* dest = (char*)malloc(strlen(source)); // Missing +1 +strcpy(dest, source); // Buffer overflow +``` + +## Pattern 5: Avoid Use-After-Free + +### ✅ CORRECT +```c +free(ptr); +ptr = NULL; // Invalidate pointer + +// Later... +if (ptr != NULL) { + // Safe: won't use freed memory +} +``` + +### ❌ INCORRECT +```c +free(ptr); +// ptr still points to freed memory + +// Later... +strcpy(ptr, data); // Use-after-free! +``` + +## Pattern 6: Double-Free Prevention + +### ✅ CORRECT +```c +void cleanup(Context* ctx) { + if (ctx) { + if (ctx->buffer) { + free(ctx->buffer); + ctx->buffer = NULL; // Prevent double-free + } + if (ctx->data) { + free(ctx->data); + ctx->data = NULL; + } + } +} +``` + +### ❌ INCORRECT +```c +void cleanup(Context* ctx) { + free(ctx->buffer); // What if already freed? + free(ctx->data); // Potential double-free +} +``` + +## Pattern 7: Array Bounds Checking + +### ✅ CORRECT +```c +for (int i = 0; i < count && i < MAX_ITEMS; i++) { + array[i] = data[i]; +} +``` + +### ❌ INCORRECT +```c +for (int i = 0; i < count; i++) { + array[i] = data[i]; // Buffer overflow if count > MAX_ITEMS +} +``` + +## Pattern 8: Safe realloc + +### ✅ CORRECT +```c +char* temp = (char*)realloc(buffer, new_size); +if (!temp) { + T2Error("realloc failed\n"); + free(buffer); // Original buffer still valid + return ERR_NO_MEMORY; +} +buffer = temp; +``` + +### ❌ INCORRECT (Memory Leak) +```c +buffer = (char*)realloc(buffer, new_size); +if (!buffer) { + // Lost reference to original buffer! + return ERR_NO_MEMORY; +} +``` + +## Pattern 9: Struct with Dynamic Members + +### ✅ CORRECT +```c +typedef struct { + char* name; + char* value; +} Item; + +void free_item(Item* item) { + if (item) { + free(item->name); + free(item->value); + free(item); + } +} + +Item* create_item(const char* name, const char* value) { + Item* item = (Item*)calloc(1, sizeof(Item)); + if (!item) return NULL; + + item->name = strdup(name); + if (!item->name) { + free(item); + return NULL; + } + + item->value = strdup(value); + if (!item->value) { + free(item->name); + free(item); + return NULL; + } + + return item; +} +``` + +## Pattern 10: Avoid Stack Overflow + +### ✅ CORRECT (Small Array) +```c +char buffer[256]; // Small, OK on stack +``` + +### ✅ CORRECT (Large Array) +```c +char* buffer = (char*)malloc(LARGE_SIZE); // Heap for large data +if (!buffer) return ERR_NO_MEMORY; +// Use buffer... +free(buffer); +``` + +### ❌ INCORRECT (Stack Overflow Risk) +```c +char buffer[1024 * 1024]; // 1MB on stack - too large! +``` + +## Pattern 11: Const Correctness (Avoid Unexpected Modifications) + +### ✅ CORRECT +```c +void process(const char* input) { + // Can't modify input + char* copy = strdup(input); + if (copy) { + modify(copy); + free(copy); + } +} +``` + +## Pattern 12: NULL Safety in Cleanup + +### ✅ CORRECT +```c +void cleanup(Data* data) { + if (data == NULL) return; // Guard against NULL + + if (data->buffer) free(data->buffer); + if (data->file) fclose(data->file); + if (data->mutex) { + pthread_mutex_destroy(data->mutex); + free(data->mutex); + } +} +``` + +## Red Flags to Look For + +1. **malloc/calloc without NULL check** +2. **strcpy/strcat/sprintf (use bounded versions)** +3. **Return before cleanup on error path** +4. **Pointer used after free (no NULL assignment)** +5. **Array index without bounds check** +6. **Large arrays on stack** +7. **realloc result assigned directly to original pointer** +8. **Missing free for every malloc** +9. **Buffer size calculations with off-by-one errors** +10. **Strdup without considering NULL return** + +## Valgrind Signals to Watch + +When reviewing changes, recommend valgrind if: +- New malloc/calloc/realloc/free calls +- Complex pointer manipulation +- String handling code +- Data structure modifications +- Error path additions + +```bash +valgrind --leak-check=full \ + --show-leak-kinds=all \ + --track-origins=yes \ + --verbose \ + ./test_binary +``` + +Look for: +- **Definitely lost**: Memory leak (must fix) +- **Indirectly lost**: Leaked structure with allocated members +- **Possibly lost**: Pointer arithmetic issue +- **Still reachable**: Not freed but still tracked (often acceptable for globals) +- **Invalid read/write**: Buffer overflow or use-after-free diff --git a/.github/skills/code-review/references/quick-assessment-guide.md b/.github/skills/code-review/references/quick-assessment-guide.md new file mode 100644 index 00000000..6aaf69b2 --- /dev/null +++ b/.github/skills/code-review/references/quick-assessment-guide.md @@ -0,0 +1,323 @@ +# Quick Assessment Guide + +Fast reference for generating code reviews. Use this to streamline the analysis process. + +## Risk Scoring Matrix + +### Scope Score (0-10) +| Files Changed | Modules | LOC | Score | +|---------------|---------|-----|-------| +| 1-5 | 1 | <100 | 1-2 | +| 6-15 | 1-2 | 100-500 | 3-5 | +| 16-30 | 2-4 | 500-1000 | 6-8 | +| >30 | >4 | >1000 | 9-10 | + +### Criticality Score (0-10) +| Component Type | Examples | Score | +|----------------|----------|-------| +| Peripheral | Test code, docs, comments | 1-2 | +| Configuration | Config files, schemas | 3-4 | +| Utilities | Helper functions, parsing | 5-6 | +| Core Logic | Scheduler, profile mgmt, report gen | 7-8 | +| Critical Path | Bus interface, data collection | 9-10 | + +### Complexity Score (0-10) +| Change Type | Indicators | Score | +|-------------|------------|-------| +| Trivial | Typo, comment, doc | 1 | +| Simple | Single function, <20 LOC | 2-3 | +| Moderate | Multi-function, refactor | 4-6 | +| Complex | Algorithm change, state machine | 7-8 | +| Very Complex | Concurrency change, protocol change | 9-10 | + +**Complexity Indicators:** +- Cyclomatic complexity >15: +2 +- Nesting depth >4: +1 +- New thread/lock: +2 +- Recursive function: +1 + +### Safety Score (0-10) +| Issues Found | Score | +|--------------|-------| +| None (clean review) | 0 | +| Minor style issues | 1-2 | +| Missing NULL checks | 3-4 | +| Potential race condition | 5-6 | +| Memory leak confirmed | 7-8 | +| Deadlock scenario | 9-10 | + +**Auto-score based on findings:** +- Each memory leak: +2 +- Each race condition: +2 +- Each potential deadlock: +3 +- Each buffer overflow: +3 +- Use-after-free: +4 + +**Coverity Defects (from PR comments):** +- Check for comments from user "rdkcmf-jenkins" or titles starting with "Coverity Issue" +- Each HIGH severity Coverity defect: +3 +- Each MEDIUM severity Coverity defect: +2 +- Each LOW severity Coverity defect: +1 +- Cap at 10 for safety score + +### Testing Score (0-10) +| Test Status | Score | +|-------------|-------| +| Comprehensive (>90% coverage, integration tests) | 0-1 | +| Good (>75% coverage, unit tests complete) | 2-3 | +| Basic (>50% coverage, some tests) | 4-6 | +| Minimal (<50% coverage, few tests) | 7-8 | +| None (no tests added) | 9-10 | + +--- + +## Risk Level Calculation + +**Total Score** = Scope + Criticality + Complexity + Safety + Testing + +| Total | Risk Level | Symbol | Action | +|-------|-----------|---------|---------| +| 0-10 | LOW | 🟢 | Standard review | +| 11-20 | MEDIUM | 🟡 | Careful review, validate with tools | +| 21-35 | HIGH | 🔴 | Deep review, testing required | +| 36-50 | CRITICAL | ⚫ | Block merge, full analysis needed | + +--- + +## Module Classification + +Map file paths to modules for change tree: + +| Path Pattern | Module | Criticality | +|--------------|--------|-------------| +| `source/bulkdata/*.c` | Bulk Data Collection | High | +| `source/scheduler/*.c` | Scheduling Engine | Critical | +| `source/reportgen/*.c` | Report Generation | High | +| `source/ccspinterface/*.c` | Bus Interface | Critical | +| `source/protocol/http/*.c` | HTTP Transport | High | +| `source/protocol/rbusMethod/*.c` | rbus Transport | High | +| `source/t2parser/*.c` | Profile Parser | Medium | +| `source/dcautil/*.c` | DCA Utilities | Medium | +| `source/utils/*.c` | Common Utilities | Medium | +| `source/privacycontrol/*.c` | Privacy Control | High | +| `source/t2dm/*.c` | Data Model Plugin | Medium | +| `source/t2ssp/*.c` | SSP Main | High | +| `source/test/*.cpp` | Unit Tests | Low | +| `source/testApp/*.c` | Test Application | Low | +| `*.am`, `*.ac` | Build System | Medium | +| `config/*` | Configuration | Medium | +| `schemas/*` | JSON Schemas | Low | +| `docs/*` | Documentation | Low | + +--- + +## Change Indicators (for Mermaid diagrams) + +**Symbols:** +- ✅ Test coverage added (positive) +- 🟢 Low risk change +- 🟡 Medium risk (review carefully) +- ⚠️ Safety concern flagged +- 🔴 High risk (requires attention) +- ⚫ Critical issue (blocking) + +**Mermaid CSS Classes:** +```css +classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff /* Red - High risk */ +classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 /* Yellow - Warning */ +classDef medium fill:#ffa94d,stroke:#fd7e14,color:#000 /* Orange - Medium */ +classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff /* Green - Safe/Tests */ +classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 /* Gray - Neutral */ +``` + +--- + +## Quick Memory Safety Check + +When analyzing C files, scan for these patterns: + +### 🔴 RED FLAGS (must fix) +- `strcpy`, `strcat`, `sprintf`, `gets` +- `malloc`/`calloc` without NULL check +- Return before `free` on error path +- Pointer used after `free` +- Array access without bounds check +- `realloc` result assigned directly to original pointer + +### 🟡 YELLOW FLAGS (verify carefully) +- `strncpy`, `snprintf` (check NULL termination) +- `memcpy` (check for overlap, size validation) +- Large stack arrays (>4KB) +- Complex pointer arithmetic +- cJSON functions (check for free) +- Vector/list operations (cleanup elements?) + +### ✅ GREEN PATTERNS (good) +- `goto cleanup` single-exit pattern +- Pointer set to NULL after free +- Defensive NULL checks before use +- Bounded string operations with explicit NULL +- RAII wrappers (C++ tests) + +--- + +## Quick Thread Safety Check + +### 🔴 RED FLAGS (must fix) +- Shared global modified without mutex +- Multiple locks acquired in different orders +- Condition variable without predicate loop +- Long operation under lock (>10ms) +- Thread created without join or detach +- Recursive mutex usage + +### 🟡 YELLOW FLAGS (verify carefully) +- New mutex or lock introduced +- Lock held across function call +- Callback invoked under lock +- Complex lock/unlock patterns +- Lock with goto (verify error paths) + +### ✅ GREEN PATTERNS (good) +- Consistent lock ordering documented +- Short critical sections (<1ms) +- Condition variable with while loop +- Thread stack size explicitly set +- Lock released before blocking operation +- Timeout on lock acquisition + +--- + +## API Compatibility Quick Check + +### Breaking Changes (high risk) +- ✗ Function signature modified +- ✗ Struct member reordered +- ✗ Enum value changed +- ✗ Public function deleted +- ✗ Return value semantics changed + +### Safe Changes (low risk) +- ✓ New function added +- ✓ New struct member at end +- ✓ New enum value at end +- ✓ Internal function changed +- ✓ Implementation detail changed + +--- + +## Priority Flagging + +When generating recommendations, prioritize: + +### P0 (MUST FIX - blocking) +- Deadlock scenarios +- Memory corruption (buffer overflow, use-after-free) +- NULL pointer dereferences +- Race conditions in critical path +- API/ABI breakage + +### P1 (SHOULD FIX - before merge) +- Memory leaks +- Missing error handling +- Test coverage gaps (<80% on new code) +- Documentation missing +- Performance regressions + +### P2 (CONSIDER - post-merge) +- Code style inconsistencies +- Minor optimization opportunities +- Refactoring suggestions +- Additional test scenarios +- Enhanced error messages + +--- + +## Review Flow Checklist + +Follow this order for efficient review generation: + +1. **Fetch PR metadata** (title, description, files changed) +2. **Calculate risk scores** (use matrices above) +3. **Classify files by module** (use module table) +4. **Generate change tree** (ASCII visualization) +5. **Analyze each file**: + - Run memory safety patterns + - Run thread safety patterns + - Check against common pitfalls + - Calculate local risk +6. **Identify cross-cutting concerns** (build, config, tests) +7. **Aggregate findings** (group by severity) +8. **Generate recommendations** (prioritize by P0/P1/P2) +9. **Create checklist** (actionable items) +10. **Write executive summary** (2-3 sentences) + +--- + +## Common Review Shortcuts + +### For small PRs (<5 files, <100 LOC) +- Skip detailed module breakdown +- Focus on changed functions only +- Brief risk assessment +- Quick checklist + +### For doc-only PRs +- Verify accuracy +- Check formatting +- Ensure completeness +- Skip code analysis sections + +### For test-only PRs +- Verify test quality (determinism, coverage) +- Check for test resource leaks +- Ensure setup/teardown balanced +- Brief risk summary only + +### For config-only PRs +- Validate against schema +- Check backward compatibility +- Verify default values +- Document impact + +--- + +## Time-Saving Tips + +1. **Use grep patterns** for quick scans: + ```bash + grep -rn "malloc\|calloc\|free" source/ + grep -rn "pthread_mutex\|pthread_cond" source/ + grep -rn "strcpy\|sprintf\|gets" source/ + ``` + +2. **Focus on diff hunks**, not entire files + +3. **Reuse analysis** from previous similar PRs + +4. **Skip generated code** (e.g., protocol buffers) + +5. **Batch similar issues** rather than listing individually + +6. **Reference line numbers** for specific issues only + +7. **Use tables** for repetitive data (better than prose) + +--- + +## Output Structure Template + +```markdown +# Title + Overview (5% of content) +## Executive Summary (5%) +## Risk Assessment (10%) +## Change Tree (5%) +## Module Analysis (50%) + - Per module: files, changes, impacts +## Cross-Cutting (10%) +## Regression Analysis (10%) +## Recommendations (5%) +## Checklist (5%) +``` + +Aim for 1500-3000 words total, adjust based on PR size. diff --git a/.github/skills/code-review/references/review-checklist.md b/.github/skills/code-review/references/review-checklist.md new file mode 100644 index 00000000..20226ba4 --- /dev/null +++ b/.github/skills/code-review/references/review-checklist.md @@ -0,0 +1,225 @@ +# Code Review Checklist for Telemetry Embedded Systems + +## General Requirements + +- [ ] Code follows project naming conventions +- [ ] Comments explain WHY, not WHAT +- [ ] No debug/dead code committed +- [ ] Commit messages are meaningful +- [ ] Changes align with PR description + +## Memory Safety (C/C++) + +### Allocations +- [ ] All `malloc`/`calloc`/`realloc` return values checked +- [ ] Every allocation has a corresponding `free` +- [ ] No memory leaks on error paths +- [ ] Error paths clean up in reverse order +- [ ] Use of `strdup` is justified (prefer stack when possible) + +### Pointers +- [ ] Pointers initialized to NULL +- [ ] NULL checks before dereference +- [ ] No use-after-free scenarios +- [ ] No double-free possible +- [ ] Pointer invalidation after free (`ptr = NULL`) + +### String Operations +- [ ] `strcpy` → replaced with `strncpy` or bounds-checked +- [ ] `sprintf` → replaced with `snprintf` +- [ ] `strcat` → buffer size validated +- [ ] `gets` → NEVER used (immediate rejection) +- [ ] String buffers always NULL-terminated + +### Buffer Safety +- [ ] Array bounds validated before access +- [ ] Loop conditions prevent overruns +- [ ] `memcpy` size validated, no overlap +- [ ] Fixed-size buffers avoid magic numbers (use sizeof) + +## Thread Safety + +### Shared Data +- [ ] Shared variables protected by mutex +- [ ] Lock ordering is consistent (prevent deadlocks) +- [ ] Critical sections are minimal +- [ ] Read-only access considered for atomics + +### Synchronization +- [ ] Mutex initialization checked (`pthread_mutex_init`) +- [ ] Mutexes properly destroyed on cleanup +- [ ] Condition variables paired with predicates +- [ ] No locks held across blocking operations unless necessary + +### Thread Management +- [ ] Thread creation specifies stack size +- [ ] Thread joins or detaches (no zombie threads) +- [ ] Thread-local storage used appropriately +- [ ] No busy-wait loops (use condition variables) + +## Resource Constraints + +### Memory Usage +- [ ] Prefer stack allocation over heap for small objects +- [ ] Memory pools considered for frequent allocations +- [ ] Large buffers justified by requirements +- [ ] No unbounded growth (arrays, queues, caches) + +### CPU Usage +- [ ] Algorithms are O(n) or better where possible +- [ ] No unnecessary loops or redundant operations +- [ ] Expensive operations avoided in hot paths +- [ ] Consider caching for repeated calculations + +## Error Handling + +### Return Values +- [ ] All function return values checked +- [ ] Error codes are meaningful and documented +- [ ] Errors propagated to caller when appropriate +- [ ] No silent failures (all errors logged) + +### Logging +- [ ] Error conditions logged with context +- [ ] Log levels appropriate (ERROR, WARN, INFO, DEBUG) +- [ ] No sensitive data in logs +- [ ] Logs sufficient for field debugging + +### Failure Modes +- [ ] All error paths tested +- [ ] System degrades gracefully +- [ ] No unbounded retries +- [ ] Timeouts implemented where needed + +## API Design + +### Backward Compatibility +- [ ] Existing function signatures unchanged (or versioned) +- [ ] Struct layouts maintain ABI compatibility +- [ ] Enum values not reordered +- [ ] New APIs clearly documented + +### Function Design +- [ ] Functions do one thing well (single responsibility) +- [ ] Function names are descriptive +- [ ] Parameter validation at entry +- [ ] Output parameters clearly documented + +### Header Files +- [ ] Include guards present +- [ ] Minimize dependencies (forward declarations) +- [ ] Public vs private APIs separated +- [ ] No implementation in headers (except inline) + +## Platform Independence + +### Portable Code +- [ ] No platform-specific types without abstraction +- [ ] Endianness handled explicitly if needed +- [ ] Size assumptions avoided (use sizeof, stdint.h) +- [ ] Conditional compilation minimized + +### System Calls +- [ ] System calls error-checked +- [ ] Alternatives available for missing APIs +- [ ] File paths use portable separators +- [ ] Network byte order handled for network data + +## Build System + +### Makefile Changes (*.am) +- [ ] Dependencies complete and minimal +- [ ] Compiler flags justified +- [ ] Link order correct +- [ ] No absolute paths + +### Configuration (*.ac) +- [ ] Feature detection used (not hard-coded) +- [ ] Optional dependencies handled +- [ ] Cross-compilation supported +- [ ] Platform detection accurate + +## Testing + +### Unit Tests +- [ ] New functionality has unit tests +- [ ] Edge cases covered +- [ ] Negative tests included (error paths) +- [ ] Mocks used appropriately (not excessively) + +### Test Quality +- [ ] Tests are deterministic (no flakiness) +- [ ] Test names describe what is tested +- [ ] Assertions have meaningful messages +- [ ] Setup/teardown prevent state leaks + +### Coverage +- [ ] All new functions tested +- [ ] Branch coverage >80% for new code +- [ ] Critical paths have integration tests +- [ ] Regression tests for bug fixes + +## Documentation + +### Code Documentation +- [ ] Public functions documented (purpose, params, return) +- [ ] Complex algorithms explained +- [ ] Assumptions documented +- [ ] TODOs tracked with tickets + +### External Documentation +- [ ] README.md updated if needed +- [ ] API documentation updated +- [ ] Configuration examples provided +- [ ] Migration guide for breaking changes + +## Embedded-Specific Concerns + +### Telemetry 2.0 Framework +- [ ] Profile configurations validated against schema +- [ ] Report generation doesn't block data collection +- [ ] Scheduler changes maintain timing guarantees +- [ ] Bus interface changes compatible with both CCSP and rbus + +### RDK Integration +- [ ] Changes tested on target hardware (or simulator) +- [ ] No increase in boot time +- [ ] Watchdog timeouts respected +- [ ] Log rotation considerations + +### Performance +- [ ] Startup time impact measured +- [ ] Memory footprint delta calculated +- [ ] CPU usage profiled for new loops/threads +- [ ] No I/O in critical paths unless unavoidable + +## Security + +### Input Validation +- [ ] All external inputs validated +- [ ] Buffer sizes enforced +- [ ] No arbitrary command execution +- [ ] Path traversal prevented + +### Data Protection +- [ ] No hardcoded credentials +- [ ] Sensitive data redacted in logs +- [ ] Privacy controls respected +- [ ] No unnecessary privilege escalation + +## Risk Assessment Questions + +1. **What is the blast radius if this change has a bug?** + - Single profile? All profiles? System stability? + +2. **Can this change cause a device to become unresponsive?** + - Deadlocks? Infinite loops? Resource exhaustion? + +3. **How will this fail in production?** + - Graceful degradation? Hard crash? Silent corruption? + +4. **What's the rollback plan?** + - Configuration revert? Firmware downgrade? Feature flag? + +5. **What operational visibility exists?** + - Logs? Metrics? Debug capabilities? diff --git a/.github/skills/code-review/references/thread-patterns.md b/.github/skills/code-review/references/thread-patterns.md new file mode 100644 index 00000000..5008819b --- /dev/null +++ b/.github/skills/code-review/references/thread-patterns.md @@ -0,0 +1,488 @@ +# Thread Safety Patterns for Code Review + +## Pattern 1: Mutex-Protected Shared Data + +### ✅ CORRECT +```c +typedef struct { + pthread_mutex_t mutex; + int counter; + char* shared_data; +} SharedContext; + +void increment_counter(SharedContext* ctx) { + pthread_mutex_lock(&ctx->mutex); + ctx->counter++; + pthread_mutex_unlock(&ctx->mutex); +} +``` + +### ❌ INCORRECT (Race Condition) +```c +typedef struct { + int counter; // No mutex protection +} SharedContext; + +void increment_counter(SharedContext* ctx) { + ctx->counter++; // Race condition! +} +``` + +## Pattern 2: Consistent Lock Ordering (Avoid Deadlock) + +### ✅ CORRECT +```c +// Always acquire in same order: mutex_a then mutex_b +void operation1() { + pthread_mutex_lock(&mutex_a); + pthread_mutex_lock(&mutex_b); + // Critical section + pthread_mutex_unlock(&mutex_b); + pthread_mutex_unlock(&mutex_a); +} + +void operation2() { + pthread_mutex_lock(&mutex_a); // Same order + pthread_mutex_lock(&mutex_b); + // Critical section + pthread_mutex_unlock(&mutex_b); + pthread_mutex_unlock(&mutex_a); +} +``` + +### ❌ INCORRECT (Deadlock Risk) +```c +void operation1() { + pthread_mutex_lock(&mutex_a); + pthread_mutex_lock(&mutex_b); + // ... +} + +void operation2() { + pthread_mutex_lock(&mutex_b); // Opposite order - DEADLOCK! + pthread_mutex_lock(&mutex_a); + // ... +} +``` + +## Pattern 3: Condition Variable Usage + +### ✅ CORRECT +```c +pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_cond_t cond = PTHREAD_COND_INITIALIZER; +bool ready = false; + +// Producer +void signal_ready() { + pthread_mutex_lock(&mutex); + ready = true; // Set predicate + pthread_cond_signal(&cond); + pthread_mutex_unlock(&mutex); +} + +// Consumer +void wait_ready() { + pthread_mutex_lock(&mutex); + while (!ready) { // WHILE loop, not IF + pthread_cond_wait(&cond, &mutex); + } + // Ready is true here + pthread_mutex_unlock(&mutex); +} +``` + +### ❌ INCORRECT (Missing Predicate) +```c +void wait_ready() { + pthread_mutex_lock(&mutex); + pthread_cond_wait(&cond, &mutex); // No predicate check! + // Spurious wakeup possible + pthread_mutex_unlock(&mutex); +} +``` + +## Pattern 4: Thread Creation with Stack Size + +### ✅ CORRECT (Embedded System) +```c +pthread_t thread; +pthread_attr_t attr; + +pthread_attr_init(&attr); +pthread_attr_setstacksize(&attr, 256 * 1024); // 256KB explicit + +int ret = pthread_create(&thread, &attr, thread_func, arg); +if (ret != 0) { + T2Error("pthread_create failed: %d\n", ret); +} +pthread_attr_destroy(&attr); +``` + +### ❌ INCORRECT (Default Stack) +```c +pthread_t thread; +pthread_create(&thread, NULL, thread_func, arg); // Unknown stack size +``` + +## Pattern 5: Initialization Safety + +### ✅ CORRECT +```c +static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; +static pthread_once_t once_control = PTHREAD_ONCE_INIT; + +void init_once() { + // Initialization code (runs exactly once) +} + +void safe_init() { + pthread_once(&once_control, init_once); +} +``` + +### ✅ CORRECT (Dynamic Init) +```c +pthread_mutex_t* mutex; + +int init_mutex() { + mutex = (pthread_mutex_t*)malloc(sizeof(pthread_mutex_t)); + if (!mutex) return -1; + + int ret = pthread_mutex_init(mutex, NULL); + if (ret != 0) { + free(mutex); + return -1; + } + return 0; +} +``` + +### ❌ INCORRECT +```c +pthread_mutex_t mutex; // Not initialized! +pthread_mutex_lock(&mutex); // Undefined behavior +``` + +## Pattern 6: Read-Write Lock for Read-Heavy Workloads + +### ✅ CORRECT (Multiple Readers) +```c +pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER; + +void read_data() { + pthread_rwlock_rdlock(&rwlock); + // Read shared data (multiple readers OK) + pthread_rwlock_unlock(&rwlock); +} + +void write_data() { + pthread_rwlock_wrlock(&rwlock); + // Modify shared data (exclusive) + pthread_rwlock_unlock(&rwlock); +} +``` + +## Pattern 7: Atomic Operations (Lockless) + +### ✅ CORRECT (C11 Atomics) +```c +#include + +atomic_int counter = ATOMIC_VAR_INIT(0); + +void increment() { + atomic_fetch_add(&counter, 1); // Thread-safe +} + +int get_value() { + return atomic_load(&counter); +} +``` + +### ✅ CORRECT (GCC Built-ins for embedded) +```c +static int counter = 0; + +void increment() { + __sync_fetch_and_add(&counter, 1); // Thread-safe because the built-in is atomic +} +``` + +## Pattern 8: Thread Cleanup + +### ✅ CORRECT +```c +typedef struct { + pthread_t thread; + bool running; + pthread_mutex_t mutex; +} ThreadContext; + +void stop_thread(ThreadContext* ctx) { + pthread_mutex_lock(&ctx->mutex); + ctx->running = false; + pthread_mutex_unlock(&ctx->mutex); + + pthread_join(ctx->thread, NULL); // Wait for termination + pthread_mutex_destroy(&ctx->mutex); +} +``` + +### ❌ INCORRECT (Resource Leak) +```c +void stop_thread(ThreadContext* ctx) { + ctx->running = false; // No synchronization + // Missing pthread_join - zombie thread! +} +``` + +## Pattern 9: Minimize Lock Hold Time + +### ✅ CORRECT +```c +void process_item(Queue* queue) { + Item* item = NULL; + + pthread_mutex_lock(&queue->mutex); + item = dequeue(queue); + pthread_mutex_unlock(&queue->mutex); // Release early + + if (item) { + expensive_operation(item); // Done outside lock + free(item); + } +} +``` + +### ❌ INCORRECT +```c +void process_item(Queue* queue) { + pthread_mutex_lock(&queue->mutex); + Item* item = dequeue(queue); + + if (item) { + expensive_operation(item); // Holding lock too long! + } + + pthread_mutex_unlock(&queue->mutex); +} +``` + +## Pattern 10: Thread-Local Storage + +### ✅ CORRECT (C11 standard — use when the toolchain/runtime supports C11 TLS) +```c +#include /* C11 threads API; availability varies by embedded toolchain */ + +_Thread_local char error_buffer[256]; /* Standard C11 TLS, but not universally available on all embedded toolchains */ + +void set_error(const char* msg) { + strncpy(error_buffer, msg, sizeof(error_buffer) - 1); + error_buffer[sizeof(error_buffer) - 1] = '\0'; +} + +const char* get_error() { + return error_buffer; /* Thread-safe */ +} +``` + +### ✅ CORRECT (POSIX — maximum portability for pre-C11 and embedded toolchains) +```c +#include + +static pthread_key_t error_key; +static pthread_once_t key_once = PTHREAD_ONCE_INIT; + +static void make_key(void) { + pthread_key_create(&error_key, free); +} + +void set_error(const char* msg) { + pthread_once(&key_once, make_key); + char* buf = pthread_getspecific(error_key); + if (!buf) { + buf = malloc(256); + if (!buf) return; + pthread_setspecific(error_key, buf); + } + strncpy(buf, msg, 255); + buf[255] = '\0'; +} + +const char* get_error() { + pthread_once(&key_once, make_key); + return pthread_getspecific(error_key); /* Thread-safe */ +} +``` + +### ⚠️ COMPILER EXTENSION ONLY (`__thread`) +```c +/* __thread is a GCC/Clang extension — not standard C. + * Avoid in new code; prefer _Thread_local (C11) or pthread_key_t (POSIX). */ +__thread char error_buffer[256]; +``` + +## Pattern 11: Lock Timeout (Avoid Infinite Wait) + +### ✅ CORRECT (use `CLOCK_MONOTONIC` to avoid NTP/time-change disruption) +```c +struct timespec timeout; +/* CLOCK_MONOTONIC is preferred over CLOCK_REALTIME: it is not affected by + * NTP adjustments or manual clock changes, so the timeout is always + * relative to wall-clock elapsed time. */ +clock_gettime(CLOCK_MONOTONIC, &timeout); +timeout.tv_sec += 5; // 5 second timeout + +int ret = pthread_mutex_timedlock(&mutex, &timeout); +if (ret == ETIMEDOUT) { + T2Error("Mutex lock timeout\n"); + return ERR_TIMEOUT; +} +``` + +> **Note:** `pthread_mutex_timedlock` requires an *absolute* time based on +> `CLOCK_REALTIME` per POSIX. To combine the monotonic elapsed measurement +> with the required real-time absolute value, compute the delta from +> `CLOCK_MONOTONIC` and add it to the current `CLOCK_REALTIME` value, or use +> a platform-specific extension such as +> `pthread_condattr_setclock(CLOCK_MONOTONIC)` with a condition variable. +> On Linux (glibc ≥ 2.30) `pthread_mutex_clocklock` accepts `CLOCK_MONOTONIC` +> directly and is the cleanest solution when available. + +## Pattern 12: Volatile for Hardware Registers + +### ✅ CORRECT +```c +volatile uint32_t* const hardware_reg = (uint32_t*)0x40000000; + +void write_register(uint32_t value) { + *hardware_reg = value; // Compiler won't optimize away +} + +uint32_t read_register() { + return *hardware_reg; // Always reads from hardware +} +``` + +### ❌ INCORRECT (Shared Variable Between Threads) +```c +volatile int counter = 0; // volatile != thread-safe! + +// Still need mutex or atomics for thread safety +``` + +## Red Flags to Look For + +1. **Shared data without mutex protection** +2. **Multiple locks acquired in different orders** +3. **Condition variable without predicate loop** +4. **Missing pthread_join (zombie threads)** +5. **Mutex not initialized before use** +6. **Lock held across blocking I/O** +7. **Recursive locks (potential for deadlock)** +8. **Global state modified without synchronization** +9. **Memory barriers missing in lockless code** +10. **ThreadSanitizer warnings ignored** + +## Common Deadlock Scenarios + +### Scenario 1: Lock Ordering Violation +```c +// Thread A: Lock(A) → Lock(B) +// Thread B: Lock(B) → Lock(A) +// Result: DEADLOCK +``` + +### Scenario 2: Lock and Wait +```c +pthread_mutex_lock(&mutex); +pthread_join(thread, NULL); // Waiting for thread that needs mutex +pthread_mutex_unlock(&mutex); // DEADLOCK +``` + +### Scenario 3: Callback Under Lock +```c +pthread_mutex_lock(&mutex); +user_callback(); // Callback tries to lock same mutex +pthread_mutex_unlock(&mutex); // DEADLOCK +``` + +## Testing for Thread Safety + +### Helgrind (Valgrind) +```bash +valgrind --tool=helgrind ./program +``` +Detects: +- Data races +- Lock order violations +- Improper use of POSIX threading primitives + +### ThreadSanitizer (TSan) +```bash +gcc -fsanitize=thread -g program.c -o program +./program +``` +Detects: +- Data races +- Deadlocks +- Use of uninitialized mutexes + +## Telemetry 2.0 Specific Patterns + +### TimeoutThread Pattern +```c +// TimeoutThread uses condition variable with timeout +pthread_mutex_lock(&profile->mutex); +while (profile->running) { + struct timespec timeout; + // Calculate next timeout + int ret = pthread_cond_timedwait(&profile->cond, + &profile->mutex, + &timeout); + if (ret == ETIMEDOUT) { + // Trigger report generation + } +} +pthread_mutex_unlock(&profile->mutex); +``` + +### rbus Async Handler Pattern +```c +// Short-lived thread for async rbus operations +void* asyncMethodHandler(void* arg) { + Message* msg = (Message*)arg; + + // Process message (no shared state modifications) + rbus_sendData(msg->method, msg->data); + + // Cleanup + free(msg->data); + free(msg); + + return NULL; +} +``` + +### Profile State Protection +```c +// All profile state changes must be protected +pthread_mutex_lock(&profile->mutex); +profile->state = STATE_COLLECTING; +profile->last_report_time = time(NULL); +pthread_mutex_unlock(&profile->mutex); +``` + +## Review Checklist for Threading Changes + +- [ ] Shared data identified and protected +- [ ] Lock ordering documented and consistent +- [ ] Condition variables used with predicate loop +- [ ] Thread stack size specified +- [ ] Thread cleanup (join/detach) implemented +- [ ] Mutexes initialized and destroyed +- [ ] No locks held across blocking operations +- [ ] Timeouts prevent infinite waits +- [ ] Error paths release locks +- [ ] Tested with helgrind/TSan diff --git a/.github/skills/memory-safety-analyzer/SKILL.md b/.github/skills/memory-safety-analyzer/SKILL.md new file mode 100644 index 00000000..5d2d9b29 --- /dev/null +++ b/.github/skills/memory-safety-analyzer/SKILL.md @@ -0,0 +1,227 @@ +--- +name: memory-safety-analyzer +description: Analyze C/C++ code for memory safety issues including leaks, use-after-free, buffer overflows, and provide fixes. Use when reviewing memory management, debugging crashes, or improving code safety. +--- + +# Memory Safety Analysis for Embedded C + +## Purpose + +Systematically analyze C/C++ code for memory safety issues that can cause crashes, security vulnerabilities, or resource exhaustion in embedded systems. + +## Usage + +Invoke this skill when: +- Reviewing new code with dynamic memory allocation +- Debugging memory-related crashes +- Analyzing legacy code for safety issues +- Preparing code for production deployment +- Investigating memory leaks or fragmentation + +## Analysis Process + +### Step 1: Identify All Allocations + +Search the code for: +- `malloc`, `calloc`, `realloc` +- `strdup`, `strndup` +- `fopen`, `open` +- `pthread_create`, `pthread_mutex_init` +- Custom allocation functions + +For each allocation, verify: +1. Return value is checked +2. Corresponding free/close exists +3. Error paths also free resources +4. No double-free possible + +### Step 2: Check Pointer Lifetimes + +For each pointer variable: +- When is it assigned? +- When is it freed? +- Can it be used after free? +- Can it outlive the data it points to? +- Is it NULL-initialized? +- Is it NULL-checked before use? + +### Step 3: Analyze Error Paths + +For each error return: +- Are all resources freed? +- Is cleanup done in correct order? +- Are error codes accurate? +- Is logging appropriate? + +### Step 4: Review Buffer Operations + +For string and memory operations: +- `strcpy` → should be `strncpy` with size check +- `sprintf` → should be `snprintf` with size +- `gets` → never use (remove immediately) +- `strcat` → verify buffer size +- `memcpy` → verify no overlap, validate size + +### Step 5: Static Analysis + +Run tools: +```bash +# Cppcheck +cppcheck --enable=all --inconclusive file.c + +# Clang static analyzer +scan-build make + +# Compiler warnings +gcc -Wall -Wextra -Werror file.c +``` + +### Step 6: Dynamic Analysis + +Run valgrind: +```bash +valgrind --leak-check=full \ + --show-leak-kinds=all \ + --track-origins=yes \ + --verbose \ + ./program +``` + +## Common Issues and Fixes + +### Issue: Unchecked malloc + +```c +// PROBLEM +char* buffer = malloc(size); +strcpy(buffer, input); // Crash if malloc failed + +// FIX +char* buffer = malloc(size); +if (!buffer) { + log_error("Failed to allocate %zu bytes", size); + return ERR_NO_MEMORY; +} +strncpy(buffer, input, size - 1); +buffer[size - 1] = '\0'; +``` + +### Issue: Memory leak on error + +```c +// PROBLEM +int process() { + char* buf = malloc(1024); + FILE* f = fopen("file.txt", "r"); + + if (!f) return -1; // Leaked buf + + // ... process ... + + free(buf); + fclose(f); + return 0; +} + +// FIX: Single exit with cleanup +int process() { + int ret = 0; + char* buf = NULL; + FILE* f = NULL; + + buf = malloc(1024); + if (!buf) { + ret = ERR_NO_MEMORY; + goto cleanup; + } + + f = fopen("file.txt", "r"); + if (!f) { + ret = ERR_FILE_OPEN; + goto cleanup; + } + + // ... process ... + +cleanup: + free(buf); + if (f) fclose(f); + return ret; +} +``` + +### Issue: Use after free + +```c +// PROBLEM +free(ptr); +if (ptr->field > 0) { ... } // Use after free! + +// FIX +int value = ptr->field; +free(ptr); +ptr = NULL; +if (value > 0) { ... } +``` + +### Issue: Double free + +```c +// PROBLEM +free(ptr); +// ... later ... +free(ptr); // Double free! + +// FIX: NULL after free +free(ptr); +ptr = NULL; +// ... later ... +free(ptr); // Safe: free(NULL) is a no-op +``` + +### Issue: Buffer overflow + +```c +// PROBLEM +char buffer[100]; +strcpy(buffer, user_input); // Overflow if input > 99 chars + +// FIX +char buffer[100]; +strncpy(buffer, user_input, sizeof(buffer) - 1); +buffer[sizeof(buffer) - 1] = '\0'; +``` + +## Output Format + +Provide findings as: + +``` +## Memory Safety Analysis + +### Critical Issues (must fix) +1. [file.c:123] Unchecked malloc - potential NULL dereference +2. [file.c:456] Memory leak on error path - buffer not freed +3. [file.c:789] Use after free - ptr used after free() + +### Warnings (should fix) +1. [file.c:234] strcpy used - prefer strncpy +2. [file.c:567] Missing NULL check before pointer use + +### Recommendations +1. Add cleanup label for resource management +2. Use RAII wrapper in tests +3. Run valgrind in CI pipeline + +### Suggested Fixes +[Provide specific code changes for each issue] +``` + +## Verification + +After fixes: +1. All static analysis warnings resolved +2. Valgrind shows no leaks +3. All tests pass +4. Code review by human +5. Memory footprint measured and acceptable diff --git a/.github/skills/platform-portability-checker/SKILL.md b/.github/skills/platform-portability-checker/SKILL.md new file mode 100644 index 00000000..354fce3c --- /dev/null +++ b/.github/skills/platform-portability-checker/SKILL.md @@ -0,0 +1,318 @@ +--- +name: platform-portability-checker +description: Verify C/C++ code is platform-independent and portable across embedded platforms. Use when reviewing code for cross-platform deployment or preparing for new hardware targets. +--- + +# Platform Portability Checker + +## Purpose + +Ensure C/C++ code is portable across different embedded platforms, architectures, and operating systems without modification. + +## When to Use + +- Reviewing new code before merge +- Porting to new hardware platform +- Preparing release for multiple architectures +- Investigating platform-specific bugs +- Refactoring legacy platform-specific code + +## Portability Checklist + +### 1. Integer Types + +**Check for**: Use of `int`, `long`, `short` without fixed sizes + +```c +// PROBLEM: Size varies by platform +int counter; // 16, 32, or 64 bits? +long timestamp; // 32 or 64 bits? +short flag; // 16 bits on most, but not guaranteed + +// FIX: Use stdint.h types +#include + +uint32_t counter; // Always 32 bits +uint64_t timestamp; // Always 64 bits +uint16_t flag; // Always 16 bits + +// For size_t operations +size_t length; // Pointer-sized unsigned +ssize_t result; // Pointer-sized signed +``` + +### 2. Pointer Assumptions + +**Check for**: Pointer arithmetic, casting, size assumptions + +```c +// PROBLEM: Assumes pointer == long +long ptr_value = (long)ptr; // Fails on 64-bit with 32-bit long + +// FIX: Use uintptr_t +#include +uintptr_t ptr_value = (uintptr_t)ptr; + +// PROBLEM: Pointer used as integer +if (ptr & 0x1) { ... } // What size is ptr? + +// FIX: Be explicit +if ((uintptr_t)ptr & 0x1) { ... } +``` + +### 3. Endianness + +**Check for**: Multi-byte values sent over network or stored to disk + +```c +// PROBLEM: Host byte order assumed +uint32_t value = 0x12345678; +fwrite(&value, 4, 1, file); // Different on LE vs BE + +// FIX: Explicit byte order +#include // For htonl, ntohl + +uint32_t host_value = 0x12345678; +uint32_t network_value = htonl(host_value); +fwrite(&network_value, 4, 1, file); + +// For reading +uint32_t network_value; +fread(&network_value, 4, 1, file); +uint32_t host_value = ntohl(network_value); +``` + +### 4. Structure Packing + +**Check for**: Structures sent over network or saved to disk + +```c +// PROBLEM: Padding varies by platform +struct { + uint8_t type; + uint32_t value; // Padding before this? + uint16_t flags; // Padding before this? +} data; + +// FIX: Explicit packing +struct __attribute__((packed)) { + uint8_t type; + uint32_t value; + uint16_t flags; +} data; + +// Or control padding explicitly +struct { + uint8_t type; + uint8_t padding[3]; // Explicit padding + uint32_t value; + uint16_t flags; + uint16_t padding2; +} data; +``` + +### 5. Boolean Type + +**Check for**: Using int/char for boolean + +```c +// PROBLEM: Non-standard boolean +int flag; // Really 3 states: 0, 1, other +char enabled; // Also used for booleans + +// FIX: Use stdbool.h +#include + +bool flag; +bool enabled; + +if (flag) { ... } // Clear intent +``` + +### 6. Character Sets + +**Check for**: Assumptions about ASCII or character encoding + +```c +// PROBLEM: Assumes ASCII +if (ch >= 'A' && ch <= 'Z') { + ch = ch + 32; // Convert to lowercase? +} + +// FIX: Use standard functions +#include + +if (isupper(ch)) { + ch = tolower(ch); +} +``` + +### 7. File Paths + +**Check for**: Hard-coded path separators + +```c +// PROBLEM: Unix-specific +const char* path = "/tmp/telemetry/data.log"; + +// FIX: Use platform-agnostic approach +#ifdef _WIN32 + #define PATH_SEP "\\" + const char* tmp_dir = getenv("TEMP"); +#else + #define PATH_SEP "/" + const char* tmp_dir = "/tmp"; +#endif + +char path[256]; +snprintf(path, sizeof(path), "%s%stelemetry%sdata.log", + tmp_dir, PATH_SEP, PATH_SEP); +``` + +### 8. System Calls + +**Check for**: Platform-specific syscalls + +```c +// PROBLEM: Linux-specific +#include +int fd = epoll_create(10); + +// FIX: Abstraction layer +// In platform.h +#if defined(__linux__) + #include "platform_linux.h" +#elif defined(__APPLE__) + #include "platform_darwin.h" +#else + #error "Unsupported platform" +#endif + +// Each platform provides same interface +event_loop_t* create_event_loop(void); +``` + +### 9. Compiler Extensions + +**Check for**: GCC/Clang specific features + +```c +// PROBLEM: GCC-specific +typeof(x) y = x; +int array[0]; // Zero-length array + +// FIX: Use C11 standard features +__auto_type y = x; // C11 + +// Or avoid non-standard features +// Define proper types instead +``` + +### 10. Include Paths + +**Check for**: Platform-specific headers + +```c +// PROBLEM: Assumes Linux headers +#include + +// FIX: Use standard headers or configure check +#ifdef HAVE_LINUX_LIMITS_H + #include +#else + #include +#endif + +// Or use autoconf to detect +// In configure.ac: +// AC_CHECK_HEADERS([linux/limits.h limits.h]) +``` + +## Build System Integration + +### configure.ac checks + +```autoconf +# Check for required features +AC_C_BIGENDIAN +AC_CHECK_SIZEOF([int]) +AC_CHECK_SIZEOF([long]) +AC_CHECK_SIZEOF([void *]) + +# Check for headers +AC_CHECK_HEADERS([stdint.h stdbool.h endian.h]) + +# Check for functions +AC_CHECK_FUNCS([htonl ntohl]) + +# Platform-specific code +case "$host" in + *-linux*) + AC_DEFINE([PLATFORM_LINUX], [1]) + ;; + arm*|*-arm*) + AC_DEFINE([PLATFORM_ARM], [1]) + ;; +esac +``` + +## Testing + +### Cross-Compilation Test + +```bash +# Test building for different architectures +./configure --host=arm-linux-gnueabihf +make clean && make + +./configure --host=x86_64-linux-gnu +make clean && make + +./configure --host=mips-linux-gnu +make clean && make +``` + +### Endianness Test + +```c +// Test endianness handling +uint32_t value = 0x12345678; +uint32_t network = htonl(value); +uint32_t restored = ntohl(network); +assert(value == restored); + +// Verify structure packing +assert(sizeof(packed_struct_t) == EXPECTED_SIZE); +``` + +## Output Format + +``` +## Platform Portability Analysis + +### Critical Issues +1. [file.c:123] Using `long` for timestamp - not fixed width +2. [file.c:456] Writing struct directly to network - endianness issue +3. [file.c:789] Assuming 32-bit pointers + +### Warnings +1. [file.c:234] Using int for boolean - prefer stdbool.h +2. [file.c:567] Hard-coded Unix path separator + +### Recommendations +1. Add configure checks for required headers +2. Create platform abstraction layer +3. Test build on multiple architectures + +### Suggested Fixes +[Specific code changes for each issue] +``` + +## Verification + +- Code compiles on target platforms +- Tests pass on all platforms +- Static analysis clean +- No endianness issues +- No alignment issues +- Structure sizes verified diff --git a/.github/skills/quality-checker/README.md b/.github/skills/quality-checker/README.md new file mode 100644 index 00000000..ce3c6b7a --- /dev/null +++ b/.github/skills/quality-checker/README.md @@ -0,0 +1,72 @@ +# Quality Checker Skill + +Run comprehensive quality checks in the standard test container through chat interface. + +## Quick Start + +Simply ask Copilot to run quality checks in natural language: + +```text +Run quality checks +``` + +```text +Check memory safety +``` + +```text +Run static analysis on source/bulkdata +``` + +## What Gets Checked + +1. **Static Analysis**: cppcheck + shellcheck +2. **Memory Safety**: valgrind leak detection +3. **Thread Safety**: helgrind race/deadlock detection +4. **Build Verification**: strict warnings compilation + +## Environment + +Runs in the same container as CI/CD: + +- Image: `ghcr.io/rdkcentral/docker-device-mgt-service-test/native-platform:latest` +- All tools pre-installed +- Consistent with automated tests + +## Example Invocations + +| What to say | What it does | +| ----------- | ------------ | +| "Run quality checks" | Full suite, summary report | +| "Quick static analysis" | cppcheck + shellcheck only | +| "Check for memory leaks" | valgrind on test binaries | +| "Verify build with strict warnings" | Build with -Werror | +| "Run all checks on source/utils" | Full suite, scoped to utils | + +## Typical Workflow + +1. **Before committing**: "Run static analysis" +2. **Before push**: "Run quality checks" +3. **Debugging crash**: "Check memory safety" +4. **Reviewing PR**: "Run all checks" + +## Output + +You'll receive: + +- Summary of issues found +- Critical problems highlighted +- Links to detailed reports +- Recommendations for fixes + +## Prerequisites + +- Docker installed and running +- Access to GitHub Container Registry (automatic in CI/CD, may need login locally) + +## Tips + +- Start with static analysis (fastest) +- Run memory checks after static analysis passes +- Scope checks to changed files for speed +- Full suite before pushing to develop branch diff --git a/.github/skills/quality-checker/SKILL.md b/.github/skills/quality-checker/SKILL.md new file mode 100644 index 00000000..7abc3b8c --- /dev/null +++ b/.github/skills/quality-checker/SKILL.md @@ -0,0 +1,325 @@ +--- +name: quality-checker +description: Run comprehensive quality checks (static analysis, memory safety, thread safety, build verification) in the standard test container. Use when validating code changes or debugging before committing. +--- + +# Container-Based Quality Checker + +## Purpose + +Execute comprehensive quality checks on the codebase using the same containerized environment as CI/CD pipelines. Ensures consistency between local development and automated testing. + +## Usage + +Invoke this skill when: +- Validating changes before committing +- Debugging build or test failures +- Running quality checks locally +- Verifying memory safety of new code +- Checking for thread safety issues +- Performing static analysis + +You can run all checks or select specific ones based on your needs. + +## What It Does + +This skill runs quality checks inside the official test container (`ghcr.io/rdkcentral/docker-device-mgt-service-test/native-platform:latest`), which includes: +- Build tools (gcc, autotools, make) +- Static analysis tools (cppcheck, shellcheck) +- Memory analysis tools (valgrind) +- Thread analysis tools (helgrind) +- Google Test/Mock frameworks + +## Available Checks + +### 1. Static Analysis +- **cppcheck**: Comprehensive C/C++ static code analyzer +- **shellcheck**: Shell script linter +- **Output**: XML report with findings + +### 2. Memory Safety (Valgrind) +- **Memory leak detection**: Finds unreleased allocations +- **Use-after-free detection**: Catches dangling pointer usage +- **Invalid memory access**: Buffer overflows, uninitialized reads +- **Output**: XML and log files per test binary + +### 3. Thread Safety (Helgrind) +- **Race condition detection**: Finds unsynchronized shared memory access +- **Deadlock detection**: Identifies lock ordering issues +- **Lock usage verification**: Validates proper synchronization +- **Output**: XML and log files per test binary + +### 4. Build Verification +- **Strict compilation**: Builds with `-Wall -Wextra -Werror` +- **Test build**: Verifies tests compile +- **Binary analysis**: Reports size and dependencies +- **Output**: Build artifacts and size report + +## Execution Process + +### Step 1: Setup Container Environment + +Pull the latest test container: +```bash +docker pull ghcr.io/rdkcentral/docker-device-mgt-service-test/native-platform:latest +``` + +Start container with workspace mounted: +```bash +docker run -d --name native-platform \ + -v /path/to/workspace:/mnt/workspace \ + ghcr.io/rdkcentral/docker-device-mgt-service-test/native-platform:latest +``` + +### Step 2: Run Selected Checks + +Execute the requested quality checks inside the container: + +**Static Analysis:** +```bash +docker exec -i native-platform /bin/bash -c " + cd /mnt/workspace && \ + cppcheck --enable=all \ + --inconclusive \ + --suppress=missingIncludeSystem \ + --suppress=unmatchedSuppression \ + --error-exitcode=0 \ + --xml \ + --xml-version=2 \ + source/ 2> cppcheck-report.xml +" +``` + +**Shell Script Checks:** +```bash +docker exec -i native-platform /bin/bash -c " + cd /mnt/workspace && \ + find . -name '*.sh' -type f -exec shellcheck {} + +" +``` + +**Memory Safety:** +```bash +docker exec -i native-platform /bin/bash -c " + cd /mnt/workspace && \ + autoreconf -fi && \ + ./configure --enable-gtest && \ + make -j\$(nproc) && \ + find source/test -type f -executable -name '*test*' | while read test_bin; do + valgrind --leak-check=full \ + --show-leak-kinds=all \ + --track-origins=yes \ + --xml=yes \ + --xml-file=\"valgrind-\$(basename \$test_bin).xml\" \ + \"\$test_bin\" 2>&1 | tee \"valgrind-\$(basename \$test_bin).log\" + done +" +``` + +**Thread Safety:** +```bash +docker exec -i native-platform /bin/bash -c " + cd /mnt/workspace && \ + find source/test -type f -executable -name '*test*' | while read test_bin; do + valgrind --tool=helgrind \ + --track-lockorders=yes \ + --xml=yes \ + --xml-file=\"helgrind-\$(basename \$test_bin).xml\" \ + \"\$test_bin\" 2>&1 | tee \"helgrind-\$(basename \$test_bin).log\" + done +" +``` + +**Build Verification:** +```bash +docker exec -i native-platform /bin/bash -c " + cd /mnt/workspace && \ + autoreconf -fi && \ + ./configure --enable-gtest CFLAGS='-Wall -Wextra -Werror' && \ + make -j\$(nproc) && \ + if [ -f 'telemetry2_0' ]; then + ls -lh telemetry2_0 + file telemetry2_0 + size telemetry2_0 + fi +" +``` + +### Step 3: Report Results + +Parse and summarize results for the user: +- Number of issues found by category +- Critical issues requiring immediate attention +- Warnings that should be addressed +- Memory leaks with stack traces +- Race conditions or deadlock risks +- Build errors or warnings + +### Step 4: Cleanup + +Stop and remove the container: +```bash +docker stop native-platform +docker rm native-platform +``` + +## Interpreting Results + +### Static Analysis (cppcheck) +- **error**: Critical issues that must be fixed +- **warning**: Potential problems to review +- **style**: Code style improvements +- **performance**: Optimization opportunities + +### Memory Safety (Valgrind) +- **definitely lost**: Memory leaks requiring fixes +- **indirectly lost**: Leaks from lost parent structures +- **possibly lost**: Potential leaks to investigate +- **still reachable**: Memory held at exit (usually OK) +- **Invalid read/write**: Buffer overflow (CRITICAL) +- **Use of uninitialized value**: Must initialize before use + +### Thread Safety (Helgrind) +- **Possible data race**: Unsynchronized access to shared data +- **Lock order violation**: Potential deadlock scenario +- **Unlocking unlocked lock**: Synchronization bug +- **Thread still holds locks**: Resource leak + +### Build Verification +- **Compilation errors**: Must fix before proceeding +- **Warnings**: Review and fix (builds with -Werror) +- **Binary size**: Monitor for embedded constraints + +## User Interaction + +When invoked, ask the user: + +1. **Which checks to run?** + - All checks (comprehensive) + - Static analysis only (fast) + - Memory safety only + - Thread safety only + - Build verification only + - Custom combination + +2. **Scope:** + - Full codebase + - Specific directories + - Recently changed files + +3. **Report detail:** + - Summary only (counts and critical issues) + - Detailed (all findings) + - Full raw output + +## Example Invocations + +**User**: "Run quality checks" +- Default: Run all checks on full codebase, provide summary + +**User**: "Check memory safety" +- Run only valgrind checks, detailed report + +**User**: "Quick static analysis" +- Run cppcheck and shellcheck, summary only + +**User**: "Verify my changes build" +- Run build verification with strict warnings + +**User**: "Full analysis on source/bulkdata" +- Run all checks scoped to bulkdata directory + +## Best Practices + +1. **Run before committing**: Catch issues early +2. **Start with static analysis**: Fastest feedback +3. **Run memory checks on test binaries**: Most effective +4. **Review thread safety for concurrent code**: Essential for multi-threaded components +5. **Monitor binary size**: Important for embedded targets + +## Integration with Development Workflow + +1. **Pre-commit**: Quick static analysis +2. **Pre-push**: Full quality check suite +3. **Debugging**: Targeted memory/thread analysis +4. **Code review**: Validate reviewer feedback +5. **Refactoring**: Ensure no regressions + +## Advantages Over Manual Testing + +- **Consistency**: Same environment as CI/CD +- **Completeness**: All tools in one command +- **Reproducibility**: Container ensures identical results +- **Efficiency**: No local tool installation needed +- **Confidence**: Pass locally = pass in CI + +## Output Files Generated + +- `cppcheck-report.xml`: Static analysis findings +- `valgrind-.xml`: Memory issues per test +- `valgrind-.log`: Detailed memory logs +- `helgrind-.xml`: Thread safety issues per test +- `helgrind-.log`: Detailed concurrency logs + +These files can be uploaded as artifacts or reviewed locally. + +## Limitations + +- Requires Docker with GitHub Container Registry access +- Container pulls can be slow on first run (cached afterward) +- Full suite can take several minutes depending on codebase size +- Valgrind slows execution significantly (expected) + +## Tips for Faster Execution + +1. Use cached container images (don't pull every time) +2. Run static analysis first (fastest) +3. Scope checks to changed directories +4. Run memory/thread checks only on affected tests +5. Use parallel execution where possible + +## Skill Execution Logic + +When user invokes this skill: + +1. **Authenticate with GitHub Container Registry** + - Use github.actor and GITHUB_TOKEN if available + - Otherwise prompt for credentials or skip private registries + +2. **Pull container image** + - Check if image exists locally + - Pull only if needed or if --force specified + +3. **Start container** + - Mount workspace at /mnt/workspace + - Use unique container name (quality-checker-) + - Run in detached mode + +4. **Execute requested checks** + - Run checks in sequence + - Capture output + - Continue on errors (collect all findings) + +5. **Collect results** + - Copy result files from container + - Parse XML/log outputs + - Categorize findings + +6. **Report to user** + - Summary count + - Critical issues highlighted + - Link to detailed reports + - Next steps recommendations + +7. **Cleanup** + - Stop container + - Remove container + - Optional: clean up result files + +## Error Handling + +- **Container pull fails**: Report error, suggest manual pull +- **Container start fails**: Check Docker daemon, ports, permissions +- **Build fails**: Report build errors, stop further checks +- **Tools missing**: Verify container version, report missing tools +- **Out of memory**: Suggest increasing Docker memory limit diff --git a/.github/skills/technical-documentation-writer/SKILL.md b/.github/skills/technical-documentation-writer/SKILL.md new file mode 100644 index 00000000..4500a80e --- /dev/null +++ b/.github/skills/technical-documentation-writer/SKILL.md @@ -0,0 +1,707 @@ +--- +name: technical-documentation-writer +description: Create and maintain comprehensive technical documentation for embedded systems projects. Use for architecture docs, API references, developer guides, and system documentation following best practices. +--- + +# Technical Documentation Writer for Embedded Systems + +## Purpose + +Create clear, comprehensive, and maintainable technical documentation for embedded C/C++ projects, with focus on architecture, APIs, threading models, memory management, and platform integration. Specifically tailored for the NetworkManager Thunder plugin project. + +## Usage + +Invoke this skill when: +- Documenting NetworkManager features (WiFi, Ethernet, connectivity) +- Creating system architecture documentation +- Writing Thunder plugin API reference documentation +- Documenting threading and synchronization models (DBus, JSONRPC) +- Creating developer onboarding guides +- Documenting debugging procedures (network issues, STUN, connectivity) +- Writing integration guides for platform vendors (Gnome NM, RDK NSM backends) + +## Documentation Structure + +### Directory Layout + +``` +project/ +├── README.md # Project overview, quick start +├── docs/ # General documentation +│ ├── README.md # Documentation index +│ ├── architecture/ # System architecture +│ │ ├── overview.md # High-level architecture +│ │ ├── component-diagram.md # Component relationships +│ │ ├── threading-model.md # Threading architecture +│ │ └── data-flow.md # Data flow diagrams +│ ├── api/ # API documentation +│ │ ├── public-api.md # Public API reference +│ │ └── internal-api.md # Internal API reference +│ ├── integration/ # Integration guides +│ │ ├── build-setup.md # Build environment setup +│ │ ├── platform-porting.md # Porting to new platforms +│ │ └── testing.md # Test procedures +│ └── troubleshooting/ # Debug guides +│ ├── memory-issues.md # Memory debugging +│ ├── threading-issues.md # Thread debugging +│ └── common-errors.md # Common error solutions +└── plugin/ # Source code + └── docs/ # Component-specific docs + ├── connectivity/ # Mirrors source structure + │ ├── README.md # Component overview + │ └── connectivity-test.md + ├── gnome/ + │ ├── README.md + │ └── dbus-architecture.md + └── wifi/ + ├── README.md + └── scanning-algorithm.md +``` + +### Document Types + +#### 1. **Architecture Documentation** (`docs/architecture/`) +- System overview and design principles +- Component relationships and dependencies +- Threading and concurrency models +- Data flow and state machines +- Memory management strategies +- Platform abstraction layers + +#### 2. **API Documentation** (`docs/api/`) +- Public API reference with examples +- Internal API documentation +- Function contracts and preconditions +- Thread-safety guarantees +- Memory ownership semantics +- Error handling conventions + +#### 3. **Component Documentation** (`source/docs/`) +- Per-component technical details +- Algorithm explanations +- Implementation notes +- Performance characteristics +- Resource usage (memory, CPU, threads) +- Dependencies and interfaces + +#### 4. **Integration Guides** (`docs/integration/`) +- Build system setup +- Platform porting guides +- Configuration options +- Testing procedures +- Deployment checklists + +#### 5. **Troubleshooting Guides** (`docs/troubleshooting/`) +- Common error scenarios +- Debug techniques +- Log analysis +- Memory profiling +- Thread race detection + +## Documentation Process + +### Step 1: Analyze the Code + +Before writing documentation: + +1. **Read the source code** - Understand implementation +2. **Identify key abstractions** - Classes, structs, modules +3. **Map dependencies** - What calls what, data flow +4. **Find synchronization** - Mutexes, conditions, atomics +5. **Trace resource lifecycle** - Allocations, ownership, cleanup +6. **Review existing docs** - Check for patterns and style + +### Step 2: Create Structure + +For each component: + +```markdown +# Component Name + +## Overview +Brief 2-3 sentence description of purpose and role. + +## Architecture +High-level design with diagrams. + +## Key Components +List main structures, functions, modules. + +## Threading Model +How threads interact, synchronization primitives. + +## Memory Management +Allocation patterns, ownership, lifecycle. + +## API Reference +Public functions with signatures and examples. + +## Usage Examples +Common use cases with code snippets. + +## Error Handling +Error codes, failure modes, recovery. + +## Performance Considerations +Resource usage, bottlenecks, optimization tips. + +## Platform Notes +Platform-specific behavior or requirements. + +## Testing +How to test, test coverage, known issues. + +## See Also +Cross-references to related documentation. +``` + +### Step 3: Add Diagrams + +Use Mermaid for visual documentation: + +#### Component Diagram +```mermaid +graph TB + A[Thunder Framework] --> B[NetworkManager Plugin] + B --> C[JSONRPC Handler] + B --> D[Implementation Layer] + D --> E[Gnome Backend] + D --> F[RDK Backend] + E --> G[DBus/NetworkManager] + F --> H[Network Service Manager] + C --> I[COM-RPC Interface] +``` + +#### Sequence Diagram +```mermaid +sequenceDiagram + participant Client + participant Plugin + participant Backend + participant DBus + + Client->>Plugin: GetAvailableInterfaces + Plugin->>Plugin: Lock mutex + Plugin->>Backend: Query interfaces + Backend->>DBus: GetDevices + DBus-->>Backend: Device list + Backend-->>Plugin: Interface data + Plugin-->>Client: JSON response + Plugin->>Plugin: Unlock mutex +``` + +#### State Diagram +```mermaid +stateDiagram-v2 + [*] --> Uninitialized + Uninitialized --> Initialized: init() + Initialized --> Running: start() + Running --> Paused: pause() + Paused --> Running: resume() + Running --> Stopped: stop() + Stopped --> [*] +``` + +#### Data Flow Diagram +```mermaid +flowchart LR + A[Network Event] --> B{Event Type} + B -->|Interface State| C[State Change] + B -->|IP Changed| D[IP Update] + C --> E[Event Handler] + D --> E + E --> F[Notification Generator] + F --> G[Thunder Event] +``` + +### Step 4: Add Code Examples + +Provide clear, compilable examples: + +#### Good Example Structure +```markdown +### Example: Getting Available Interfaces + +This example shows how to query available network interfaces. + +**Prerequisites:** +- NetworkManager plugin initialized +- Valid Thunder connection + +**Code:** +```c +#include "INetworkManager.h" +#include + +int main(void) { + INetworkManager* manager = NULL; + InterfaceList interfaces = {}; + int ret = 0; + + // Get NetworkManager instance + ret = GetNetworkManager(&manager); + if (ret != 0) { + fprintf(stderr, "Failed to get NetworkManager: %d\n", ret); + return -1; + } + + // Get available interfaces + ret = manager->GetAvailableInterfaces(&interfaces); + if (ret != 0) { + fprintf(stderr, "Failed to get interfaces: %d\n", ret); + ReleaseNetworkManager(manager); + return -1; + } + + // Process interfaces + for (size_t i = 0; i < interfaces.count; i++) { + printf("Interface: %s, Type: %s, Enabled: %d\n", + interfaces.list[i].name, + interfaces.list[i].type, + interfaces.list[i].enabled); + } + + // Cleanup + FreeInterfaceList(&interfaces); + ReleaseNetworkManager(manager); + return 0; +} +``` + +**Expected Output:** +``` +Interface: eth0, Type: ETHERNET, Enabled: 1 +Interface: wlan0, Type: WIFI, Enabled: 1 +``` + +**Notes:** +- Always check return values +- Call cleanup functions even on error paths +- Interface names are platform-dependent +``` +\`\`\` + +### Step 5: Document APIs + +For each public function: + +```markdown +### GetAvailableInterfaces() + +Retrieves the list of available network interfaces on the device. + +**Signature:** +```c +Core::hresult GetAvailableInterfaces( + INetworkManager::IInterfaceDetailsIterator*& interfaces) const; +``` + +**Parameters:** +- `interfaces` - Output iterator for interface details (must be non-NULL) + +**Returns:** +- `Core::ERROR_NONE` - Success +- `Core::ERROR_GENERAL` - Failed to query backend +- `Core::ERROR_UNAVAILABLE` - Backend not initialized + +**Thread Safety:** +Thread-safe. Uses internal mutex for backend access. + +**Memory:** +Allocates iterator object. Caller must call Release() on iterator. + +**Example:** +See [Example: Getting Available Interfaces](#example-getting-available-interfaces) + +**See Also:** +- GetPrimaryInterface() +- SetInterfaceState() +- GetInterfaceState() +``` + +### Step 6: Document Threading + +For multi-threaded components: + +```markdown +## Threading Model + +### Thread Overview + +| Thread Name | Purpose | Priority | Stack Size | +|------------|---------|----------|------------| +| Thunder Main | Plugin lifecycle, JSONRPC dispatch | Normal | Default | +| DBus Event | Backend event processing | High | 64KB | +| Connectivity | Internet connectivity tests | Low | 64KB | +| STUN Client | Public IP discovery | Low | 32KB | + +### Synchronization Primitives + +```c +// Global mutexes +static std::mutex interface_mutex; // Interface state management +static std::mutex backend_mutex; // Backend access serialization + +// Condition variables +static std::condition_variable connectivity_cv; // Connectivity test sync +static std::condition_variable stun_cv; // STUN completion +``` + +### Lock Ordering + +To prevent deadlocks, always acquire locks in this order: + +1. `backend_mutex` (backend operations) +2. `interface_mutex` (interface list) +3. Individual interface state locks + +**Example:** +```cpp +// CORRECT: Proper lock ordering +std::lock_guard backend_lock(backend_mutex); +auto interface = GetInterface(name); +std::lock_guard if_lock(interface->mutex); +// ... use both resources ... + +// WRONG: Deadlock risk! +std::lock_guard if_lock(interface->mutex); +std::lock_guard backend_lock(backend_mutex); // May deadlock! +``` + +### Thread Safety Guarantees + +| Function | Thread Safety | Notes | +|----------|---------------|-------| +| GetAvailableInterfaces() | Thread-safe | Uses backend_mutex | +| SetInterfaceState() | Thread-safe | Uses backend_mutex | +| WiFiConnect() | Thread-safe | Async operation | +| IsConnectedToInternet() | Thread-safe | Uses connectivity_cv | +``` + +### Step 7: Document Memory Management + +```markdown +## Memory Management + +### Allocation Patterns + +```mermaid +graph TD + A[NetworkManager::Initialize] --> B[Allocate backend proxy] + B --> C[Connect to DBus] + C --> D[Register signal handlers] + E[GetAvailableInterfaces] --> F[Query backend] + F --> G[Allocate interface list] + H[Deinitialize] --> I[Disconnect DBus] + I --> J[Free backend proxy] +``` + +### Ownership Rules + +1. **Interface objects**: Managed by NetworkManager implementation +2. **JSON strings**: Caller owns returned strings; must free +3. **DBus proxies**: Owned by backend; lifetime tied to plugin + +### Lifecycle Example + +```cpp +// Initialization phase +NetworkManager* nm = new NetworkManager(); +nm->Initialize(service); // Connects to backend + +// Operation phase +string result; +nm->GetAvailableInterfaces(result); // Returns JSON string +// Use result... + +// WiFi operation +nm->StartWiFiScan(); // Async operation +// Notifications will arrive via events + +// Cleanup phase +nm->Deinitialize(); // Disconnects backend +delete nm; +``` + +### Memory Budget + +Typical memory usage per component: + +| Component | Static | Dynamic (per item) | Notes | +|-----------|--------|-------------------|-------| +| Plugin Instance | 2KB | - | Single instance | +| Backend Proxy | 512 bytes | +128 bytes/interface | Max ~10 interfaces | +| WiFi Scan Results | 0 | 256 bytes/AP | Temporary, freed after delivery | +| Event Handlers | 1KB | - | Signal subscriptions | + +**Total typical footprint**: ~5-10KB (plugin + backend + interfaces) +``` + +## Best Practices + +### Writing Style + +1. **Be Concise**: Get to the point quickly +2. **Be Specific**: Use exact terms, not vague descriptions +3. **Be Accurate**: Test all code examples +4. **Be Complete**: Don't leave critical details unstated +5. **Be Consistent**: Follow established patterns + +### Code Examples + +- **Always compile-test** examples before documenting +- **Show error handling** - embedded systems need robust code +- **Include cleanup** - demonstrate proper resource management +- **Add context** - explain when/why to use the code +- **Keep focused** - one example, one concept + +### Diagrams + +- **Use Mermaid** for all diagrams (version control friendly) +- **Keep simple** - max 10-12 nodes per diagram +- **Label clearly** - all arrows and nodes need names +- **Show flow** - make direction obvious +- **Add legends** - explain symbols if needed + +### Cross-References + +Link related documentation: + +```markdown +## See Also + +- [Threading Model](../architecture/threading-model.md) - Overall thread architecture +- [Connection Pool API](connection-pool.md) - Pool management functions +- [Error Codes](../api/error-codes.md) - Complete error code reference +- [Build Guide](../integration/build-setup.md) - Compilation instructions +``` + +### Platform-Specific Notes + +Always document platform variations: + +```markdown +## Platform Notes + +### Linux +- Uses C++11 or later +- Requires libnm (NetworkManager library) or RDK Network Service Manager +- DBus communication for backend +- Thunder framework dependency + +### RDK Devices +- Integration with RDK logger (rdk_debug.h) +- IARM bus support for legacy API compatibility +- Backend selection: Gnome NetworkManager or RDK NSM +- Platform-specific WiFi drivers + +### Constraints +- **Memory**: Tested with 64MB minimum +- **CPU**: ARMv7 or better +- **Network**: Ethernet and/or WiFi hardware +- **Dependencies**: Thunder, DBus, backend (libnm or NSM) +``` + +## Output Format + +### Component Documentation Template + +```markdown +# [Component Name] + +## Overview + +[2-3 sentence description] + +## Architecture + +[High-level design explanation] + +### Component Diagram +```mermaid +[Component relationship diagram] +``` + +## Key Components + +### [Structure/Type Name] + +[Description] + +```c +typedef struct { + // Fields with comments +} structure_t; +``` + +## Threading Model + +[Thread safety and synchronization] + +## Memory Management + +[Allocation patterns and ownership] + +## API Reference + +### [function_name()] + +[Full API documentation] + +## Usage Examples + +### Example: [Use Case] + +[Complete working example] + +## Error Handling + +[Error codes and recovery] + +## Performance + +[Resource usage and bottlenecks] + +## Testing + +[Test procedures and coverage] + +## See Also + +[Cross-references] +``` + +## Quality Checklist + +Before considering documentation complete: + +- [ ] All public APIs documented with signatures +- [ ] At least one working code example per major function +- [ ] Thread safety explicitly stated +- [ ] Memory ownership clearly documented +- [ ] Error codes and meanings listed +- [ ] Diagrams for complex flows +- [ ] Cross-references to related docs +- [ ] Platform-specific notes included +- [ ] Code examples compile and run +- [ ] Grammar and spelling checked +- [ ] Reviewed by component author + +## Maintenance + +Documentation is code: + +1. **Update with code changes** - docs and code change together +2. **Version documentation** - tag with releases +3. **Review periodically** - ensure accuracy quarterly +4. **Fix broken links** - validate references +5. **Deprecate carefully** - mark old features clearly + +### Deprecation Notice Template + +```markdown +## DEPRECATED: old_function() + +⚠️ **This function is deprecated as of v2.1.0** + +**Reason**: Memory leak risk in error paths + +**Alternative**: Use new_function() instead + +**Migration Example**: +```c +// Old way (deprecated) +old_function(param); + +// New way +new_function(param); +``` + +**Removal**: Scheduled for v3.0.0 (Est. Q2 2026) +``` + +## Tools Integration + +### Generate API Docs from Code + +Use Doxygen-style comments in code: + +```c +/** + * @brief Get available network interfaces + * + * Retrieves the list of all network interfaces (Ethernet, WiFi) available + * on the device along with their current state and configuration. + * + * @param[out] interfaces Iterator for interface details + * + * @return Core::ERROR_NONE on success, error code on failure + * @retval Core::ERROR_NONE Success + * @retval Core::ERROR_GENERAL Backend query failed + * @retval Core::ERROR_UNAVAILABLE Backend not initialized + * + * @note Thread-safe + * @see GetPrimaryInterface(), SetInterfaceState() + * + * @par Example: + * @code + * IInterfaceDetailsIterator* interfaces = nullptr; + * uint32_t result = networkMgr->GetAvailableInterfaces(interfaces); + * if (result == Core::ERROR_NONE) { + * // Process interfaces... + * interfaces->Release(); + * } + * @endcode + */ +Core::hresult GetAvailableInterfaces( + IInterfaceDetailsIterator*& interfaces) const; +``` + +### Diagram Tools + +- **Mermaid Live Editor**: https://mermaid.live +- **VS Code Markdown Preview**: Built-in mermaid support +- **Documentation generators**: Can embed mermaid in output + +## Troubleshooting Common Documentation Issues + +### Issue: Code example doesn't compile + +**Solution**: Always test examples in isolation +```bash +# Extract example to test file +cat > test_example.c << 'EOF' +[paste example code] +EOF + +# Compile with project flags +gcc -Wall -Wextra -I../include test_example.c -o test_example + +# Run to verify +./test_example +``` + +### Issue: Diagram is too complex + +**Solution**: Break into multiple diagrams +- One high-level overview diagram +- Multiple focused detail diagrams +- Link them together in text + +### Issue: Outdated documentation + +**Solution**: Add CI check +```bash +# Check for TODOs in docs +grep -r "TODO\|FIXME\|XXX" docs/ && exit 1 + +# Check for broken links +markdown-link-check docs/**/*.md +``` + +## Examples From This Project + +See existing documentation for reference: +- [NetworkManager Plugin API](../../../docs/NetworkManagerPlugin.md) - Complete API reference +- [README](../../../README.md) - Project overview and design +- [Build Instructions](../../../CMakeLists.txt) - Build system integration diff --git a/.github/skills/thread-safety-analyzer/SKILL.md b/.github/skills/thread-safety-analyzer/SKILL.md new file mode 100644 index 00000000..9d413f01 --- /dev/null +++ b/.github/skills/thread-safety-analyzer/SKILL.md @@ -0,0 +1,436 @@ +--- +name: thread-safety-analyzer +description: Analyze C/C++ code for thread safety issues including race conditions, deadlocks, and improper synchronization. Use when reviewing concurrent code or debugging threading issues. +--- + +# Thread Safety Analysis for Embedded C + +## Purpose + +Systematically analyze C/C++ code for thread safety issues that can cause race conditions, deadlocks, or performance degradation in embedded systems. + +## Usage + +Invoke this skill when: +- Reviewing multi-threaded code +- Debugging race conditions or deadlocks +- Optimizing synchronization overhead +- Validating thread creation and cleanup +- Investigating lock contention issues + +## Analysis Process + +### Step 1: Identify Shared Data + +Search for global and static variables: +- Global variables (especially non-const) +- Static variables in functions +- Shared heap allocations +- Reference-counted objects + +For each shared variable, verify: +1. How is it protected (mutex, atomic, etc.)? +2. Is the protection consistent across all accesses? +3. Are reads and writes both protected? +4. Is initialization thread-safe? + +### Step 2: Review Thread Creation + +Check all pthread_create calls: +- Are thread attributes used? +- Is stack size specified? +- Are threads detached or joinable? +- Is cleanup properly handled? + +```c +// CHECK FOR: +pthread_t thread; +pthread_create(&thread, NULL, func, arg); // BAD: No attributes + +// SHOULD BE: +pthread_attr_t attr; +pthread_attr_init(&attr); +pthread_attr_setstacksize(&attr, 64 * 1024); // Explicit size +pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); +pthread_create(&thread, &attr, func, arg); +pthread_attr_destroy(&attr); +``` + +### Step 3: Analyze Lock Usage + +For each mutex/rwlock: +- Is it initialized before use? +- Is it destroyed when done? +- Are lock/unlock pairs balanced? +- What is the lock ordering? +- Are locks held during expensive operations? + +Common patterns to check: +```c +// Pattern 1: Missing unlock on error path +pthread_mutex_lock(&lock); +if (error) return -1; // LEAK! +pthread_mutex_unlock(&lock); + +// Pattern 2: Lock ordering violation +// Thread 1: +pthread_mutex_lock(&a); +pthread_mutex_lock(&b); + +// Thread 2: +pthread_mutex_lock(&b); // Different order! +pthread_mutex_lock(&a); // DEADLOCK RISK! + +// Pattern 3: Heavy lock for simple operation +pthread_rwlock_wrlock(&lock); // Too heavy +counter++; +pthread_rwlock_unlock(&lock); +// Should use atomic_int instead +``` + +### Step 4: Check for Race Conditions + +Look for unprotected accesses to shared data: + +```c +// RACE: Read-modify-write without protection +if (shared_flag == 0) { // Thread 1 reads + shared_flag = 1; // Thread 2 also reads before Thread 1 writes +} + +// FIX: Use atomic or lock +pthread_mutex_lock(&lock); +if (shared_flag == 0) { + shared_flag = 1; +} +pthread_mutex_unlock(&lock); + +// OR: Use atomic compare-and-swap +int expected = 0; +atomic_compare_exchange_strong(&shared_flag, &expected, 1); +``` + +### Step 5: Verify Atomic Usage + +For atomic variables: +- Are they declared with proper type (atomic_int, atomic_bool)? +- Is memory ordering appropriate? +- Are non-atomic operations mixed with atomic ones? + +```c +// CHECK: +atomic_int counter; + +// GOOD: Atomic operations +atomic_fetch_add(&counter, 1); +int value = atomic_load(&counter); + +// BAD: Mixing atomic and non-atomic +counter++; // Non-atomic! Use atomic_fetch_add +``` + +### Step 6: Deadlock Detection + +Check for common deadlock patterns: + +1. **Circular wait**: Lock A → Lock B, Lock B → Lock A +2. **Lock held while waiting**: Mutex held during sleep/wait +3. **Missing timeout**: Indefinite blocking without timeout +4. **Signal under lock**: Condition signal while holding mutex + +```c +// Deadlock Pattern 1: Circular dependency +// Function 1: +lock(mutex_a); +lock(mutex_b); // Order: A, B + +// Function 2: +lock(mutex_b); +lock(mutex_a); // Order: B, A - DEADLOCK! + +// Deadlock Pattern 2: Lock held during expensive operation +lock(mutex); +expensive_network_call(); // Blocks other threads! +unlock(mutex); + +// Deadlock Pattern 3: No timeout +pthread_mutex_lock(&lock); // Waits forever if deadlock +``` + +### Step 7: Check Condition Variables + +For condition variables: +- Is wait always in a loop? +- Is predicate checked before and after wait? +- Is signal/broadcast done correctly? +- Is spurious wakeup handled? + +```c +// GOOD: Proper condition variable usage +pthread_mutex_lock(&mutex); +while (!condition) { // Loop for spurious wakeups + pthread_cond_wait(&cond, &mutex); +} +// ... use protected data ... +pthread_mutex_unlock(&mutex); + +// Signal: +pthread_mutex_lock(&mutex); +condition = true; +pthread_cond_signal(&cond); +pthread_mutex_unlock(&mutex); + +// BAD: Missing loop +pthread_mutex_lock(&mutex); +if (!condition) { // Should be 'while'! + pthread_cond_wait(&cond, &mutex); +} +pthread_mutex_unlock(&mutex); +``` + +## Common Issues and Fixes + +### Issue: Default Thread Stack Size + +```c +// PROBLEM: Wastes memory (8MB per thread) +pthread_t thread; +pthread_create(&thread, NULL, worker, arg); + +// FIX: Specify minimal stack size +pthread_attr_t attr; +pthread_attr_init(&attr); +pthread_attr_setstacksize(&attr, 64 * 1024); // 64KB +pthread_create(&thread, &attr, worker, arg); +pthread_attr_destroy(&attr); +``` + +### Issue: Heavy Synchronization + +```c +// PROBLEM: Reader-writer lock overkill +pthread_rwlock_t lock; +int counter; + +void increment() { + pthread_rwlock_wrlock(&lock); + counter++; + pthread_rwlock_unlock(&lock); +} + +// FIX: Use atomic operations +atomic_int counter; + +void increment() { + atomic_fetch_add(&counter, 1); // No lock needed +} +``` + +### Issue: Lock Ordering Violation + +```c +// PROBLEM: Different lock orders cause deadlock +// Thread 1: +void process_a_then_b() { + lock(&resource_a.lock); + lock(&resource_b.lock); + // ... +} + +// Thread 2: +void process_b_then_a() { + lock(&resource_b.lock); + lock(&resource_a.lock); // DEADLOCK! + // ... +} + +// FIX: Consistent ordering everywhere +void process_a_then_b() { + lock(&resource_a.lock); // Always A first + lock(&resource_b.lock); // Then B + // ... +} + +void process_b_then_a() { + lock(&resource_a.lock); // Always A first + lock(&resource_b.lock); // Then B + // ... +} +``` + +### Issue: Race in Lazy Initialization + +```c +// PROBLEM: Non-thread-safe initialization +static config_t* config = NULL; + +config_t* get_config() { + if (!config) { // Race here! + config = malloc(sizeof(config_t)); + init_config(config); + } + return config; +} + +// FIX: Use pthread_once +static pthread_once_t init_once = PTHREAD_ONCE_INIT; +static config_t* config = NULL; + +static void init_config_once() { + config = malloc(sizeof(config_t)); + init_config(config); +} + +config_t* get_config() { + pthread_once(&init_once, init_config_once); + return config; +} +``` + +### Issue: Missing Lock on Error Path + +```c +// PROBLEM: Lock not released on error +int process_data(data_t* shared) { + pthread_mutex_lock(&shared->lock); + + if (validate(shared) != 0) { + return -1; // BUG: Lock not released! + } + + update(shared); + pthread_mutex_unlock(&shared->lock); + return 0; +} + +// FIX: Unlock on all paths +int process_data(data_t* shared) { + int ret = 0; + + pthread_mutex_lock(&shared->lock); + + if (validate(shared) != 0) { + ret = -1; + goto cleanup; + } + + update(shared); + +cleanup: + pthread_mutex_unlock(&shared->lock); + return ret; +} +``` + +### Issue: Long Critical Section + +```c +// PROBLEM: Expensive operation under lock +pthread_mutex_lock(&lock); +for (int i = 0; i < 1000000; i++) { + compute(); // Blocks other threads! +} +shared_result = final_value; +pthread_mutex_unlock(&lock); + +// FIX: Minimize critical section +int result = 0; +for (int i = 0; i < 1000000; i++) { + result += compute(); // No lock +} + +pthread_mutex_lock(&lock); +shared_result = result; // Lock only for update +pthread_mutex_unlock(&lock); +``` + +## Testing for Thread Safety + +### Compile with Thread Sanitizer + +```bash +# Build with thread sanitizer +gcc -g -fsanitize=thread -O1 source.c -o program -lpthread + +# Run +./program + +# Will report: +# - Data races +# - Lock ordering issues +# - Potential deadlocks +``` + +### Run Helgrind + +```bash +# Check for thread safety issues +valgrind --tool=helgrind \ + --track-lockorders=yes \ + ./program + +# Reports: +# - Race conditions +# - Lock order violations +# - Possible deadlocks +``` + +### Stress Testing + +```c +// Test under high concurrency +#define NUM_THREADS 100 +#define ITERATIONS 10000 + +void stress_test() { + pthread_t threads[NUM_THREADS]; + + for (int i = 0; i < NUM_THREADS; i++) { + pthread_create(&threads[i], NULL, worker, NULL); + } + + for (int i = 0; i < NUM_THREADS; i++) { + pthread_join(threads[i], NULL); + } + + // Verify invariants + assert(shared_counter == NUM_THREADS * ITERATIONS); +} +``` + +## Output Format + +Provide findings as: + +``` +## Thread Safety Analysis + +### Critical Issues (must fix) +1. [file.c:123] Race condition - unprotected access to shared_flag +2. [file.c:456] Deadlock potential - lock order violation (A→B vs B→A) +3. [file.c:789] Lock leak - mutex not released on error path + +### Warnings (should fix) +1. [file.c:234] Default thread stack - wastes 8MB per thread +2. [file.c:567] Heavy lock - use atomic_int instead of mutex +3. [file.c:890] Long critical section - holds lock during I/O + +### Recommendations +1. Establish lock ordering convention (document in header) +2. Use pthread_once for singleton initialization +3. Replace reader-writer locks with atomics for counters +4. Add thread sanitizer to CI pipeline + +### Suggested Fixes +[Provide specific code changes for each issue] +``` + +## Verification + +After fixes: +1. Thread sanitizer shows no errors +2. Helgrind reports clean +3. Stress tests pass consistently +4. Lock contention metrics acceptable +5. No deadlocks under load testing +6. Code review confirms thread safety diff --git a/.github/skills/triage-logs/SKILL.md b/.github/skills/triage-logs/SKILL.md new file mode 100644 index 00000000..d11d6be4 --- /dev/null +++ b/.github/skills/triage-logs/SKILL.md @@ -0,0 +1,303 @@ +--- +name: triage-logs +description: > + Triage any Telemetry 2.0 behavioral issue on RDK devices by correlating device + log bundles with source code. Covers hangs, under-reporting, over-reporting, + duplicate reports, CPU/memory spikes, scheduler anomalies, rbus problems, and + test gap analysis. The user states the issue; this skill guides systematic + root-cause analysis regardless of issue type. +--- + +# Telemetry 2.0 Issue Triage Skill + +## Purpose + +Systematically correlate device log bundles with Telemetry 2.0 source code to +identify root causes, characterize impact, and propose unit-test and +functional-test reproduction scenarios — for **any** behavioral anomaly reported +by the user. + +--- + +## Usage + +Invoke this skill when: +- A device log bundle is available under `logs/` (or attached separately) +- The user describes a behavioral anomaly (examples: daemon stuck, reports + missing, too many reports sent, reports arriving late, high CPU, high memory, + unexpected profile activation, marker counts wrong) +- You need to write a reproduction scenario for an existing or proposed fix + +**The user's stated issue drives the investigation.** Do not assume a specific +failure mode — read the issue description first, then follow the steps below. + +--- + +## Step 1: Orient to the Log Bundle + +**Log bundle layout** (typical RDK device): +``` +logs///logs/ + telemetry2_0.txt.0 ← Primary T2 daemon log (start here) + GatewayManagerLog.txt.0 ← WAN/gateway state machine + WanManager*.txt.0 ← WAN interface transitions + PAMlog.txt.0 ← Platform/parameter management + SelfHeal*.txt.0 ← Watchdog and recovery events + top_log.txt.0 ← CPU/memory snapshots (useful for perf issues) + messages.txt.0 ← Kernel and system messages +``` + +Include any log files surfaced by the user's issue description (e.g., `cellular*.txt.0` +for connectivity issues, `syslog` for OOM events). + +**Log timestamp prefix format**: `YYMMDD-HH:MM:SS.uuuuuu` +- Session folder names are **local-time snapshots** (format: `MM-DD-YY-HH:MMxM`) +- Log lines inside use device local time — always confirm via `[Time]` field + in telemetry reports (`"Time":"2026-03-06 07:24:23"`) +- Report JSON `"timestamp"` fields are Unix epoch UTC + +**Session ordering**: Sort session folders chronologically. Multiple sessions may +represent reboots. Alphabetical sort does NOT equal chronological order. + +--- + +## Step 2: Map Profiles and Threads + +Read the startup section of `telemetry2_0.txt.0` (first ~50 lines) to identify: + +| What to find | Log pattern | +|---|---| +| Profile name | `Profile Name : ` | +| Reporting interval | `Waiting for sec for next TIMEOUT` | +| Timeout thread TID | `TIMEOUT for profile - ` (first occurrence) | +| CollectAndReport TID | `CollectAndReport ++in profileName : ` (first occurrence) | +| Send mechanism | `methodName = Device.X_RDK_Xmidt.SendData` (rbus) or `HTTP_CODE` (curl) | + +**Thread role map** (look for TID in `TimeoutThread` context): +- `TimeoutThread` per profile — fires `TIMEOUT for profile` log lines +- `CollectAndReport` / `CollectAndReportXconf` — one per profile, generates/sends reports +- `asyncMethodHandler` — short-lived rbus handler thread, called when `SendData` is dispatched + +--- + +## Step 3: Identify the Anomaly Window + +Based on the **user's stated issue**, search for the relevant evidence pattern: + +### Hang / Stuck Daemon +A reporting hang manifests as a **timestamp gap** between `CollectAndReport ++in` and +the next report-related log line from the same TID. +``` +grep -n "CollectAndReport" telemetry2_0.txt.0 | head -40 +``` +Gap > 1 reporting interval = anomaly. During the gap, check: +- Is `asyncMethodHandler` ever logged? (no → rbus provider unresponsive) +- Does `TIMEOUT for profile` still fire? (yes → TimeoutThread alive but CollectAndReport stuck) + +### Under-Reporting / Missing Reports +Look for expected `TIMEOUT for profile` events that never trigger a `CollectAndReport`: +``` +grep -n "TIMEOUT for profile\|CollectAndReport ++in\|Return status" telemetry2_0.txt.0 +``` +- Count `TIMEOUT` events vs. `CollectAndReport` entries over a time window +- Check for `SendInterruptToTimeoutThread` logged as failed (EBUSY path) — signals silently dropped +- Check for profile deactivation or reload during expected report window + +### Over-Reporting / Duplicate Reports +Look for multiple `CollectAndReport ++in` within a single interval: +``` +grep -n "CollectAndReport ++in\|TIMEOUT for profile" telemetry2_0.txt.0 +``` +- Multiple `TIMEOUT` signals in one interval → concurrent interrupt and scheduler fire +- Report-on-condition (`T2ERROR_SUCCESS` after a marker event) firing alongside periodic report +- Check `signalrecived_and_executing` global flag race (concurrent profile callbacks) + +### CPU / Memory Spikes +Correlate `top_log.txt.0` timestamps with T2 activity: +``` +grep -n "telemetry2" top_log.txt.0 +``` +- Identify what T2 was doing (profile scan, DCA grep, report generation, rbus call) at spike time +- Check DCA log grep operations (`dca.c`, `dcautil.c`) for large log files causing high CPU +- Check marker accumulation in `t2markers.c` for memory growth +- Check if multiple profiles overlap their `CollectAndReport` window + +### Profile / Configuration Anomalies +- Unexpected profile changes: `grep -n "profile\|xconf" telemetry2_0.txt.0 | grep -i "receiv\|updat\|activ"` +- Marker count mismatches: compare report JSON marker values against grep patterns in `dca.c` +- Wrong reporting interval: confirm `Waiting for sec` matches profile definition + +--- + +## Step 4: Correlate with Other Component Logs + +Based on the anomaly window identified in Step 3, cross-reference with other logs: + +| Issue Type | Companion Log | What to Look For | +|---|---|---| +| Hang / rbus block | `GatewayManagerLog.txt.0` | WAN/interface state changes within hang window | +| Hang / rbus block | `WanManager*.txt.0` | Interface up/down transitions | +| Under-reporting | `SelfHeal*.txt.0` | Watchdog restarts of telemetry2_0 process | +| Over-reporting | `PAMlog.txt.0` | Parameter changes triggering report-on-condition | +| CPU spike | `top_log.txt.0` | CPU% at anomaly timestamps | +| Memory growth | `messages.txt.0` | OOM killer events, slab usage | +| Profile changes | Any xconf response log | Profile push or xconf poll activity | + +A tight coupling between an external event (state change, parameter update, restart) +and the T2 anomaly window is the primary indicator of cause vs. coincidence. + +--- + +## Step 5: Locate the Code Path + +Navigate to the relevant source based on the anomaly type. Key modules: + +### Scheduler (`source/scheduler/scheduler.c`) + +Controls when profiles fire. Key paths: +- **`TimeoutThread`** — per-profile thread; calls `timeoutNotificationCb` while holding `tMutex` +- **`SendInterruptToTimeoutThread`** — uses `pthread_mutex_trylock`; if `tMutex` is held + (callback in progress), the interrupt is **silently dropped** (EBUSY returns `T2ERROR_FAILURE`) +- **`signalrecived_and_executing`** — global flag with no atomic protection; susceptible + to concurrent-write races under multi-profile load + +### Profile / Report Generation (`source/bulkdata/profile.c`, `profilexconf.c`, `reportprofiles.c`) + +- `CollectAndReport` / `CollectAndReportXconf` hold `plMutex` or `reuseThreadMutex` + for the entire report lifecycle (collection + send) +- **rbus send** (`rbusMethod_Invoke` / `rbusMethod_InvokeAsync`) has **no timeout** — + a blocked rbus provider blocks the entire thread indefinitely +- Report-on-condition logic in `reportprofiles.c` can fire concurrently with a + periodic send if synchronization is missing + +### Data Collection / CPU (`source/dcautil/dca.c`, `dcautil.c`, `dcacpu.c`, `dcamem.c`) + +- DCA log-grep is I/O and CPU intensive; large log files can cause CPU spikes +- `dcacpu.c` and `dcamem.c` sample system resources; misreads can cause false markers +- Marker accumulation without cleanup (`t2markers.c`) can grow heap over time + +### Profile Configuration (`source/t2parser/`, `source/bulkdata/profilexconf.c`) + +- Profile reception, parsing, and activation path for xconf-sourced profiles +- Incorrect interval parsing or duplicate profile names can cause + over-scheduling or silent deactivation + +### Transport Layer (`source/protocol/http/`, `source/protocol/rbusMethod/`) + +- HTTP send failures, retry logic, and cached-report replay +- rbus method provider registration and response handling + +--- + +## Step 6: Characterize Root Cause + +Use this matrix to classify the issue based on observed evidence: + +| Observed Pattern | Issue Class | Primary Code Location | +|---|---|---| +| rbus call blocks > 10s, no `asyncMethodHandler` logged | Rbus provider unresponsive | `profile.c` / rbus transport | +| `Signal Thread To restart` logged but no report follows | Interrupt signal dropped (EBUSY on `tMutex`) | `scheduler.c:SendInterruptToTimeoutThread` | +| `TIMEOUT` fires but `CollectAndReport` never starts | Thread pool exhausted or profile in error state | `scheduler.c`, `reportprofiles.c` | +| `TIMEOUT` entries missing for > 2 intervals | TimeoutThread stuck, exited, or profile deregistered | `scheduler.c:TimeoutThread` | +| Multiple `CollectAndReport ++in` within one interval | Over-scheduling: concurrent interrupt + periodic fire | `scheduler.c`, `reportprofiles.c` | +| Long gap between `++in` and `--out` with HTTP errors | Network failure; cached report retry loop | `profilexconf.c`, HTTP transport | +| Report JSON marker counts lower than expected | DCA grep miss, log rotation during scan, or marker not registered | `dca.c`, `t2markers.c` | +| Report JSON marker counts higher than expected | Duplicate marker registration or over-counting in DCA | `dca.c`, `t2markers.c` | +| `signalrecived_and_executing` logic inconsistency | Unsynchronized global flag race | `scheduler.c` (global variable) | +| CPU spike during report window | Large log file DCA grep or concurrent profile collection | `dcautil.c`, `dca.c` | +| Memory growth over sessions | Marker list not freed, profile not cleaned up on deregister | `t2markers.c`, `profile.c` | +| Profile activated/deactivated unexpectedly | xconf push race or profile name collision | `profilexconf.c`, `t2parser/` | + +--- + +## Step 7: Assess L1 (Unit) Test Coverage + +**Location**: `source/test/` + +**Existing coverage** (representative): +- `schedulerTest.cpp`: basic `SendInterruptToTimeoutThread`, `TimeoutThread` single-run, + profile register/unregister lifecycle +- `profileTest.cpp`: profile creation, marker accumulation, basic report generation +- `dcaTest.cpp`: grep pattern matching, marker extraction + +**Identify gaps relevant to the issue**. For each gap, write a test template: + +``` +Test Name: +Setup: +Action: +Assert: +File: source/test// +``` + +**Common gap areas** (match to the issue class): +- Scheduler signal dropped when `tMutex` held during callback (EBUSY path) +- `CollectAndReport` blocked while scheduler fires multiple subsequent timeouts +- DCA grep on a large/rotating log file — correct marker counts +- Profile re-activation during an active `CollectAndReport` — no double-send +- Memory freed correctly when profile is deregistered mid-cycle +- `signalrecived_and_executing` flag read/write under concurrent profile load + +--- + +## Step 8: Assess L2 (Functional) Test Coverage + +**Location**: `test/functional-tests/features/` + +**Existing scenarios** (from `.feature` files): +- `telemetry_process_singleprofile.feature` — caching on send failure +- `telemetry_process_multiprofile.feature` — multi-profile interaction +- `telemetry_bootup_sequence.feature`, `telemetry_runs_as_daemon.feature` +- `telemetry_process_tempProfile.feature`, `telemetry_xconf_communication.feature` + +**Identify the missing scenario** that would catch the reported issue. Write a +Gherkin outline covering: +1. The precondition (profile active, network state, system load) +2. The triggering event (external state change, concurrent interrupt, large log file, etc.) +3. The correct observable outcome (report sent within interval, no duplicate, CPU within bounds) +4. The failure observable outcome (what the bug produces vs. what is expected) + +```gherkin +Feature: + + Scenario: + Given + And + When + Then + And +``` + +--- + +## Step 9: Document Findings + +Produce a triage report with: +1. **Issue restatement**: confirm back the user's stated problem in one sentence +2. **Device context**: MAC, firmware, session timestamp(s) examined +3. **Anomaly timeline**: exact timestamps, thread IDs, duration or frequency +4. **Root cause chain**: numbered steps, each with log evidence + source code reference +5. **L1 test gap**: which test file, test name, and what assertion it makes +6. **L2 test gap**: Gherkin scenario outline +7. **Proposed fix**: minimum-scope change — file, function, and what to change + +--- + +## Common Pitfalls + +- **Timestamp confusion**: Log header `260306-HH:MM:SS` = `2026-03-06`; report JSON + `"timestamp":"177xxxxxxx.xx"` is Unix epoch UTC — do not mix them +- **Session folder order**: Alphabetical sort does NOT equal chronological order +- **`signalrecived_and_executing`**: This global has a typo in the source ("recived") — + search for it exactly as spelled +- **EBUSY ≠ deadlock**: The `trylock` in `SendInterruptToTimeoutThread` prevents + deadlock but causes **silent signal loss** — the thread is not stuck, the interrupt + was simply never delivered +- **`asyncMethodHandler` absence**: No log of this thread during an rbus call means + the rbus provider never received the request — distinguish from a network-only issue +- **Double-log artifact**: T2 logs `methodName = ...` twice per send in some builds — + this is a logging artifact, not two actual sends; verify by counting `Return status` lines +- **Profile count vs. report count**: A profile may be registered but never reach + `CollectAndReport` if upstream conditions are not met — trace from `TIMEOUT` forward +- **DCA grep on rotated logs**: If a log file rotates mid-scan, DCA may return 0 for + a marker that was incremented — correlates to under-reporting without any error log diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..fb1536be --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +# Code review artifacts +reviews/ diff --git a/docs/PROJECT_OVERVIEW.md b/docs/PROJECT_OVERVIEW.md new file mode 100644 index 00000000..94449aca --- /dev/null +++ b/docs/PROJECT_OVERVIEW.md @@ -0,0 +1,762 @@ +# NetworkManager Thunder Plugin - Project Overview + +## Overview + +The **NetworkManager Thunder Plugin** (v2.0.0) is a unified, out-of-process Thunder plugin responsible for configuring and managing all networking and WiFi interfaces on RDK devices. It replaces the legacy Network and WiFiManager Thunder plugins, providing a consistent API across multiple backend implementations. + +**Key Capabilities:** +- Unified management of Ethernet and WiFi interfaces +- Support for multiple backends (Gnome NetworkManager, RDK Network Service Manager) +- COM-RPC and JSON-RPC API support +- Internet connectivity testing with captive portal detection +- STUN-based public IP discovery +- WiFi scanning, connection management, and signal quality monitoring + +## Architecture + +### High-Level Design + +The plugin operates out-of-process, communicating with backend network services over DBus. This design provides isolation, stability, and flexibility to support multiple platform backends. + +```mermaid +graph TB + A[Thunder Framework] --> B[NetworkManager Plugin] + B --> C[JSONRPC Handler] + B --> D[COM-RPC Interface] + B --> E[Implementation Layer] + E --> F[Backend Proxy] + F --> G{Backend Selection} + G -->|Compile Time| H[Gnome NetworkManager] + G -->|Compile Time| I[RDK NSM] + H --> J[DBus: org.freedesktop.NetworkManager] + I --> K[DBus: RDK Network Service] + E --> L[Connectivity Monitor] + E --> M[STUN Client] + B --> N[Legacy API Bridge] + N --> O[IARM Bus] +``` + +### Component Relationships + +The plugin architecture consists of several key layers: + +1. **Thunder Plugin Layer** (`NetworkManager.cpp`) + - Plugin lifecycle management (Initialize, Deinitialize) + - JSONRPC method dispatch + - Event notification to Thunder clients + +2. **Implementation Layer** (`NetworkManagerImplementation.cpp`) + - Core business logic + - Connectivity monitoring + - STUN client integration + - Platform-independent abstractions + +3. **Backend Proxy Layer** + - **Gnome Backend** (`plugin/gnome/`) + - DBus communication with NetworkManager daemon + - Event handling via GLib main loop + - WiFi scanning and connection management + - **RDK Backend** (`plugin/rdk/`) + - DBus communication with RDK Network Service Manager + - RDK-specific platform integration + +4. **Legacy Bridge Layer** (`legacy/`) + - IARM bus compatibility + - Legacy Network and WiFiManager API emulation + +5. **Support Components** + - Connectivity testing with configurable endpoints + - STUN client for public IP discovery + - Telemetry integration (T2) + +## Key Components + +### NetworkManager Plugin Class + +The main plugin class implementing Thunder's IPlugin interface: + +```cpp +class NetworkManager : public PluginHost::IPlugin, + public PluginHost::JSONRPC, + public PluginHost::ISubSystem::IInternet +``` + +**Responsibilities:** +- Plugin activation and deactivation +- JSONRPC method registration +- Event notification management +- COM-RPC interface exposure + +### NetworkManagerImplementation Class + +Core implementation providing INetworkManager interface: + +```cpp +class NetworkManagerImplementation : public Exchange::INetworkManager +``` + +**Key Methods:** +- `GetAvailableInterfaces()` - Query Ethernet and WiFi interfaces +- `SetInterfaceState()` - Enable/disable interfaces +- `GetIPSettings()` / `SetIPSettings()` - IP configuration management +- `WiFiConnect()` / `WiFiDisconnect()` - WiFi connection control +- `StartWiFiScan()` / `StopWiFiScan()` - WiFi scanning +- `IsConnectedToInternet()` - Internet connectivity verification +- `Ping()` / `Trace()` - Network diagnostics + +### Backend Proxies + +#### Gnome Backend +Files: `NetworkManagerGnomeProxy.cpp`, `NetworkManagerGnomeWIFI.cpp`, `NetworkManagerGnomeEvents.cpp` + +Communicates with Gnome NetworkManager daemon via DBus: +- Interface management using libnm APIs +- WiFi scanning and connection +- Event subscription and propagation + +#### RDK Backend +Files: `NetworkManagerRDKProxy.cpp` + +Communicates with RDK Network Service Manager: +- Platform-specific network operations +- Integration with RDK ecosystem + +### Connectivity Monitor + +File: `NetworkManagerConnectivity.cpp` + +Monitors internet connectivity by testing configured endpoints: +- Periodic HTTP requests to test endpoints (default: clients3.google.com) +- Captive portal detection +- Limited vs. full internet detection +- Configurable test interval (default: 6 seconds) + +```cpp +class ConnectivityMonitor +{ + void startContinuousConnectivityMonitor(int timeoutInSeconds); + bool isConnectedToInternet(string& ipversion, bool& isLimitedInternet); + InternetStatus getInternetStatus(); +}; +``` + +### STUN Client + +File: `NetworkManagerStunClient.cpp` + +Discovers device's public IP address using STUN protocol: +- Configurable STUN server endpoint +- Asynchronous public IP retrieval +- Timeout handling + +## Threading Model + +### Thread Overview + +| Thread Name | Purpose | Priority | Stack Size | +|------------|---------|----------|------------| +| Thunder Main | Plugin lifecycle, JSONRPC dispatch | Normal | Default | +| GLib Main Loop | DBus event processing (Gnome backend) | High | 64KB | +| Connectivity Monitor | Internet connectivity tests | Low | 64KB | +| STUN Client | Public IP discovery requests | Low | 32KB | +| WiFi Scan | Asynchronous WiFi scanning | Normal | 64KB | + +### Synchronization Primitives + +```cpp +// Backend access synchronization +static std::mutex backend_mutex; // Serializes backend operations +static std::mutex interface_mutex; // Protects interface list + +// Connectivity monitoring +static std::condition_variable connectivity_cv; // Connectivity test sync +static std::mutex connectivity_mutex; + +// STUN operations +static std::mutex stun_mutex; +``` + +### Lock Ordering + +To prevent deadlocks, always acquire locks in this order: + +1. `backend_mutex` (backend operations) +2. `interface_mutex` (interface list) +3. Component-specific locks (connectivity, STUN) + +### Thread Safety Guarantees + +| API Method | Thread Safety | Notes | +|-----------|---------------|-------| +| GetAvailableInterfaces() | Thread-safe | Backend mutex protected | +| SetInterfaceState() | Thread-safe | Backend mutex protected | +| WiFiConnect() | Thread-safe | Async operation with callback | +| IsConnectedToInternet() | Thread-safe | Connectivity mutex protected | +| StartWiFiScan() | Thread-safe | Async scan with event notification | +| GetPublicIP() | Thread-safe | STUN mutex protected | + +## Memory Management + +### Allocation Patterns + +```mermaid +graph TD + A[Plugin::Initialize] --> B[Create Implementation] + B --> C[Connect to Backend] + C --> D[Register DBus Signals] + D --> E[Start Connectivity Monitor] + E --> F[Initialize STUN Client] + + G[Plugin::Deinitialize] --> H[Stop Monitors] + H --> I[Disconnect Backend] + I --> J[Free Implementation] +``` + +### Ownership Rules + +1. **Plugin Instance**: Managed by Thunder framework +2. **Implementation Instance**: Owned by plugin, created at Initialize +3. **Backend Proxies**: Singleton pattern, lifetime tied to plugin +4. **Interface Objects**: Allocated on query, caller must release iterators +5. **Event Strings**: Allocated by implementation, freed after event dispatch +6. **JSON Results**: Caller owns returned strings + +### Lifecycle Example + +```cpp +// Plugin activation +NetworkManager* plugin = new NetworkManager(); +plugin->Initialize(service); // Creates implementation + +// Implementation lifecycle +impl = new NetworkManagerImplementation(); +impl->Initialize(config); // Connects backend, starts monitors + +// Operation phase - minimal allocations +impl->GetAvailableInterfaces(iterator); // Allocates iterator +// ... use iterator ... +iterator->Release(); // Caller releases + +// Deactivation +impl->Deinitialize(); // Stops monitors, disconnects backend +delete impl; +plugin->Deinitialize(); +delete plugin; +``` + +### Memory Budget + +| Component | Static | Dynamic (per item) | Notes | +|-----------|--------|-------------------|-------| +| Plugin Instance | 2KB | - | Single instance | +| Implementation | 4KB | - | Single instance | +| Backend Proxy | 1KB | +256 bytes/interface | Max ~10 interfaces | +| WiFi Scan Results | 0 | 512 bytes/AP | Temporary, freed after delivery | +| Connectivity Monitor | 2KB | +1KB/endpoint | Max 5 endpoints | +| STUN Client | 1KB | +512 bytes/request | Temporary buffer | +| Event Handlers | 512 bytes | - | DBus signal subscriptions | + +**Total typical footprint**: ~15-20KB (plugin + backend + monitors) + +## API Interface + +### COM-RPC Interface + +Defined in `INetworkManager.h`, provides type-safe C++ interface: + +```cpp +namespace Exchange +{ + struct INetworkManager: virtual public Core::IUnknown + { + // Interface management + virtual uint32_t GetAvailableInterfaces( + IInterfaceDetailsIterator*& interfaces) const = 0; + virtual uint32_t GetPrimaryInterface(string& interface) const = 0; + virtual uint32_t SetInterfaceState( + const string& interface, const bool enabled) = 0; + + // IP configuration + virtual uint32_t GetIPSettings( + const string& interface, + const string& ipversion, + IPAddress& result) const = 0; + virtual uint32_t SetIPSettings( + const string& interface, + const IPAddress& address) = 0; + + // WiFi operations + virtual uint32_t StartWiFiScan(const FrequencyType frequency) = 0; + virtual uint32_t WiFiConnect(const WIFIConnectParam& param) = 0; + virtual uint32_t WiFiDisconnect() = 0; + + // Connectivity + virtual uint32_t IsConnectedToInternet( + string& ipversion, + InternetStatus& result) = 0; + + // Diagnostics + virtual uint32_t Ping( + const string& endpoint, + const uint32_t packets, + string& result) = 0; + }; +} +``` + +### JSON-RPC API + +Thunder JSONRPC interface for web/JavaScript clients. Full specification in [NetworkManagerPlugin.md](NetworkManagerPlugin.md). + +**Example Request:** +```json +{ + "jsonrpc": "2.0", + "id": 1, + "method": "NetworkManager.1.GetAvailableInterfaces" +} +``` + +**Example Response:** +```json +{ + "jsonrpc": "2.0", + "id": 1, + "result": { + "interfaces": [ + { + "type": "ETHERNET", + "name": "eth0", + "mac": "00:11:22:33:44:55", + "enabled": true, + "connected": true + }, + { + "type": "WIFI", + "name": "wlan0", + "mac": "AA:BB:CC:DD:EE:FF", + "enabled": true, + "connected": false + } + ] + } +} +``` + +## Event Notifications + +The plugin sends asynchronous event notifications for state changes: + +| Event | Trigger | Data | +|-------|---------|------| +| onInterfaceStateChange | Interface enabled/disabled | interface, state | +| onActiveInterfaceChange | Primary interface changed | prevInterface, currInterface | +| onIPAddressChange | IP acquired or lost | interface, ipversion, address, status | +| onInternetStatusChange | Internet connectivity changed | prevStatus, currStatus, interface | +| onAvailableSSIDs | WiFi scan completed | JSON array of SSIDs | +| onWiFiStateChange | WiFi connected/disconnected | state, ssid | +| onWiFiSignalQualityChange | Signal strength changed | ssid, strength, quality | + +**Example Event:** +```json +{ + "jsonrpc": "2.0", + "method": "NetworkManager.1.onIPAddressChange", + "params": { + "interface": "eth0", + "ipversion": "IPv4", + "ipaddress": "192.168.1.100", + "status": "ACQUIRED" + } +} +``` + +## Backend Support + +### Compile-Time Backend Selection + +Backend is selected at build time via CMake options: + +```cmake +# Gnome NetworkManager backend (default for Linux) +cmake -DUSE_GNOME_NETWORK_MANAGER=ON .. + +# RDK Network Service Manager backend +cmake -DUSE_RDK_NSM=ON .. +``` + +### Gnome Backend Details + +**Dependencies:** +- libnm (NetworkManager library) >= 1.20 +- GLib 2.0 +- DBus system bus + +**DBus Service:** `org.freedesktop.NetworkManager` + +**Key Features:** +- Full NetworkManager API support +- WiFi WPA/WPA2/WPA3 support +- 802.1x enterprise WiFi +- Advanced IP configuration (static, DHCP, IPv6) + +### RDK Backend Details + +**Dependencies:** +- RDK Network Service Manager +- RDK IARM bus library +- RDK logger + +**DBus Service:** Platform-specific RDK service + +**Key Features:** +- RDK platform integration +- Telemetry support +- Legacy IARM compatibility + +## Platform Notes + +### Linux Desktop/Server +- Uses C++11 or later +- Requires Gnome NetworkManager backend +- DBus system bus required +- Tested on Ubuntu 20.04+, Fedora 35+ + +### RDK Devices (Set-Top Boxes, Gateways) +- ARMv7 or ARMv8 CPU +- RDK-B or RDK-V platform +- Integration with RDK logger (`rdk_debug.h`) +- IARM bus support for legacy APIs +- Backend: Gnome NetworkManager or RDK NSM +- Platform-specific WiFi drivers + +### Resource Constraints +- **Memory**: 64MB minimum, ~20KB plugin footprint +- **CPU**: ARMv7 or better +- **Network**: Ethernet and/or WiFi hardware required +- **Storage**: Minimal (logs only) + +## Build System + +### Dependencies + +**Required:** +- Thunder/WPEFramework SDK +- CMake >= 3.3 +- C++11 compiler (GCC 7+, Clang 6+) +- DBus library + +**Backend-Specific:** +- Gnome: libnm, GLib 2.0 +- RDK: IARM, RDK logger, RDK NSM + +**Optional:** +- T2 telemetry library (USE_TELEMETRY) +- GTest/GMock (ENABLE_UNIT_TESTING) + +### Build Instructions + +```bash +# Clone repository +git clone https://github.com/rdkcentral/networkmanager.git +cd networkmanager + +# Create build directory +mkdir build && cd build + +# Configure with Gnome backend +cmake -DCMAKE_INSTALL_PREFIX=/usr \ + -DUSE_GNOME_NETWORK_MANAGER=ON \ + -DUSE_TELEMETRY=OFF \ + .. + +# Build +make -j$(nproc) + +# Install +sudo make install +``` + +### CMake Options + +| Option | Default | Description | +|--------|---------|-------------| +| USE_GNOME_NETWORK_MANAGER | ON | Enable Gnome backend | +| USE_RDK_NSM | OFF | Enable RDK backend | +| ENABLE_LEGACY_PLUGINS | ON | Build legacy IARM plugins | +| USE_RDK_LOGGER | OFF | Use RDK logger instead of stdout | +| USE_TELEMETRY | OFF | Enable T2 telemetry support | +| ENABLE_UNIT_TESTING | OFF | Build unit tests (L1/L2) | + +## Testing + +### Unit Tests + +Located in `tests/` directory: + +**L1 Tests** (`tests/l1Test/`) +- Component-level tests +- Connectivity monitor tests +- STUN client tests +- Route discovery tests + +**L2 Tests** (`tests/l2Test/`) +- Integration tests with backends +- Mock backend tests +- End-to-end API tests + +**Run Tests:** +```bash +# Configure with testing enabled +cmake -DENABLE_UNIT_TESTING=ON .. +make + +# Run L1 tests +./tests/l1Test/l1_test_connectivity +./tests/l1Test/l1_test_stunclient + +# Run L2 tests +./tests/l2Test/l2_test_networkmanager +``` + +### Manual Testing + +**Test Connectivity:** +```bash +# Using Thunder Console +curl -X POST http://localhost:9998/jsonrpc \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","id":1,"method":"NetworkManager.1.IsConnectedToInternet"}' +``` + +**Monitor Events:** +```bash +# Subscribe to events using Thunder tools +ThunderClient events NetworkManager +``` + +## Legacy API Compatibility + +The plugin provides backward compatibility with legacy Network and WiFiManager plugins through IARM bridge: + +**Legacy Components:** +- `LegacyNetworkAPIs.cpp` - Network plugin compatibility +- `LegacyWiFiManagerAPIs.cpp` - WiFiManager plugin compatibility + +**IARM Methods Supported:** +- `IARM_BUS_NETSRVMGR_API_getActiveInterface` +- `IARM_BUS_NETSRVMGR_API_setIPSettings` +- `IARM_BUS_WIFI_MGR_API_getAvailableSSIDs` +- And more... + +## Performance Considerations + +### Optimization Guidelines + +1. **WiFi Scanning**: Limit scan frequency to avoid battery drain on battery-powered devices +2. **Connectivity Monitoring**: Adjust test interval based on use case (6-60 seconds) +3. **Event Throttling**: Avoid excessive event notifications during interface state transitions +4. **DBus Calls**: Batch operations when possible to reduce IPC overhead + +### Resource Usage + +**CPU Usage:** +- Idle: <1% CPU +- WiFi scanning: 2-5% CPU +- Connectivity tests: 1-2% CPU per test + +**Memory:** +- Base: 15-20KB +- Peak: 50KB (during WiFi scan with 50+ APs) + +**Network:** +- Connectivity tests: ~1KB per endpoint per test +- STUN requests: ~100 bytes per request + +## Troubleshooting + +### Common Issues + +**1. Plugin fails to activate** +- **Symptom**: Plugin state remains "Deactivated" +- **Cause**: Backend not available or DBus connection failed +- **Solution**: Check backend service status (`systemctl status NetworkManager`) + +**2. WiFi scan returns no results** +- **Symptom**: `onAvailableSSIDs` event contains empty array +- **Cause**: WiFi interface disabled or hardware issue +- **Solution**: Enable interface using `SetInterfaceState`, check `rfkill list` + +**3. Internet status always NO_INTERNET** +- **Symptom**: `IsConnectedToInternet` returns NO_INTERNET despite connection +- **Cause**: Connectivity test endpoints unreachable +- **Solution**: Configure alternative endpoints with `SetConnectivityTestEndpoints` + +**4. Memory leak over time** +- **Symptom**: Plugin memory usage increases continuously +- **Cause**: Event notifications not released, iterator not freed +- **Solution**: Always call `Release()` on iterators, check event handler cleanup + +### Debug Logging + +Enable debug logging: + +```bash +# Set log level via API +curl -X POST http://localhost:9998/jsonrpc \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","id":1,"method":"NetworkManager.1.SetLogLevel","params":{"loglevel":"DEBUG"}}' +``` + +**Log Levels:** +- FATAL: Critical errors only +- ERROR: Error conditions +- WARNING: Warning conditions +- INFO: Informational messages (default) +- DEBUG: Debug-level messages +- VERBOSE: Verbose debug messages + +**Log Locations:** +- RDK devices: `/opt/logs/wpeframework.log` +- Linux: `/var/log/wpeframework.log` or stdout + +### Diagnostic Commands + +```bash +# Check plugin status +WPEProcess -status NetworkManager + +# Monitor DBus traffic (Gnome backend) +dbus-monitor --system "sender='org.freedesktop.NetworkManager'" + +# Check interface status +ip addr show +nmcli device status + +# Test connectivity manually +curl -I http://clients3.google.com/generate_204 +``` + +## Development Guide + +### Code Structure + +``` +networkmanager/ +├── interface/ # COM-RPC interface definitions +│ └── INetworkManager.h +├── definition/ # JSON-RPC API definition +│ └── NetworkManager.json +├── plugin/ # Plugin implementation +│ ├── NetworkManager.h/cpp # Plugin class +│ ├── NetworkManagerImplementation.h/cpp # Core logic +│ ├── NetworkManagerConnectivity.h/cpp # Connectivity +│ ├── NetworkManagerStunClient.h/cpp # STUN client +│ ├── gnome/ # Gnome backend +│ └── rdk/ # RDK backend +├── legacy/ # Legacy IARM bridge +├── tests/ # Unit tests +└── tools/ # CLI tools +``` + +### Adding New Features + +1. **Update Interface** (`INetworkManager.h`) + - Add new method to COM-RPC interface + - Document parameters and return values + +2. **Update Implementation** (`NetworkManagerImplementation.cpp`) + - Implement new method + - Add platform-specific code to backend proxies + +3. **Update JSON-RPC** (`NetworkManager.json`) + - Define JSON-RPC method + - Add parameter validation + +4. **Add Tests** + - Write L1 unit tests + - Add L2 integration tests + +5. **Update Documentation** + - Update API reference + - Add usage examples + +### Coding Standards + +- **Language**: C++11 or later +- **Style**: Follow Thunder coding conventions +- **Naming**: CamelCase for classes, camelCase for methods +- **Error Handling**: Return Core::ERROR_* codes, log errors +- **Thread Safety**: Document thread safety guarantees +- **Memory**: Use RAII, smart pointers where appropriate +- **Comments**: Doxygen-style for public APIs + +## Release Process + +### Branches + +- `develop` - Active development, PR target +- `main` - Stable releases only + +### Workflow + +1. Feature development on feature branches +2. PR to `develop` with tests +3. CI validation (build, tests, static analysis) +4. Team review on RDK devices +5. Merge to `develop` +6. Periodic promotion to `main` for releases + +### Versioning + +Semantic versioning: `MAJOR.MINOR.PATCH` +- **MAJOR**: Incompatible API changes +- **MINOR**: Backward-compatible functionality +- **PATCH**: Backward-compatible bug fixes + +Current version: **2.0.0** + +## Contributors + +### Maintainers + +- Jacob Gladish (jacob_gladish@cable.comcast.com) +- Karunakaran Amirthalingam (karunakaran_amirthalingam@comcast.com) + +### Contributing + +Contributions welcome! Please: +1. Sign RDK Contributor License Agreement (CLA) +2. Fork repository +3. Create feature branch +4. Submit PR with tests and documentation +5. See [CONTRIBUTING.md](../CONTRIBUTING.md) for details + +## References + +### Documentation + +- [API Reference](NetworkManagerPlugin.md) - Complete JSON-RPC API +- [Interface Definition](../interface/INetworkManager.h) - COM-RPC interface +- [Build System](../CMakeLists.txt) - Build configuration +- [Legacy Network API](https://github.com/rdkcentral/rdkservices/blob/main/docs/api/NetworkPlugin.md) +- [Legacy WiFi API](https://github.com/rdkcentral/rdkservices/blob/main/docs/api/WifiPlugin.md) + +### External Resources + +- [Thunder Framework](https://github.com/rdkcentral/Thunder) +- [RDK Central](https://rdkcentral.com/) +- [Gnome NetworkManager](https://networkmanager.dev/) +- [DBus Specification](https://dbus.freedesktop.org/doc/dbus-specification.html) + +## License + +Copyright 2023 RDK Management + +Licensed under the Apache License, Version 2.0. See [LICENSE](../LICENSE) for full text. + +--- + +**Document Version**: 1.0 +**Last Updated**: April 22, 2026 +**Plugin Version**: 2.0.0