diff --git a/CLAUDE.md b/CLAUDE.md index f6fd22d..2bf18ef 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,71 +1,67 @@ # CLAUDE.md +Guidance for Claude Code when working in this repository. + ## Project Overview -ECHO (Enhanced Cellular Handling Operations) is the OASIS modem daemon for the SIM7600G-H 4G modem. It owns the serial port, handles all AT command traffic, publishes telemetry and events via MQTT, and receives commands from DAWN. +ECHO (Enhanced Cellular Handling Operations) is the OASIS modem daemon for the SIM7600G-H 4G modem. It owns the serial port, handles all AT command traffic, publishes telemetry and events via MQTT, and receives commands from DAWN. Template: STAT (system telemetry daemon). + +See @ARCHITECTURE.md for subsystem details and @README.md for deployment context. -Part of The OASIS Project. Template: STAT (system telemetry daemon). +## Critical Rules — Always Follow -## Building +- **NEVER delete files.** Tell the developer which files to delete. +- **NEVER run `git add`, `git commit`, or `git push`.** Suggest the command and message; let the developer run it. +- **Feedback before implementation.** Provide analysis, trade-offs, and a recommendation *first*. Wait for explicit confirmation ("go ahead", "do it", "yes") before coding. +- **Format before committing.** Every change must pass `./format_code.sh --check`. Pre-commit hook enforces this. +- **GPL header on every new `.c`/`.h`.** Template in @CODING_STYLE_GUIDE.md. +- **Design doc commit policy**: commit design docs only when they describe shipped or in-flight code. Docs for planned-but-unstarted work stay untracked. + +## Build & Test ```bash -# Configure and build +# Build cmake -B build -DCMAKE_BUILD_TYPE=Debug make -C build -j8 -# Run tests +# Run tests (Unity framework, runs without hardware) ctest --test-dir build --output-on-failure -# Run individual test -./build/tests/test_sms +# Format +./format_code.sh # fix +./format_code.sh --check # CI mode +./format_code.sh --changed # only changed files ``` -### Dependencies -- `libmosquitto` — MQTT client library -- `json-c` — JSON construction and parsing -- `pthread` — threading (system) -- Unity (vendored in `tests/unity/`) — unit test framework - -## Code Formatting - -**MANDATORY**: All code MUST be formatted before committing. - -```bash -# Format all code (run from repository root) -./format_code.sh - -# Check formatting without modifying files -./format_code.sh --check +- Dependencies: `libmosquitto`, `json-c`, `pthread`, Unity (vendored). +- Requires `clang-format-14` for format checks. +- Pre-commit hook: `./install-git-hooks.sh` (one-time). -# Format only changed files (fast) -./format_code.sh --changed -``` +## Code Standards -Requires `clang-format-14`. Install: `sudo apt-get install clang-format-14` +Full standards in @CODING_STYLE_GUIDE.md. Critical gotchas specific to ECHO: -### Git Hooks -Install the pre-commit hook to automatically check formatting: -```bash -./install-git-hooks.sh -``` +- **Return codes**: `SUCCESS` (0) / `FAILURE` (1) — never negative. Specific error codes > 1. +- **Logging**: use `OLOG_INFO` / `OLOG_WARNING` / `OLOG_ERROR` (ECHO's convention). +- **Naming**: `snake_case` functions/vars, `UPPER_CASE` constants, `_t` suffix on types. +- **Memory**: prefer static allocation; null-check after malloc; `free(ptr); ptr = NULL;`. +- **Functions**: soft target < 50 lines, inputs first / outputs last. -## Architecture +## Threading (hard constraints) -### Threading Model +Three threads — know which one you're in: -Three threads: -- **Main thread**: Command queue drain, telemetry polling (10s), heartbeat (30s) -- **URC reader thread**: Blocking serial reads, line parsing, URC classification, AT response delivery via condvar -- **Mosquitto thread**: Network I/O, message callbacks (queues commands to main thread) +- **Main thread**: command-queue drain, telemetry polling (10s), heartbeat (30s). +- **URC reader thread**: blocking serial reads, line parsing, URC classification, AT response delivery via condvar. +- **Mosquitto thread**: network I/O, message callbacks (queues commands to main thread). -### Key Design Decisions +**Never call `at_command_send()` from the URC reader thread.** Use the command queue. Doing so deadlocks the condvar. -- **Single serial reader**: URC reader owns ALL reads. Main thread only writes AT commands. -- **Command queue**: MQTT commands queued to lock-free SPSC ring buffer, drained by main thread. Prevents blocking mosquitto's event loop. -- **Deferred CMTI**: SMS reads queued from URC thread to main thread to avoid condvar deadlock. -- **Atomic call state**: `__atomic` builtins for `g_call_state` (3 threads access it). +- Single serial reader: URC reader owns **all** reads. Main thread only writes AT commands. +- CMTI handling: SMS reads queued from URC → main thread to avoid deadlock. +- Call state: `__atomic` builtins on `g_call_state` (3 threads touch it). -### AT Command Types +## AT Command Types | Type | Function | Behavior | |------|----------|----------| @@ -73,110 +69,44 @@ Three threads: | Async | `at_command_send_async()` | Write and return; result comes as URC | | SMS | `at_command_send_sms()` | Two-phase: wait for `>` prompt, then body+Ctrl-Z | -### MQTT Topics +## MQTT Topics | Topic | Dir | Content | |-------|-----|---------| | `echo/telemetry` | out | Signal, network, call state (every 10s) | | `echo/events` | out | Incoming call, SMS, call ended | -| `echo/response` | out | Command responses with request_id | +| `echo/response` | out | Command responses with `request_id` | | `echo/status` | out | Online/offline (LWT) | | `echo/cmd` | in | Commands from DAWN | -All messages conform to OCP v1.3. - -## Coding Standards - -Follow `CODING_STYLE_GUIDE.md` strictly: - -**Naming**: `snake_case` functions/variables, `UPPER_CASE` constants, `_t` suffix on types. - -**Error Handling**: Return 0 on success. Always check return values. Log with `OLOG_ERROR()`. - -**Memory**: Prefer static allocation. Minimize malloc. Free and NULL. - -**File Headers**: GPL license block required on all `.c` and `.h` files (see CODING_STYLE_GUIDE.md). - -**Functions**: Soft target < 50 lines. Inputs first, outputs last. - -**Threading**: Never call `at_command_send()` from the URC reader thread. Use the command queue. - -## Important Files - -**Source modules:** -- `src/oasis-echo.c` — Main entry, command queue, URC event dispatch, MQTT command processor -- `src/at_command.c` — Serial I/O with flock, sync/async/SMS AT commands, terminator parsing -- `src/urc_handler.c` — URC reader thread, classification, RING+CLIP merge -- `src/modem.c` — Init sequence, signal polling, telemetry builder, echo cancellation -- `src/mqtt_comms.c` — MQTT lifecycle, json-c JSON builders, command parser -- `src/sms.c` — Phone number validation, SMS body sanitization, CLIP sanitization -- `src/logging.c` — Logging (copied from STAT) - -**Headers:** -- `include/echo.h` — Global types, config struct, call/reg/SIM enums, rate bucket -- `include/at_command.h` — AT context, response, pending state types -- `include/urc_handler.h` — URC event types, callback, context -- `include/modem.h` — Modem init, polling, telemetry builder -- `include/mqtt_comms.h` — MQTT topics, publish/subscribe/parse API -- `include/sms.h` — Validation and sanitization API - -**Configuration:** -- `config/echo.conf` — MQTT credentials, serial port, rate limits (systemd EnvironmentFile) -- `config/oasis-echo.service` — systemd service unit -- `config/sim7600-rndis.service` — RNDIS data path boot service -- `scripts/sim7600-rndis-up.sh` — RNDIS activation script - -**Tooling:** -- `.clang-format` — clang-format-14 config (matches DAWN) -- `format_code.sh` — Format all code (adapted from DAWN) -- `pre-commit.hook` — Git pre-commit formatting check -- `install-git-hooks.sh` — Hook installer -- `.github/workflows/ci.yml` — CI: format-check + build + tests - -## Testing - -Unity framework (vendored in `tests/unity/`, MIT license). Four test modules: - -| Test | Assertions | What it covers | -|------|-----------|---------------| -| `test_at_command` | 14 | Response terminator parsing, status strings | -| `test_sms` | 24 | Phone number validation, body sanitization, CLIP sanitization | -| `test_urc_handler` | 22 | URC classification, RING+CLIP merge, VOICE CALL URCs | -| `test_mqtt_messages` | 16 | Telemetry/event/response JSON, command parsing | - -Tests link against specific source files (not the full daemon binary), so they run without hardware or an MQTT broker. - -```bash -# Build and run all tests -cmake -B build -DCMAKE_BUILD_TYPE=Debug && make -C build -j8 -ctest --test-dir build --output-on-failure -``` +All messages conform to OCP v1.4 (`ocp_get_timestamp_ms()` for ms timestamps, `msg_type` field on every message). ## SIM7600 Hardware Notes Discoveries from live hardware testing: -- Modem kept in default UCS2 charset — `AT+CSMP=17,167,0,8` sets DCS=8 to tell the network body is UCS2-encoded. Enables full Unicode/emoji SMS. -- Phone numbers and SMS bodies are UCS2 hex-encoded for `AT+CMGS` and decoded from `AT+CMGR` responses. CLIP and ATD use plain ASCII. -- `AT+CPMS="ME","ME","ME"` required — default SMS read storage is "SR" (status reports) -- `AT+CHUP` for hangup instead of `ATH` — works reliably in all call states -- `AT+CECM=1` only works during active calls — sent per-call, not at init -- `VOICE CALL: BEGIN` / `VOICE CALL: END` are SIM7600-specific URCs (not standard `CONNECT`) -- Modem sends `VOICE CALL: END` + `NO CARRIER` back-to-back — duplicate suppressed in event handler +- Modem kept in default UCS2 charset — `AT+CSMP=17,167,0,8` sets DCS=8. Enables full Unicode/emoji SMS. +- Phone numbers and SMS bodies are UCS2 hex-encoded for `AT+CMGS` and decoded from `AT+CMGR`. CLIP and ATD use plain ASCII. +- `AT+CPMS="ME","ME","ME"` required — default SMS read storage is "SR" (status reports). +- `AT+CHUP` for hangup, not `ATH` — works reliably in all call states. +- `AT+CECM=1` only works during active calls — sent per-call, not at init. +- `VOICE CALL: BEGIN` / `VOICE CALL: END` are SIM7600-specific URCs (not standard `CONNECT`). +- Modem sends `VOICE CALL: END` + `NO CARRIER` back-to-back — duplicate suppressed in event handler. +- Current firmware (`LE20B04SIM7600G22`) does **not** include MMS AT commands. See `~/code/The-OASIS-Project/dawn/docs/UNIFIED_IMAGE_STORE_DESIGN.md` §Phase 4 for unblock paths. ## Development Lifecycle -1. **Implement**: Build and format check after each chunk: `make -C build -j8` + `./format_code.sh --check` -2. **Test**: Run `ctest --test-dir build --output-on-failure` -3. **Review**: Run review agents on the diff (architecture-reviewer, embedded-efficiency-reviewer, security-auditor) -4. **Manual test**: Verify on live hardware if touching AT commands, URC handling, or MQTT -5. **Format**: `./format_code.sh` -6. **Commit**: Provide `git add` + commit message to developer (never run git commands directly) +1. **Implement** — build + format check after each chunk: `make -C build -j8` + `./format_code.sh --check`. +2. **Test** — `ctest --test-dir build --output-on-failure`. +3. **Review** — run review agents on the diff (architecture-reviewer, embedded-efficiency-reviewer, security-auditor). +4. **Manual test** — verify on live hardware if touching AT commands, URC handling, or MQTT. +5. **Format** — `./format_code.sh` one final time. +6. **Commit** — provide `git add` + commit message; **developer runs git commands**. -## Design Document +## Design Documents -Single source of truth: `~/code/The-OASIS-Project/dawn/docs/PHONE_SMS_DESIGN.md` +Phone/SMS integration design: `~/code/The-OASIS-Project/dawn/docs/PHONE_SMS_DESIGN.md`. ## License -GPLv3 or later. All source files include GPL header block. +GPLv3 or later. Every new source file includes the GPL header block. diff --git a/CMakeLists.txt b/CMakeLists.txt index 05a8ea4..6c88077 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -34,7 +34,10 @@ set(SOURCES src/modem.c src/mqtt_comms.c src/oasis-echo.c + src/pdu.c src/sms.c + src/sms_io.c + src/sms_reassembly.c src/urc_handler.c ) @@ -93,4 +96,16 @@ if(BUILD_TESTS) ${MOSQUITTO_LIBRARIES}) target_include_directories(test_mqtt_messages PRIVATE include ${JSONC_INCLUDE_DIRS}) add_test(NAME test_mqtt_messages COMMAND test_mqtt_messages) + + # test_pdu — PDU encode/decode + malformed-input rejection + add_executable(test_pdu tests/test_pdu.c src/pdu.c) + target_link_libraries(test_pdu unity echo_logging pthread) + target_include_directories(test_pdu PRIVATE include) + add_test(NAME test_pdu COMMAND test_pdu) + + # test_sms_reassembly — LRU, per-sender cap, duplicate handling + add_executable(test_sms_reassembly tests/test_sms_reassembly.c src/sms_reassembly.c) + target_link_libraries(test_sms_reassembly unity echo_logging) + target_include_directories(test_sms_reassembly PRIVATE include) + add_test(NAME test_sms_reassembly COMMAND test_sms_reassembly) endif() diff --git a/include/at_command.h b/include/at_command.h index d32bf00..7dc7082 100644 --- a/include/at_command.h +++ b/include/at_command.h @@ -130,6 +130,28 @@ at_status_t at_command_send_sms(at_context_t *ctx, const char *body, at_response_t *response); +/** + * @brief Send one PDU-mode SMS segment. + * + * Two-phase AT+CMGS with the `` argument, then hex PDU + Ctrl-Z: + * Phase 1: AT+CMGS=\r → wait for '>' prompt. + * Phase 2: \x1A → wait for +CMGS / OK / ERROR. + * + * The hex string is validated for the hex alphabet before transmission — + * an accidental non-hex byte sent in this mode triggers CMS ERROR 305 at + * best and unpredictable modem state at worst. + * + * @param ctx AT context. + * @param tpdu_octets Length argument for AT+CMGS (TPDU only, not SMSC prefix). + * @param pdu_hex Full hex payload, including the "00" SMSC-default prefix. + * @param response Output response. + * @return AT_OK or a failure status. + */ +at_status_t at_command_send_pdu(at_context_t *ctx, + int tpdu_octets, + const char *pdu_hex, + at_response_t *response); + /** * @brief Write raw bytes to the serial port (thread-safe). * @return Number of bytes written, or -1 on error. diff --git a/include/echo.h b/include/echo.h index 00a9dc1..51d9bb3 100644 --- a/include/echo.h +++ b/include/echo.h @@ -40,12 +40,27 @@ #define ECHO_DEFAULT_TELEMETRY_S 10 #define ECHO_DEFAULT_RATE_CALLS_H 5 #define ECHO_DEFAULT_RATE_SMS_H 20 +/* Per-segment rate bucket for PDU mode. A single concat SMS can burn up to + * PDU_MAX_SEGMENTS airtime units, so we budget this separately from the + * per-message rate so a chatty user can't exhaust the network quota. */ +#define ECHO_DEFAULT_RATE_SEGMENTS_H 200 +/* Inter-segment pacing. T-Mobile + SIM7600 can wedge on back-to-back + * concat sends without a breather. 150ms is gentle on both. */ +#define ECHO_DEFAULT_SEGMENT_DELAY_MS 150 /* AT command limits */ #define AT_RESPONSE_MAX 4096 /* large enough for UCS2 hex SMS bodies */ #define AT_TIMEOUT_DEFAULT 2000 /* ms */ #define AT_TIMEOUT_SMS 60000 /* ms — AT+CMGS waits for network */ #define AT_TIMEOUT_DIAL 5000 /* ms — ATD returns quickly, result comes as URC */ +/* Inbound SMS storage ops (CMGR read, CMGD delete) hit local modem memory + * and normally return <100ms. A short timeout here matters because a + * multi-segment SMS fires one CMTI per segment; at the 2s default, a 10- + * segment message could block the main-thread command-queue drain for up + * to 40s. 500ms caps that at ~10s and still leaves margin over real-world + * modem latency. On timeout we fall back to logging + CMGD-fire-forget so + * the inbox doesn't fill. */ +#define AT_TIMEOUT_SMS_STORAGE 500 /* SMS limits */ #define SMS_BODY_MAX 800 @@ -107,26 +122,39 @@ typedef struct { int telemetry_interval_s; int rate_limit_calls_per_hour; int rate_limit_sms_per_hour; + int rate_limit_segments_per_hour; + int inter_segment_delay_ms; + bool pdu_mode; /* true = PDU (AT+CMGF=0), false = legacy text mode */ bool service_mode; } echo_config_t; -/* Rate limiter bucket */ +/* Leaky-bucket rate limiter. Fills at `max_per_hour / 3600` tokens/sec up to + * a `max_per_hour` ceiling; a `take_n()` spends N tokens atomically. + * + * Replaces an earlier ring-buffer design that silently capped active count + * at 64 — the old limiter never rejected anything when `max_per_hour > 64`. + * The counter form is correct at any configured limit and O(1) per call. */ typedef struct { - int64_t timestamps[64]; /* ring buffer of event timestamps (epoch seconds) */ - int head; /* next write position */ - int count; /* events in current window */ - int max_per_hour; /* configured limit */ + double tokens; /* current token balance (fractional) */ + int64_t last_refill_sec; /* wall clock seconds of last refill */ + int max_per_hour; /* bucket ceiling + refill rate input */ } rate_bucket_t; /** - * @brief Initialize a rate limiter bucket. + * @brief Initialize a rate limiter bucket, pre-filled to capacity. */ void rate_bucket_init(rate_bucket_t *bucket, int max_per_hour); /** - * @brief Check if an action is allowed and record it if so. + * @brief Try to consume one token; record it if available. * @return true if allowed, false if rate limited. */ bool rate_bucket_allow(rate_bucket_t *bucket); +/** + * @brief Try to consume `n` tokens atomically (no partial debit on failure). + * @return true if all N allowed, false if insufficient balance. + */ +bool rate_bucket_take_n(rate_bucket_t *bucket, int n); + #endif /* ECHO_H */ diff --git a/include/modem.h b/include/modem.h index b30648e..b56f35f 100644 --- a/include/modem.h +++ b/include/modem.h @@ -32,14 +32,17 @@ /** * @brief Run the modem initialization sequence. * - * Sends: AT, ATE0, AT+CMEE=2, AT+CLIP=1, AT+CMGF=1, - * AT+CNMI=2,1,0,0,0, AT+CREG=1, AT+CSDVC=1, AT+CLVL=3, - * AT+CECM=1, AT+CSQ, AT+COPS? + * Sends: AT, ATE0, AT+CMEE=2, AT+CLIP=1, AT+CMGF=0 (PDU) or 1 (text), + * AT+CNMI=2,1,0,0,0, AT+CREG=1, AT+CSDVC=1, AT+CLVL=3, AT+CSQ, AT+COPS? * - * @param at AT context with open serial port. + * AT+CSMP (SMS parameters) is only sent in text mode — PDU mode carries the + * DCS per-frame so the init-time default is irrelevant. + * + * @param at AT context with open serial port. + * @param pdu_mode true → AT+CMGF=0 (PDU), false → AT+CMGF=1 (text). * @return 0 on success (basic AT works), -1 on failure. */ -int modem_init(at_context_t *at); +int modem_init(at_context_t *at, bool pdu_mode); /** * @brief Poll signal strength (AT+CSQ). diff --git a/include/mqtt_comms.h b/include/mqtt_comms.h index 08ed963..9dd16d9 100644 --- a/include/mqtt_comms.h +++ b/include/mqtt_comms.h @@ -83,13 +83,17 @@ int mqtt_publish_event(const char *event_json); * @param value Optional value string (for success responses). * @param err_code Error code string (for error responses, e.g. "NO_CARRIER"). * @param err_msg Error message string (for error responses). + * @param data_json Optional pre-serialized JSON object (e.g., "{\"segments_sent\":3}") + * merged into the response under the "data" key. NULL is fine. + * Caller owns the string. */ int mqtt_publish_response(const char *action, const char *request_id, bool success, const char *value, const char *err_code, - const char *err_msg); + const char *err_msg, + const char *data_json); /** * @brief Publish online status (echo/status, QoS 1, retained). @@ -140,6 +144,7 @@ int mqtt_build_response_json(const char *action, const char *value, const char *err_code, const char *err_msg, + const char *data_json, char *buf, size_t size); diff --git a/include/pdu.h b/include/pdu.h new file mode 100644 index 0000000..88a23fd --- /dev/null +++ b/include/pdu.h @@ -0,0 +1,172 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * SMS PDU encoder/decoder for the SIM7600G-H modem. + * + * Implements 3GPP TS 23.040 SMS PDU SUBMIT (outbound) and DELIVER (inbound) + * handling for UCS2-encoded messages. Multi-segment messages are concatenated + * via the User Data Header (UDH) with IE 0x00 (8-bit reference) so the + * receiving device reassembles them into one conversation. + * + * v1 is UCS2-only: ASCII messages use 67 chars/segment instead of the 153 + * GSM7 packing would allow. GSM7 is deferred until there's a real metric + * showing it matters. + */ + +#ifndef PDU_H +#define PDU_H + +#include +#include +#include +#include + +/* Limits */ +#define PDU_MAX_SEGMENTS 10 /* hard cap on concat segments */ +#define PDU_UCS2_CHARS_PER_SEG 67 /* (140 - 6 UDH) / 2 bytes */ +#define PDU_TPDU_MAX 280 /* max TPDU octets per segment */ +#define PDU_MAX_HEX_LEN (PDU_TPDU_MAX * 2 + 32) /* TPDU hex + SMSC prefix slack */ +#define PDU_SENDER_MAX 24 /* matches PHONE_NUMBER_MAX + NUL */ + +/* Data coding scheme (DCS) */ +typedef enum { + PDU_DCS_GSM7 = 0x00, + PDU_DCS_UCS2 = 0x08, +} pdu_dcs_t; + +/* Encode/decode error codes. SUCCESS (0) is the only success value; anything + * else is a distinct failure reason so callers and fuzz harness logs can + * classify without parsing strings. */ +typedef enum { + PDU_OK = 0, + PDU_ERR_NULL_ARG, + PDU_ERR_BAD_HEX, /* non-hex character or odd-length input */ + PDU_ERR_TRUNCATED, /* TPDU ended mid-field */ + PDU_ERR_BAD_LENGTH, /* length prefix > remaining buffer */ + PDU_ERR_BAD_UDH, /* malformed user data header */ + PDU_ERR_SENDER_OVERFLOW, /* decoded sender > PDU_SENDER_MAX */ + PDU_ERR_BODY_TOO_LONG, /* message needs more than PDU_MAX_SEGMENTS */ + PDU_ERR_BUFFER_TOO_SMALL, /* caller-supplied output buffer too small */ + PDU_ERR_UNSUPPORTED_DCS, /* decode of GSM7 packed body not implemented v1 */ + PDU_ERR_BAD_ADDRESS, /* TP-DA length or digits invalid */ + PDU_ERR_INTERNAL, /* should-not-happen — bug guard */ +} pdu_err_t; + +/* One encoded segment, ready to feed into AT+CMGS=. Both the + * SMSC length prefix "00" (no SMSC override) and the TPDU are in `hex`; + * `tpdu_octets` counts only the TPDU — carrier counts the same way. */ +typedef struct { + char hex[PDU_MAX_HEX_LEN + 1]; + int tpdu_octets; +} pdu_segment_t; + +/* Decoded inbound PDU metadata. Body lands in a caller-provided buffer so + * growing the body cap (UCS2 is 2048 bytes after reassembly) doesn't bloat + * this struct for every URC handler. */ +typedef struct { + char sender[PDU_SENDER_MAX]; + bool has_udh; + uint8_t udh_ref_id; + uint8_t udh_total; + uint8_t udh_seq; + time_t scts; /* service centre timestamp (0 if unparseable) */ + bool is_ucs2; + size_t body_len; /* bytes written to body_out (not including NUL) */ +} pdu_decoded_t; + +/** + * @brief Decide whether a UTF-8 body needs UCS2 encoding. + * + * v1: always returns true. GSM7 packing is deferred; overseas/emoji traffic + * always forced UCS2 anyway. + * + * @param utf8_body UTF-8 string (may be NULL/empty). + * @return true if UCS2 must be used. + */ +bool pdu_needs_ucs2(const char *utf8_body); + +/** + * @brief Generate a non-zero reference id for a new concatenated message. + * + * Monotonically increments an internal 8-bit counter, skipping 0 (receivers + * interpret ref_id=0 as a special case on some handsets). Thread-safe. + */ +uint8_t pdu_new_ref_id(void); + +/** + * @brief Count segments needed to encode a body at the given DCS. + * + * @param utf8_body UTF-8 string. + * @param is_ucs2 true for UCS2 (67 chars/seg), false for GSM7 (153/seg). + * @return Segment count [1, PDU_MAX_SEGMENTS], or 0 on empty body. + */ +int pdu_segment_count(const char *utf8_body, bool is_ucs2); + +/** + * @brief Encode a UTF-8 body into one or more SMS SUBMIT PDUs. + * + * Produces up to PDU_MAX_SEGMENTS segments. Each segment's `hex` is ready + * to hand to at_command_send_pdu(); `tpdu_octets` is the value for the + * AT+CMGS= preamble. + * + * @param dest Destination number in E.164 form (e.g., "+15551234567"). + * @param utf8_body UTF-8 message body. + * @param ref_id Concatenation reference id (from pdu_new_ref_id()). + * @param is_ucs2 Encoding selector — v1 always true. + * @param out_segs Output segment array. + * @param max_segs Size of out_segs (callers pass PDU_MAX_SEGMENTS). + * @param n_segs Output: segments produced. + * @return PDU_OK or a PDU_ERR_* code. + */ +pdu_err_t pdu_encode_submit(const char *dest, + const char *utf8_body, + uint8_t ref_id, + bool is_ucs2, + pdu_segment_t *out_segs, + int max_segs, + int *n_segs); + +/** + * @brief Decode a SMS-DELIVER PDU from hex (as returned by AT+CMGR). + * + * Every length-prefix field is bounds-checked before use. Sender digits and + * the user data are copied into caller-supplied buffers — the function + * refuses to overflow them and returns a distinct error so the fuzzer can + * separate bounds violations from other malformed input. + * + * UCS2 bodies are decoded to UTF-8 in body_out with light sanitization: + * U+0000 is stripped (C-string safety), C0/C1 controls (except \n, \t) are + * replaced with ' ', and bidi override controls (U+202A–U+202E, U+2066–U+2069) + * are dropped to prevent display spoofing. + * + * @param tpdu_hex TPDU hex (with optional SMSC prefix as delivered by CMGR). + * @param out Output metadata. + * @param body_out UTF-8 output buffer. + * @param body_cap Size of body_out in bytes (must be >= 2). + * @return PDU_OK or a PDU_ERR_* code. + */ +pdu_err_t pdu_decode(const char *tpdu_hex, pdu_decoded_t *out, char *body_out, size_t body_cap); + +/** + * @brief Human-readable string for a PDU error code (for logging). + */ +const char *pdu_err_str(pdu_err_t err); + +#endif /* PDU_H */ diff --git a/include/sms_io.h b/include/sms_io.h new file mode 100644 index 0000000..aeb9236 --- /dev/null +++ b/include/sms_io.h @@ -0,0 +1,107 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * SMS send / inbound-dispatch orchestration. Extracted from oasis-echo.c so + * the main daemon stays under the 1,500-line soft limit and the PDU path + * can be unit-tested end-to-end. + */ + +#ifndef SMS_IO_H +#define SMS_IO_H + +#include +#include + +#include "at_command.h" +#include "echo.h" + +typedef enum { + SMS_IO_SEND_OK = 0, + SMS_IO_SEND_INVALID_NUMBER, + SMS_IO_SEND_INVALID_BODY, + SMS_IO_SEND_ENCODE_ERROR, + SMS_IO_SEND_BODY_TOO_LONG, + SMS_IO_SEND_RATE_LIMITED, + SMS_IO_SEND_SEGMENT_LIMITED, + SMS_IO_SEND_AT_ERROR, + SMS_IO_SEND_PARTIAL_FAIL, /* some segments ok, others failed */ +} sms_io_send_err_t; + +typedef struct { + at_context_t *at; + rate_bucket_t *msg_bucket; + rate_bucket_t *segment_bucket; + int inter_segment_delay_ms; + bool pdu_mode; +} sms_io_ctx_t; + +/** + * @brief Send an outbound SMS and publish the MQTT response in one call. + * + * Does everything sms_io_send would do (validate → rate-limit → encode → + * send with pacing) and then publishes the appropriate success/error + * response on echo/response, mapping internal error codes to OCP error + * strings. This keeps sms_io_send_err_t out of oasis-echo.c. + * + * Success responses include segments_sent/segments_total in the data blob; + * PDU_PARTIAL_FAIL error responses include the same so the caller knows + * how many segments reached the carrier. + * + * @param ctx I/O context (AT port, buckets, config). + * @param dest Destination in E.164 form ("+15551234567"). + * @param utf8_body UTF-8 message body. + * @param action MQTT action name (echoed into the response). + * @param request_id MQTT request_id (echoed into the response). + */ +void sms_io_send_and_respond(const sms_io_ctx_t *ctx, + const char *dest, + const char *utf8_body, + const char *action, + const char *request_id); + +/** + * @brief Send an outbound SMS. Lower-level entry point for tests or callers + * that want to publish their own response shape. + * + * Output parameters mirror sms_io_send_and_respond: segments_sent is 0..total + * on PARTIAL_FAIL, 0 on other errors, total on success. + */ +sms_io_send_err_t sms_io_send(const sms_io_ctx_t *ctx, + const char *dest, + const char *utf8_body, + int *segments_sent, + int *total_segments, + char *at_err_detail, + size_t at_err_cap); + +/** + * @brief Handle a CMTI URC: read SMS from modem storage, decode, publish. + * + * Covers both legacy text-mode and the new PDU path. The function is the + * single point where inbound messages are parsed and forwarded onto + * echo/events. Multi-segment PDU messages are run through reassembly and + * published only when complete. + * + * @param ctx I/O context. + * @param sms_index Modem storage slot (from +CMTI). + */ +void sms_io_handle_cmti(const sms_io_ctx_t *ctx, int sms_index); + +#endif /* SMS_IO_H */ diff --git a/include/sms_reassembly.h b/include/sms_reassembly.h new file mode 100644 index 0000000..69983fd --- /dev/null +++ b/include/sms_reassembly.h @@ -0,0 +1,113 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * Inbound multi-segment SMS reassembly with bounded state. + * + * Callers push one UDH-carrying fragment at a time. When a message is + * complete, the helper concatenates the fragments into a caller-provided + * buffer and frees the slot. State is inline (no heap) so a long-running + * daemon doesn't fragment; bounded by 8 slots, 2 slots per sender, and a + * 10-minute TTL so a spammer can't pin resources. + * + * Thread-safety: all callers run on the single URC handler thread (CMTI + * handler is dispatched on the main thread via the command queue, but the + * reassembly itself happens there and only there). No mutex. + */ + +#ifndef SMS_REASSEMBLY_H +#define SMS_REASSEMBLY_H + +#include +#include +#include +#include + +#include "pdu.h" + +#define REASSEMBLY_SLOTS 8 +#define REASSEMBLY_TIMEOUT_SEC 600 +#define REASSEMBLY_PER_SENDER 2 +/* UCS2 decoded fragment can reach 67 chars × 4 UTF-8 bytes = 268; round up + * for NUL and replacement characters. */ +#define REASSEMBLY_FRAG_BUF_SIZE 320 + +typedef enum { + REASM_INCOMPLETE = 0, /* fragment stored, waiting for more */ + REASM_COMPLETE, /* message whole, body written to out */ + REASM_REJECTED_CAP, /* per-sender cap tripped */ + REASM_REJECTED_TOTAL, /* mismatched total across fragments */ + REASM_REJECTED_DUP, /* duplicate seq ignored */ + REASM_ERROR, /* bad args or fragment copy overflow */ +} reasm_result_t; + +typedef struct { + uint32_t slots_in_use; + uint64_t total_completed; + uint64_t total_timed_out; + uint64_t total_dropped_exhaustion; + uint64_t total_duplicates; + uint64_t total_sender_cap_exceeded; +} sms_reassembly_stats_t; + +/** + * @brief Push one fragment. Call on every inbound segment with UDH. + * + * @param sender Normalized sender (E.164 or short code). + * @param ref_id UDH reference id. + * @param total UDH total segments. + * @param seq UDH sequence number (1-based). + * @param fragment UTF-8 body for this segment. + * @param frag_len Bytes in fragment (not including NUL). + * @param now Current wall time (injection for testability). + * @param out_body Optional output buffer for assembled body (REASM_COMPLETE). + * @param out_cap Size of out_body. + * @param out_len Optional: bytes written (not including NUL) on COMPLETE. + * @return reasm_result_t classification. + */ +reasm_result_t sms_reassembly_push(const char *sender, + uint8_t ref_id, + uint8_t total, + uint8_t seq, + const char *fragment, + size_t frag_len, + time_t now, + char *out_body, + size_t out_cap, + size_t *out_len); + +/** + * @brief Evict timed-out slots. Safe to call periodically or on push. + * + * @param now Current wall time. + * @return Slots evicted. + */ +int sms_reassembly_sweep(time_t now); + +/** + * @brief Snapshot cumulative counters. + */ +void sms_reassembly_stats(sms_reassembly_stats_t *out); + +/** + * @brief Reset all state. Intended for tests. + */ +void sms_reassembly_reset(void); + +#endif /* SMS_REASSEMBLY_H */ diff --git a/src/at_command.c b/src/at_command.c index f5d4737..2a201d6 100644 --- a/src/at_command.c +++ b/src/at_command.c @@ -36,6 +36,7 @@ #include #include "logging.h" +#include "pdu.h" /* ── Helpers ─────────────────────────────────────────────────────────── */ @@ -439,6 +440,151 @@ at_status_t at_command_send_sms(at_context_t *ctx, return status; } +/* ── PDU send (two-phase with arg) ──────────────────────────── */ + +static bool hex_alphabet_valid(const char *hex) { + if (!hex) + return false; + size_t len = 0; + for (const char *p = hex; *p; p++) { + char c = *p; + bool ok = (c >= '0' && c <= '9') || (c >= 'A' && c <= 'F') || (c >= 'a' && c <= 'f'); + if (!ok) { + return false; + } + len++; + } + return (len > 0) && ((len & 1) == 0); +} + +at_status_t at_command_send_pdu(at_context_t *ctx, + int tpdu_octets, + const char *pdu_hex, + at_response_t *response) { + if (!ctx || ctx->fd < 0 || !pdu_hex || tpdu_octets <= 0) { + return AT_PORT_ERROR; + } + /* Re-validate before we push bytes to the modem. A bad encode that slips + * through here would otherwise corrupt modem state, not just error. */ + if (!hex_alphabet_valid(pdu_hex)) { + OLOG_ERROR("at_command_send_pdu: PDU hex failed alphabet validation"); + if (response) { + memset(response, 0, sizeof(*response)); + response->status = AT_ERROR; + } + return AT_ERROR; + } + + char cmgs[32]; + snprintf(cmgs, sizeof(cmgs), "AT+CMGS=%d", tpdu_octets); + + pthread_mutex_lock(&ctx->pending.mutex); + ctx->pending.type = AT_PENDING_SMS; + ctx->pending.completed = false; + memset(&ctx->pending.response, 0, sizeof(ctx->pending.response)); + pthread_mutex_unlock(&ctx->pending.mutex); + + if (at_write_cmd(ctx, cmgs) < 0) { + pthread_mutex_lock(&ctx->pending.mutex); + ctx->pending.type = AT_PENDING_NONE; + pthread_mutex_unlock(&ctx->pending.mutex); + return AT_PORT_ERROR; + } + + /* Wait for '>' prompt. */ + struct timespec ts; + clock_gettime(CLOCK_REALTIME, &ts); + ts.tv_sec += AT_TIMEOUT_SMS / 1000; + ts.tv_nsec += (AT_TIMEOUT_SMS % 1000) * 1000000L; + if (ts.tv_nsec >= 1000000000L) { + ts.tv_sec++; + ts.tv_nsec -= 1000000000L; + } + + pthread_mutex_lock(&ctx->pending.mutex); + while (!ctx->pending.completed) { + int rc = pthread_cond_timedwait(&ctx->pending.cond, &ctx->pending.mutex, &ts); + if (rc == ETIMEDOUT) { + char esc = 0x1B; + if (at_write_raw(ctx, &esc, 1) < 0) { + OLOG_WARNING("PDU abort ESC write failed after prompt timeout"); + } + ctx->pending.type = AT_PENDING_NONE; + pthread_mutex_unlock(&ctx->pending.mutex); + OLOG_ERROR("PDU prompt timeout for AT+CMGS=%d", tpdu_octets); + if (response) { + response->status = AT_TIMEOUT; + } + return AT_TIMEOUT; + } + } + + if (ctx->pending.response.status != AT_OK) { + at_status_t status = ctx->pending.response.status; + if (response) { + *response = ctx->pending.response; + } + ctx->pending.type = AT_PENDING_NONE; + pthread_mutex_unlock(&ctx->pending.mutex); + return status; + } + pthread_mutex_unlock(&ctx->pending.mutex); + + /* Phase 2: hex body + Ctrl-Z. Fixed stack buffer — `pdu_hex` has a + * compile-time ceiling of PDU_MAX_HEX_LEN, so no allocation needed. */ + size_t hex_len = strlen(pdu_hex); + if (hex_len > PDU_MAX_HEX_LEN) { + return AT_PORT_ERROR; + } + char send_buf[PDU_MAX_HEX_LEN + 2]; + memcpy(send_buf, pdu_hex, hex_len); + send_buf[hex_len] = 0x1A; + + pthread_mutex_lock(&ctx->pending.mutex); + ctx->pending.type = AT_PENDING_SYNC; + ctx->pending.completed = false; + memset(&ctx->pending.response, 0, sizeof(ctx->pending.response)); + pthread_mutex_unlock(&ctx->pending.mutex); + + if (at_write_raw(ctx, send_buf, (int)(hex_len + 1)) < 0) { + pthread_mutex_lock(&ctx->pending.mutex); + ctx->pending.type = AT_PENDING_NONE; + pthread_mutex_unlock(&ctx->pending.mutex); + return AT_PORT_ERROR; + } + + clock_gettime(CLOCK_REALTIME, &ts); + ts.tv_sec += AT_TIMEOUT_SMS / 1000; + ts.tv_nsec += (AT_TIMEOUT_SMS % 1000) * 1000000L; + if (ts.tv_nsec >= 1000000000L) { + ts.tv_sec++; + ts.tv_nsec -= 1000000000L; + } + + pthread_mutex_lock(&ctx->pending.mutex); + while (!ctx->pending.completed) { + int rc = pthread_cond_timedwait(&ctx->pending.cond, &ctx->pending.mutex, &ts); + if (rc == ETIMEDOUT) { + ctx->pending.type = AT_PENDING_NONE; + pthread_mutex_unlock(&ctx->pending.mutex); + OLOG_ERROR("PDU send timeout after body (octets=%d)", tpdu_octets); + if (response) { + response->status = AT_TIMEOUT; + } + return AT_TIMEOUT; + } + } + + at_status_t status = ctx->pending.response.status; + if (response) { + *response = ctx->pending.response; + } + ctx->pending.type = AT_PENDING_NONE; + pthread_mutex_unlock(&ctx->pending.mutex); + + return status; +} + /* ── Response parsing ────────────────────────────────────────────────── */ bool at_parse_terminator(const char *line, at_status_t *status, int *err_code) { diff --git a/src/modem.c b/src/modem.c index 6290e43..d8c20f7 100644 --- a/src/modem.c +++ b/src/modem.c @@ -46,12 +46,12 @@ static int init_cmd(at_context_t *at, const char *cmd, const char *desc) { return 0; } -int modem_init(at_context_t *at) { +int modem_init(at_context_t *at, bool pdu_mode) { if (!at) { return -1; } - OLOG_INFO("Starting modem initialization sequence"); + OLOG_INFO("Starting modem initialization sequence (mode=%s)", pdu_mode ? "PDU" : "text"); /* Verify communication */ if (init_cmd(at, "AT", "verify comm") != 0) { @@ -62,11 +62,14 @@ int modem_init(at_context_t *at) { /* Core setup — failures are warnings, not fatal */ init_cmd(at, "ATE0", "disable echo"); init_cmd(at, "AT+CMEE=2", "verbose errors"); - /* Keep modem default UCS2 charset for full Unicode/emoji SMS support. - * AT+CSMP DCS=8 tells the network the body is UCS2-encoded. */ - init_cmd(at, "AT+CSMP=17,167,0,8", "SMS params UCS2 DCS"); init_cmd(at, "AT+CLIP=1", "caller ID"); - init_cmd(at, "AT+CMGF=1", "SMS text mode"); + if (pdu_mode) { + init_cmd(at, "AT+CMGF=0", "SMS PDU mode"); + } else { + /* Text-mode path keeps the DCS hint so UCS2 text-mode encodes still work. */ + init_cmd(at, "AT+CSMP=17,167,0,8", "SMS params UCS2 DCS"); + init_cmd(at, "AT+CMGF=1", "SMS text mode"); + } init_cmd(at, "AT+CPMS=\"ME\",\"ME\",\"ME\"", "SMS storage to ME"); init_cmd(at, "AT+CNMI=2,1,0,0,0", "SMS notification URC"); init_cmd(at, "AT+CREG=1", "network reg URC"); diff --git a/src/mqtt_comms.c b/src/mqtt_comms.c index 8ba14cb..1fa59e5 100644 --- a/src/mqtt_comms.c +++ b/src/mqtt_comms.c @@ -154,6 +154,7 @@ int mqtt_build_response_json(const char *action, const char *value, const char *err_code, const char *err_msg, + const char *data_json, char *buf, size_t size) { if (!action || !request_id || !buf || size == 0) { @@ -179,6 +180,19 @@ int mqtt_build_response_json(const char *action, json_object_object_add(obj, "error", err); } + /* Optional caller-provided data object. Parse it here so the published + * payload is a real nested object instead of a quoted string blob. Bad + * input is dropped with a warning — we don't want a caller bug to eat + * the whole response. */ + if (data_json && data_json[0] != '\0') { + struct json_object *data = json_tokener_parse(data_json); + if (data) { + json_object_object_add(obj, "data", data); + } else { + OLOG_WARNING("mqtt_build_response_json: malformed data_json dropped"); + } + } + json_object_object_add(obj, "timestamp", json_object_new_int64(get_timestamp())); const char *json_str = json_object_to_json_string(obj); @@ -410,14 +424,15 @@ int mqtt_publish_response(const char *action, bool success, const char *value, const char *err_code, - const char *err_msg) { + const char *err_msg, + const char *data_json) { if (!mosq || !mqtt_initialized) { return -1; } char buf[1024]; - if (mqtt_build_response_json(action, request_id, success, value, err_code, err_msg, buf, - sizeof(buf)) < 0) { + if (mqtt_build_response_json(action, request_id, success, value, err_code, err_msg, data_json, + buf, sizeof(buf)) < 0) { return -1; } diff --git a/src/oasis-echo.c b/src/oasis-echo.c index 0f16e98..aa0a94e 100644 --- a/src/oasis-echo.c +++ b/src/oasis-echo.c @@ -37,6 +37,8 @@ #include "modem.h" #include "mqtt_comms.h" #include "sms.h" +#include "sms_io.h" +#include "sms_reassembly.h" #include "urc_handler.h" /* ── Globals ─────────────────────────────────────────────────────────── */ @@ -45,6 +47,8 @@ static volatile sig_atomic_t g_running = 1; static at_context_t g_at_ctx; static rate_bucket_t g_call_bucket; static rate_bucket_t g_sms_bucket; +static rate_bucket_t g_segment_bucket; +static sms_io_ctx_t g_sms_io; /* Thread-safe call state using __atomic builtins (GCC) */ static call_state_t g_call_state = CALL_STATE_IDLE; @@ -96,7 +100,14 @@ typedef struct { volatile int tail; /* written by consumer */ } cmd_queue_t; +/* Two queues so MQTT commands always drain ahead of deferred events. A + * multi-segment SMS burst pushes up to 10 CMTIs onto the deferred queue; + * without this split, a newly-arrived hangup/dial would wait behind those + * CMGR+CMGD operations (several seconds in the worst case). Both queues + * are single-producer / single-consumer: MQTT thread → g_cmd_queue; URC + * reader thread → g_deferred_queue; both consumed by main. */ static cmd_queue_t g_cmd_queue; +static cmd_queue_t g_deferred_queue; static void cmd_queue_init(cmd_queue_t *q) { memset(q, 0, sizeof(*q)); @@ -133,37 +144,40 @@ static void signal_handler(int sig) { void rate_bucket_init(rate_bucket_t *bucket, int max_per_hour) { memset(bucket, 0, sizeof(*bucket)); bucket->max_per_hour = max_per_hour; + bucket->tokens = (double)max_per_hour; /* start full */ + bucket->last_refill_sec = (int64_t)time(NULL); } -bool rate_bucket_allow(rate_bucket_t *bucket) { +static void rate_bucket_refill(rate_bucket_t *bucket) { int64_t now = (int64_t)time(NULL); - int64_t window_start = now - 3600; - - /* Count events in the last hour */ - int active = 0; - for (int i = 0; i < bucket->count && i < 64; i++) { - if (bucket->timestamps[i] >= window_start) { - active++; - } + int64_t elapsed = now - bucket->last_refill_sec; + if (elapsed <= 0) { + return; } + double add = (double)elapsed * (double)bucket->max_per_hour / 3600.0; + bucket->tokens += add; + if (bucket->tokens > (double)bucket->max_per_hour) { + bucket->tokens = (double)bucket->max_per_hour; + } + bucket->last_refill_sec = now; +} - if (active >= bucket->max_per_hour) { +bool rate_bucket_take_n(rate_bucket_t *bucket, int n) { + if (!bucket || n <= 0) { return false; } - - /* Record this event */ - bucket->timestamps[bucket->head] = now; - bucket->head = (bucket->head + 1) % 64; - if (bucket->count < 64) { - bucket->count++; + rate_bucket_refill(bucket); + if (bucket->tokens < (double)n) { + return false; } - - /* Fix: keep count accurate as old entries expire */ - bucket->count = active + 1; - + bucket->tokens -= (double)n; return true; } +bool rate_bucket_allow(rate_bucket_t *bucket) { + return rate_bucket_take_n(bucket, 1); +} + /* ── Input validation helpers ────────────────────────────────────────── */ /** @@ -230,7 +244,7 @@ static void on_urc_event(const urc_event_t *event, void *userdata) { cmd_entry_t audio_cmd; memset(&audio_cmd, 0, sizeof(audio_cmd)); audio_cmd.type = CMD_TYPE_CALL_CONNECTED; - cmd_queue_push(&g_cmd_queue, &audio_cmd); + cmd_queue_push(&g_deferred_queue, &audio_cmd); OLOG_INFO("Call connected"); break; } @@ -321,8 +335,8 @@ static void on_urc_event(const urc_event_t *event, void *userdata) { memset(&cmd, 0, sizeof(cmd)); cmd.type = CMD_TYPE_CMTI; cmd.sms_index = event->index; - if (!cmd_queue_push(&g_cmd_queue, &cmd)) { - OLOG_WARNING("Command queue full, dropping CMTI event for index %d", event->index); + if (!cmd_queue_push(&g_deferred_queue, &cmd)) { + OLOG_WARNING("Deferred queue full, dropping CMTI event for index %d", event->index); } break; } @@ -339,86 +353,7 @@ static void on_urc_event(const urc_event_t *event, void *userdata) { /* ── Deferred SMS read (runs on main thread) ─────────────────────────── */ static void handle_cmti(int sms_index) { - OLOG_INFO("Reading SMS at index %d...", sms_index); - char cmd[32]; - snprintf(cmd, sizeof(cmd), "AT+CMGR=%d", sms_index); - at_response_t resp; - at_status_t rc = at_command_send(&g_at_ctx, cmd, &resp, AT_TIMEOUT_DEFAULT); - if (rc != AT_OK) { - OLOG_WARNING("Failed to read SMS at index %d: %s (code=%d, data=[%s])", sms_index, - at_status_str(rc), resp.error_code, resp.data); - return; - } - - /* Parse +CMGR response — in UCS2 mode, sender and body are hex-encoded. - * Format: +CMGR: "REC UNREAD","hex_sender","","timestamp"\nhex_body - * Extract hex substrings from resp.data, then decode to UTF-8. */ - char sender_hex[PHONE_NUMBER_HEX_MAX + 1] = ""; - const char *body_hex = ""; /* points into resp.data, no copy needed */ - - const char *cmgr = strstr(resp.data, "+CMGR:"); - if (cmgr) { - /* Extract sender hex (second quoted string) */ - const char *q1 = strchr(cmgr, '"'); - if (q1) { - q1 = strchr(q1 + 1, '"'); - if (q1) { - q1 = strchr(q1 + 1, '"'); - if (q1) { - const char *q2 = strchr(q1 + 1, '"'); - if (q2) { - size_t len = (size_t)(q2 - q1 - 1); - if (len > PHONE_NUMBER_HEX_MAX) { - len = PHONE_NUMBER_HEX_MAX; - } - memcpy(sender_hex, q1 + 1, len); - sender_hex[len] = '\0'; - } - } - } - } - /* Body hex is after the first newline — point directly into resp.data */ - const char *nl = strchr(cmgr, '\n'); - if (nl) { - body_hex = nl + 1; - } - /* Trim trailing whitespace in resp.data (mutate OK, we own it) */ - size_t blen = strlen(body_hex); - if (blen > 0) { - char *end = resp.data + (body_hex - resp.data) + blen; - while (end > body_hex && (*(end - 1) == '\n' || *(end - 1) == '\r' || *(end - 1) == ' ')) { - *(--end) = '\0'; - } - } - } - - /* Decode UCS2 hex to UTF-8 */ - char sender[PHONE_NUMBER_MAX + 1] = ""; - char body[SMS_BODY_MAX + 1] = ""; - sms_ucs2_hex_to_utf8(sender_hex, sender, sizeof(sender)); - sms_ucs2_hex_to_utf8(body_hex, body, sizeof(body)); - - /* Build event using json-c for proper escaping (OCP v1.4) */ - struct timespec ts; - clock_gettime(CLOCK_REALTIME, &ts); - int64_t timestamp_ms = (int64_t)ts.tv_sec * 1000 + ts.tv_nsec / 1000000; - - struct json_object *evt = json_object_new_object(); - json_object_object_add(evt, "device", json_object_new_string("echo")); - json_object_object_add(evt, "msg_type", json_object_new_string("event")); - json_object_object_add(evt, "event", json_object_new_string("sms_received")); - json_object_object_add(evt, "index", json_object_new_int(sms_index)); - json_object_object_add(evt, "sender", json_object_new_string(sender)); - json_object_object_add(evt, "body", json_object_new_string(body)); - json_object_object_add(evt, "timestamp", json_object_new_int64(timestamp_ms)); - - const char *json_str = json_object_to_json_string(evt); - mqtt_publish_event(json_str); - json_object_put(evt); - - /* DAWN is responsible for sending delete_sms after committing to phone_db. - * If ECHO auto-deleted here and DAWN crashed before DB commit, the SMS - * would be lost. The index is included in the event for DAWN to reference. */ + sms_io_handle_cmti(&g_sms_io, sms_index); } /* ── MQTT command handler (queues to main thread) ────────────────────── */ @@ -445,7 +380,7 @@ static void on_mqtt_command(const char *action, if (!cmd_queue_push(&g_cmd_queue, &cmd)) { OLOG_WARNING("Command queue full, rejecting action=%s request_id=%s", action, request_id); mqtt_publish_response(action, request_id, false, NULL, "QUEUE_FULL", - "Command queue full, try again"); + "Command queue full, try again", NULL); } } @@ -461,12 +396,12 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { if (strcmp(action, "dial") == 0) { if (!sms_validate_number(value)) { mqtt_publish_response(action, request_id, false, NULL, "INVALID_NUMBER", - "Phone number validation failed"); + "Phone number validation failed", NULL); return; } if (!rate_bucket_allow(&g_call_bucket)) { mqtt_publish_response(action, request_id, false, NULL, "RATE_LIMITED", - "Call rate limit exceeded"); + "Call rate limit exceeded", NULL); return; } @@ -475,10 +410,10 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { at_status_t rc = at_command_send_async(&g_at_ctx, at_cmd); if (rc == AT_OK) { set_call_state(CALL_STATE_DIALING); - mqtt_publish_response(action, request_id, true, NULL, NULL, NULL); + mqtt_publish_response(action, request_id, true, NULL, NULL, NULL, NULL); } else { mqtt_publish_response(action, request_id, false, NULL, "AT_ERROR", - "Failed to send dial command"); + "Failed to send dial command", NULL); } return; } @@ -487,10 +422,10 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { if (strcmp(action, "answer") == 0) { at_status_t rc = at_command_send_async(&g_at_ctx, "ATA"); if (rc == AT_OK) { - mqtt_publish_response(action, request_id, true, NULL, NULL, NULL); + mqtt_publish_response(action, request_id, true, NULL, NULL, NULL, NULL); } else { mqtt_publish_response(action, request_id, false, NULL, "AT_ERROR", - "Failed to send answer command"); + "Failed to send answer command", NULL); } return; } @@ -505,7 +440,7 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { set_last_ring_time(0); /* AT+CHUP may return OK or NO CARRIER — both mean the call ended */ if (rc == AT_OK || rc == AT_NO_CARRIER) { - mqtt_publish_response(action, request_id, true, NULL, NULL, NULL); + mqtt_publish_response(action, request_id, true, NULL, NULL, NULL, NULL); /* Publish call_ended directly — don't rely on URC which we'll suppress * since state is already IDLE by the time it arrives. */ if (prev != CALL_STATE_IDLE) { @@ -520,25 +455,14 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { OLOG_INFO("Call ended: local hangup"); } } else { - mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), "Hangup failed"); + mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), "Hangup failed", + NULL); } return; } - /* send_sms */ + /* send_sms — delegate encoding + transmission + response to sms_io. */ if (strcmp(action, "send_sms") == 0) { - if (!sms_validate_number(value)) { - mqtt_publish_response(action, request_id, false, NULL, "INVALID_NUMBER", - "Phone number validation failed"); - return; - } - if (!rate_bucket_allow(&g_sms_bucket)) { - mqtt_publish_response(action, request_id, false, NULL, "RATE_LIMITED", - "SMS rate limit exceeded"); - return; - } - - /* Extract body from data_json */ char body[SMS_BODY_MAX + 1] = ""; if (data_json[0] != '\0') { struct json_object *data = json_tokener_parse(data_json); @@ -550,38 +474,7 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { json_object_put(data); } } - - /* Sanitize body */ - char clean[SMS_BODY_MAX + 1]; - int clean_len = sms_sanitize_body(body, clean, sizeof(clean)); - if (clean_len < 0) { - mqtt_publish_response(action, request_id, false, NULL, "INVALID_BODY", - "SMS body contains dangerous characters"); - return; - } - - /* Encode number and body to UCS2 hex for AT+CMGS (modem uses UCS2 charset) */ - char hex_number[PHONE_NUMBER_HEX_MAX + 1]; - if (sms_utf8_to_ucs2_hex(value, hex_number, sizeof(hex_number)) < 0) { - mqtt_publish_response(action, request_id, false, NULL, "ENCODE_ERROR", - "Failed to encode phone number"); - return; - } - char hex_body[SMS_BODY_HEX_MAX + 1]; - if (sms_utf8_to_ucs2_hex(clean, hex_body, sizeof(hex_body)) < 0) { - mqtt_publish_response(action, request_id, false, NULL, "ENCODE_ERROR", - "Failed to encode SMS body"); - return; - } - - at_response_t resp; - at_status_t rc = at_command_send_sms(&g_at_ctx, hex_number, hex_body, &resp); - if (rc == AT_OK) { - mqtt_publish_response(action, request_id, true, NULL, NULL, NULL); - } else { - mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), - "SMS send failed"); - } + sms_io_send_and_respond(&g_sms_io, value, body, action, request_id); return; } @@ -590,7 +483,7 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { long idx; if (!validate_sms_index(value, &idx)) { mqtt_publish_response(action, request_id, false, NULL, "INVALID_INDEX", - "SMS index must be 0-999"); + "SMS index must be 0-999", NULL); return; } char at_cmd[32]; @@ -598,10 +491,10 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { at_response_t resp; at_status_t rc = at_command_send(&g_at_ctx, at_cmd, &resp, AT_TIMEOUT_DEFAULT); if (rc == AT_OK) { - mqtt_publish_response(action, request_id, true, resp.data, NULL, NULL); + mqtt_publish_response(action, request_id, true, resp.data, NULL, NULL, NULL); } else { mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), - "Failed to read SMS"); + "Failed to read SMS", NULL); } return; } @@ -611,7 +504,7 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { long idx; if (!validate_sms_index(value, &idx)) { mqtt_publish_response(action, request_id, false, NULL, "INVALID_INDEX", - "SMS index must be 0-999"); + "SMS index must be 0-999", NULL); return; } char at_cmd[32]; @@ -619,10 +512,10 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { at_response_t resp; at_status_t rc = at_command_send(&g_at_ctx, at_cmd, &resp, AT_TIMEOUT_DEFAULT); if (rc == AT_OK) { - mqtt_publish_response(action, request_id, true, NULL, NULL, NULL); + mqtt_publish_response(action, request_id, true, NULL, NULL, NULL, NULL); } else { mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), - "Failed to delete SMS"); + "Failed to delete SMS", NULL); } return; } @@ -635,11 +528,11 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { json_object_object_add(val, "signal_dbm", json_object_new_int(dbm)); json_object_object_add(val, "csq", json_object_new_int(csq)); const char *val_str = json_object_to_json_string(val); - mqtt_publish_response(action, request_id, true, val_str, NULL, NULL); + mqtt_publish_response(action, request_id, true, val_str, NULL, NULL, NULL); json_object_put(val); } else { mqtt_publish_response(action, request_id, false, NULL, "SIGNAL_ERROR", - "Failed to read signal"); + "Failed to read signal", NULL); } return; } @@ -648,12 +541,12 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { if (strcmp(action, "dtmf") == 0) { if (!value || value[0] == '\0') { mqtt_publish_response(action, request_id, false, NULL, "INVALID_VALUE", - "DTMF digit required"); + "DTMF digit required", NULL); return; } if (!validate_dtmf(value[0])) { mqtt_publish_response(action, request_id, false, NULL, "INVALID_DTMF", - "DTMF must be 0-9, *, #, A-D"); + "DTMF must be 0-9, *, #, A-D", NULL); return; } char at_cmd[32]; @@ -661,9 +554,10 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { at_response_t resp; at_status_t rc = at_command_send(&g_at_ctx, at_cmd, &resp, AT_TIMEOUT_DEFAULT); if (rc == AT_OK) { - mqtt_publish_response(action, request_id, true, NULL, NULL, NULL); + mqtt_publish_response(action, request_id, true, NULL, NULL, NULL, NULL); } else { - mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), "DTMF failed"); + mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), "DTMF failed", + NULL); } return; } @@ -673,18 +567,18 @@ static void process_mqtt_command(const cmd_entry_t *cmd) { at_response_t resp; at_status_t rc = at_command_send(&g_at_ctx, "AT+CLCC", &resp, AT_TIMEOUT_DEFAULT); if (rc == AT_OK) { - mqtt_publish_response(action, request_id, true, resp.data, NULL, NULL); + mqtt_publish_response(action, request_id, true, resp.data, NULL, NULL, NULL); } else { mqtt_publish_response(action, request_id, false, NULL, at_status_str(rc), - "Call query failed"); + "Call query failed", NULL); } return; } /* Unknown action */ OLOG_WARNING("Unknown command action: %s", action); - mqtt_publish_response(action, request_id, false, NULL, "UNKNOWN_ACTION", - "Action not recognized"); + mqtt_publish_response(action, request_id, false, NULL, "UNKNOWN_ACTION", "Action not recognized", + NULL); } /* ── Usage / Version ─────────────────────────────────────────────────── */ @@ -708,6 +602,7 @@ static void print_usage(const char *prog) { printf(" --mqtt-password PASS MQTT password (or env MQTT_PASSWORD, preferred)\n"); printf(" --mqtt-tls Enable MQTT TLS\n"); printf(" --mqtt-ca-cert PATH CA certificate path (implies --mqtt-tls)\n"); + printf(" --legacy-sms Use legacy text-mode SMS (AT+CMGF=1), no concat\n"); printf(" -e, --service Run in service mode (syslog)\n"); printf(" -h, --help Show this help\n"); printf(" -v, --version Show version\n"); @@ -730,6 +625,9 @@ int main(int argc, char *argv[]) { config.telemetry_interval_s = ECHO_DEFAULT_TELEMETRY_S; config.rate_limit_calls_per_hour = ECHO_DEFAULT_RATE_CALLS_H; config.rate_limit_sms_per_hour = ECHO_DEFAULT_RATE_SMS_H; + config.rate_limit_segments_per_hour = ECHO_DEFAULT_RATE_SEGMENTS_H; + config.inter_segment_delay_ms = ECHO_DEFAULT_SEGMENT_DELAY_MS; + config.pdu_mode = true; config.service_mode = false; /* Environment variable overrides (for systemd EnvironmentFile) */ @@ -767,6 +665,15 @@ int main(int argc, char *argv[]) { if ((env = getenv("RATE_LIMIT_SMS_PER_HOUR"))) { config.rate_limit_sms_per_hour = atoi(env); } + if ((env = getenv("RATE_LIMIT_SEGMENTS_PER_HOUR"))) { + config.rate_limit_segments_per_hour = atoi(env); + } + if ((env = getenv("INTER_SEGMENT_DELAY_MS"))) { + config.inter_segment_delay_ms = atoi(env); + } + if ((env = getenv("PDU_MODE"))) { + config.pdu_mode = !(strcmp(env, "0") == 0 || strcmp(env, "false") == 0); + } /* Command-line overrides */ static struct option long_options[] = { @@ -778,6 +685,7 @@ int main(int argc, char *argv[]) { { "mqtt-password", required_argument, 0, 1001 }, { "mqtt-tls", no_argument, 0, 1002 }, { "mqtt-ca-cert", required_argument, 0, 1003 }, + { "legacy-sms", no_argument, 0, 1004 }, { "service", no_argument, 0, 'e' }, { "help", no_argument, 0, 'h' }, { "version", no_argument, 0, 'v' }, @@ -812,6 +720,9 @@ int main(int argc, char *argv[]) { snprintf(config.mqtt_ca_cert, sizeof(config.mqtt_ca_cert), "%s", optarg); config.mqtt_tls = 1; break; + case 1004: + config.pdu_mode = false; + break; case 'e': config.service_mode = true; break; @@ -842,10 +753,19 @@ int main(int argc, char *argv[]) { signal(SIGINT, signal_handler); signal(SIGTERM, signal_handler); - /* Init command queue and rate limiters */ + /* Init command queues and rate limiters */ cmd_queue_init(&g_cmd_queue); + cmd_queue_init(&g_deferred_queue); rate_bucket_init(&g_call_bucket, config.rate_limit_calls_per_hour); rate_bucket_init(&g_sms_bucket, config.rate_limit_sms_per_hour); + rate_bucket_init(&g_segment_bucket, config.rate_limit_segments_per_hour); + sms_reassembly_reset(); + + g_sms_io.at = &g_at_ctx; + g_sms_io.msg_bucket = &g_sms_bucket; + g_sms_io.segment_bucket = &g_segment_bucket; + g_sms_io.inter_segment_delay_ms = config.inter_segment_delay_ms; + g_sms_io.pdu_mode = config.pdu_mode; /* Open serial port */ if (at_open(&g_at_ctx, config.serial_port, config.serial_baud) != 0) { @@ -864,7 +784,7 @@ int main(int argc, char *argv[]) { } /* Run modem init sequence */ - if (modem_init(&g_at_ctx) != 0) { + if (modem_init(&g_at_ctx, config.pdu_mode) != 0) { OLOG_ERROR("Modem init failed — exiting"); urc_stop(&urc_ctx); at_close(&g_at_ctx); @@ -895,17 +815,32 @@ int main(int argc, char *argv[]) { while (g_running) { time_t now = time(NULL); - /* Drain command queue (MQTT commands + deferred events) */ + /* Drain MQTT commands first so a newly-arrived hangup/dial never waits + * behind a queued CMTI burst (a 10-segment inbound SMS generates 10 + * CMTIs, each holding the main thread for a CMGR+CMGD round-trip). */ cmd_entry_t cmd; while (cmd_queue_pop(&g_cmd_queue, &cmd)) { + process_mqtt_command(&cmd); + last_at_success = time(NULL); + } + + /* Then drain one deferred event per tick. Re-checking `g_cmd_queue` + * between each ensures a command arriving mid-burst still jumps + * ahead of the remaining CMTIs. */ + while (cmd_queue_pop(&g_deferred_queue, &cmd)) { if (cmd.type == CMD_TYPE_CMTI) { handle_cmti(cmd.sms_index); } else if (cmd.type == CMD_TYPE_CALL_CONNECTED) { modem_call_audio_setup(&g_at_ctx); - } else { - process_mqtt_command(&cmd); } last_at_success = time(NULL); + + /* Yield back to MQTT between deferred events. */ + cmd_entry_t mqtt_cmd; + while (cmd_queue_pop(&g_cmd_queue, &mqtt_cmd)) { + process_mqtt_command(&mqtt_cmd); + last_at_success = time(NULL); + } } /* Ring timeout watchdog — if ringing for too long without a termination URC, @@ -949,6 +884,10 @@ int main(int argc, char *argv[]) { last_telemetry = now; } + /* Reassembly sweep — clears timed-out slots even during idle periods + * so the 10-min TTL behavior is deterministic regardless of traffic. */ + sms_reassembly_sweep(now); + /* Heartbeat (every 30s, but skip if recent AT success) */ if (now - last_heartbeat >= 30) { if (now - last_at_success < 30) { diff --git a/src/pdu.c b/src/pdu.c new file mode 100644 index 0000000..9f2be74 --- /dev/null +++ b/src/pdu.c @@ -0,0 +1,1001 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * SMS PDU encoder/decoder. See pdu.h for the public API and 3GPP TS 23.040 + * for the wire format. + */ + +/* timegm() needs _DEFAULT_SOURCE on glibc; getrandom() needs . */ +#define _DEFAULT_SOURCE + +#include "pdu.h" + +#include +#include +#include +#include + +/* ── Constants ───────────────────────────────────────────────────────── */ + +#define TP_MTI_SUBMIT 0x01 +#define TP_MTI_DELIVER 0x00 +#define TP_MTI_MASK 0x03 +#define TP_UDHI_FLAG 0x40 +#define TP_VPF_RELATIVE 0x10 /* VPF=10 in bits 4-3 of first octet */ +#define VP_RELATIVE_4_DAYS 0xAA + +/* UDH for 8-bit reference concatenation: + * UDHL=05, IEI=00, IEL=03, ref, total, seq → 6 octets total (with UDHL byte). */ +#define UDH_CONCAT_TOTAL_OCTETS 6 + +static const char HEX_CHARS[] = "0123456789ABCDEF"; + +/* ── Hex helpers ─────────────────────────────────────────────────────── */ + +static inline int hex_nibble(char c) { + if (c >= '0' && c <= '9') + return c - '0'; + if (c >= 'A' && c <= 'F') + return c - 'A' + 10; + if (c >= 'a' && c <= 'f') + return c - 'a' + 10; + return -1; +} + +static bool hex_string_is_valid(const char *hex, size_t len) { + if ((len & 1) != 0) { + return false; + } + for (size_t i = 0; i < len; i++) { + if (hex_nibble(hex[i]) < 0) { + return false; + } + } + return true; +} + +static int hex_read_octet(const char *hex, uint8_t *out) { + int hi = hex_nibble(hex[0]); + int lo = hex_nibble(hex[1]); + if (hi < 0 || lo < 0) { + return -1; + } + *out = (uint8_t)((hi << 4) | lo); + return 0; +} + +static void hex_write_octet(uint8_t v, char *out) { + out[0] = HEX_CHARS[(v >> 4) & 0x0F]; + out[1] = HEX_CHARS[v & 0x0F]; +} + +/* ── UTF-8 / UCS2 helpers ────────────────────────────────────────────── */ + +/* Decode one UTF-8 code point. Returns bytes consumed (1..4), or 0 at end + * of string. On malformed input, substitutes U+FFFD and advances 1 byte. */ +static int utf8_decode(const unsigned char *s, uint32_t *out_cp) { + if (s[0] == 0) { + return 0; + } + if (s[0] < 0x80) { + *out_cp = s[0]; + return 1; + } + if ((s[0] & 0xE0) == 0xC0 && (s[1] & 0xC0) == 0x80) { + *out_cp = ((uint32_t)(s[0] & 0x1F) << 6) | (s[1] & 0x3F); + return 2; + } + if ((s[0] & 0xF0) == 0xE0 && (s[1] & 0xC0) == 0x80 && (s[2] & 0xC0) == 0x80) { + *out_cp = ((uint32_t)(s[0] & 0x0F) << 12) | ((uint32_t)(s[1] & 0x3F) << 6) | (s[2] & 0x3F); + return 3; + } + if ((s[0] & 0xF8) == 0xF0 && (s[1] & 0xC0) == 0x80 && (s[2] & 0xC0) == 0x80 && + (s[3] & 0xC0) == 0x80) { + *out_cp = ((uint32_t)(s[0] & 0x07) << 18) | ((uint32_t)(s[1] & 0x3F) << 12) | + ((uint32_t)(s[2] & 0x3F) << 6) | (s[3] & 0x3F); + return 4; + } + *out_cp = 0xFFFD; + return 1; +} + +/* UCS2 surrogate pair helpers */ +static bool is_high_surrogate(uint16_t w) { + return w >= 0xD800 && w <= 0xDBFF; +} +static bool is_low_surrogate(uint16_t w) { + return w >= 0xDC00 && w <= 0xDFFF; +} + +/* Count UCS2 code units (16-bit words) needed for a UTF-8 body. A BMP + * character is 1 code unit; a supplementary character is 2 (surrogate pair). */ +static int utf8_to_ucs2_units(const char *utf8) { + if (!utf8) { + return 0; + } + const unsigned char *s = (const unsigned char *)utf8; + int units = 0; + while (*s) { + uint32_t cp; + int n = utf8_decode(s, &cp); + if (n == 0) { + break; + } + s += n; + units += (cp <= 0xFFFF) ? 1 : 2; + } + return units; +} + +/* ── Sanitization (inbound body) ─────────────────────────────────────── */ + +/* Strip/replace characters that shouldn't end up in displayed SMS text or in + * prompts routed to the LLM. Covers three threat classes: + * 1. C-string termination — U+0000. + * 2. Display spoofing — bidi overrides, zero-width joiners/non-joiners, + * BOM/WJ, variation selectors, Arabic/Mongolian format chars. + * 3. Prompt injection amplification — the Unicode Tag block (U+E0000.. + * U+E007F) is widely used to smuggle hidden instructions into LLM + * prompts; strip it outright. */ +static bool sanitize_ucs2_cp(uint32_t *cp) { + uint32_t c = *cp; + if (c == 0x0000) { + return false; + } + /* C0/C1 controls → space (keep \n, \t). */ + if (c < 0x20 && c != 0x0A && c != 0x09) { + *cp = 0x20; + return true; + } + if (c >= 0x7F && c <= 0x9F) { + *cp = 0x20; + return true; + } + /* Bidi overrides/isolates. */ + if ((c >= 0x202A && c <= 0x202E) || (c >= 0x2066 && c <= 0x2069)) { + return false; + } + /* Zero-width + formatting characters. */ + if ((c >= 0x200B && c <= 0x200F) || /* ZWSP, ZWNJ, ZWJ, LRM, RLM */ + c == 0x2060 || /* WORD JOINER */ + c == 0xFEFF || /* BOM / ZWNBSP */ + c == 0x061C || /* ALM */ + c == 0x180E) { /* MONGOLIAN VOWEL SEPARATOR */ + return false; + } + /* Variation selectors (VS1..VS16 in BMP, VS17..VS256 in supplementary). */ + if ((c >= 0xFE00 && c <= 0xFE0F) || (c >= 0xE0100 && c <= 0xE01EF)) { + return false; + } + /* Unicode tag characters — prompt-injection smuggling vector. */ + if (c >= 0xE0000 && c <= 0xE007F) { + return false; + } + return true; +} + +/* Append UTF-8 encoding of cp to body_out[pos..]. Returns new pos, or -1 if + * it wouldn't fit (caller must have reserved room for NUL). */ +static int append_utf8(char *body_out, size_t cap, int pos, uint32_t cp) { + /* Reserve one byte for NUL. */ + if ((size_t)pos >= cap) { + return -1; + } + size_t room = cap - 1 - (size_t)pos; + if (cp < 0x80) { + if (room < 1) + return -1; + body_out[pos++] = (char)cp; + } else if (cp < 0x800) { + if (room < 2) + return -1; + body_out[pos++] = (char)(0xC0 | (cp >> 6)); + body_out[pos++] = (char)(0x80 | (cp & 0x3F)); + } else if (cp < 0x10000) { + if (room < 3) + return -1; + body_out[pos++] = (char)(0xE0 | (cp >> 12)); + body_out[pos++] = (char)(0x80 | ((cp >> 6) & 0x3F)); + body_out[pos++] = (char)(0x80 | (cp & 0x3F)); + } else { + if (room < 4) + return -1; + body_out[pos++] = (char)(0xF0 | (cp >> 18)); + body_out[pos++] = (char)(0x80 | ((cp >> 12) & 0x3F)); + body_out[pos++] = (char)(0x80 | ((cp >> 6) & 0x3F)); + body_out[pos++] = (char)(0x80 | (cp & 0x3F)); + } + return pos; +} + +/* ── GSM 7-bit decode (inbound) ──────────────────────────────────────── */ +/* We decode GSM7 because most phones default to it for plain ASCII — 99% + * of real inbound traffic arrives with DCS=0x00, not UCS2. We don't encode + * GSM7 (UCS2-only for outbound per v1 plan); that's an independent bit of + * 7-bit packing that's only worth writing once bandwidth metrics justify. */ + +/* 3GPP TS 23.038 §6.2.1.1 default alphabet. 0x1B is the ESC marker for + * the extension table; everything else maps to a single Unicode code point. */ +static const uint16_t gsm7_default_table[128] = { + 0x0040, 0x00A3, 0x0024, 0x00A5, 0x00E8, 0x00E9, 0x00F9, 0x00EC, 0x00F2, 0x00C7, 0x000A, 0x00D8, + 0x00F8, 0x000D, 0x00C5, 0x00E5, 0x0394, 0x005F, 0x03A6, 0x0393, 0x039B, 0x03A9, 0x03A0, 0x03A8, + 0x03A3, 0x0398, 0x039E, 0xFFFF, 0x00C6, 0x00E6, 0x00DF, 0x00C9, 0x0020, 0x0021, 0x0022, 0x0023, + 0x00A4, 0x0025, 0x0026, 0x0027, 0x0028, 0x0029, 0x002A, 0x002B, 0x002C, 0x002D, 0x002E, 0x002F, + 0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037, 0x0038, 0x0039, 0x003A, 0x003B, + 0x003C, 0x003D, 0x003E, 0x003F, 0x00A1, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047, + 0x0048, 0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F, 0x0050, 0x0051, 0x0052, 0x0053, + 0x0054, 0x0055, 0x0056, 0x0057, 0x0058, 0x0059, 0x005A, 0x00C4, 0x00D6, 0x00D1, 0x00DC, 0x00A7, + 0x00BF, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067, 0x0068, 0x0069, 0x006A, 0x006B, + 0x006C, 0x006D, 0x006E, 0x006F, 0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077, + 0x0078, 0x0079, 0x007A, 0x00E4, 0x00F6, 0x00F1, 0x00FC, 0x00E0, +}; + +/* Extension table (3GPP TS 23.038 §6.2.1.1 Table 2). Triggered by ESC + * (0x1B) in the septet stream. Most values are reserved or duplicate the + * default alphabet; only the 10 below have distinct extension encodings. */ +static uint32_t gsm7_extension_lookup(uint8_t septet) { + switch (septet) { + case 0x0A: + return 0x000C; /* form feed */ + case 0x14: + return 0x005E; /* ^ */ + case 0x28: + return 0x007B; /* { */ + case 0x29: + return 0x007D; /* } */ + case 0x2F: + return 0x005C; /* \ */ + case 0x3C: + return 0x005B; /* [ */ + case 0x3D: + return 0x007E; /* ~ */ + case 0x3E: + return 0x005D; /* ] */ + case 0x40: + return 0x007C; /* | */ + case 0x65: + return 0x20AC; /* € */ + default: + return 0x003F; /* '?' — unknown extension, best-effort display */ + } +} + +/* Unpack n_septets from `data` starting at bit position `start_bit`. Each + * septet is 7 bits, LSB-first within the byte stream. Output is Unicode + * code points translated via the default alphabet + extension table. */ +static int unpack_gsm7_septets(const uint8_t *data, + size_t data_bytes, + size_t start_bit, + int n_septets, + uint32_t *out_cps, + int out_cap) { + int produced = 0; + bool pending_esc = false; + for (int i = 0; i < n_septets && produced < out_cap; i++) { + size_t bit_pos = start_bit + (size_t)i * 7; + size_t byte_idx = bit_pos / 8; + size_t bit_off = bit_pos % 8; + if (byte_idx >= data_bytes) { + break; + } + uint32_t window = data[byte_idx]; + if (byte_idx + 1 < data_bytes) { + window |= ((uint32_t)data[byte_idx + 1]) << 8; + } + uint8_t septet = (uint8_t)((window >> bit_off) & 0x7F); + + if (pending_esc) { + out_cps[produced++] = gsm7_extension_lookup(septet); + pending_esc = false; + } else if (septet == 0x1B) { + pending_esc = true; + } else { + out_cps[produced++] = gsm7_default_table[septet]; + } + } + /* A dangling ESC at the end is malformed — drop silently rather than + * emit an unterminated extension. */ + return produced; +} + +/* ── Address encoding ────────────────────────────────────────────────── */ + +/* Encode a phone number as BCD semi-octets with nibble swap. + * "+15551234567" → 91 (TOA) 11 (len=11) 51 55 21 43 65 F7 (odd pads with F). + * Writes to out (TOA + digits), returns bytes written or -1 on bad input. */ +static int encode_address_bcd(const char *dest, uint8_t *out, size_t out_cap, int *digit_count) { + if (!dest || out_cap < 2) { + return -1; + } + uint8_t toa = 0x81; /* unknown type, unknown numbering plan */ + const char *p = dest; + if (*p == '+') { + toa = 0x91; /* international */ + p++; + } + + char digits[20]; + int n = 0; + for (; *p; p++) { + if (*p < '0' || *p > '9') { + return -1; + } + if (n >= (int)sizeof(digits)) { + return -1; + } + digits[n++] = *p; + } + if (n == 0 || n > 15) { + return -1; + } + + size_t need = 1 + ((size_t)n + 1) / 2; + if (need > out_cap) { + return -1; + } + + out[0] = toa; + int o = 1; + for (int i = 0; i < n; i += 2) { + int hi = (i + 1 < n) ? (digits[i + 1] - '0') : 0xF; + int lo = digits[i] - '0'; + out[o++] = (uint8_t)((hi << 4) | lo); + } + + *digit_count = n; + return o; +} + +/* Decode a phone number from BCD semi-octets (with nibble swap). + * `toa` is the type-of-address byte (0x91=international → prepend '+'). + * `semi_octets` is the TP-OA length field (digit count, not byte count). + * Returns 0 on success, or a PDU_ERR_* on truncation/overflow. */ +static pdu_err_t decode_address_bcd(const uint8_t *buf, + size_t buf_len, + uint8_t toa, + int semi_octets, + char *out, + size_t out_cap) { + if (semi_octets < 0 || semi_octets > 20) { + return PDU_ERR_BAD_ADDRESS; + } + int bytes = (semi_octets + 1) / 2; + if ((size_t)bytes > buf_len) { + return PDU_ERR_TRUNCATED; + } + /* Need room for optional '+' + digits + NUL. */ + if (out_cap < (size_t)semi_octets + 2) { + return PDU_ERR_SENDER_OVERFLOW; + } + + size_t pos = 0; + if ((toa & 0x70) == 0x10) { + out[pos++] = '+'; + } + for (int i = 0; i < semi_octets; i++) { + int b = buf[i / 2]; + int nib = (i % 2 == 0) ? (b & 0x0F) : ((b >> 4) & 0x0F); + if (nib > 9) { + /* 'F' is the valid pad for odd digits; anything else is junk. */ + if (nib == 0xF && i == semi_octets - 1) { + break; + } + return PDU_ERR_BAD_ADDRESS; + } + out[pos++] = (char)('0' + nib); + } + out[pos] = '\0'; + return PDU_OK; +} + +/* ── Public: heuristics ──────────────────────────────────────────────── */ + +bool pdu_needs_ucs2(const char *utf8_body) { + (void)utf8_body; + return true; /* v1 policy */ +} + +uint8_t pdu_new_ref_id(void) { + static pthread_mutex_t ref_mutex = PTHREAD_MUTEX_INITIALIZER; + static uint8_t counter = 0; + static bool seeded = false; + + pthread_mutex_lock(&ref_mutex); + if (!seeded) { + /* Seed from the kernel RNG so a local observer can't predict the next + * ref_id from an earlier outbound PDU. Fall back to the clock if + * getrandom is unavailable — still better than 0. */ + uint8_t seed = 0; + if (getrandom(&seed, 1, GRND_NONBLOCK) != 1) { + seed = (uint8_t)(time(NULL) & 0xFF); + } + counter = seed; + seeded = true; + } + uint8_t v = ++counter; + if (v == 0) { + v = ++counter; /* skip 0 */ + } + pthread_mutex_unlock(&ref_mutex); + return v; +} + +int pdu_segment_count(const char *utf8_body, bool is_ucs2) { + if (!utf8_body || utf8_body[0] == '\0') { + return 0; + } + if (!is_ucs2) { + /* GSM7 not implemented for encode path — fall back to UCS2 math so + * callers that haven't switched don't accidentally pick the wrong + * segment count. */ + is_ucs2 = true; + } + int units = utf8_to_ucs2_units(utf8_body); + if (units <= 0) { + return 0; + } + /* Single-segment UCS2 allows 70 chars (no UDH); concat uses 67. + * We always emit concat headers once seg > 1 so the receiver joins them. */ + if (units <= 70) { + return 1; + } + int segs = (units + PDU_UCS2_CHARS_PER_SEG - 1) / PDU_UCS2_CHARS_PER_SEG; + if (segs > PDU_MAX_SEGMENTS) { + segs = PDU_MAX_SEGMENTS + 1; /* sentinel: too long */ + } + return segs; +} + +/* ── Encode path ─────────────────────────────────────────────────────── */ + +/* Convert UTF-8 body to UCS2 code-unit array. Returns units written, or -1 + * on buffer overflow. */ +static int utf8_to_ucs2(const char *utf8, uint16_t *units, int max_units) { + const unsigned char *s = (const unsigned char *)utf8; + int n = 0; + while (*s && n < max_units) { + uint32_t cp; + int consumed = utf8_decode(s, &cp); + if (consumed == 0) { + break; + } + s += consumed; + if (cp <= 0xFFFF) { + units[n++] = (uint16_t)cp; + } else { + if (n + 2 > max_units) { + return -1; + } + uint32_t adj = cp - 0x10000; + units[n++] = (uint16_t)(0xD800 + (adj >> 10)); + units[n++] = (uint16_t)(0xDC00 + (adj & 0x3FF)); + } + } + return n; +} + +static int write_tpdu_octet(pdu_segment_t *seg, int tpdu_pos, uint8_t v) { + /* 2 hex chars per octet, +2 for "00" SMSC prefix already written, +1 NUL. */ + int offset = 2 + tpdu_pos * 2; + if (offset + 2 >= (int)sizeof(seg->hex)) { + return -1; + } + hex_write_octet(v, seg->hex + offset); + return tpdu_pos + 1; +} + +pdu_err_t pdu_encode_submit(const char *dest, + const char *utf8_body, + uint8_t ref_id, + bool is_ucs2, + pdu_segment_t *out_segs, + int max_segs, + int *n_segs) { + if (!dest || !utf8_body || !out_segs || !n_segs || max_segs <= 0) { + return PDU_ERR_NULL_ARG; + } + if (!is_ucs2) { + /* GSM7 encode path deferred. */ + return PDU_ERR_UNSUPPORTED_DCS; + } + + /* Convert the whole body to UCS2 once. */ + uint16_t units[PDU_UCS2_CHARS_PER_SEG * PDU_MAX_SEGMENTS + 32]; + int total_units = utf8_to_ucs2(utf8_body, units, (int)(sizeof(units) / sizeof(units[0]))); + if (total_units < 0) { + return PDU_ERR_BODY_TOO_LONG; + } + if (total_units == 0) { + return PDU_ERR_NULL_ARG; + } + + int per_seg_max; + if (total_units <= 70) { + per_seg_max = total_units; + } else { + per_seg_max = PDU_UCS2_CHARS_PER_SEG; + } + + /* Precompute slice boundaries so surrogate pairs don't split across + * segments. A high surrogate at a slice boundary gets pushed to the next + * segment; if that would spill past PDU_MAX_SEGMENTS we refuse. */ + int slices[PDU_MAX_SEGMENTS]; + int segs = 0; + int cursor = 0; + while (cursor < total_units) { + if (segs >= PDU_MAX_SEGMENTS || segs >= max_segs) { + return PDU_ERR_BODY_TOO_LONG; + } + int remain = total_units - cursor; + int slice = (remain > per_seg_max) ? per_seg_max : remain; + /* Only pull back a surrogate if more units follow. */ + if (cursor + slice < total_units && is_high_surrogate(units[cursor + slice - 1])) { + slice--; + if (slice <= 0) { + return PDU_ERR_INTERNAL; /* per_seg_max == 1 — impossible for UCS2 */ + } + } + slices[segs++] = slice; + cursor += slice; + } + if (segs == 0) { + return PDU_ERR_NULL_ARG; + } + + /* Encode destination address once (same for all segments). */ + uint8_t addr_buf[16]; + int addr_digits = 0; + int addr_len = encode_address_bcd(dest, addr_buf, sizeof(addr_buf), &addr_digits); + if (addr_len < 0) { + return PDU_ERR_BAD_ADDRESS; + } + + int slice_start = 0; + for (int si = 0; si < segs; si++) { + pdu_segment_t *seg = &out_segs[si]; + memset(seg, 0, sizeof(*seg)); + + /* SMSC length prefix "00" tells the modem to use the default SMSC. + * It is NOT counted in tpdu_octets for AT+CMGS. */ + seg->hex[0] = '0'; + seg->hex[1] = '0'; + + int pos = 0; + int rc; + + /* TP-MTI (SUBMIT) + VPF=10 + UDHI (if concat) */ + uint8_t first = TP_MTI_SUBMIT | TP_VPF_RELATIVE; + if (segs > 1) { + first |= TP_UDHI_FLAG; + } + if ((rc = write_tpdu_octet(seg, pos, first)) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + + /* TP-MR — let modem assign */ + if ((rc = write_tpdu_octet(seg, pos, 0x00)) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + + /* TP-DA: length in semi-octets, then TOA + digits */ + if ((rc = write_tpdu_octet(seg, pos, (uint8_t)addr_digits)) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + for (int j = 0; j < addr_len; j++) { + if ((rc = write_tpdu_octet(seg, pos, addr_buf[j])) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + } + + /* TP-PID = 0 */ + if ((rc = write_tpdu_octet(seg, pos, 0x00)) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + + /* TP-DCS = 0x08 (UCS2) */ + if ((rc = write_tpdu_octet(seg, pos, (uint8_t)PDU_DCS_UCS2)) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + + /* TP-VP = 0xAA (relative, 4 days) */ + if ((rc = write_tpdu_octet(seg, pos, VP_RELATIVE_4_DAYS)) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + + int slice = slices[si]; + int body_octets = slice * 2; + int udh_octets = (segs > 1) ? UDH_CONCAT_TOTAL_OCTETS : 0; + int udl = body_octets + udh_octets; + + /* TP-UDL */ + if ((rc = write_tpdu_octet(seg, pos, (uint8_t)udl)) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + + /* TP-UD: optional UDH, then UCS2 body */ + if (segs > 1) { + /* UDHL=05, IEI=00 (8-bit ref concat), IEL=03, ref, total, seq */ + const uint8_t udh[UDH_CONCAT_TOTAL_OCTETS] = { 0x05, 0x00, 0x03, + ref_id, (uint8_t)segs, (uint8_t)(si + 1) }; + for (int k = 0; k < UDH_CONCAT_TOTAL_OCTETS; k++) { + if ((rc = write_tpdu_octet(seg, pos, udh[k])) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + } + } + + for (int u = 0; u < slice; u++) { + uint16_t w = units[slice_start + u]; + if ((rc = write_tpdu_octet(seg, pos, (uint8_t)(w >> 8))) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + if ((rc = write_tpdu_octet(seg, pos, (uint8_t)(w & 0xFF))) < 0) + return PDU_ERR_BUFFER_TOO_SMALL; + pos = rc; + } + + seg->tpdu_octets = pos; + seg->hex[2 + pos * 2] = '\0'; + slice_start += slice; + } + + *n_segs = segs; + return PDU_OK; +} + +/* ── Decode path ─────────────────────────────────────────────────────── */ + +typedef struct { + uint8_t octets[PDU_TPDU_MAX + 64]; + size_t len; +} pdu_buf_t; + +static pdu_err_t hex_to_buf(const char *hex, pdu_buf_t *buf) { + if (!hex) { + return PDU_ERR_NULL_ARG; + } + size_t len = strlen(hex); + /* Trim trailing whitespace. */ + while (len > 0 && (hex[len - 1] == '\r' || hex[len - 1] == '\n' || hex[len - 1] == ' ' || + hex[len - 1] == '\t')) { + len--; + } + if (len == 0) { + return PDU_ERR_TRUNCATED; + } + if (!hex_string_is_valid(hex, len)) { + return PDU_ERR_BAD_HEX; + } + size_t bytes = len / 2; + if (bytes > sizeof(buf->octets)) { + return PDU_ERR_BAD_LENGTH; + } + for (size_t i = 0; i < bytes; i++) { + uint8_t v; + if (hex_read_octet(hex + i * 2, &v) < 0) { + return PDU_ERR_BAD_HEX; + } + buf->octets[i] = v; + } + buf->len = bytes; + return PDU_OK; +} + +/* Parse a 7-octet SCTS into a time_t. Return 0 on error. */ +static time_t parse_scts(const uint8_t *s) { + /* Each octet is two BCD nibbles with nibble-swap. */ + int yr = (s[0] & 0x0F) * 10 + ((s[0] >> 4) & 0x0F); + int mo = (s[1] & 0x0F) * 10 + ((s[1] >> 4) & 0x0F); + int dy = (s[2] & 0x0F) * 10 + ((s[2] >> 4) & 0x0F); + int hr = (s[3] & 0x0F) * 10 + ((s[3] >> 4) & 0x0F); + int mn = (s[4] & 0x0F) * 10 + ((s[4] >> 4) & 0x0F); + int sc = (s[5] & 0x0F) * 10 + ((s[5] >> 4) & 0x0F); + /* s[6] = timezone in quarters of an hour, sign bit at 0x08 of low nibble. */ + + if (mo < 1 || mo > 12 || dy < 1 || dy > 31 || hr > 23 || mn > 59 || sc > 59) { + return 0; + } + + struct tm t; + memset(&t, 0, sizeof(t)); + t.tm_year = 100 + yr; /* year is 00-99, assumed 2000s */ + t.tm_mon = mo - 1; + t.tm_mday = dy; + t.tm_hour = hr; + t.tm_min = mn; + t.tm_sec = sc; + return timegm(&t); +} + +pdu_err_t pdu_decode(const char *tpdu_hex, pdu_decoded_t *out, char *body_out, size_t body_cap) { + if (!tpdu_hex || !out || !body_out || body_cap < 2) { + return PDU_ERR_NULL_ARG; + } + memset(out, 0, sizeof(*out)); + body_out[0] = '\0'; + + pdu_buf_t buf; + pdu_err_t err = hex_to_buf(tpdu_hex, &buf); + if (err != PDU_OK) { + return err; + } + + size_t pos = 0; + size_t remaining = buf.len; + + /* SMSC length prefix. 0 = no SMSC. */ + if (remaining < 1) + return PDU_ERR_TRUNCATED; + uint8_t smsc_len = buf.octets[pos++]; + remaining--; + if (smsc_len > remaining) { + return PDU_ERR_BAD_LENGTH; + } + pos += smsc_len; + remaining -= smsc_len; + + /* First octet: MTI + flags */ + if (remaining < 1) + return PDU_ERR_TRUNCATED; + uint8_t first = buf.octets[pos++]; + remaining--; + bool udhi = (first & TP_UDHI_FLAG) != 0; + /* Accept MTI=0 (DELIVER). Other MTIs (STATUS REPORT etc.) not handled. */ + if ((first & TP_MTI_MASK) != TP_MTI_DELIVER) { + /* Not a fatal decode failure — just nothing useful to extract. */ + return PDU_ERR_UNSUPPORTED_DCS; + } + + /* TP-OA */ + if (remaining < 1) + return PDU_ERR_TRUNCATED; + uint8_t oa_semi = buf.octets[pos++]; + remaining--; + if (remaining < 1) + return PDU_ERR_TRUNCATED; + uint8_t oa_toa = buf.octets[pos++]; + remaining--; + + int oa_bytes = (oa_semi + 1) / 2; + if (oa_semi > 20 || (size_t)oa_bytes > remaining) { + return PDU_ERR_BAD_ADDRESS; + } + err = decode_address_bcd(buf.octets + pos, remaining, oa_toa, (int)oa_semi, out->sender, + sizeof(out->sender)); + if (err != PDU_OK) { + return err; + } + pos += (size_t)oa_bytes; + remaining -= (size_t)oa_bytes; + + /* TP-PID */ + if (remaining < 1) + return PDU_ERR_TRUNCATED; + pos++; + remaining--; + + /* TP-DCS */ + if (remaining < 1) + return PDU_ERR_TRUNCATED; + uint8_t dcs = buf.octets[pos++]; + remaining--; + bool is_ucs2 = ((dcs & 0x0C) == 0x08); + out->is_ucs2 = is_ucs2; + + /* TP-SCTS (7 octets) */ + if (remaining < 7) + return PDU_ERR_TRUNCATED; + out->scts = parse_scts(buf.octets + pos); + pos += 7; + remaining -= 7; + + /* TP-UDL */ + if (remaining < 1) + return PDU_ERR_TRUNCATED; + uint8_t udl = buf.octets[pos++]; + remaining--; + + /* For UCS2, UDL is in octets. For GSM7, UDL is in septets. v1 decodes + * UCS2 bodies only but still extracts UDH for inbound-from-ASCII senders + * that a receiving device encoded as GSM7 — best effort: skip the body. */ + if (!is_ucs2) { + /* Walk UDH if present so we still capture concat metadata, then bail + * on the body with PDU_ERR_UNSUPPORTED_DCS. Callers can still ack + * and delete the message. */ + } + + size_t ud_len = remaining; + if (is_ucs2) { + if (udl > ud_len) { + return PDU_ERR_BAD_LENGTH; + } + ud_len = udl; + } else { + /* GSM7 len in septets → bytes = ceil(udl*7/8). */ + size_t gsm7_bytes = ((size_t)udl * 7 + 7) / 8; + if (gsm7_bytes > ud_len) { + return PDU_ERR_BAD_LENGTH; + } + ud_len = gsm7_bytes; + } + + size_t body_off = 0; + if (udhi) { + if (ud_len < 1) { + return PDU_ERR_BAD_UDH; + } + uint8_t udhl = buf.octets[pos]; + /* udhl is length of UDH *not including itself*, so total UDH bytes + * = udhl + 1. Must not exceed user data. */ + if ((size_t)udhl + 1 > ud_len) { + return PDU_ERR_BAD_UDH; + } + size_t ie_off = pos + 1; + size_t ie_end = pos + 1 + udhl; + bool saw_concat = false; + while (ie_off + 2 <= ie_end) { + uint8_t iei = buf.octets[ie_off]; + uint8_t iel = buf.octets[ie_off + 1]; + if (ie_off + 2 + iel > ie_end) { + return PDU_ERR_BAD_UDH; + } + if (iei == 0x00 && iel == 3) { + /* A second concat IE in one PDU is a spec violation and a + * slot-confusion vector. Reject. */ + if (saw_concat) { + return PDU_ERR_BAD_UDH; + } + saw_concat = true; + out->has_udh = true; + out->udh_ref_id = buf.octets[ie_off + 2]; + out->udh_total = buf.octets[ie_off + 3]; + out->udh_seq = buf.octets[ie_off + 4]; + if (out->udh_total == 0 || out->udh_seq == 0 || out->udh_seq > out->udh_total) { + return PDU_ERR_BAD_UDH; + } + } else if (iei == 0x08 && iel == 4) { + if (saw_concat) { + return PDU_ERR_BAD_UDH; + } + saw_concat = true; + /* 16-bit ref concat — use the low 8 bits so reassembly slots + * don't need a separate key space. Collisions across 256-space + * wrap are rare and resolve via the 10-min TTL. */ + out->has_udh = true; + out->udh_ref_id = buf.octets[ie_off + 3]; + out->udh_total = buf.octets[ie_off + 4]; + out->udh_seq = buf.octets[ie_off + 5]; + if (out->udh_total == 0 || out->udh_seq == 0 || out->udh_seq > out->udh_total) { + return PDU_ERR_BAD_UDH; + } + } + ie_off += 2 + iel; + } + body_off = (size_t)udhl + 1; + } + + if (!is_ucs2) { + /* GSM 7-bit default alphabet decode (DCS=0x00, the default for plain + * ASCII traffic from most handsets). UDH, if present, occupies the + * first ceil((udhl+1)*8 / 7) virtual septets of the user data — with + * up to 6 fill bits between the UDH's last byte and the first body + * septet so the body lands on a septet boundary. */ + const uint8_t *ud = buf.octets + pos; + int udh_septets = udhi ? (int)(((body_off * 8) + 6) / 7) : 0; + int body_septets = (int)udl - udh_septets; + if (body_septets < 0) { + body_septets = 0; + } + size_t body_start_bit = (size_t)udh_septets * 7; + + /* Buffer sized for any single-PDU GSM7 payload (UDL ≤ 255 septets). */ + uint32_t cps[256]; + int n_cps = unpack_gsm7_septets(ud, ud_len, body_start_bit, body_septets, cps, + (int)(sizeof(cps) / sizeof(cps[0]))); + + int out_pos = 0; + for (int k = 0; k < n_cps; k++) { + uint32_t cp = cps[k]; + if (!sanitize_ucs2_cp(&cp)) { + continue; + } + int np = append_utf8(body_out, body_cap, out_pos, cp); + if (np < 0) { + break; + } + out_pos = np; + } + body_out[out_pos] = '\0'; + out->body_len = (size_t)out_pos; + return PDU_OK; + } + + /* UCS2 body walk */ + const uint8_t *body = buf.octets + pos + body_off; + size_t body_octets = ud_len - body_off; + if ((body_octets & 1) != 0) { + return PDU_ERR_BAD_LENGTH; + } + + int out_pos = 0; + for (size_t i = 0; i + 1 < body_octets; i += 2) { + uint16_t w1 = (uint16_t)((body[i] << 8) | body[i + 1]); + uint32_t cp; + if (is_high_surrogate(w1) && i + 3 < body_octets) { + uint16_t w2 = (uint16_t)((body[i + 2] << 8) | body[i + 3]); + if (is_low_surrogate(w2)) { + cp = 0x10000 + (((uint32_t)(w1 - 0xD800) << 10) | (w2 - 0xDC00)); + i += 2; + } else { + cp = 0xFFFD; + } + } else if (is_high_surrogate(w1) || is_low_surrogate(w1)) { + cp = 0xFFFD; + } else { + cp = w1; + } + + if (!sanitize_ucs2_cp(&cp)) { + continue; + } + int np = append_utf8(body_out, body_cap, out_pos, cp); + if (np < 0) { + /* Truncation — NUL-terminate what we have and warn. */ + break; + } + out_pos = np; + } + body_out[out_pos] = '\0'; + out->body_len = (size_t)out_pos; + + return PDU_OK; +} + +/* ── Error strings ───────────────────────────────────────────────────── */ + +const char *pdu_err_str(pdu_err_t err) { + switch (err) { + case PDU_OK: + return "OK"; + case PDU_ERR_NULL_ARG: + return "NULL_ARG"; + case PDU_ERR_BAD_HEX: + return "BAD_HEX"; + case PDU_ERR_TRUNCATED: + return "TRUNCATED"; + case PDU_ERR_BAD_LENGTH: + return "BAD_LENGTH"; + case PDU_ERR_BAD_UDH: + return "BAD_UDH"; + case PDU_ERR_SENDER_OVERFLOW: + return "SENDER_OVERFLOW"; + case PDU_ERR_BODY_TOO_LONG: + return "BODY_TOO_LONG"; + case PDU_ERR_BUFFER_TOO_SMALL: + return "BUFFER_TOO_SMALL"; + case PDU_ERR_UNSUPPORTED_DCS: + return "UNSUPPORTED_DCS"; + case PDU_ERR_BAD_ADDRESS: + return "BAD_ADDRESS"; + case PDU_ERR_INTERNAL: + return "INTERNAL"; + } + return "UNKNOWN"; +} diff --git a/src/sms_io.c b/src/sms_io.c new file mode 100644 index 0000000..babd290 --- /dev/null +++ b/src/sms_io.c @@ -0,0 +1,461 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * SMS send/dispatch implementation. See sms_io.h. + */ + +#define _POSIX_C_SOURCE 200809L + +#include "sms_io.h" + +#include +#include +#include +#include +#include + +#include "logging.h" +#include "mqtt_comms.h" +#include "pdu.h" +#include "sms.h" +#include "sms_reassembly.h" + +/* ── Shared reassembly body buffer. Capped so a single URC can't OOM, and + * sized large enough for max-concat UCS2 (10 * 67 chars * worst-case 4 + * UTF-8 bytes = 2680; 4 KiB covers it with headroom). ─────────────── */ +#define SMS_IO_REASM_BODY_MAX 4096 + +/* Emergency eviction watchdog: if reassembly fails with ERROR (not incomplete) + * we still want the modem slot cleared so inbox doesn't fill. Short timeout + * — CMGD on local storage is a sub-100ms op in practice. "Fire-and-forget" + * means the caller doesn't wait for completion before moving on; failures + * are still logged so a pinned inbox surfaces before it overflows. */ +static void cmgd_fire_and_forget(at_context_t *at, int sms_index) { + char cmd[32]; + snprintf(cmd, sizeof(cmd), "AT+CMGD=%d", sms_index); + at_response_t resp; + at_status_t rc = at_command_send(at, cmd, &resp, AT_TIMEOUT_SMS_STORAGE); + if (rc != AT_OK) { + OLOG_WARNING("CMGD idx %d failed: %s (inbox may fill)", sms_index, at_status_str(rc)); + } +} + +/* ── Outbound ────────────────────────────────────────────────────────── */ + +/* Legacy UCS2-hex text-mode send. Kept behind --legacy-sms until PDU mode + * is soaked on real hardware; deleting this function + the pdu_mode flag + * + the AT+CSMP init line in modem.c is all the removal work takes. */ +static sms_io_send_err_t send_text_mode(const sms_io_ctx_t *ctx, + const char *dest, + const char *clean, + int *segments_sent, + int *total_segments, + char *at_err_detail, + size_t at_err_cap) { + char hex_number[PHONE_NUMBER_HEX_MAX + 1]; + char hex_body[SMS_BODY_HEX_MAX + 1]; + if (sms_utf8_to_ucs2_hex(dest, hex_number, sizeof(hex_number)) < 0 || + sms_utf8_to_ucs2_hex(clean, hex_body, sizeof(hex_body)) < 0) { + return SMS_IO_SEND_ENCODE_ERROR; + } + at_response_t resp; + at_status_t rc = at_command_send_sms(ctx->at, hex_number, hex_body, &resp); + if (rc == AT_OK) { + if (segments_sent) { + *segments_sent = 1; + } + if (total_segments) { + *total_segments = 1; + } + return SMS_IO_SEND_OK; + } + if (at_err_detail && at_err_cap) { + snprintf(at_err_detail, at_err_cap, "%s", at_status_str(rc)); + } + return SMS_IO_SEND_AT_ERROR; +} + +static sms_io_send_err_t send_pdu_mode(const sms_io_ctx_t *ctx, + const char *dest, + const char *clean, + int *segments_sent, + int *total_segments, + char *at_err_detail, + size_t at_err_cap) { + bool ucs2 = pdu_needs_ucs2(clean); + int seg_count = pdu_segment_count(clean, ucs2); + if (seg_count <= 0 || seg_count > PDU_MAX_SEGMENTS) { + return SMS_IO_SEND_BODY_TOO_LONG; + } + + /* Segment budget separate from message budget — 200/hr default covers a + * healthy conversation pace without letting one user burn the whole + * carrier plan. Atomic take: partial debit is worse than either + * reject-and-retry or allow-the-whole-message. */ + if (ctx->segment_bucket && !rate_bucket_take_n(ctx->segment_bucket, seg_count)) { + return SMS_IO_SEND_SEGMENT_LIMITED; + } + + pdu_segment_t segs[PDU_MAX_SEGMENTS]; + int n_segs = 0; + uint8_t ref_id = pdu_new_ref_id(); + pdu_err_t err = pdu_encode_submit(dest, clean, ref_id, ucs2, segs, PDU_MAX_SEGMENTS, &n_segs); + if (err != PDU_OK) { + OLOG_ERROR("PDU encode failed: %s", pdu_err_str(err)); + if (at_err_detail && at_err_cap) { + snprintf(at_err_detail, at_err_cap, "%s", pdu_err_str(err)); + } + return (err == PDU_ERR_BODY_TOO_LONG) ? SMS_IO_SEND_BODY_TOO_LONG : SMS_IO_SEND_ENCODE_ERROR; + } + if (total_segments) { + *total_segments = n_segs; + } + + for (int i = 0; i < n_segs; i++) { + at_response_t resp; + at_status_t rc = at_command_send_pdu(ctx->at, segs[i].tpdu_octets, segs[i].hex, &resp); + if (rc != AT_OK) { + OLOG_WARNING("PDU segment %d/%d failed: %s", i + 1, n_segs, at_status_str(rc)); + if (at_err_detail && at_err_cap) { + snprintf(at_err_detail, at_err_cap, "%s", at_status_str(rc)); + } + if (segments_sent) { + *segments_sent = i; + } + return (i == 0) ? SMS_IO_SEND_AT_ERROR : SMS_IO_SEND_PARTIAL_FAIL; + } + if (i + 1 < n_segs && ctx->inter_segment_delay_ms > 0) { + struct timespec delay; + delay.tv_sec = ctx->inter_segment_delay_ms / 1000; + delay.tv_nsec = (long)(ctx->inter_segment_delay_ms % 1000) * 1000000L; + nanosleep(&delay, NULL); + } + } + + if (segments_sent) { + *segments_sent = n_segs; + } + return SMS_IO_SEND_OK; +} + +sms_io_send_err_t sms_io_send(const sms_io_ctx_t *ctx, + const char *dest, + const char *utf8_body, + int *segments_sent, + int *total_segments, + char *at_err_detail, + size_t at_err_cap) { + if (segments_sent) { + *segments_sent = 0; + } + if (total_segments) { + *total_segments = 0; + } + if (at_err_detail && at_err_cap) { + at_err_detail[0] = '\0'; + } + if (!ctx || !ctx->at || !ctx->msg_bucket || !dest || !utf8_body) { + return SMS_IO_SEND_INVALID_NUMBER; + } + if (!sms_validate_number(dest)) { + return SMS_IO_SEND_INVALID_NUMBER; + } + + /* Validate body BEFORE debiting any bucket so a rejected request doesn't + * burn rate tokens. Empty bodies explicitly flagged as INVALID_BODY — + * without this, pdu_segment_count(clean)==0 would later be classified as + * BODY_TOO_LONG which is wrong and misleading for the caller. */ + char clean[SMS_BODY_MAX + 1]; + int clean_len = sms_sanitize_body(utf8_body, clean, sizeof(clean)); + if (clean_len < 0) { + return SMS_IO_SEND_INVALID_BODY; + } + if (clean_len == 0) { + return SMS_IO_SEND_INVALID_BODY; + } + + /* For PDU mode, check the segment budget first so a too-long or segment- + * rate-limited send doesn't consume a message-bucket token. send_pdu_mode + * still does its own segment-bucket debit; the pre-check here is purely + * for bucket accounting ordering. */ + if (ctx->pdu_mode) { + bool ucs2 = pdu_needs_ucs2(clean); + int seg_count = pdu_segment_count(clean, ucs2); + if (seg_count <= 0 || seg_count > PDU_MAX_SEGMENTS) { + return SMS_IO_SEND_BODY_TOO_LONG; + } + } + + if (!rate_bucket_allow(ctx->msg_bucket)) { + return SMS_IO_SEND_RATE_LIMITED; + } + + if (ctx->pdu_mode) { + return send_pdu_mode(ctx, dest, clean, segments_sent, total_segments, at_err_detail, + at_err_cap); + } + return send_text_mode(ctx, dest, clean, segments_sent, total_segments, at_err_detail, + at_err_cap); +} + +static int build_segment_data_json(char *buf, size_t cap, int sent, int tot) { + if (!buf || cap == 0) { + return -1; + } + struct json_object *obj = json_object_new_object(); + if (!obj) { + return -1; + } + json_object_object_add(obj, "segments_sent", json_object_new_int(sent)); + json_object_object_add(obj, "segments_total", json_object_new_int(tot)); + const char *s = json_object_to_json_string(obj); + int len = (int)strlen(s); + if ((size_t)len >= cap) { + json_object_put(obj); + return -1; + } + memcpy(buf, s, (size_t)len + 1); + json_object_put(obj); + return len; +} + +void sms_io_send_and_respond(const sms_io_ctx_t *ctx, + const char *dest, + const char *utf8_body, + const char *action, + const char *request_id) { + int sent = 0; + int total = 0; + char at_detail[64] = ""; + sms_io_send_err_t rc = sms_io_send(ctx, dest, utf8_body, &sent, &total, at_detail, + sizeof(at_detail)); + + /* Pre-initialize so a failed build (e.g., OOM from json-c) can't publish + * uninitialized stack bytes as the response data blob. */ + char data_buf[128] = ""; + const char *data_ptr = NULL; + if (build_segment_data_json(data_buf, sizeof(data_buf), sent, total) > 0) { + data_ptr = data_buf; + } + + switch (rc) { + case SMS_IO_SEND_OK: + mqtt_publish_response(action, request_id, true, NULL, NULL, NULL, data_ptr); + break; + case SMS_IO_SEND_INVALID_NUMBER: + mqtt_publish_response(action, request_id, false, NULL, "INVALID_NUMBER", + "Phone number validation failed", NULL); + break; + case SMS_IO_SEND_INVALID_BODY: + mqtt_publish_response(action, request_id, false, NULL, "INVALID_BODY", + "SMS body contains dangerous characters", NULL); + break; + case SMS_IO_SEND_ENCODE_ERROR: + mqtt_publish_response(action, request_id, false, NULL, "ENCODE_ERROR", + at_detail[0] ? at_detail : "Failed to encode SMS", NULL); + break; + case SMS_IO_SEND_BODY_TOO_LONG: + mqtt_publish_response(action, request_id, false, NULL, "BODY_TOO_LONG", + "SMS body exceeds maximum concatenated length", NULL); + break; + case SMS_IO_SEND_RATE_LIMITED: + mqtt_publish_response(action, request_id, false, NULL, "RATE_LIMITED", + "SMS rate limit exceeded", NULL); + break; + case SMS_IO_SEND_SEGMENT_LIMITED: + mqtt_publish_response(action, request_id, false, NULL, "SEGMENT_RATE_LIMITED", + "SMS segment rate limit exceeded", NULL); + break; + case SMS_IO_SEND_AT_ERROR: + mqtt_publish_response(action, request_id, false, NULL, + at_detail[0] ? at_detail : "AT_ERROR", "SMS send failed", NULL); + break; + case SMS_IO_SEND_PARTIAL_FAIL: + mqtt_publish_response(action, request_id, false, NULL, "PDU_PARTIAL_FAIL", + at_detail[0] ? at_detail : "Partial SMS send", data_ptr); + break; + } +} + +/* ── Inbound ─────────────────────────────────────────────────────────── */ + +/* Publish one reassembled message. Routes through the canonical OCP event + * builder so the device/msg_type/event/timestamp envelope stays consistent + * with every other event emission in the daemon. */ +static void publish_inbound(int sms_index, const char *sender, const char *body) { + struct json_object *extra = json_object_new_object(); + if (!extra) { + OLOG_ERROR("publish_inbound: json_object_new_object failed"); + return; + } + json_object_object_add(extra, "index", json_object_new_int(sms_index)); + json_object_object_add(extra, "sender", json_object_new_string(sender ? sender : "")); + json_object_object_add(extra, "body", json_object_new_string(body ? body : "")); + + char buf[SMS_IO_REASM_BODY_MAX + 512]; + if (mqtt_build_event_json("sms_received", extra, buf, sizeof(buf)) >= 0) { + mqtt_publish_event(buf); + } + json_object_put(extra); +} + +/* Extract the hex payload line from AT+CMGR output in PDU mode. In PDU mode + * CMGR returns: + * +CMGR: ,[],\r\n\r\n\r\nOK + * We already strip the trailing OK/terminator in the response accumulator, + * so the body lives on the line after the +CMGR header. */ +static const char *extract_pdu_hex(const char *cmgr_data) { + const char *header = strstr(cmgr_data, "+CMGR:"); + if (!header) { + return NULL; + } + const char *nl = strchr(header, '\n'); + if (!nl) { + return NULL; + } + return nl + 1; +} + +/* Decode text-mode CMGR output — the pre-existing UCS2-text-mode behavior. */ +static void handle_cmti_text_mode(at_context_t *at, int sms_index) { + char cmd[32]; + snprintf(cmd, sizeof(cmd), "AT+CMGR=%d", sms_index); + at_response_t resp; + at_status_t rc = at_command_send(at, cmd, &resp, AT_TIMEOUT_SMS_STORAGE); + if (rc != AT_OK) { + OLOG_WARNING("CMGR text-mode failed at idx %d: %s (deleting slot)", sms_index, + at_status_str(rc)); + cmgd_fire_and_forget(at, sms_index); + return; + } + + char sender_hex[PHONE_NUMBER_HEX_MAX + 1] = ""; + const char *body_hex = ""; + const char *cmgr = strstr(resp.data, "+CMGR:"); + if (cmgr) { + const char *q1 = strchr(cmgr, '"'); + if (q1 && (q1 = strchr(q1 + 1, '"')) && (q1 = strchr(q1 + 1, '"'))) { + const char *q2 = strchr(q1 + 1, '"'); + if (q2) { + size_t len = (size_t)(q2 - q1 - 1); + if (len > PHONE_NUMBER_HEX_MAX) { + len = PHONE_NUMBER_HEX_MAX; + } + memcpy(sender_hex, q1 + 1, len); + sender_hex[len] = '\0'; + } + } + const char *nl = strchr(cmgr, '\n'); + if (nl) { + body_hex = nl + 1; + /* Trim trailing CR/LF/space on the body line. */ + size_t blen = strlen(body_hex); + char *end = resp.data + (body_hex - resp.data) + blen; + while (end > body_hex && (end[-1] == '\n' || end[-1] == '\r' || end[-1] == ' ')) { + *(--end) = '\0'; + } + } + } + + char sender[PHONE_NUMBER_MAX + 1] = ""; + char body[SMS_BODY_MAX + 1] = ""; + sms_ucs2_hex_to_utf8(sender_hex, sender, sizeof(sender)); + sms_ucs2_hex_to_utf8(body_hex, body, sizeof(body)); + + publish_inbound(sms_index, sender, body); +} + +static void handle_cmti_reassemble(at_context_t *at, + int sms_index, + const pdu_decoded_t *dec, + const char *frag) { + char full[SMS_IO_REASM_BODY_MAX]; + size_t full_len = 0; + reasm_result_t r = sms_reassembly_push(dec->sender, dec->udh_ref_id, dec->udh_total, + dec->udh_seq, frag, dec->body_len, time(NULL), full, + sizeof(full), &full_len); + switch (r) { + case REASM_COMPLETE: + publish_inbound(sms_index, dec->sender, full); + cmgd_fire_and_forget(at, sms_index); + break; + case REASM_INCOMPLETE: + cmgd_fire_and_forget(at, sms_index); + break; + case REASM_REJECTED_DUP: + cmgd_fire_and_forget(at, sms_index); + break; + default: + OLOG_WARNING("Reassembly rejected idx %d: code=%d", sms_index, (int)r); + cmgd_fire_and_forget(at, sms_index); + break; + } +} + +static void handle_cmti_pdu_mode(at_context_t *at, int sms_index) { + char cmd[32]; + snprintf(cmd, sizeof(cmd), "AT+CMGR=%d", sms_index); + at_response_t resp; + at_status_t rc = at_command_send(at, cmd, &resp, AT_TIMEOUT_SMS_STORAGE); + if (rc != AT_OK) { + OLOG_WARNING("CMGR PDU failed at idx %d: %s (deleting slot)", sms_index, at_status_str(rc)); + cmgd_fire_and_forget(at, sms_index); + return; + } + + const char *pdu_hex = extract_pdu_hex(resp.data); + if (!pdu_hex || pdu_hex[0] == '\0') { + OLOG_WARNING("CMGR PDU idx %d: no hex payload in response", sms_index); + cmgd_fire_and_forget(at, sms_index); + return; + } + + pdu_decoded_t dec; + char frag[SMS_IO_REASM_BODY_MAX]; + pdu_err_t err = pdu_decode(pdu_hex, &dec, frag, sizeof(frag)); + if (err != PDU_OK) { + OLOG_WARNING("PDU decode idx %d failed: %s", sms_index, pdu_err_str(err)); + /* Delete anyway so the inbox doesn't fill with bad messages. */ + cmgd_fire_and_forget(at, sms_index); + return; + } + + if (!dec.has_udh) { + publish_inbound(sms_index, dec.sender, frag); + cmgd_fire_and_forget(at, sms_index); + return; + } + + /* Multi-segment reassembly runs in its own frame so the 4 KiB `full` + * buffer doesn't sit on the stack for the single-segment fast path. */ + handle_cmti_reassemble(at, sms_index, &dec, frag); +} + +void sms_io_handle_cmti(const sms_io_ctx_t *ctx, int sms_index) { + if (!ctx || !ctx->at) { + return; + } + OLOG_INFO("Reading SMS at index %d (%s)", sms_index, ctx->pdu_mode ? "PDU" : "text"); + if (ctx->pdu_mode) { + handle_cmti_pdu_mode(ctx->at, sms_index); + } else { + handle_cmti_text_mode(ctx->at, sms_index); + } +} diff --git a/src/sms_reassembly.c b/src/sms_reassembly.c new file mode 100644 index 0000000..cee4403 --- /dev/null +++ b/src/sms_reassembly.c @@ -0,0 +1,259 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * Bounded reassembly store for multi-segment inbound SMS. + */ + +#include "sms_reassembly.h" + +#include + +#include "echo.h" +#include "logging.h" + +/* ── State ───────────────────────────────────────────────────────────── */ + +typedef struct { + bool in_use; + /* PDU_SENDER_MAX (24) rather than PHONE_NUMBER_MAX+1 (21): pdu_decode() + * produces up to '+' + 20 digits + NUL = 22 bytes, so a 21-byte field + * would truncate on max-length E.164 senders. Truncation would then make + * later fragments' sender key miss the slot, wasting slots and sometimes + * dropping reassembly entirely. Keep this in sync with pdu.h. */ + char sender[PDU_SENDER_MAX]; + uint8_t ref_id; + uint8_t total; + uint16_t received_mask; /* bit (seq-1) set when that fragment has arrived */ + char fragments[PDU_MAX_SEGMENTS][REASSEMBLY_FRAG_BUF_SIZE]; + size_t frag_len[PDU_MAX_SEGMENTS]; + time_t first_seen; +} reasm_slot_t; + +static reasm_slot_t g_slots[REASSEMBLY_SLOTS]; +static sms_reassembly_stats_t g_stats; + +/* ── Helpers ─────────────────────────────────────────────────────────── */ + +static void clear_slot(reasm_slot_t *slot) { + memset(slot, 0, sizeof(*slot)); +} + +static reasm_slot_t *find_slot(const char *sender, uint8_t ref_id) { + for (int i = 0; i < REASSEMBLY_SLOTS; i++) { + if (g_slots[i].in_use && g_slots[i].ref_id == ref_id && + strcmp(g_slots[i].sender, sender) == 0) { + return &g_slots[i]; + } + } + return NULL; +} + +static int count_slots_for_sender(const char *sender) { + int n = 0; + for (int i = 0; i < REASSEMBLY_SLOTS; i++) { + if (g_slots[i].in_use && strcmp(g_slots[i].sender, sender) == 0) { + n++; + } + } + return n; +} + +static int popcount_u16(uint16_t x) { + int n = 0; + while (x) { + n += (int)(x & 1u); + x >>= 1; + } + return n; +} + +static reasm_slot_t *claim_slot(time_t now) { + /* Free slot first. */ + for (int i = 0; i < REASSEMBLY_SLOTS; i++) { + if (!g_slots[i].in_use) { + clear_slot(&g_slots[i]); + g_slots[i].in_use = true; + g_slots[i].first_seen = now; + return &g_slots[i]; + } + } + /* All slots busy — evict the slot with the fewest received fragments, + * tie-break by oldest first_seen. Naive LRU-by-first_seen would let an + * attacker pushing fresh fragments under new ref_ids evict honest + * users' mid-message slots. Preferring low-fragment-count slots keeps + * the victim-friendly bias. */ + int best = 0; + int best_frags = popcount_u16(g_slots[0].received_mask); + for (int i = 1; i < REASSEMBLY_SLOTS; i++) { + int frags = popcount_u16(g_slots[i].received_mask); + if (frags < best_frags || + (frags == best_frags && g_slots[i].first_seen < g_slots[best].first_seen)) { + best = i; + best_frags = frags; + } + } + OLOG_WARNING("SMS reassembly exhausted — evicting slot for sender=%s ref=%u seq_mask=0x%x", + g_slots[best].sender, g_slots[best].ref_id, g_slots[best].received_mask); + g_stats.total_dropped_exhaustion++; + clear_slot(&g_slots[best]); + g_slots[best].in_use = true; + g_slots[best].first_seen = now; + return &g_slots[best]; +} + +static void update_in_use_count(void) { + uint32_t n = 0; + for (int i = 0; i < REASSEMBLY_SLOTS; i++) { + if (g_slots[i].in_use) { + n++; + } + } + g_stats.slots_in_use = n; +} + +/* ── Public API ──────────────────────────────────────────────────────── */ + +int sms_reassembly_sweep(time_t now) { + int evicted = 0; + for (int i = 0; i < REASSEMBLY_SLOTS; i++) { + if (g_slots[i].in_use && (now - g_slots[i].first_seen) > REASSEMBLY_TIMEOUT_SEC) { + OLOG_WARNING("SMS reassembly timeout — sender=%s ref=%u seq_mask=0x%x age=%lds", + g_slots[i].sender, g_slots[i].ref_id, g_slots[i].received_mask, + (long)(now - g_slots[i].first_seen)); + clear_slot(&g_slots[i]); + g_stats.total_timed_out++; + evicted++; + } + } + update_in_use_count(); + return evicted; +} + +reasm_result_t sms_reassembly_push(const char *sender, + uint8_t ref_id, + uint8_t total, + uint8_t seq, + const char *fragment, + size_t frag_len, + time_t now, + char *out_body, + size_t out_cap, + size_t *out_len) { + if (!sender || !fragment) { + return REASM_ERROR; + } + if (total == 0 || total > PDU_MAX_SEGMENTS) { + return REASM_REJECTED_TOTAL; + } + if (seq == 0 || seq > total) { + return REASM_REJECTED_TOTAL; + } + if (frag_len >= REASSEMBLY_FRAG_BUF_SIZE) { + /* Decoded UTF-8 won't fit — bail before trashing slot state. */ + return REASM_ERROR; + } + + /* Sweep stale slots on every push — cheap (8 slots) and keeps the table + * self-cleaning without a separate scheduler. */ + sms_reassembly_sweep(now); + + reasm_slot_t *slot = find_slot(sender, ref_id); + if (slot) { + if (slot->total != total) { + /* Spec violation or spoofed segment — reject the new fragment + * without corrupting the existing slot. */ + OLOG_WARNING("SMS reassembly total mismatch — sender=%s ref=%u stored=%u incoming=%u", + sender, ref_id, slot->total, total); + return REASM_REJECTED_TOTAL; + } + uint16_t bit = (uint16_t)(1u << (seq - 1)); + if (slot->received_mask & bit) { + g_stats.total_duplicates++; + return REASM_REJECTED_DUP; + } + memcpy(slot->fragments[seq - 1], fragment, frag_len); + slot->fragments[seq - 1][frag_len] = '\0'; + slot->frag_len[seq - 1] = frag_len; + slot->received_mask |= bit; + } else { + if (count_slots_for_sender(sender) >= REASSEMBLY_PER_SENDER) { + g_stats.total_sender_cap_exceeded++; + OLOG_WARNING("SMS reassembly per-sender cap hit for %s (cap=%d)", sender, + REASSEMBLY_PER_SENDER); + return REASM_REJECTED_CAP; + } + slot = claim_slot(now); + snprintf(slot->sender, sizeof(slot->sender), "%s", sender); + slot->ref_id = ref_id; + slot->total = total; + slot->received_mask = (uint16_t)(1u << (seq - 1)); + memcpy(slot->fragments[seq - 1], fragment, frag_len); + slot->fragments[seq - 1][frag_len] = '\0'; + slot->frag_len[seq - 1] = frag_len; + update_in_use_count(); + } + + /* Complete when all (1..total) bits set. */ + uint16_t full = (total == 16) ? 0xFFFFu : (uint16_t)((1u << total) - 1u); + if ((slot->received_mask & full) != full) { + return REASM_INCOMPLETE; + } + + /* Concatenate in sequence order. Bail out clean if the output buffer + * can't hold the result — caller decides how to handle. */ + size_t total_len = 0; + for (int i = 0; i < total; i++) { + total_len += slot->frag_len[i]; + } + if (!out_body || out_cap == 0 || total_len + 1 > out_cap) { + OLOG_WARNING("SMS reassembly output buffer too small (need=%zu cap=%zu)", total_len + 1, + out_cap); + clear_slot(slot); + update_in_use_count(); + return REASM_ERROR; + } + + size_t pos = 0; + for (int i = 0; i < total; i++) { + memcpy(out_body + pos, slot->fragments[i], slot->frag_len[i]); + pos += slot->frag_len[i]; + } + out_body[pos] = '\0'; + if (out_len) { + *out_len = pos; + } + + clear_slot(slot); + update_in_use_count(); + g_stats.total_completed++; + return REASM_COMPLETE; +} + +void sms_reassembly_stats(sms_reassembly_stats_t *out) { + if (!out) { + return; + } + *out = g_stats; +} + +void sms_reassembly_reset(void) { + memset(g_slots, 0, sizeof(g_slots)); + memset(&g_stats, 0, sizeof(g_stats)); +} diff --git a/tests/test_mqtt_messages.c b/tests/test_mqtt_messages.c index 010137a..4a2885d 100644 --- a/tests/test_mqtt_messages.c +++ b/tests/test_mqtt_messages.c @@ -138,7 +138,7 @@ void test_event_json_null_type(void) { void test_response_json_success(void) { char buf[512]; - int len = mqtt_build_response_json("dial", "worker_0_42", true, NULL, NULL, NULL, buf, + int len = mqtt_build_response_json("dial", "worker_0_42", true, NULL, NULL, NULL, NULL, buf, sizeof(buf)); TEST_ASSERT_GREATER_THAN(0, len); @@ -153,7 +153,7 @@ void test_response_json_success(void) { void test_response_json_success_with_value(void) { char buf[512]; int len = mqtt_build_response_json("signal", "worker_0_48", true, "{\\\"signal_dbm\\\":-67}", - NULL, NULL, buf, sizeof(buf)); + NULL, NULL, NULL, buf, sizeof(buf)); TEST_ASSERT_GREATER_THAN(0, len); struct json_object *root = parse_and_check(buf); @@ -165,7 +165,7 @@ void test_response_json_success_with_value(void) { void test_response_json_error(void) { char buf[512]; int len = mqtt_build_response_json("dial", "worker_0_42", false, NULL, "NO_CARRIER", - "Call failed: no carrier", buf, sizeof(buf)); + "Call failed: no carrier", NULL, buf, sizeof(buf)); TEST_ASSERT_GREATER_THAN(0, len); struct json_object *root = parse_and_check(buf); diff --git a/tests/test_pdu.c b/tests/test_pdu.c new file mode 100644 index 0000000..e0b544b --- /dev/null +++ b/tests/test_pdu.c @@ -0,0 +1,334 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * PDU encode/decode unit tests + malformed-input rejection. + */ + +#include + +#include "pdu.h" +#include "unity.h" + +void setUp(void) { +} +void tearDown(void) { +} + +/* ── Heuristics ──────────────────────────────────────────────────────── */ + +void test_needs_ucs2_always_true_v1(void) { + TEST_ASSERT_TRUE(pdu_needs_ucs2("ASCII")); + TEST_ASSERT_TRUE(pdu_needs_ucs2("")); +} + +void test_ref_id_is_nonzero(void) { + /* Run a few times so a naive counter eventually rolls through 0. */ + for (int i = 0; i < 300; i++) { + uint8_t r = pdu_new_ref_id(); + TEST_ASSERT_NOT_EQUAL(0, r); + } +} + +void test_segment_count_empty(void) { + TEST_ASSERT_EQUAL_INT(0, pdu_segment_count("", true)); +} + +void test_segment_count_single(void) { + TEST_ASSERT_EQUAL_INT(1, pdu_segment_count("Hello", true)); +} + +void test_segment_count_boundary(void) { + /* 70 UCS2 chars fits in a single-segment PDU (no UDH). */ + char body[71]; + memset(body, 'A', 70); + body[70] = '\0'; + TEST_ASSERT_EQUAL_INT(1, pdu_segment_count(body, true)); + + /* 71 chars forces concat → 2 segments of 67 chars each. */ + char body2[72]; + memset(body2, 'A', 71); + body2[71] = '\0'; + TEST_ASSERT_EQUAL_INT(2, pdu_segment_count(body2, true)); +} + +void test_segment_count_three(void) { + /* 135 chars → 3 segments (67 + 67 + 1). */ + char body[136]; + memset(body, 'A', 135); + body[135] = '\0'; + TEST_ASSERT_EQUAL_INT(3, pdu_segment_count(body, true)); +} + +/* ── Encode ──────────────────────────────────────────────────────────── */ + +void test_encode_single_segment(void) { + pdu_segment_t segs[PDU_MAX_SEGMENTS]; + int n = 0; + pdu_err_t err = pdu_encode_submit("+15551234567", "Hi", 42, true, segs, PDU_MAX_SEGMENTS, &n); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + TEST_ASSERT_EQUAL_INT(1, n); + /* UDHI must NOT be set on single-segment. First TPDU octet is at + * offset 2 (after "00" SMSC prefix). Bits: MTI=01, VPF=10 → 0x11. */ + TEST_ASSERT_EQUAL_STRING_LEN("0011", segs[0].hex, 4); + TEST_ASSERT_GREATER_THAN(0, segs[0].tpdu_octets); +} + +void test_encode_sets_udhi_for_multi_segment(void) { + char body[200]; + memset(body, 'A', 100); /* 100 UCS2 units → 2 segments */ + body[100] = '\0'; + + pdu_segment_t segs[PDU_MAX_SEGMENTS]; + int n = 0; + pdu_err_t err = pdu_encode_submit("+15551234567", body, 0x5A, true, segs, PDU_MAX_SEGMENTS, &n); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + TEST_ASSERT_EQUAL_INT(2, n); + /* UDHI bit = 0x40 → first octet = 0x51 (MTI=01, VPF=10, UDHI=1) */ + TEST_ASSERT_EQUAL_STRING_LEN("0051", segs[0].hex, 4); + TEST_ASSERT_EQUAL_STRING_LEN("0051", segs[1].hex, 4); + + /* UDH concat is 05 00 03 . Preamble occupies 14 + * octets (MTI, MR, DA[1+1+6], PID, DCS, VP, UDL), so UDH starts at + * hex offset 2 + 14*2 = 30. */ + TEST_ASSERT_EQUAL_STRING_LEN("0500035A", segs[0].hex + 30, 8); + TEST_ASSERT_EQUAL_STRING_LEN("02", segs[0].hex + 38, 2); /* total=2 */ + TEST_ASSERT_EQUAL_STRING_LEN("01", segs[0].hex + 40, 2); /* seq=1 */ + TEST_ASSERT_EQUAL_STRING_LEN("02", segs[1].hex + 38, 2); + TEST_ASSERT_EQUAL_STRING_LEN("02", segs[1].hex + 40, 2); /* seq=2 */ +} + +void test_encode_bad_address_rejected(void) { + pdu_segment_t segs[PDU_MAX_SEGMENTS]; + int n = 0; + pdu_err_t err = pdu_encode_submit("not-a-number", "Hi", 1, true, segs, PDU_MAX_SEGMENTS, &n); + TEST_ASSERT_EQUAL_INT(PDU_ERR_BAD_ADDRESS, err); +} + +void test_encode_too_long_body_rejected(void) { + /* 11 segments worth of content = 11 * 67 = 737 chars, > PDU_MAX_SEGMENTS. */ + char body[760]; + memset(body, 'A', 750); + body[750] = '\0'; + pdu_segment_t segs[PDU_MAX_SEGMENTS]; + int n = 0; + pdu_err_t err = pdu_encode_submit("+15551234567", body, 1, true, segs, PDU_MAX_SEGMENTS, &n); + TEST_ASSERT_EQUAL_INT(PDU_ERR_BODY_TOO_LONG, err); +} + +/* ── Decode: malformed-input rejection ───────────────────────────────── */ + +void test_decode_odd_hex_rejected(void) { + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode("0123456", &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_ERR_BAD_HEX, err); +} + +void test_decode_non_hex_rejected(void) { + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode("00ZZ1234", &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_ERR_BAD_HEX, err); +} + +void test_decode_empty_rejected(void) { + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode("", &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_ERR_TRUNCATED, err); +} + +void test_decode_oversized_smsc_len_rejected(void) { + /* smsc_len = 0xFF but only 2 bytes follow */ + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode("FF0000", &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_ERR_BAD_LENGTH, err); +} + +/* ── Decode: round-trip ──────────────────────────────────────────────── */ + +/* Captured DELIVER PDU. Hand-built for the test: no SMSC (len=0), MTI=0, + * sender = +15551234567, DCS = UCS2 (0x08), SCTS = fixed, body = "Hi". + * 00 SMSC length = 0 (no override) + * 04 first octet: MTI=00 (DELIVER), no UDHI + * 0B TP-OA length = 11 semi-octets + * 91 TOA = international + * 51 55 21 43 65 F7 +15551234567 in nibble-swapped BCD + * 00 TP-PID + * 08 TP-DCS = UCS2 + * 26 04 21 50 00 00 00 TP-SCTS = 2026-04-12T05:00:00 UTC (placeholder) + * 04 TP-UDL = 4 octets + * 00 48 00 69 UCS2 "Hi" + */ +void test_decode_simple_ucs2(void) { + const char *hex = "00040B915155214365F70008260421500000000400480069"; + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode(hex, &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + TEST_ASSERT_EQUAL_STRING("+15551234567", out.sender); + TEST_ASSERT_FALSE(out.has_udh); + TEST_ASSERT_TRUE(out.is_ucs2); + TEST_ASSERT_EQUAL_STRING("Hi", body); + TEST_ASSERT_EQUAL_INT(2, out.body_len); +} + +/* UDH concat DELIVER PDU. Same shape, UDHI set, UDHL=05, IEI=00, IEL=03, + * ref=7A, total=2, seq=1, then UCS2 body "Hi". + * 00 44 0B 91 51 55 21 43 65 F7 00 08 26 04 21 50 00 00 00 + * 0A TP-UDL = 10 = UDH(6) + body(4) + * 05 00 03 7A 02 01 UDH concat IE + * 00 48 00 69 UCS2 "Hi" + */ +void test_decode_udh_concat(void) { + /* Fields: 00 44 0B 91 5155214365F7 00 08 26042150000000 0A 050003 7A 02 01 00480069 */ + const char *hex = "00440B915155214365F7000826042150000000" + "0A" + "0500037A020100480069"; + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode(hex, &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + TEST_ASSERT_EQUAL_STRING("+15551234567", out.sender); + TEST_ASSERT_TRUE(out.has_udh); + TEST_ASSERT_EQUAL_UINT8(0x7A, out.udh_ref_id); + TEST_ASSERT_EQUAL_UINT8(2, out.udh_total); + TEST_ASSERT_EQUAL_UINT8(1, out.udh_seq); + TEST_ASSERT_EQUAL_STRING("Hi", body); +} + +void test_decode_bad_udh_seq_rejected(void) { + /* UDHL=05, concat IE total=2, seq=5 → invalid. */ + const char *hex = "00440B915155214365F7000826042150000000" + "0A" + "0500037A020500480069"; + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode(hex, &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_ERR_BAD_UDH, err); +} + +void test_decode_sanitizes_zero_width_and_tag(void) { + /* UCS2 body "A\u200B B\uE0041C" where U+200B is ZWSP (dropped) and + * U+E0041 is a tag char (dropped). Expect "A BC". + * Surrogate encoding of U+E0041 = DB40 DC41. + * 0x0041 'A' 0x200B ZWSP 0x0020 ' ' 0x0042 'B' 0xDB40 0xDC41 0x0043 'C' + * = 7 UCS2 code units = 14 octets. */ + const char *hex = "00040B915155214365F7000826042150000000" + "0E" + "0041200B00200042DB40DC410043"; + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode(hex, &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + TEST_ASSERT_EQUAL_STRING("A BC", body); +} + +void test_decode_rejects_duplicate_concat_ie(void) { + /* UDH carries two IEI=0x00 concat IEs. Should reject as BAD_UDH. + * UDHL=0B (11 bytes): 05 00 03 AA 02 01 05 00 03 BB 02 01 + * Body: 00 48 = "H" (2 octets) + * TP-UDL = 11 + 1 + 2 = 14 = 0x0E */ + const char *hex = "00440B915155214365F7000826042150000000" + "0E" + "0B" + "050003AA0201" + "050003BB0201" + "0048"; + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode(hex, &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_ERR_BAD_UDH, err); +} + +void test_decode_sanitizes_nul_and_bidi(void) { + /* UCS2 body "A\u0000B\u202EC" → sanitized to "ABC" (NUL stripped, + * bidi U+202E dropped). 5 UCS2 units = 10 octets → UDL = 0x0A. */ + const char *hex = "00040B915155214365F7000826042150000000" + "0A" + "004100000042202E0043"; + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode(hex, &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + TEST_ASSERT_EQUAL_STRING("ABC", body); +} + +void test_decode_gsm7_simple(void) { + /* DELIVER PDU, DCS=0x00 (GSM7), body = "hello" (5 septets packed into + * 5 bytes: E8 32 9B FD 06). SCTS is a filler. Sender = +15551234567. + * Real-world test — this is exactly what an iPhone sending "hello" + * lands on the modem. */ + const char *hex = "00040B915155214365F700002604215000000005E8329BFD06"; + pdu_decoded_t out; + char body[64]; + pdu_err_t err = pdu_decode(hex, &out, body, sizeof(body)); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + TEST_ASSERT_EQUAL_STRING("+15551234567", out.sender); + TEST_ASSERT_FALSE(out.is_ucs2); + TEST_ASSERT_EQUAL_STRING("hello", body); +} + +void test_decode_buffer_too_small_graceful(void) { + /* Same body "Hi" but only 1-byte output — decoder must not overflow + * and should truncate cleanly. */ + const char *hex = "00040B915155214365F70008260421500000000400480069"; + pdu_decoded_t out; + char tiny[2]; /* room for NUL only */ + pdu_err_t err = pdu_decode(hex, &out, tiny, sizeof(tiny)); + TEST_ASSERT_EQUAL_INT(PDU_OK, err); + /* 'H' is 1 byte UTF-8, won't fit with NUL room guard → truncates. */ + TEST_ASSERT_LESS_OR_EQUAL_INT(1, out.body_len); +} + +/* ── Unity test runner ───────────────────────────────────────────────── */ + +int main(void) { + UNITY_BEGIN(); + + RUN_TEST(test_needs_ucs2_always_true_v1); + RUN_TEST(test_ref_id_is_nonzero); + RUN_TEST(test_segment_count_empty); + RUN_TEST(test_segment_count_single); + RUN_TEST(test_segment_count_boundary); + RUN_TEST(test_segment_count_three); + + RUN_TEST(test_encode_single_segment); + RUN_TEST(test_encode_sets_udhi_for_multi_segment); + RUN_TEST(test_encode_bad_address_rejected); + RUN_TEST(test_encode_too_long_body_rejected); + + RUN_TEST(test_decode_odd_hex_rejected); + RUN_TEST(test_decode_non_hex_rejected); + RUN_TEST(test_decode_empty_rejected); + RUN_TEST(test_decode_oversized_smsc_len_rejected); + + RUN_TEST(test_decode_simple_ucs2); + RUN_TEST(test_decode_udh_concat); + RUN_TEST(test_decode_bad_udh_seq_rejected); + RUN_TEST(test_decode_sanitizes_nul_and_bidi); + RUN_TEST(test_decode_sanitizes_zero_width_and_tag); + RUN_TEST(test_decode_rejects_duplicate_concat_ie); + RUN_TEST(test_decode_gsm7_simple); + RUN_TEST(test_decode_buffer_too_small_graceful); + + return UNITY_END(); +} diff --git a/tests/test_sms_reassembly.c b/tests/test_sms_reassembly.c new file mode 100644 index 0000000..266fc00 --- /dev/null +++ b/tests/test_sms_reassembly.c @@ -0,0 +1,176 @@ +/* + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + * + * By contributing to this project, you agree to license your contributions + * under the GPLv3 (or any later version) or any future licenses chosen by + * the project author(s). Contributions include any modifications, + * enhancements, or additions to the project. These contributions become + * part of the project and are adopted by the project author(s). + * + * Unit tests for the SMS reassembly store. + */ + +#include + +#include "sms_reassembly.h" +#include "unity.h" + +void setUp(void) { + sms_reassembly_reset(); +} +void tearDown(void) { +} + +/* ── Happy path ──────────────────────────────────────────────────────── */ + +void test_complete_in_order(void) { + char out[256]; + size_t out_len = 0; + reasm_result_t r; + + r = sms_reassembly_push("+15551234567", 10, 3, 1, "AAA", 3, 1000, out, sizeof(out), &out_len); + TEST_ASSERT_EQUAL_INT(REASM_INCOMPLETE, r); + r = sms_reassembly_push("+15551234567", 10, 3, 2, "BBB", 3, 1000, out, sizeof(out), &out_len); + TEST_ASSERT_EQUAL_INT(REASM_INCOMPLETE, r); + r = sms_reassembly_push("+15551234567", 10, 3, 3, "CCC", 3, 1000, out, sizeof(out), &out_len); + TEST_ASSERT_EQUAL_INT(REASM_COMPLETE, r); + TEST_ASSERT_EQUAL_STRING("AAABBBCCC", out); + TEST_ASSERT_EQUAL_INT(9, out_len); +} + +void test_complete_out_of_order(void) { + char out[256]; + size_t out_len = 0; + /* Arrive 3, 1, 2 — should still concatenate in seq order. */ + TEST_ASSERT_EQUAL_INT(REASM_INCOMPLETE, sms_reassembly_push("+15551234567", 20, 3, 3, "CCC", 3, + 1000, out, sizeof(out), &out_len)); + TEST_ASSERT_EQUAL_INT(REASM_INCOMPLETE, sms_reassembly_push("+15551234567", 20, 3, 1, "AAA", 3, + 1000, out, sizeof(out), &out_len)); + TEST_ASSERT_EQUAL_INT(REASM_COMPLETE, sms_reassembly_push("+15551234567", 20, 3, 2, "BBB", 3, + 1000, out, sizeof(out), &out_len)); + TEST_ASSERT_EQUAL_STRING("AAABBBCCC", out); +} + +/* ── Duplicate / mismatch / cap rejection ────────────────────────────── */ + +void test_duplicate_rejected(void) { + char out[64]; + size_t out_len = 0; + sms_reassembly_push("+1555", 30, 2, 1, "AA", 2, 1000, out, sizeof(out), &out_len); + reasm_result_t r = sms_reassembly_push("+1555", 30, 2, 1, "AA", 2, 1000, out, sizeof(out), + &out_len); + TEST_ASSERT_EQUAL_INT(REASM_REJECTED_DUP, r); + + sms_reassembly_stats_t s; + sms_reassembly_stats(&s); + TEST_ASSERT_EQUAL_UINT64(1, s.total_duplicates); +} + +void test_total_mismatch_rejected(void) { + char out[64]; + size_t out_len = 0; + sms_reassembly_push("+1555", 40, 3, 1, "AA", 2, 1000, out, sizeof(out), &out_len); + /* Same sender/ref, different total → reject without corrupting slot. */ + reasm_result_t r = sms_reassembly_push("+1555", 40, 5, 2, "BB", 2, 1000, out, sizeof(out), + &out_len); + TEST_ASSERT_EQUAL_INT(REASM_REJECTED_TOTAL, r); +} + +void test_sender_cap(void) { + char out[64]; + size_t out_len = 0; + /* Two concurrent messages from same sender, different ref_ids. */ + sms_reassembly_push("+1555", 1, 2, 1, "A", 1, 1000, out, sizeof(out), &out_len); + sms_reassembly_push("+1555", 2, 2, 1, "B", 1, 1000, out, sizeof(out), &out_len); + /* Third must be rejected. */ + reasm_result_t r = sms_reassembly_push("+1555", 3, 2, 1, "C", 1, 1000, out, sizeof(out), + &out_len); + TEST_ASSERT_EQUAL_INT(REASM_REJECTED_CAP, r); + + sms_reassembly_stats_t s; + sms_reassembly_stats(&s); + TEST_ASSERT_EQUAL_UINT64(1, s.total_sender_cap_exceeded); +} + +/* ── Eviction / timeout ──────────────────────────────────────────────── */ + +void test_lru_eviction_when_full(void) { + char out[64]; + size_t out_len = 0; + /* Fill all 8 slots with 4 different senders × 2 ref_ids each (per-sender + * cap is 2). Each slot gets exactly 1 fragment so none complete. */ + sms_reassembly_push("+sender1", 1, 2, 1, "A", 1, 1000, out, sizeof(out), &out_len); + sms_reassembly_push("+sender1", 2, 2, 1, "A", 1, 1001, out, sizeof(out), &out_len); + sms_reassembly_push("+sender2", 1, 2, 1, "A", 1, 1002, out, sizeof(out), &out_len); + sms_reassembly_push("+sender2", 2, 2, 1, "A", 1, 1003, out, sizeof(out), &out_len); + sms_reassembly_push("+sender3", 1, 2, 1, "A", 1, 1004, out, sizeof(out), &out_len); + sms_reassembly_push("+sender3", 2, 2, 1, "A", 1, 1005, out, sizeof(out), &out_len); + sms_reassembly_push("+sender4", 1, 2, 1, "A", 1, 1006, out, sizeof(out), &out_len); + sms_reassembly_push("+sender4", 2, 2, 1, "A", 1, 1007, out, sizeof(out), &out_len); + + sms_reassembly_stats_t s; + sms_reassembly_stats(&s); + TEST_ASSERT_EQUAL_UINT32(REASSEMBLY_SLOTS, s.slots_in_use); + + /* A brand new sender arrives — LRU eviction should free the oldest + * slot (sender1/ref=1 at ts=1000). */ + reasm_result_t r = sms_reassembly_push("+sender9", 7, 2, 1, "A", 1, 1100, out, sizeof(out), + &out_len); + TEST_ASSERT_EQUAL_INT(REASM_INCOMPLETE, r); + + sms_reassembly_stats(&s); + TEST_ASSERT_EQUAL_UINT64(1, s.total_dropped_exhaustion); +} + +void test_sweep_timeout(void) { + char out[64]; + size_t out_len = 0; + sms_reassembly_push("+1555", 1, 2, 1, "A", 1, 1000, out, sizeof(out), &out_len); + + int evicted = sms_reassembly_sweep(1000 + REASSEMBLY_TIMEOUT_SEC + 1); + TEST_ASSERT_EQUAL_INT(1, evicted); + + sms_reassembly_stats_t s; + sms_reassembly_stats(&s); + TEST_ASSERT_EQUAL_UINT64(1, s.total_timed_out); + TEST_ASSERT_EQUAL_UINT32(0, s.slots_in_use); +} + +void test_bad_total_rejected(void) { + char out[64]; + size_t out_len = 0; + /* total=0 is invalid per UDH spec. */ + reasm_result_t r = sms_reassembly_push("+1555", 1, 0, 1, "A", 1, 1000, out, sizeof(out), + &out_len); + TEST_ASSERT_EQUAL_INT(REASM_REJECTED_TOTAL, r); + + /* seq > total */ + r = sms_reassembly_push("+1555", 1, 2, 5, "A", 1, 1000, out, sizeof(out), &out_len); + TEST_ASSERT_EQUAL_INT(REASM_REJECTED_TOTAL, r); +} + +/* ── Unity test runner ───────────────────────────────────────────────── */ + +int main(void) { + UNITY_BEGIN(); + RUN_TEST(test_complete_in_order); + RUN_TEST(test_complete_out_of_order); + RUN_TEST(test_duplicate_rejected); + RUN_TEST(test_total_mismatch_rejected); + RUN_TEST(test_sender_cap); + RUN_TEST(test_lru_eviction_when_full); + RUN_TEST(test_sweep_timeout); + RUN_TEST(test_bad_total_rejected); + return UNITY_END(); +}