Audit/bug hunt#90
Merged
Merged
Conversation
Mirror the FileTransport overlapping-batch-flush fix for DatabaseTransport. write() fires flush() un-awaited once the batch crosses batchSize and the interval timer fires it too, so a synchronous burst (e.g. a NestJS bootstrap replaying buffered logs) triggered many overlapping flushes. The old isFlushing guard merely skipped the size-triggered flush, defeating backpressure and letting the batch grow unbounded; and any snapshot-then-clear race risked writing the same entries twice. Changes: - Serialize flushes through a single shared drain promise; detach the batch synchronously before awaiting so concurrent writes land in a fresh array and are written exactly once (no duplication). - Harden close() to drain on shutdown: retry transient flush failures and re-buffer entries between attempts so buffered logs are not lost on deploy (the 'last N seconds of logs lost' problem). Warn if anything truly remains. - Clear connectionPromise on failed connect so a transient startup error does not permanently wedge the transport. - Coerce getLogCount() to a number (pg/mysql return COUNT as a string). Adds regression tests: 500-entry un-awaited burst written exactly once, concurrent flush() de-duplication, failed-flush re-buffering, and close() draining all entries through a transient failure with zero loss. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
AnalyticsTransport is the shared base for Mixpanel, DataDog, Segment, and Google Analytics, so its overlapping-batch-flush bug affected all four. Same class of bug as the FileTransport/DatabaseTransport fixes: addToBatch() fires flush() un-awaited on every Nth entry, and the old flush() snapshotted [...this.batch] then cleared it only after the async send, so overlapping flushes re-sent the same entries — turning N logs into N² delivered events. Changes: - Serialize flushes through a single shared drain promise; detach the batch synchronously before awaiting sendBatch() so concurrent writes are never sent twice. - Harden close() to retry-drain on shutdown so a transient provider outage does not drop buffered logs (the 'last N seconds lost on deploy' problem). - flush() no longer swallows the timer; batchTimer cleared up front. Adds regression tests (600-entry burst delivered once, 4 concurrent flushes, failed-send re-buffering, close() drain-through-failure) and a runnable verification example (examples/verify-flush-no-duplication.ts) that prints live PASS/FAIL for each guarantee. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CloudWatch, GCP Cloud Logging, and Azure Monitor transports each buffered entries on an interval but (a) flushed only ONE batchSize chunk per flush() call, leaving a tail behind when the batch exceeded batchSize, and (b) had no close() method at all — so TransportManager.close() skipped them on shutdown and any buffered logs were silently lost on deploy (the exact 'last N seconds of logs lost' problem the library promises to avoid). These three use batch.splice(0, batchSize), which detaches synchronously, so they were never vulnerable to the N² duplication bug — the issue here is loss, not duplication. Changes (all three transports): - flush() now loops until the batch is empty, draining every chunk. - Add close(): clears the interval timer and retry-drains the batch (bounded to 3 attempts) so a transient cloud outage does not drop buffered logs. Adds regression tests (whole-batch drain + close-drain-through-failure for each provider) and extends examples/verify-flush-no-duplication.ts with a live cloud close() check (95 buffered logs all drained on shutdown). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tdown The per-transport averageWriteTime used (avg + sample) / 2, an exponential decay that over-weighted the most recent write so the reported average never matched the true mean. Replace with the standard incremental cumulative mean avg += (sample - avg) / n. Adds the first TransportManager test coverage: the corrected metric, the shutdown guarantee that batched logs are flushed through manager.close() (no loss on deploy), and documents the post-shutdown write early-return. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pattern-based redaction only ran against the payload object, so a secret
interpolated into the message itself (e.g. logger.info(`Auth: Bearer ${token}`))
bypassed redaction entirely and leaked in plaintext — a real gap for a library
whose headline feature is redacting secrets.
Add applyRedactionToString() (patterns only — path rules don't apply to a bare
string) and run it over the message in the logger hot path, gated behind the
existing _hasRedact fast-path flag so the no-redact common case stays
allocation-free.
Adds unit tests for applyRedactionToString and an end-to-end logger test
asserting the raw token never appears in output. Extends the verification
example with a live message-redaction check.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nals flushOnExit() registers one handler for both SIGTERM and SIGINT, so two signals (e.g. SIGINT after SIGTERM, or an impatient double Ctrl+C) ran the async flush+exit handler twice concurrently. The two sequences raced: the first process.exit(0) to fire could truncate the other's in-flight flush mid-write — the exact log loss this util exists to prevent. Add a shutdownInProgress re-entrancy guard so only the first signal runs the flush; later signals are ignored while it is in progress. Reset the flag in resetShutdownHandlers(). Adds regression tests: a second signal mid-shutdown does not re-flush loggers, and two concurrent signals call process.exit only once. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three lifecycle bugs in WorkerTransport: 1. No close() — the cleanup method was named shutdown(), but TransportManager calls transport.close(). So on graceful exit the worker was never told to flush+exit: its buffered logs were lost and the thread leaked. Rename to close() (shutdown() kept as a delegating alias) and forward any locally buffered pre-ready entries before sending the shutdown message. 2. flush() could hang forever — it resolved only on the worker's 'flushed' ack, so if the worker died/restarted mid-flush the promise never settled and blocked shutdown until the force-exit timer killed the process (losing everything). Time-box it. 3. Restart loop leaked / resurrected workers — the backoff setTimeout was never unref()'d (kept the process alive) and wasn't cancelled on close(), so a restart could spawn a fresh worker after shutdown. Track the timer, clear it in close(), unref() it, and add a flag so a late exit event no longer triggers a restart. Adds lifecycle tests with an injected fake worker (close posts shutdown + resolves on exit, flush times out instead of hanging, buffered entries are forwarded, shutdown() aliases close()). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The sampledTraces / droppedTraces sets are only cleared by resetStats(), which only runs when a stats timer is active (an onStats callback was provided). With traceConsistent sampling but no stats callback — a common config — they grew one entry per unique traceId for the life of the process: an unbounded memory leak in long-running services. Cap the sets at maxTrackedTraces (100k) and clear on overflow via a _rememberTrace helper. A re-seen trace after a clear just gets a fresh sampling decision, which is acceptable for sampling consistency. Adds tests asserting the set size never exceeds the cap under high trace cardinality and that sampling still decides correctly after an overflow clear. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
createTraceMiddleware unconditionally called res.setHeader(), which throws when the response object lacks it (Fastify replies use reply.header()) or when headers are already sent or the stream is closed — crashing the whole request for a middleware advertised for Express/NestJS (NestJS can run on Fastify). Guard the header write: prefer setHeader(), fall back to header(), skip when headersSent, and swallow any throw. The trace ID is still propagated via async context regardless, so observability is unaffected. Adds tests for Fastify-style reply.header(), a response with no header method, headersSent=true, and a setHeader that throws. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cle guard Two correctness bugs in the defensive serialization helpers: 1. safeToString() returned JSON.stringify(value) directly, but JSON.stringify returns undefined for some inputs (e.g. an object whose toJSON() returns undefined). The whole point of safeToString is that safeReplace can call .replace() on the result without crashing — returning undefined defeated it and threw 'Cannot read properties of undefined'. Fall back to the constructor-name tag when stringify yields undefined. 2. serializeError's serializeValue created a fresh WeakSet for Errors nested in plain props, so the circular-reference guard didn't span the whole tree — cross-referencing errors relied only on the depth limit. Thread the shared seen set through serializeValue and guard arrays/objects too. Adds a coerce.utils test suite (incl. the undefined-stringify and circular cases) and error cross-reference / self-reference cycle tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Kafka and WebSocket trace interceptors wrap next.handle() in a new Observable and subscribe to it inside ctx.run(), but never returned a teardown. When the outer subscriber unsubscribed (consumer shutdown, socket disconnect, upstream takeUntil/timeout), the inner handler subscription kept running — a subscription/memory leak that also let handler side effects continue after cancellation. Capture the inner subscription and return () => inner?.unsubscribe() from the Observable so cancellation propagates. Adds a Kafka interceptor test asserting the inner subscription is torn down when the outer is unsubscribed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ssages LogixiaLoggerService.formatMessage did JSON.stringify(message) for object messages, which throws on circular structures (crashing the whole log call — NestJS routinely logs request/response objects that can contain cycles) and returns undefined for some inputs. Delegate to safeToString (already imported), which is circular-safe and never returns a non-string. Adds a regression test logging a circular object as the message. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… files
Two file-rotation bugs:
1. compressFile() was an empty stub — rotation with compress: true silently did
nothing despite the documented 'gzip compression of rotated files'. Implement
it with a streamed zlib gzip to <file>.gz (no full-file buffering), removing
the original only after the compressed copy is written, and cleaning up a
partial .gz on failure.
2. cleanupOldFiles() matched files with startsWith(base), so with filename
app.log it also matched (and could delete) unrelated files like
application.log. Tighten the match to this transport's own rotated pattern:
(optionally .gz).
Adds rotation tests: the rotated file is gzip-compressed with no uncompressed
original left behind, and cleanup leaves a similarly-named unrelated file intact.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…hooks The plugin lifecycle hooks promised that thrown errors are swallowed so a buggy plugin can't crash logging, but only the ASYNC branch was guarded (.catch on the returned promise). A synchronous throw propagated to the caller: - onInit sync throw broke register()/use() - onError sync throw turned a recoverable transport failure into a crash (onError fires precisely during failure handling) - onShutdown sync throw rejected the Promise.all and broke graceful shutdown, risking log loss Wrap each synchronous hook call in try/catch. runOnLog already did this; this brings the other three to parity. Updates the stale 'sync throws ARE propagated' onInit test to assert the new (safe) behavior and adds sync-throw isolation tests for onError and onShutdown, including that a later well-behaved hook still runs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tters Both formatters called raw JSON.stringify on the payload, which throws 'Converting circular structure to JSON' on any cyclic object — and a logging library is routinely handed cyclic objects (Express req/res, DB connections, Mongoose docs). The throw escaped format() and crashed the whole log/transport path. - JsonFormatter: stringify with a circular-safe replacer that emits '[Circular]' for repeated object references (fresh per call — it holds per-serialization state). - TextFormatter: replace JSON.stringify(value) and the catch-fallback JSON.stringify(payload) (which threw AGAIN on the same cycle) with the circular-safe safeToString, keeping the CWE-117 control-char strip on object values too. Adds a formatters test suite covering circular payloads for both, error serialization, and the control-char strip. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
OTel bridge: getActiveOtelContext runs on every log call when the bridge is enabled. A broken or version-mismatched @opentelemetry/api whose context/trace methods throw would crash logging itself. Wrap the whole interaction in try/catch so it degrades to no-injection like the rest of the (intentionally silent) integration. Adds the first OTel tests (graceful degradation, never throws, init/disable state). Metrics: metric names (from user map keys) and label names were emitted unsanitized. An invalid name like 'my-bad.metric' or label 'status code' produces output Prometheus can't parse, which makes the scraper reject the ENTIRE /metrics endpoint — not just that one metric. Add sanitizePromName ([a-zA-Z_][a-zA-Z0-9_]*, leading digit prefixed with _) and apply it to metric and label names. Adds the first metrics tests (sanitization + counter/histogram/ gauge extraction + reset + pass-through). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+close
The Express HTTP logger registered onFinish on BOTH res.once('finish') and
res.once('close'). A normal response fires both events, so the 'request
completed' entry (and the slow-request warning) could be logged twice — the
'close' guard only checked statusCode === 0, which doesn't hold for a normal
finish+close pair on every Node version / response shape.
Add a flag so the completion is logged exactly once regardless of
which events fire, while 'close' still covers a client that disconnects before
'finish'.
Adds http-logger middleware tests: single completion on finish+close, request
start level, 5xx error level, header redaction, and skip predicate.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
calculateIndexSize() JSON.stringify'd EVERY entry in the index on each getIndexStats() call. With the default maxIndexSize of 1,000,000 that means serializing a million objects per call, blocking the event loop — pathological if stats are polled for a dashboard. Maintain the serialized size incrementally: trackInsert/trackDelete keep a per-id byte estimate and a running total, so getIndexStats() is O(1). Clearing and age/size-based eviction all update the total, keeping it consistent. Adds the first BasicLogIndexer tests: size stays consistent across add/remove/clear, field search (case-insensitive), and max-size eviction. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two BasicSearchEngine bugs: 1. getSearchableText built its text with raw JSON.stringify(log.payload), which throws on a cyclic payload (Express req/res, DB handle). Since it runs for every candidate log, one such stored entry crashed the ENTIRE search. Use the circular-safe safeToString. 2. addLogs did this.logs.push(...logs) with no cap — the buffer grew without bound (memory leak in any long-running process feeding the engine), and spreading a very large array risks a RangeError. Append via a loop and cap at maxLogs (default 100k, configurable) with FIFO eviction of the oldest. Adds the first BasicSearchEngine tests: text search, circular-payload search without crashing, trace-id correlation ordering, and the bounded buffer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
formatAsTable used (r[c] || '').toString(), so a real value of 0, false, or '' was treated as empty and the cell rendered blank. In a log-analysis CLI that silently hides numeric columns like count, statusCode, or duration when they are zero. Use a null/undefined check via a cellText helper so falsy values still render. Adds CLI utils tests: falsy-value rendering, empty rows, missing fields, and safeParseLogs JSON-line / fallback / blank-line handling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BrowserRemoteTransport.flush() spliced only one batchSize chunk, so a batch larger than batchSize (e.g. after a failed POST re-queued entries) left a tail unsent. And destroy() cleared the interval timer but never flushed, dropping buffered logs on page unload / teardown — the browser equivalent of losing the last N seconds of logs. Loop flush() until the batch is empty (splice is synchronous so no duplication) and flush() the remainder inside destroy(). Adds browser transport tests (whole-batch drain, destroy flush, failed-POST re-buffer, non-http url rejection) with a mocked fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two bugs in the NestJS TraceMiddleware / functional traceMiddleware: 1. res.setHeader() was called unconditionally. NestJS can run on Fastify (reply uses .header()), and on an already-sent/closed response setHeader throws — crashing the request. Add writeTraceHeader() which prefers setHeader, falls back to header(), skips when headersSent, and swallows throws. Trace context propagation is unaffected. 2. ip resolution used . req.connection is deprecated and can be undefined, so when req.ip was empty the access threw 'Cannot read properties of undefined (reading remoteAddress)'. Prefer req.socket and guard both with optional chaining. Adds traceMiddleware tests for Express setHeader, Fastify header() fallback, no-header-method response, and headersSent — which also cover the ip guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The LOGIXIA_SILENT_INTERNAL flag was captured into a module-level const at import time, so setting it AFTER the module was first imported (the common case for a test setup file) had no effect — the documented test-silencing was unreliable. Read process.env per call via isSilent() instead. Adds the first internal-log tests, including silencing applied after import and resuming when the flag is cleared. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d method @logmethod unconditionally replaced the method with an async function and awaited the original, so a SYNCHRONOUS method silently began returning a Promise — breaking any caller that consumed its value directly (const x = obj.syncMethod()). The decorator even documented 'works on sync methods' while changing their contract. Rewrite so the wrapper inspects the original return value: if it's a thenable, await it (async path — exit/error logs reflect the resolved outcome and the Promise type is preserved); otherwise treat it as synchronous, emit the exit log fire-and-forget, and return the value as-is. Entry logging is now fire-and-forget so a sync method is never forced to become async just to await it. Shared emit/emitError helpers keep transport failures routed to stderr. Adds the first @logmethod tests: sync returns a value (not a Promise), async returns a resolving Promise, entry/exit logging, and sync-throw / async-reject propagation with error logging. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
size-limit report 📦
|
The earlier commit 0e8f915 recorded the message and intent but landed an EMPTY diff — the staged changes were dropped (a pre-commit hook stripped them), so the applyRedactionToString implementation, its wiring into the logger hot path, and the tests lived only in the working tree. Tests still passed because jest runs the working tree, masking the gap. This commit lands the actual code so the redaction-in-message fix is in history, not just on disk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
🎉 This issue has been resolved in version 1.10.3. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
📋 Description
What does this PR do?
Why is this change needed?
Related Issues
🔄 Type of Change
🧪 Testing
Test Coverage
Test Details
Manual Testing
📖 Documentation
🔍 Code Quality
📦 Dependencies
🚀 Performance
🔒 Security
📱 Compatibility
🎃 Hacktoberfest
📸 Screenshots/Examples
Before
After
Code Example
🔗 Additional Context
Deployment Notes
Rollback Plan
Future Considerations
✅ Reviewer Checklist
📝 Notes for Reviewers
🙏 Acknowledgments
Thank you for contributing to Logixia! 🚀