feat: add source_type provenance flag to SecurityEvent by hesam-oxe · Pull Request #24 · OWASP/www-project-agent-memory-guard

hesam-oxe · 2026-05-10T10:55:48Z

Added source_type provenance flag to SecurityEvent and MemoryGuard.

Changes

events.py — Added SourceType enum with USER_INPUT, TOOL_OUTPUT,
MODEL_INFERENCE, SYSTEM, UNKNOWN
events.py — Added source_type field to SecurityEvent
guard.py — Added source_type parameter to MemoryGuard.write()
guard.py — All _emit calls now pass source_type

Source Types

USER_INPUT — direct from user
TOOL_OUTPUT — tool returned this
MODEL_INFERENCE — model derived from context
SYSTEM — config / startup / internal ops
UNKNOWN — default for back-compat

Backward Compatibility

Defaults to UNKNOWN — existing integrations continue to work.

Closes #13

Added SourceType enum and source_type field. Closes OWASP#13

vgudur-dev

Thanks for the contribution @hesam-oxe — structured provenance on SecurityEvent is the right direction. A few things to address before this is merge-ready:

Blocking

Rebase onto current main. This was opened against 22bbde0; 66faa7f (PR #25 — the classification system) has landed since and modifies the same _emit surface. GitHub already shows mergeable_state: dirty.
Reconcile with the classification work. After #25, MemoryGuard.write() already accepts cls: MemoryClass for entry provenance and a free-text source: str for the actor. Adding source_type: SourceType as a third parallel concept makes the API confusing. See the inline thread on events.py:26 and guard.py:116 — pick one shape and refactor.
No tests. New public enum, new keyword-only parameter on write(), new field in the SIEM-bound SecurityEvent.to_dict() — none of it is exercised by tests/. At minimum:
- write(..., source_type=SourceType.TOOL_OUTPUT) produces an event whose to_dict()["source_type"] == "tool_output".
- Omitting source_type defaults to "unknown" (back-compat).
- The detector-triggered paths (block / quarantine / redact / allow-with-findings) preserve the caller's source_type.
Semantic correctness on read/delete/rollback — see inline at guard.py:203. Hard-coding SourceType.SYSTEM on every non-write emit conflates "event provenance" with "event-emitter identity."

Non-blocking nits

Missing trailing newlines on both files (\ No newline at end of file in the diff).
def write(...) exceeds the project's 100-char ruff limit.
SourceType should be re-exported from agent_memory_guard.__init__ alongside Action / Severity.

Good problem to solve, but worth aligning with MemoryClass / Source before we expand the surface.

Generated by Claude Code

vgudur-dev · 2026-05-13T02:19:50Z

    QUARANTINE = "quarantine"


+class SourceType(str, Enum):


Heads up — this overlaps with concepts that landed in #25 (the classification system merged just after this PR was opened). After the merge, MemoryGuard.write() already takes a cls: MemoryClass argument that encodes provenance for the entry itself (TOOL_OBSERVATION, RETRIEVED_FACT, VERIFIED_PREFERENCE, POLICY, …), and there's a separate free-text source: str = "agent" kwarg that's stored in event metadata.

So once you rebase you'll have three parallel provenance handles:

field type purpose

source: str free text event metadata only

source_type: SourceType this PR event metadata only

cls: MemoryClass enum, attached to the entry provenance of the stored value

Worth reconciling before merging — otherwise the public API has two near-synonyms (source / source_type) plus a third orthogonal concept (cls). Two reasonable shapes:

Replace source: str with a structured Source(role=SourceRole.TOOL, principal="search_tool_v1") and drop source_type — the role is the provenance.

Keep source_type, rename source → source_principal for clarity, and document the relationship to MemoryClass in the docstring (event-emitter vs entry-content).

Either is fine; the current shape is the only one that's clearly wrong.

Generated by Claude Code

vgudur-dev · 2026-05-13T02:19:57Z

    key: str
    message: str
    operation: str = "write"
+    source_type: SourceType = SourceType.UNKNOWN


Defaulting to UNKNOWN is pragmatic for back-compat, but for a security-focused field it lets callers silently neglect provenance forever. Two follow-ups worth filing (not blocking this PR):

a one-shot debug-level log the first time the guard sees an UNKNOWN write per process, so operators notice unset provenance;

a strict-mode policy flag (Policy(require_source_type=True)) that rejects UNKNOWN writes when enabled.

Generated by Claude Code

vgudur-dev · 2026-05-13T02:20:04Z

        }
+
+
+__all__ = ["Action", "SecurityEvent", "Severity", "SourceType"]


Two nits on this line:

Missing trailing newline (\ No newline at end of file in the diff). Same for guard.py. Most linters and POSIX tools assume a final newline; please add one.

Once SourceType is exported here, also re-export it from the package root in src/agent_memory_guard/__init__.py so consumers can write from agent_memory_guard import SourceType rather than importing from a private path. Same pattern as the existing Action / Severity exports.

Generated by Claude Code

vgudur-dev · 2026-05-13T02:20:11Z

        return drifted

-    def write(self, key: str, value: Any, *, source: str = "agent") -> Action:
+    def write(self, key: str, value: Any, *, source: str = "agent", source_type: SourceType = SourceType.UNKNOWN) -> Action:


Two issues on this signature:

Line length. The project sets line-length = 100 in pyproject.toml ([tool.ruff]); this line is well over that. Please wrap:

def write( self, key: str, value: Any, *, source: str = "agent", source_type: SourceType = SourceType.UNKNOWN, ) -> Action:

Reconcile with cls= (added in feat(classification): provenance-based memory classes and promotion rules #25). After the rebase, write() will already accept cls: MemoryClass | str | None = None plus the existing source: str = "agent". Adding source_type: SourceType here makes three parameters that all describe "where this came from." See the larger discussion on events.py:26. Decision needed before this is merge-ready.

Generated by Claude Code

vgudur-dev · 2026-05-13T02:20:23Z

                key=key,
                message="Integrity verification failed on read",
                metadata={"expected": exc.expected, "actual": exc.actual},
+                source_type=SourceType.SYSTEM,


Semantic issue (applies to every source_type=SourceType.SYSTEM you set on a non-write code path — integrity failure on read, read block, read redact, read allow-with-findings, delete block, rollback).

SourceType is documented as "Provenance of a memory write — where the write came from." A read isn't a write, and an integrity failure on a read isn't sourced from "the system" — the offending data was sourced from whoever previously wrote it (or from out-of-band tampering, which is the whole reason we're flagging it). Forcing SYSTEM on these paths conflates two ideas:

where the event-relevant data originated (what the field is documented to mean), versus

who emitted this event (the guard itself, always).

Three ways to fix, pick one:

Leave reads/deletes/rollbacks as UNKNOWN — the field genuinely doesn't apply to those operations.

For integrity failures specifically, attach the recorded source_type of the prior write (you'd need to store it alongside the SHA-256 baseline in IntegrityRegistry — small but useful change). For the rest, default to UNKNOWN.

Rename the field to something like event_source and update the docstring to cover both semantics.

Whichever you pick, please also remove the hard-coded SourceType.SYSTEM from delete() and rollback() — neither is "system-sourced" in any useful sense.

Generated by Claude Code

hesam-oxe · 2026-05-13T07:58:39Z

@lefarcen Rebased on latest main — conflicts should be resolved.
This PR is ready to merge. LGTM was already given.

giskard09 · 2026-05-30T00:00:42Z

The source_type provenance flag is the right direction. One layer worth adding alongside it: external verifiability.

The current integrity model — SHA-256 baselines on immutable keys — closes the in-band tampering surface. It doesn't close the operator-trust gap: the baselines and SecurityEvents are produced and held by the same infrastructure. A regulator or auditor who needs to verify a memory write independently can't do it without access to the operator's system.

action_ref is an open spec that closes this gap. Each write generates a content-addressed receipt:

SHA-256(JCS({agent_id, action_type, scope, timestamp_ms}))

The receipt is independently verifiable — any third party confirms it without operator access. Integration shape for GuardedMemory (from the LlamaIndex thread where Gogani / Nobulex outlined this for ASI06):

action_ref = sha256(jcs({
    "agent_id": agent_id,
    "action_type": "memory_write",
    "scope": sha256(jcs({"content": message.content})),
    "timestamp_ms": int(time.time() * 1000)
}))
memory_event.metadata["action_ref"] = action_ref

The action_ref field sits in the metadata layer alongside source_type — it doesn't modify the memory content itself, so existing detectors are unaffected.

Spec + conformance fixtures (5 language implementations, byte-verified):
https://github.com/giskard09/argentum-core/blob/main/docs/spec/action-ref.md

Happy to help wire the integration if useful.

…newline, re-export SourceType

hesam-oxe · 2026-05-30T05:28:54Z

@vgudur-dev Really appreciate the thorough review — the overlap analysis
with MemoryClass from #25 is exactly the kind of architectural thinking
that makes this project solid. 🙏

Addressed the actionable items:
✅ Wrapped write() signature to fit 100-char limit
✅ Changed read/delete/rollback source_type from SYSTEM to UNKNOWN
(your point about "event provenance vs event-emitter identity" is well taken)
✅ Added trailing newlines
✅ Re-exported SourceType from package root

On the MemoryClass reconciliation — I agree there's overlap that should
be resolved. I'd lean toward your "keep source_type, rename source →
source_principal" suggestion, but happy to follow the maintainers'
preference. I'll open a follow-up issue to track that design discussion
so it doesn't block this PR.

The strict-mode policy flag idea is also worth a follow-up. 🚀

vgudur-dev · 2026-06-05T20:47:53Z

@hesam-oxe — heads up, this now has merge conflicts after we merged PR #39 (docstrings). Could you rebase against main and address the review comments from May 13? The core implementation looks good — just needs the conflict resolution and the minor fixes I flagged (newline at EOF, docstring for SourceType enum).

Happy to help if you run into issues with the rebase. This is a high-priority feature for v0.4.0.

feat: add source_type provenance flag to SecurityEvent

8bfca61

Added SourceType enum and source_type field. Closes OWASP#13

vgudur-dev requested changes May 13, 2026

View reviewed changes

hesam-oxe closed this May 13, 2026

hesam-oxe reopened this May 13, 2026

fix: address review - wrap signature, use UNKNOWN for non-write ops, …

d3ff862

…newline, re-export SourceType

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add source_type provenance flag to SecurityEvent#24

feat: add source_type provenance flag to SecurityEvent#24
hesam-oxe wants to merge 2 commits into
OWASP:mainfrom
hesam-oxe:feat/source-type-provenance

hesam-oxe commented May 10, 2026

Uh oh!

vgudur-dev left a comment

Uh oh!

vgudur-dev May 13, 2026

Uh oh!

vgudur-dev May 13, 2026

Uh oh!

vgudur-dev May 13, 2026

Uh oh!

vgudur-dev May 13, 2026

Uh oh!

vgudur-dev May 13, 2026

Uh oh!

hesam-oxe commented May 13, 2026

Uh oh!

giskard09 commented May 30, 2026

Uh oh!

hesam-oxe commented May 30, 2026

Uh oh!

vgudur-dev commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

field	type	purpose
`source: str`	free text	event metadata only
`source_type: SourceType`	this PR	event metadata only
`cls: MemoryClass`	enum, attached to the entry	provenance of the stored value

		}


		__all__ = ["Action", "SecurityEvent", "Severity", "SourceType"] No newline at end of file

Conversation

hesam-oxe commented May 10, 2026

Changes

Source Types

Backward Compatibility

Uh oh!

vgudur-dev left a comment

Choose a reason for hiding this comment

Blocking

Non-blocking nits

Uh oh!

vgudur-dev May 13, 2026

Choose a reason for hiding this comment

Uh oh!

vgudur-dev May 13, 2026

Choose a reason for hiding this comment

Uh oh!

vgudur-dev May 13, 2026

Choose a reason for hiding this comment

Uh oh!

vgudur-dev May 13, 2026

Choose a reason for hiding this comment

Uh oh!

vgudur-dev May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hesam-oxe commented May 13, 2026

Uh oh!

giskard09 commented May 30, 2026

Uh oh!

hesam-oxe commented May 30, 2026

Uh oh!

vgudur-dev commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants