Skip to content

Fix #90: render identity-threat framing in persona/reasoning context#97

Closed
RandomOscillations wants to merge 2 commits intocodex/issue-89-think-vs-say-divergencefrom
codex/issue-90-identity-threat-framing
Closed

Fix #90: render identity-threat framing in persona/reasoning context#97
RandomOscillations wants to merge 2 commits intocodex/issue-89-think-vs-say-divergencefrom
codex/issue-90-identity-threat-framing

Conversation

@RandomOscillations
Copy link
Copy Markdown
Collaborator

Summary

  • extend ReasoningContext with identity_threat_summary
  • add deterministic identity-threat detection in engine context building using scenario text plus agent identity attributes (political, religious, race/ethnicity, gender/sexual identity, parental role, citizenship)
  • inject a dedicated Identity Relevance prompt section so agents can explicitly reason when an issue feels identity-relevant
  • add tests covering context construction and prompt inclusion

Testing

  • pytest -q tests/test_engine.py::TestTokenAccumulation::test_build_reasoning_context_adds_identity_threat_summary tests/test_reasoning_prompts.py::TestPhaseAPromptFeatures::test_identity_relevance_included
  • ruff check extropy/core/models/simulation.py extropy/simulation/engine.py extropy/simulation/reasoning.py tests/test_engine.py tests/test_reasoning_prompts.py

Closes #90

Copy link
Copy Markdown
Collaborator

@DeveshParagiri DeveshParagiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Verdict: ✅ Ready to merge

Summary

Implements deterministic identity-threat framing per architecture doc Tenet 10 and §Fix 3. Scenarios that threaten group identity now get an "## Identity Relevance" section in prompts.

Identity Dimensions Detected

  • Political orientation (liberal, conservative, republican, democrat)
  • Religious affiliation (church, mosque, temple, faith)
  • Race/ethnicity (racial, ethnic, minority, immigration)
  • Gender/sexual identity (LGBT, transgender, pronouns)
  • Parent/family role (children, school, curriculum, parental rights)
  • Citizenship (immigration, border, deportation)

Edge Cases Handled

Case Behavior
No identity relevance in scenario Returns None, no section
Agent with no identity attributes Returns None, no section
Sentinel values ("unknown", "none") Skipped
Future timeline events Excluded from corpus

Design Note

Keyword-based detection is simple but appropriate for deterministic, zero-API-cost detection. May need refinement if false positives become problematic in practice.

No changes required.

Copy link
Copy Markdown
Collaborator

@DeveshParagiri DeveshParagiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Verdict: ❌ Needs changes - hardcoded keywords are not general-purpose

Problem

The identity-threat detection uses hardcoded keyword lists:

if political_value and scenario_mentions(
    (
        "liberal", "conservative", "left", "right", "republican", "democrat",
        "politic", "ideolog", "culture war", "censorship", "book ban", "school board", " ban ",
    )
):

Issues:

  1. Not configurable - can't add/remove keywords without code changes
  2. Scenario-specific leakage - book ban, school board are clearly from the test scenario
  3. False positives - men/man will match management, manual, humanity, etc.
  4. Not extensible - new identity dimensions require code changes

Suggested Fix

Add an identity_dimensions field to the scenario spec that lets authors declare which identity aspects are threatened:

# scenario.v1.yaml
meta:
  name: "Library Book Removal"
  
identity_dimensions:
  - dimension: political_orientation
    reason: "The policy is framed along partisan lines"
  - dimension: parental_status  
    reason: "Parents are the primary stakeholders in school content decisions"
  - dimension: religious_affiliation
    reason: "Some removals are driven by religious concerns about content"

Then in _render_identity_threat_context():

def _render_identity_threat_context(self, agent: dict[str, Any], timestep: int) -> str | None:
    if not self.scenario.identity_dimensions:
        return None
    
    relevant = []
    for dim in self.scenario.identity_dimensions:
        agent_value = self._identity_value(agent, _IDENTITY_ATTR_KEYS.get(dim.dimension, ()))
        if agent_value:
            relevant.append(f"{dim.dimension} ({agent_value}): {dim.reason}")
    
    if not relevant:
        return None
        
    return (
        "This development can feel identity-relevant, not just practical. "
        f"Parts of who I am that may feel implicated: {'; '.join(relevant)}. "
        "If it feels personal, acknowledge that in both your internal reaction and what you choose to say publicly."
    )

This approach:

  • Puts scenario authors in control
  • Zero false positives (explicit declaration)
  • Extensible without code changes
  • Documents why each dimension is relevant (useful for prompt quality)

The _IDENTITY_ATTR_KEYS would be a simple mapping from dimension name to agent attribute keys - that's the only hardcoded part, and it's stable.

@DeveshParagiri DeveshParagiri force-pushed the codex/issue-89-think-vs-say-divergence branch from 26d1df3 to 1e680b2 Compare February 17, 2026 19:01
@DeveshParagiri DeveshParagiri deleted the branch codex/issue-89-think-vs-say-divergence February 17, 2026 19:01
@DeveshParagiri DeveshParagiri deleted the codex/issue-90-identity-threat-framing branch February 23, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants