Skip to content

Development#21

Merged
Doble-2 merged 10 commits into
mainfrom
development
Jun 7, 2026
Merged

Development#21
Doble-2 merged 10 commits into
mainfrom
development

Conversation

@Doble-2

@Doble-2 Doble-2 commented Jun 7, 2026

Copy link
Copy Markdown
Owner

PR: developmentmain — v2.0.0

🤖 Agentic AI Mode (osint-d2 agent)

The flagship feature of v2.0.0: autonomous OSINT investigations powered by LLM function calling.

  • 5 agent tools: scan_username, scan_email, fetch_url, breach_check, generate_report
  • The AI decides what to investigate, discovers new leads, and pivots automatically
  • Generates a structured 6-dimension cognitive profile when evidence is sufficient
  • Works with DeepSeek, Groq, OpenRouter, HuggingFace, and any OpenAI-compatible provider
  • Available in the CLI and the interactive wizard
osint-d2 agent "torvalds" --breach-check --export-pdf --export-json -l es

🛡️ Trust Anchors

Define verified identity sources to automatically filter false positives.

osint-d2 agent "janedoe" --trust instagram:janedoe --trust email:jane@gmail.com
  • Compares discovered profiles against trusted sources
  • Automatically discards mismatched identities
  • Supports network:username and email:user@domain formats

🌐 New Scanners

  • Facebook — profile/page detection, name, avatar, bio, likes extraction
  • Instagram — public profile scraping (bio, followers, avatar via og:tags)

Both scanners use OG meta tag extraction with HTML entity unescaping for clean URLs.

🔒 ScrapingAnt Proxy Integration

  • Residential and datacenter proxy modes
  • Country-specific routing (e.g., --proxy-country us)
  • Auto-detected from .env configuration

📄 Premium PDF Dossier

  • Redesigned report template with dark theme and professional layout
  • Profile avatar on cover page (Instagram → Facebook → Telegram priority)
  • Fixed table overflow and page-break issues
  • Localized AI section titles across 5 languages (EN, ES, PT, AR, RU)
  • Suppressed noisy WeasyPrint/fontTools warnings in CLI output

🧙 Interactive Wizard Improvements

  • Agent mode integrated into the wizard flow
  • Trust anchor configuration via interactive prompts
  • Proxy settings (mode, country) in wizard
  • Streamlined CLI output: analysis-first, compact confirmed-only table

🧪 Testing & Quality

  • 143 tests passing (75 new tests for agent_engine, identity_pipeline, trust_anchor)
  • CI workflow with pytest
  • All ruff lint errors resolved (121 fixed)

🔧 UX & Polish

  • Agent CLI output no longer dumps the full AI summary in step logs
  • Confirmed profiles table is deduplicated and shows only YES results
  • AI analysis panel displayed before the table (it's the main deliverable)
  • Duplicate highlights detection (skips if already in summary)
  • Clean terminal output — no more fontTools/WeasyPrint noise

Commits (33)

Features

  • feat: add agentic AI mode (osint-d2 agent)
  • feat: add agent mode to wizard + README documentation
  • feat: add fetch_url agent tool + enrich HTML extraction
  • feat: add trust anchors for identity verification (--trust)
  • feat: add Instagram scanner (public profile scraping)
  • feat: add Facebook profile/page scanner
  • feat: add proxy, trust anchors to wizard + hunt command
  • feat: integrate ScrapingAnt proxy support (residential + datacenter)
  • feat: premium PDF dossier redesign
  • feat: show profile avatar on PDF cover page (Instagram priority)
  • feat: add pytest suite + CI workflow

Fixes

  • fix: unescape HTML entities in OG image URLs (avatar on PDF cover)
  • fix: stop dumping full AI summary twice in agent mode CLI
  • fix: AI sections used wrong titles — all showed 'Intelligence Summary'
  • fix: PDF layout overflow — table-layout:fixed, word-break, page-break-inside
  • fix: suppress WeasyPrint/fontTools warnings during PDF export
  • fix: wizard trust anchors + AI analysis error handling
  • fix: support email trust anchors + name contradiction detection
  • fix: force report generation when agent exhausts max_steps
  • fix: correct ScrapingAnt proxy URL format

UX

  • ux: agent mode — show analysis first, compact confirmed-only table
  • ux: fix duplicate highlights in panel + dedup confirmed profiles

Docs & Quality

  • docs: rewrite README + .env.example
  • docs: remove personal data from README examples
  • test: add 75 new tests for agent_engine, identity_pipeline, trust_anchor
  • fix: resolve 8 ruff lint errors in test files
  • style: fix all ruff lint errors (121 total)

angel added 10 commits June 7, 2026 11:09
…-inside

Root causes and fixes:
- Tables overflowing A4 page: added table-layout:fixed + colgroup
  widths on footprint tables (18/18/14/12/38% split)
- Long URLs breaking layout: truncated to 45 chars (was 55/60)
  + font-size:8px for URL columns
- AI-generated markdown tables: table-layout:fixed + font-size:9px
- Text not wrapping: overflow-wrap:break-word on body, td, p, code
- Table rows split across pages: page-break-inside:avoid on tbody tr
- Cards split across pages: page-break-inside:avoid on .card
- AI sections (multi-page): page-break-inside:auto override for
  .ai-section-content so large analyses flow naturally
Root cause: template used toc_entries[0].label for ALL 6 AI sections
(identity, geotemporal, psychological, technical, ideology, opsec),
which always resolved to '01 // Resumen de Inteligencia'.

Fixes:
- Added ai_section_* localized strings to all 5 languages (EN/ES/PT/AR/RU)
- Template now uses dedicated strings with proper fallbacks
- Hardcoded English titles (OCEAN Profile, Technical, etc.) now translate
- Added has_real_sections guard so 'intro' alone doesn't trigger the
  AI sections block (avoids rendering '# Perfil OSINT: ...' as a section)

Before: 🆔 Resumen de Inteligencia, 🌍 Resumen de Inteligencia, ...
After:  🆔 Identidad, 🌍 Geo-Temporal, 🧠 Perfil OCEAN, ...
The generate_report step was printing the ENTIRE summary (all 6 dimensions,
tables, highlights) as raw tool args in the step log, then showing the same
content again in the formatted AI Analysis panel below.

Now the step log shows a compact line:
  📋 Step 5/10: generate_report(confidence=0.92, highlights=5)

The full formatted analysis is only shown once in the final panel.
Before: 95-row table (80+ NOs) → then full AI analysis → redundant
After:  AI analysis panel (main deliverable) → compact confirmed-only
        table (12 rows) → '12 confirmed / 95 scanned (full table in PDF)'

The full scan table with all results is still available in the PDF export.
1. build_analysis_panel: skip appending 'Highlights:' bullets when
   the summary already contains a HIGHLIGHTS section (agent reports
   always embed them). Avoids showing 5 highlights twice.

2. Confirmed profiles table: deduplicate by (network, username) so
   profiles found by both scan_username and scan_email don't appear
   twice (e.g. kissmelymarcano on twitch, pinterest, x).
The 'fsSelection bit 5 (bold) and head table macStyle bit 0 (bold)
should match' messages from fontTools were polluting CLI output.
Now silenced via logging level + warnings filter.
_extract_identity_card now picks up avatar from any confirmed profile's
image_url field, not just GitHub/GitLab metadata. Priority:
  instagram > telegram > x/twitter > any other

Also extracts name and bio from Instagram profiles for the cover card.
The template already had the <img> element — it just never received data
from non-GitHub scans.
New FacebookScanner extracts from public pages:
- Profile existence (og:title heuristic)
- Display name (og:title)
- Avatar/profile picture (og:image)
- Bio/description (og:description)
- Page likes count
- Page vs profile detection (og:type)

Handles login redirects gracefully (marks as blocked).
Residential proxy recommended for reliable results.

Registered in identity_pipeline and report identity card extractor.
Instagram/Facebook og:image meta tags contain HTML-escaped URLs with
&amp; instead of &. WeasyPrint can't fetch these broken URLs, so the
avatar never appears on the PDF cover.

Fix at two levels:
1. Source: _extract_og_content() in instagram.py and facebook.py now
   uses html.unescape() on all extracted values.
2. Defense: _extract_identity_card() in report_exporter.py also
   unescapes the final avatar_url as a safety net.
@Doble-2 Doble-2 merged commit 041d004 into main Jun 7, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant