Development#21
Merged
Merged
Conversation
added 10 commits
June 7, 2026 11:09
…-inside Root causes and fixes: - Tables overflowing A4 page: added table-layout:fixed + colgroup widths on footprint tables (18/18/14/12/38% split) - Long URLs breaking layout: truncated to 45 chars (was 55/60) + font-size:8px for URL columns - AI-generated markdown tables: table-layout:fixed + font-size:9px - Text not wrapping: overflow-wrap:break-word on body, td, p, code - Table rows split across pages: page-break-inside:avoid on tbody tr - Cards split across pages: page-break-inside:avoid on .card - AI sections (multi-page): page-break-inside:auto override for .ai-section-content so large analyses flow naturally
Root cause: template used toc_entries[0].label for ALL 6 AI sections (identity, geotemporal, psychological, technical, ideology, opsec), which always resolved to '01 // Resumen de Inteligencia'. Fixes: - Added ai_section_* localized strings to all 5 languages (EN/ES/PT/AR/RU) - Template now uses dedicated strings with proper fallbacks - Hardcoded English titles (OCEAN Profile, Technical, etc.) now translate - Added has_real_sections guard so 'intro' alone doesn't trigger the AI sections block (avoids rendering '# Perfil OSINT: ...' as a section) Before: 🆔 Resumen de Inteligencia, 🌍 Resumen de Inteligencia, ... After: 🆔 Identidad, 🌍 Geo-Temporal, 🧠 Perfil OCEAN, ...
The generate_report step was printing the ENTIRE summary (all 6 dimensions, tables, highlights) as raw tool args in the step log, then showing the same content again in the formatted AI Analysis panel below. Now the step log shows a compact line: 📋 Step 5/10: generate_report(confidence=0.92, highlights=5) The full formatted analysis is only shown once in the final panel.
Before: 95-row table (80+ NOs) → then full AI analysis → redundant
After: AI analysis panel (main deliverable) → compact confirmed-only
table (12 rows) → '12 confirmed / 95 scanned (full table in PDF)'
The full scan table with all results is still available in the PDF export.
1. build_analysis_panel: skip appending 'Highlights:' bullets when the summary already contains a HIGHLIGHTS section (agent reports always embed them). Avoids showing 5 highlights twice. 2. Confirmed profiles table: deduplicate by (network, username) so profiles found by both scan_username and scan_email don't appear twice (e.g. kissmelymarcano on twitch, pinterest, x).
The 'fsSelection bit 5 (bold) and head table macStyle bit 0 (bold) should match' messages from fontTools were polluting CLI output. Now silenced via logging level + warnings filter.
_extract_identity_card now picks up avatar from any confirmed profile's image_url field, not just GitHub/GitLab metadata. Priority: instagram > telegram > x/twitter > any other Also extracts name and bio from Instagram profiles for the cover card. The template already had the <img> element — it just never received data from non-GitHub scans.
New FacebookScanner extracts from public pages: - Profile existence (og:title heuristic) - Display name (og:title) - Avatar/profile picture (og:image) - Bio/description (og:description) - Page likes count - Page vs profile detection (og:type) Handles login redirects gracefully (marks as blocked). Residential proxy recommended for reliable results. Registered in identity_pipeline and report identity card extractor.
Instagram/Facebook og:image meta tags contain HTML-escaped URLs with & instead of &. WeasyPrint can't fetch these broken URLs, so the avatar never appears on the PDF cover. Fix at two levels: 1. Source: _extract_og_content() in instagram.py and facebook.py now uses html.unescape() on all extracted values. 2. Defense: _extract_identity_card() in report_exporter.py also unescapes the final avatar_url as a safety net.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR:
development→main— v2.0.0🤖 Agentic AI Mode (
osint-d2 agent)The flagship feature of v2.0.0: autonomous OSINT investigations powered by LLM function calling.
scan_username,scan_email,fetch_url,breach_check,generate_reportosint-d2 agent "torvalds" --breach-check --export-pdf --export-json -l es🛡️ Trust Anchors
Define verified identity sources to automatically filter false positives.
osint-d2 agent "janedoe" --trust instagram:janedoe --trust email:jane@gmail.com🌐 New Scanners
Both scanners use OG meta tag extraction with HTML entity unescaping for clean URLs.
🔒 ScrapingAnt Proxy Integration
--proxy-country us).envconfiguration📄 Premium PDF Dossier
🧙 Interactive Wizard Improvements
🧪 Testing & Quality
🔧 UX & Polish
Commits (33)
Features
feat: add agentic AI mode (osint-d2 agent)feat: add agent mode to wizard + README documentationfeat: add fetch_url agent tool + enrich HTML extractionfeat: add trust anchors for identity verification (--trust)feat: add Instagram scanner (public profile scraping)feat: add Facebook profile/page scannerfeat: add proxy, trust anchors to wizard + hunt commandfeat: integrate ScrapingAnt proxy support (residential + datacenter)feat: premium PDF dossier redesignfeat: show profile avatar on PDF cover page (Instagram priority)feat: add pytest suite + CI workflowFixes
fix: unescape HTML entities in OG image URLs (avatar on PDF cover)fix: stop dumping full AI summary twice in agent mode CLIfix: AI sections used wrong titles — all showed 'Intelligence Summary'fix: PDF layout overflow — table-layout:fixed, word-break, page-break-insidefix: suppress WeasyPrint/fontTools warnings during PDF exportfix: wizard trust anchors + AI analysis error handlingfix: support email trust anchors + name contradiction detectionfix: force report generation when agent exhausts max_stepsfix: correct ScrapingAnt proxy URL formatUX
ux: agent mode — show analysis first, compact confirmed-only tableux: fix duplicate highlights in panel + dedup confirmed profilesDocs & Quality
docs: rewrite README + .env.exampledocs: remove personal data from README examplestest: add 75 new tests for agent_engine, identity_pipeline, trust_anchorfix: resolve 8 ruff lint errors in test filesstyle: fix all ruff lint errors (121 total)