Overview
Add memory-efficient inspection tools for XML files to help agents avoid loading entire documents into context, similar to the CSV and JSON token-saving tools.
Motivation
XML files can be massive (thousands of elements, deeply nested). Agents often need to inspect structure and extract specific data without parsing entire documents, which wastes tokens and may exceed context windows.
Proposed Functions
High Priority - Inspection Tools
get_xml_structure - Get element hierarchy/schema without loading content
count_xml_elements - Count elements by tag name without loading content
get_xml_element_at_path - Extract specific element by XPath
get_xml_attributes - List attributes for element type without loading content
search_xml_tags - Find all paths containing tags matching pattern
High Priority - Selective Extraction
select_xml_elements - Get only specific elements by tag name
filter_xml_elements - Filter elements by attribute/text criteria (6 operators)
preview_xml_elements - Get first N elements of a specific type
slice_xml_elements - Get range of elements (pagination support)
Medium Priority - Analysis
get_xml_namespace_info - List all namespaces without loading content
get_xml_element_stats - Statistics for element type (count, attributes, depth)
validate_xml_structure_simple - Quick validation without full parse
Design Principles
- Google ADK compliant (JSON-serializable types, no defaults)
- @strands_tool decorator
- Memory-efficient (streaming/iterative parsing where possible)
- XPath support for path notation
- Consistent with CSV/JSON token-saving patterns
- Process data without loading entire documents
Related
Module
xml/parsing.py or new xml/inspection.py
Overview
Add memory-efficient inspection tools for XML files to help agents avoid loading entire documents into context, similar to the CSV and JSON token-saving tools.
Motivation
XML files can be massive (thousands of elements, deeply nested). Agents often need to inspect structure and extract specific data without parsing entire documents, which wastes tokens and may exceed context windows.
Proposed Functions
High Priority - Inspection Tools
get_xml_structure- Get element hierarchy/schema without loading contentcount_xml_elements- Count elements by tag name without loading contentget_xml_element_at_path- Extract specific element by XPathget_xml_attributes- List attributes for element type without loading contentsearch_xml_tags- Find all paths containing tags matching patternHigh Priority - Selective Extraction
select_xml_elements- Get only specific elements by tag namefilter_xml_elements- Filter elements by attribute/text criteria (6 operators)preview_xml_elements- Get first N elements of a specific typeslice_xml_elements- Get range of elements (pagination support)Medium Priority - Analysis
get_xml_namespace_info- List all namespaces without loading contentget_xml_element_stats- Statistics for element type (count, attributes, depth)validate_xml_structure_simple- Quick validation without full parseDesign Principles
Related
Module
xml/parsing.py or new xml/inspection.py