Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@ import basic_open_agent_tools as boat
# Load tools by category
fs_tools = boat.load_all_filesystem_tools() # 18 functions
text_tools = boat.load_all_text_tools() # 10 functions
# data_tools = boat.load_all_data_tools() # Coming in Phase 1
data_tools = boat.load_all_data_tools() # 28 functions (Phase 1 ✅)

# Merge for agent use (automatically deduplicates)
agent_tools = boat.merge_tool_lists(fs_tools, text_tools)
agent_tools = boat.merge_tool_lists(fs_tools, text_tools, data_tools)


load_dotenv()
Expand Down Expand Up @@ -118,12 +118,17 @@ Text Processing Tools:
- Smart text splitting and sentence extraction
- HTML tag removal and Unicode normalization

### Data Tools 📋 (Planned - 5 Phases)
**Phase 1 (MVP)**: Data structures, JSON serialization, basic validation (21 functions)
**Phase 2**: CSV processing, object serialization (11 functions)
**Phase 3**: Configuration files (YAML/TOML/INI), data transformation (16 functions)
**Phase 4**: Binary data, archives, streaming (18 functions)
**Phase 5**: Caching, database processing (13 functions)
### Data Tools ✅ (28 functions - Phase 1 Complete)
**Phase 1 ✅**: Data structures, JSON/CSV processing, validation (28 functions)
- Data structure manipulation (flatten, merge, nested access)
- JSON serialization with compression and validation
- CSV file processing and data cleaning
- Schema validation and data type checking

**Phase 2 📋**: Object serialization, configuration files (15 functions)
**Phase 3 📋**: Data transformation, YAML/TOML support (16 functions)
**Phase 4 📋**: Binary data, archives, streaming (18 functions)
**Phase 5 📋**: Caching, database processing (13 functions)

### Future Modules 🚧
- **Network Tools** - HTTP utilities, API helpers
Expand Down
4 changes: 4 additions & 0 deletions src/basic_open_agent_tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
load_all_text_tools,
load_data_csv_tools,
load_data_json_tools,
load_data_structure_tools,
load_data_validation_tools,
merge_tool_lists,
)

Expand Down Expand Up @@ -49,6 +51,8 @@
"load_all_data_tools",
"load_data_json_tools",
"load_data_csv_tools",
"load_data_structure_tools",
"load_data_validation_tools",
"merge_tool_lists",
"get_tool_info",
"list_all_available_tools",
Expand Down
123 changes: 64 additions & 59 deletions src/basic_open_agent_tools/data/TODO.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,32 @@
# Data Tools TODO

## 🎉 Phase 1 Complete!
**Status**: ✅ 28 functions implemented across 4 modules
**Test Coverage**: 95%+ for new modules, 81% overall
**Quality**: 100% ruff compliance, mypy compatible

## Overview
Data structure utilities, validation, and serialization tools for AI agents.

## Required Infrastructure Updates

### Exception Classes (add to `exceptions.py`)
- [ ] `DataError(BasicAgentToolsError)` - Base exception for data operations
- [ ] `ValidationError(DataError)` - Data validation failures
- [ ] `SerializationError(DataError)` - Serialization/deserialization failures
- [x] `DataError(BasicAgentToolsError)` - Base exception for data operations
- [x] `ValidationError(DataError)` - Data validation failures
- [x] `SerializationError(DataError)` - Serialization/deserialization failures

### Type Definitions (add to `types.py`)
- [ ] `DataDict = Dict[str, Any]` - Standard data dictionary type
- [ ] `NestedData = Union[Dict, List, primitives]` - Nested data structure type
- [ ] `ValidationResult = Dict[str, Union[bool, str, List[str]]]` - Validation result type
- [x] `DataDict = Dict[str, Any]` - Standard data dictionary type
- [x] `NestedData = Union[Dict, List, primitives]` - Nested data structure type
- [x] `ValidationResult = Dict[str, Any]` - Validation result type

### Helper Functions (add to `helpers.py`)
- [ ] `load_all_data_tools()` - Load all data processing functions
- [ ] `load_data_structure_tools()` - Load data structure manipulation functions
- [ ] `load_data_validation_tools()` - Load validation functions
- [ ] `load_data_json_tools()` - Load JSON serialization functions
- [x] `load_all_data_tools()` - Load all data processing functions ✅
- [x] `load_data_structure_tools()` - Load data structure manipulation functions ✅
- [x] `load_data_validation_tools()` - Load validation functions ✅
- [x] `load_data_json_tools()` - Load JSON serialization functions ✅
- [x] `load_data_csv_tools()` - Load CSV processing functions ✅
- [ ] `load_data_object_tools()` - Load object serialization functions
- [ ] `load_data_csv_tools()` - Load CSV processing functions
- [ ] `load_data_config_tools()` - Load configuration file tools
- [ ] `load_data_transformation_tools()` - Load transformation functions
- [ ] `load_data_binary_tools()` - Load binary data handling functions
Expand All @@ -32,62 +37,62 @@ Data structure utilities, validation, and serialization tools for AI agents.

## Implementation Prioritization

### Phase 1: Foundation (MVP - Immediate Implementation)
### Phase 1: Foundation (MVP - COMPLETED ✅)
**Goal**: Core data manipulation for agent tools, zero external dependencies
**Timeline**: 2-3 weeks, 21 functions
**Status**: ✅ COMPLETE - 28 functions implemented
**Dependencies**: None (pure Python stdlib)

#### Infrastructure First
- [ ] Exception classes (`DataError`, `ValidationError`, `SerializationError`)
- [ ] Type definitions (`DataDict`, `NestedData`, `ValidationResult`)
#### Infrastructure
- [x] Exception classes (`DataError`, `ValidationError`, `SerializationError`)
- [x] Type definitions (`DataDict`, `NestedData`, `ValidationResult`)

#### Core Modules (implement in order)
1. [ ] **Data Structures** (`structures.py`) - 10 functions
#### Core Modules
1. [x] **Data Structures** (`structures.py`) - 10 functions
- Essential for all other modules, zero dependencies
- `flatten_dict(data, separator=".")` - Flatten nested dictionaries
- `unflatten_dict(data, separator=".")` - Reconstruct nested structure
- `get_nested_value(data, key_path, default=None)` - Safe nested access
- `set_nested_value(data, key_path, value)` - Immutable nested updates
- `merge_dicts(*dicts, deep=True)` - Deep merge multiple dictionaries
- `compare_data_structures(data1, data2, ignore_order=False)` - Compare structures
- `safe_get(data, key, default=None)` - Safe dictionary access
- `remove_empty_values(data, recursive=True)` - Clean empty values
- `extract_keys(data, key_pattern)` - Extract keys matching pattern
- `rename_keys(data, key_mapping)` - Rename dictionary keys

2. [ ] **JSON Serialization** (`json_serialization.py`) - 5 functions
- `flatten_dict(data, separator=".")` - Flatten nested dictionaries
- `unflatten_dict(data, separator=".")` - Reconstruct nested structure
- `get_nested_value(data, key_path, default=None)` - Safe nested access
- `set_nested_value(data, key_path, value)` - Immutable nested updates
- `merge_dicts(*dicts, deep=True)` - Deep merge multiple dictionaries
- `compare_data_structures(data1, data2, ignore_order=False)` - Compare structures
- `safe_get(data, key, default=None)` - Safe dictionary access
- `remove_empty_values(data, recursive=True)` - Clean empty values
- `extract_keys(data, key_pattern)` - Extract keys matching pattern
- `rename_keys(data, key_mapping)` - Rename dictionary keys

2. [x] **JSON Processing** (`json_tools.py`) - 5 functions
- Built into Python stdlib, critical for agent data exchange
- `safe_json_serialize(data, indent=None)` - JSON serialization with error handling
- `safe_json_deserialize(json_str)` - Safe JSON deserialization
- `validate_json_string(json_str)` - Validate JSON before parsing
- `compress_json_data(data)` - Compress JSON for storage/transmission
- `decompress_json_data(compressed_data)` - Decompress JSON data
- `safe_json_serialize(data, indent=None)` - JSON serialization with error handling
- `safe_json_deserialize(json_str)` - Safe JSON deserialization
- `validate_json_string(json_str)` - Validate JSON before parsing
- `compress_json_data(data)` - Compress JSON for storage/transmission
- `decompress_json_data(compressed_data)` - Decompress JSON data

3. [ ] **Basic Validation** (`validation.py`) - 6 functions
- Foundation for data integrity, supports other modules
- `validate_schema(data, schema)` - JSON Schema-style validation
- `check_required_fields(data, required)` - Ensure required fields exist
- `validate_data_types(data, type_map)` - Check field types match expectations
- `validate_range(value, min_val=None, max_val=None)` - Numeric range validation
- `aggregate_validation_errors(results)` - Combine multiple validation results
- `create_validation_report(data, rules)` - Generate detailed validation report

### Phase 2: File Format Support (High Impact)
**Goal**: Common file formats for agent workflows
**Timeline**: 1-2 weeks, 11 functions
**Dependencies**: None (CSV in stdlib)

4. [ ] **CSV Processing** (`csv_processing.py`) - 7 functions
3. [x] **CSV Processing** (`csv_tools.py`) - 7 functions ✅
- Extremely common for agent data tasks, high ROI
- `read_csv_file(file_path, delimiter=",", headers=True)` - Read CSV files
- `write_csv_file(data, file_path, delimiter=",", headers=True)` - Write CSV files
- `csv_to_dict_list(csv_data)` - Convert CSV to list of dictionaries
- `dict_list_to_csv(data)` - Convert dictionary list to CSV format
- `detect_csv_delimiter(file_path)` - Auto-detect CSV delimiter
- `validate_csv_structure(file_path, expected_columns)` - Validate CSV format
- `clean_csv_data(data, rules)` - Clean CSV data according to rules

5. [ ] **Object Serialization** (`object_serialization.py`) - 4 functions
- `read_csv_file(file_path, delimiter=",", headers=True)` - Read CSV files ✅
- `write_csv_file(data, file_path, delimiter=",", headers=True)` - Write CSV files ✅
- `csv_to_dict_list(csv_data)` - Convert CSV to list of dictionaries ✅
- `dict_list_to_csv(data)` - Convert dictionary list to CSV format ✅
- `detect_csv_delimiter(file_path)` - Auto-detect CSV delimiter ✅
- `validate_csv_structure(file_path, expected_columns)` - Validate CSV format ✅
- `clean_csv_data(data, rules)` - Clean CSV data according to rules ✅

4. [x] **Basic Validation** (`validation.py`) - 6 functions ✅
- Foundation for data integrity, supports other modules
- `validate_schema(data, schema)` - JSON Schema-style validation ✅
- `check_required_fields(data, required)` - Ensure required fields exist ✅
- `validate_data_types(data, type_map)` - Check field types match expectations ✅
- `validate_range(value, min_val=None, max_val=None)` - Numeric range validation ✅
- `aggregate_validation_errors(results)` - Combine multiple validation results ✅
- `create_validation_report(data, rules)` - Generate detailed validation report ✅

### Phase 2: Object Serialization & Advanced Processing (Next Priority)
**Goal**: Extended serialization and processing capabilities
**Timeline**: 1-2 weeks, 4 functions
**Dependencies**: None (pure Python stdlib)

1. [ ] **Object Serialization** (`object_serialization.py`) - 4 functions
- Pickle in stdlib, security-aware implementation
- `serialize_object(obj, method="pickle")` - Object serialization (pickle/json)
- `deserialize_object(data, method="pickle")` - Safe object deserialization
Expand Down
40 changes: 40 additions & 0 deletions src/basic_open_agent_tools/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@

This module provides data processing and manipulation tools organized into logical submodules:

- structures: Data structure manipulation and transformation
- json_tools: JSON serialization, compression, and validation
- csv_tools: CSV file processing, parsing, and cleaning
- validation: Data validation and schema checking
"""

from typing import List
Expand All @@ -25,9 +27,40 @@
safe_json_serialize,
validate_json_string,
)
from .structures import (
compare_data_structures,
extract_keys,
flatten_dict,
get_nested_value,
merge_dicts,
remove_empty_values,
rename_keys,
safe_get,
set_nested_value,
unflatten_dict,
)
from .validation import (
aggregate_validation_errors,
check_required_fields,
create_validation_report,
validate_data_types,
validate_range,
validate_schema,
)

# Re-export all functions at module level for convenience
__all__: List[str] = [
# Data structures
"flatten_dict",
"unflatten_dict",
"get_nested_value",
"set_nested_value",
"merge_dicts",
"compare_data_structures",
"safe_get",
"remove_empty_values",
"extract_keys",
"rename_keys",
# JSON processing
"safe_json_serialize",
"safe_json_deserialize",
Expand All @@ -42,4 +75,11 @@
"detect_csv_delimiter",
"validate_csv_structure",
"clean_csv_data",
# Validation
"validate_schema",
"check_required_fields",
"validate_data_types",
"validate_range",
"aggregate_validation_errors",
"create_validation_report",
]
2 changes: 1 addition & 1 deletion src/basic_open_agent_tools/data/csv_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,7 @@ def clean_csv_data(

for row in data:
if not isinstance(row, dict):
continue # Skip non-dictionary items
continue # type: ignore[unreachable]

cleaned_row = {}

Expand Down
2 changes: 1 addition & 1 deletion src/basic_open_agent_tools/data/json_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def validate_json_string(json_str: str) -> bool:
False
"""
if not isinstance(json_str, str):
return False
return False # type: ignore[unreachable]

try:
json.loads(json_str)
Expand Down
Loading
Loading