Note
Datapax is not production-ready and is currently in active development.
It achieve 79.5% success rate on a small test set of 1000 images.
AI-powered dataset patching and normalization pipeline for image data.
Intelligently normalize images of any size and aspect ratio into fixed resolutions
using AI-assisted outpainting not naive resizing or cropping.
The project currently uses Qwen Image Edit Plus, but the architecture is designed to be model-agnostic and replaceable.
- The original image of a Sukhoi-57 aircraft had a resolution of 1500×1000
- The target dataset resolution was 720×720
- Instead of cropping or stretching the image:
- Datapax kept the full aircraft visible
- Preserved scale, proportions, lighting, and perspective
- Outpainted missing pixels to fill the square frame naturally
- The background was extended using AI, without introducing new objects or stylistic changes
This approach produces dataset-ready images while avoiding the common pitfalls of traditional resizing pipelines.
What Problem Does Datapax Solve?
Traditional dataset preprocessing often relies on:
- Center crops
- Resizing with distortion
- Manual padding
- Loss of important visual context
Datapax aims to:
- Preserve the entire subject
- Maintain original proportions
- Keep background, lighting, and perspective intact
- Use AI-assisted outpainting and editing to fill missing areas naturally
Use Cases:
- Vision model training
- Diffusion datasets
- Image-to-image and multimodal models
- Any workflow that needs clean, consistent image sizes without destroying content
- AI-based image normalization (e.g. random size →
512×512) - Intelligent outpainting instead of cropping
- Subject-aware framing
- Preserves colors, lighting, and sharpness
- Designed for dataset-scale processing
- Model-agnostic pipeline (Qwen is just the first backend)
- Image Editing Model: Qwen Image Edit Plus
- Framework: PyTorch
The model choice is not hardcoded and will be swappable in future versions.
Tested with:
- PyTorch:
2.10.0+cu128 - CUDA: 12.8
- OS: Windows & Linux
Datapax is currently in active prototyping.
Planned milestones:
- Working end-to-end example
- Reproducible dataset patching pipeline
- Documentation & configuration cleanup
- Open-source release
Once milestone #2 is reached, the repository will be made public immediately.
- Modular backend interface (multiple image-edit models)
- CLI interface for dataset processing
- Batch processing
- Metadata & annotation preservation
- Config-driven pipelines
- Open-source release
TBD
The license will be defined at the time of the open-source release.
This project is experimental by nature. APIs, behavior, and internal structure may change rapidly until a stable release is published.
Feedback and ideas are welcome once the repository opens.
Built with ❤️ for the AI community
Making dataset preparation accessible and intelligent

