Skip to content

Experimental pipeline for AI-assisted image normalization and dataset preparation using generative outpainting.

Notifications You must be signed in to change notification settings

SystemVll/Datapax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Note

Datapax is not production-ready and is currently in active development.
It achieve 79.5% success rate on a small test set of 1000 images.

Datapax Logo

Datapax

AI-powered dataset patching and normalization pipeline for image data.

Intelligently normalize images of any size and aspect ratio into fixed resolutions
using AI-assisted outpainting not naive resizing or cropping.


Status Backend Framework


The project currently uses Qwen Image Edit Plus, but the architecture is designed to be model-agnostic and replaceable.

🎬 Example

What happened here?

  • The original image of a Sukhoi-57 aircraft had a resolution of 1500×1000
  • The target dataset resolution was 720×720
  • Instead of cropping or stretching the image:
    • Datapax kept the full aircraft visible
    • Preserved scale, proportions, lighting, and perspective
    • Outpainted missing pixels to fill the square frame naturally
  • The background was extended using AI, without introducing new objects or stylistic changes

This approach produces dataset-ready images while avoiding the common pitfalls of traditional resizing pipelines.


🎯 Why Datapax?

What Problem Does Datapax Solve?

Traditional dataset preprocessing often relies on:

  • Center crops
  • Resizing with distortion
  • Manual padding
  • Loss of important visual context

Datapax aims to:

  • Preserve the entire subject
  • Maintain original proportions
  • Keep background, lighting, and perspective intact
  • Use AI-assisted outpainting and editing to fill missing areas naturally

Use Cases:

  • Vision model training
  • Diffusion datasets
  • Image-to-image and multimodal models
  • Any workflow that needs clean, consistent image sizes without destroying content

✨ Core Features

  • AI-based image normalization (e.g. random size → 512×512)
  • Intelligent outpainting instead of cropping
  • Subject-aware framing
  • Preserves colors, lighting, and sharpness
  • Designed for dataset-scale processing
  • Model-agnostic pipeline (Qwen is just the first backend)

🔧 Current Backend

  • Image Editing Model: Qwen Image Edit Plus
  • Framework: PyTorch

The model choice is not hardcoded and will be swappable in future versions.


💻 Environment

Tested with:

  • PyTorch: 2.10.0+cu128
  • CUDA: 12.8
  • OS: Windows & Linux

📊 Project Status

Datapax is currently in active prototyping.

Planned milestones:

  1. Working end-to-end example
  2. Reproducible dataset patching pipeline
  3. Documentation & configuration cleanup
  4. Open-source release

Once milestone #2 is reached, the repository will be made public immediately.


🗺️ Roadmap (Planned)

  • Modular backend interface (multiple image-edit models)
  • CLI interface for dataset processing
  • Batch processing
  • Metadata & annotation preservation
  • Config-driven pipelines
  • Open-source release

📜 License

TBD
The license will be defined at the time of the open-source release.


📝 Notes

This project is experimental by nature. APIs, behavior, and internal structure may change rapidly until a stable release is published.

Feedback and ideas are welcome once the repository opens.


Built with ❤️ for the AI community
Making dataset preparation accessible and intelligent

About

Experimental pipeline for AI-assisted image normalization and dataset preparation using generative outpainting.

Topics

Resources

Stars

Watchers

Forks

Languages