Skip to content

yumenana/Caelum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English | 中文

Caelum Logo

Caelum「澄空」

We share the same sky — I just want it to be clear.

Inspired by Porter Robinson — "Look At The Sky"


Status GitHub Stars GitHub Forks Platform Code License Model License PyTorch


What is Caelum?

Anime illustrations shared across the internet typically go through multiple rounds of resizing and JPEG/WebP compression, ending up as a blurry, lossy, artifact-ridden version of the original.

Caelum's goal is to restore them as faithfully as possible.

This is a ×4 super-resolution reconstruction network specifically targeting real-world internet image degradation (anime illustrations distributed on platforms like Pixiv, X, Facebook, etc.). It is not a diffusion model and it is not "generating" — it is attempting to restore the image as close as possible to its original form.


🔍 How Does Caelum Compare?

Caelum focuses on a specific, underserved problem: restoring anime illustrations degraded by real-world internet multi-platform re-upload chains. Unlike general-purpose upscalers, it is trained exclusively on a simulated pipeline that matches how images actually deteriorate across Pixiv, Twitter/X, Facebook, Discord, and screenshot cycles.

Caelum waifu2x Real-ESRGAN RealCUGAN Anime4K
Target content Anime illustration Anime / Art General / Anime Anime Anime (video)
Degradation model Multi-platform internet re-upload chain Noise + blur Real-world general Noise + compression
Architecture PPBUNet (U-Net + HAT Attention + Mamba) SwinUNet / CNN RRDB U-Net GLSL shader
Output scale ×4 ×1/×2/×4 ×2/×4 ×2/×3/×4 Variable
JPEG/WebP artifact removal ✅ Multi-stage
Inference backend DirectML (ONNX) NCNN / Vulkan NCNN / CUDA NCNN OpenCL / Vulkan
Windows GUI ✅ Native Partial Partial Partial
Free

⚠️ Caelum is in active training. Quantitative comparisons (PSNR/SSIM/LPIPS) will be published with the first stable Release.


✨ Features

  • 🎯 Scenario-Focused — Specifically simulates the real degradation pipeline of modern social/image platforms (resize + JPEG/WebP compression), not generic degradation
  • 🏗️ PPBUNet — Palette-Painter-Brush U-Net for Anime Super-Resolution
  • ×4 Upscaling — Dedicated 4× super-resolution reconstruction
  • 🪟 Windows GUI — Ready-to-use .exe application — no Python or command line needed
  • 🔓 Free & Open Source — Free forever; source code available under AGPL-3.0 / CC BY-NC-SA 4.0

🖼️ Results

⚠️ Early checkpoint — The network is still in training; current results are from an early checkpoint and do not represent the final quality.

Caelum Demo Caelum Demo

Degraded Input waifu2x SwinUNet noise2 ×4 Caelum PPBUNet CAR2 ×4 Ground Truth
Degraded Input waifu2x Caelum GT

📝 Comparison images were generated by Google Gemini, featuring Reimu Hakurei from Touhou Project (ZUN). Touhou Project permits non-commercial fan works; this project is a non-commercial open-source project with no copyright concerns.


🚀 Quick Start

Download & Use

Releases are split into two separate packages — download both from Releases:

Package Contents
Caelum_vX.Y.Z.zip GUI application
Models_YYYYMMDD.zip Model weights (date-versioned; only changed models are included per release)

After extracting both archives, place the Models/ folder into the application directory:

Caelum/        ← app directory
└── Models/    ← extracted Models folder goes here

Then run Caelum.exe.

System Requirements

  • OS: Windows 10 version 2004 (build 19041) or later / Windows 11
  • Architecture: x64 or ARM64
  • .NET Runtime: .NET 10 Desktop Runtime
  • GPU (optional): Any DirectX 12 compatible GPU (NVIDIA / AMD / Intel)
    • If no DX12 GPU is available, inference will automatically fall back to CPU (slower)

Adding a New Language

The app uses a JSON sidecar localization system. No recompilation is needed — just add a file.

Steps:

  1. Go to the Langs/ folder next to the executable.

  2. Create a new JSON file named with the BCP-47 culture code (e.g. fr-FR.json for French, ko-KR.json for Korean).

  3. Copy the contents of en-US.json as a template and translate all values in the "Strings" section. Set Metadata.DisplayName to the language's own native name (this appears in the Settings menu):

    {
      "Metadata": {
        "Culture": "fr-FR",
        "DisplayName": "Français",
        "Author": "Your Name"
      },
      "Strings": {
        "App.Title": "Caelum",
        "Menu.Settings": "Paramètres",
        "..."
      }
    }
  4. Restart the app. The new language will appear automatically under Settings → Language.

Note: Key names must not be changed. Only translate the values on the right side of each "Key": "Value" pair. Missing keys will fall back to displaying [KeyName] as a placeholder.


🏗️ Architecture

Current version: PPBUNet v1.2

A 2-level U-Net pyramid with a parallel geometry bypass, operating in three stages: Palette extracts global color prototypes → Painter reconstructs global structure → Brush performs geometry refinement and upsampling.

flowchart TD
    In(["Input Image"]) --> SE["Shallow Feature Extraction"]

    SE -->|"parallel bypass"| GB["Geometry Bypass<br/>Directional · Full-resolution"]
    SE --> L0

    subgraph Encoder["Encoder — 2-Level Pyramid"]
        L0["Level 0 · 64ch"] --> L1["Level 1 · 128ch"]
    end

    L1 --> FR

    subgraph BN["Bottleneck — Frequency Decoupled"]
        FR["Frequency Router<br/>DC / AC split"]
        FR -->|"DC"| PAL["Palette<br/>Color prototype extraction"]
        FR -->|"AC"| PAI["Painter<br/>Global structure reconstruction"]
    end

    L1 -.->|"skip"| SCR1["Skip Refinement L1<br/>Filtered · Aligned"]
    PAI --> SCR1
    SCR1 --> DL1["Decoder Level 1<br/>128ch · Attention"]
    DL1 --> R1["Residual L1"]

    L0 -.->|"skip"| SCR0["Skip Refinement L0<br/>Filtered · Aligned"]
    R1 --> SCR0
    SCR0 --> DL0["Decoder Level 0<br/>64ch · Attention"]
    DL0 --> R0["Residual L0"]

    subgraph Brush["Brush Stage"]
        GR["Geometry Refinement<br/>Deformable edge & curve correction"]
        GR --> CM["Color Modulation<br/>Palette-guided broadcast"]
    end

    GB -->|"geometry prior"| GR
    R0 --> GR
    PAL -->|"palette"| CM
    CM --> FUSE
    SE -->|"latent residual"| FUSE["Feature Fusion"]
    FUSE --> UP["Adaptive ×4 Upsampler<br/>Coordinate-aware · Phase-zero residual"]
    UP --> Out(["Output Image · 4H × 4W"])

classDef io fill:#4A90D9,stroke:#2C5F8A,color:#fff
classDef se fill:#5D6D7E,stroke:#2E4057,color:#fff
classDef bypass fill:#8E44AD,stroke:#5B2C6F,color:#fff
classDef enc fill:#2980B9,stroke:#1A4F7A,color:#fff
classDef btn fill:#E67E22,stroke:#9A5C12,color:#fff
classDef pal fill:#D4AC0D,stroke:#8B6E0A,color:#fff
classDef skip fill:#16A085,stroke:#0B6B5A,color:#fff
classDef dec fill:#2471A3,stroke:#154360,color:#fff
classDef brush fill:#C0392B,stroke:#7B241C,color:#fff
classDef up fill:#1ABC9C,stroke:#0E7560,color:#fff

class In,Out io
class SE se
class GB bypass
class L0,L1 enc
class FR,PAI btn
class PAL pal
class SCR0,SCR1 skip
class DL1,DL0,R1,R0 dec
class GR,CM brush
class FUSE,UP up
Loading

Architecture Stages

Stage Function
Shallow Feature Extraction Converts input image to a shared latent space; provides a residual bypass to the upsampler
Geometry Bypass Parallel full-resolution branch capturing directional high-frequency features before any downsampling
Encoder 2-level pyramid that progressively aggregates multi-scale spatial context
Frequency Router Explicitly splits the latent into low-freq DC (color / flat regions) and high-freq AC (edges / lines) streams
Palette Extracts global color prototypes from the DC stream; conditions the Brush stage on a stable color prior
Painter Reconstructs global high-frequency topology from the AC stream; tracks long-range structural continuity
Skip Refinement Filters and aligns encoder features before decoder merge, suppressing compressed-artifact propagation
Decoder 2-level attention-based decoder restoring spatial detail from bottleneck representations
Geometry Refinement Deformable correction guided by the geometry bypass; recovers sharp edges, fine strokes, and curves
Color Modulation Broadcasts palette prototypes across the feature map to enforce global color fidelity
Adaptive Upsampler Coordinate-aware ×4 upsampling with phase-zero high-frequency residuals for alias-free reconstruction

For full design rationale and module specifications, see Caelum/model/PPBUnet_v1/ARCHITECTURE.md.

Degradation Pipeline

When anime illustrations circulate online, they don't undergo a single compression — they go through a full multi-platform re-upload chain. The training data pipeline (dataset.py) simulates this chain online using 5 degradation modes, generating (LR, HR) pairs in real time each batch without pre-storing degraded images.

Degradation Modes

Mode Name Scenario Sample Rate
0 Pure Bicubic Mathematical downsampling only (validation set)
1 Pre-blur + Bicubic Simulates anti-aliased upload 10%
2 Light Compression Bicubic ↓4× + random 1–3× JPEG/WebP 30%
3 Moderate Degradation 3-stage high-order degradation lv2 (social platform re-upload) 35%
4 Heavy Degradation 3-stage high-order degradation lv3 (deep compression artifacts) 25%

Training uses CaelumMixedDataset, where each image undergoes a different degradation mode each epoch, providing massive equivalent data augmentation.

3-Stage High-Order Degradation Chain (Mode 3/4)

HR Original
  │
  ▼ Stage 1 — Creator Upload
  ├─ 50% chance pre-blur (Gaussian r=2 / r=1+Box)
  ├─ Random rescale (50%→0.5× · 25%→1.0× · 25%→uniform sample)
  └─ JPEG/WebP compression (q = 75–95)  [JPEG 70% / WebP 30%]
  │
  ▼ Stage 2 — Platform Re-upload
  ├─ 30% chance Sinc ringing (Hamming window, lv2: ω∈[2π/3,π] / lv3: ω∈[π/3,2π/3])
  ├─ 50% chance secondary blur
  ├─ Bilinear/Bicubic random rescale → target LR size
  ├─ DCT grid shift 1–7px (breaks quantization grid alignment, produces realistic overlapping block artifacts)
  └─ JPEG/WebP compression (q = 50–80)
  │
  ▼ Stage 3 — End-user Retrieval
  ├─ lv2: 25% / lv3: 50% chance screenshot upscale + recompress
  ├─ DCT grid shift + final compression (lv2: q=40–75 / lv3: q=10–40)
  └─ Restore coordinate alignment
  │
  ▼ LR Output

Key Technical Details

Technique Implementation Purpose
JPEG/WebP Piecewise Linear Mapping jpeg_quality_to_webp() WebP is far more efficient at low quality than JPEG; equal perceptual strength requires differentiated mapping
DCT Grid Shift break_dct_grid() cyclic shift 1–7px Two compression passes produce non-overlapping block boundaries, creating realistic multi-compression block artifacts
Sinc Ringing Hamming-windowed sinc + À Trous multi-scale Simulates ringing overshoot (Gibbs phenomenon) introduced by downsampling/resampling
Mixed Interpolation 50% Bicubic / 50% Bilinear Covers scaling algorithm differences across platforms
Geometric Augmentation Horizontal flip × Vertical flip × 90° rotation 8 combinations, 8× effective data expansion
In-Memory Compression io.BytesIO in-memory encode/decode Full DCT encode/decode ensures degradation authenticity with no filesystem overhead

Loss Function Design

CaelumLossV2 coordinates 13 sub-losses spanning pixel, color, frequency, spatial, perceptual, and adversarial dimensions, using a two-phase progressive activation strategy.

Two-Phase Progressive Strategy

Training Progress
0%──────────────30%──────────────────────────100%
│      Phase 1        │          Phase 2           │
│  Pixel + Color Anchor│  + High-freq + Perceptual + Adversarial │
└─────────────────────┴────────────────────────────┘

Phase 1 lets the network converge to the correct color and pixel distribution; Phase 2 then introduces stronger constraints to refine edges, frequency content, and semantic detail, avoiding early-stage gradient oscillation.

Sub-Loss Overview

Phase 1 (active throughout)

Loss Weight Purpose
L1 1.0 Pixel-level absolute error baseline
AdaptiveDCAnchorLoss 1.0 Scharr gradient energy drives exponential-decay soft weights; amplifies L1 in flat regions, fades in textured regions — eliminates hard-threshold variance instability
OklchColorLoss 5.0 OKLCH perceptual color space: chroma L1 + hue cosine joint constraint, atan2-free
StrictFlatTGVLoss 1.0 Morphological hard mask isolates flat regions; Charbonnier penalizes 1st+2nd derivatives→0, eliminating flat-area ripple
SmoothGradientHessianLoss 1.5 Structure-tensor guided Hessian penalty on smooth gradient regions; suppresses color banding and micro-ripple in graduation areas

Phase 2 (added when training progress ≥ 30%)

Loss Weight Purpose
ChromaGradientLoss 1.5 Sobel directly constrains Oklab a/b chroma gradients to align with GT, preventing color overflow
CreviceColorLoss 6.0 Morphological closing detects line crevices; corrects hue shift caused by JPEG 4:2:0 chroma subsampling
MaskedAsymmetricHistogramLoss 1.5 Soft histogram over edge-dilated regions; asymmetric divergence heavily penalizes "hallucinated" colors (×5) while lightly penalizing "unrecovered" detail (×1)
GibbsRingingPenaltySWT 4.0 Haar SWT three sub-bands (HL/LH/HH) × À Trous multi-scale (d=1,2,4); one-sided penalty on high-frequency overshoot without interfering with normal sharpening
AngularFluencyLoss 5.0 Farid 7×7 rotation-equivariant operator computes gradient direction angular distance, directly eliminating super-resolution aliasing
MacroscopicTurningPointLoss 0.5 Dilated Scharr (d=2) + 11×11 Gaussian macroscopic integration; contrast-invariant corner response C = 4·det(S)/trace(S)²
LaplacianResonanceTopologyLoss 1.5 Dual-scale dilated Laplacian resonance + R^75 cosine topology + Charbonnier intensity; restores crevice topology
AnimePerceptualLossV2 0.5 Danbooru ConvNeXt cosine manifold distance (stage0+stage1); GT magnitude gating focuses on edge regions

GAN Components (optional)

Component Description
DecoupledUNetDiscriminatorSN Guided filter front-end decomposes structure/texture into dual streams; structure branch full-power U-Net, texture branch lightweight global statistics — prevents D from forcing G to hallucinate unrecoverable textures
DecoupledGANLoss Structure adversarial weight ×1.0, texture adversarial weight ×0.1 (texture_tolerance); spectral normalization stabilizes training

MIM Auxiliary Loss

Obtain the InfoNCE skip-connection mutual information loss via model.mi_loss during training; recommended weight λ=0.01:

loss = criterion(pred, hr) + 0.01 * model.mi_loss

BADI Gate Regularization (optional)

GateTolerancePenalty applies a hinge penalty on model.badi.last_gate over GT flat regions, discouraging gate laziness that would allow shallow-layer noise to leak through. Cosine anneals to zero at 70% training progress; recommended weight λ≈0.001:

loss = loss + 0.001 * gate_penalty(model.badi.last_gate, gt)

📊 Experimental Results

This project is not aimed at academic publication and does not include controlled ablation studies.

The architecture is theoretically derived to be at the frontier, particularly for the specific scenario of restoring real-world internet image degradation.

If you process an image with it and the result looks good — that's enough.


📁 Project Structure

Caelum/                            ← Repository root
├── .github/
│   └── FUNDING.yml
├── Caelum/                        ← Python project directory
│   ├── assets/
│   │   ├── logo.png               # Project logo
│   │   ├── demo0.png              # GUI demo image
│   │   ├── demo1.png              # GUI demo image
│   │   └── compare/               # Comparison images
│   │       ├── degradation.png
│   │       ├── GT.png
│   │       ├── PPBUnet_CAR2_x4.png
│   │       └── waifu2x_SwinUNet_noise2_x4.png
│   └── model/
│       └── PPBUnet_v1/
│           ├── ARCHITECTURE.md       # Detailed architecture design doc
│           ├── PPBUNet_v1_x4.py      # Main network definition       [AGPL-3.0]
│           ├── modules.py            # Core module library           [AGPL-3.0]
│           ├── hat.py                # HAT Decoder                   [AGPL-3.0]
│           ├── ps_mamba.py           # PS-Mamba SSM module           [AGPL-3.0]
│           ├── dataset.py            # Online degradation pipeline   [CC BY-NC-SA 4.0]
│           ├── losses.py             # Custom loss function system   [CC BY-NC-SA 4.0]
│           └── train.py              # Training script               [AGPL-3.0]
├── README.md
├── README_zh.md
├── LICENSE
└── LICENSE-AGPL-3.0

Model weights and packaged applications (*.onnx, Caelum.exe) are distributed via Releases under CC BY-NC-SA 4.0.


🗺️ Roadmap

✅ Completed

  • PPBUNet v1.2 architecture design (ParallelOAM · FrequencyRouter · MIM · RMA · HAT · CornerAwareDCN · AnimeCommitteeRefiner · SATUpsampler)
  • Multi-platform degradation simulation pipeline (5 modes · 3-stage high-order chain · online real-time generation)
  • Custom loss function system (CaelumLossV2 · 13 sub-losses · two-phase progressive strategy · committee orthogonality + BADI gate regularizer)
  • Early checkpoint validation

🔄 In Progress

  • Complete model training → update final results showcase
  • Package exe + ONNX export → publish Release

🔭 Future Plans

  • Hair reconstruction — Recovery of hair tips and line-crevice detail is the current biggest weakness; planning hair-aware perceptual loss and targeted geometry refinement modules
  • Residual-free architecture exploration — Skip-add residuals have a fundamental limitation in artifact suppression (harmful input information is difficult to cut off); exploring purely attention/Mamba forward architectures free of skip-add residuals
  • New architecture exploration — Building on PPBUNet experience, continuing to explore more efficient and interesting anime SR architecture directions
  • Expand training dataset — Current dataset scale and diversity remain a bottleneck; planning to incorporate larger-scale anime illustration data (Danbooru · Pixiv etc.) while researching data cleaning and quality filtering pipelines

🌌 Origin Story

Years ago, I just wanted to upscale the anime illustrations I loved so they could be desktop wallpapers.

waifu2x was the first time I realized that neural networks could do this remarkably well — and at the time, the results were a complete paradigm shift over traditional interpolation upscaling. It sparked an intense curiosity in me — how does it do that?

To find out, I started learning deep learning and built my first super-resolution network, Entropia. Life got in the way and I set it aside for a long time.

Now, with the help of LLMs, I'm back — standing on the shoulders of my past self and everyone who came before.

Caelum is not a paper, not a research contribution. It's simply a continuation of a question — is there anything wrong with wanting the things I love to be a little clearer?

I've benefited too much from too many people's unpaid contributions along this road.


❤️ Support This Project

Caelum is and always will be free. If it helped you, you can buy me a coffee via Ko-fi — it directly translates into more GPU time.

ko-fi


📄 License

This project uses a dual license model:

Scope License Key Constraint
Network architecture & training source code
(PPBUNet_v1_x4.py, modules.py, hat.py, ps_mamba.py, train.py)
AGPL-3.0 Derivatives must be open-sourced (including network services); commercial use permitted
Degradation/loss code + model & application release files
(dataset.py, losses.py, *.onnx, Caelum.exe)
CC BY-NC-SA 4.0 Commercial use prohibited; derivatives must use the same license

For commercial use of training code or model weights, please contact the author for a separate commercial license.


📬 Acknowledgements

Thanks to the author of waifu2x.

Thanks to everyone willing to make the world a little better.


"Look at the sky — I'm still here."

About

Free 4×(2×) anime illustration super-resolution & artifact removal. Targets real-world internet compression (Pixiv/X/Facebook multi-upload chains).

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors

Languages