An API-first platform for ingesting, browsing, and analysing billion-scale AI image datasets. Built with ASP.NET Core minimal APIs and a Blazor WebAssembly client.
- API-Driven Lifecycle: Dataset creation, ingestion status, and item retrieval exposed via REST endpoints.
- Virtualized Viewing: Only render what the user sees while prefetching nearby items for buttery scrolling.
- Sliding-Window Infinite Scroll: Browse very large image datasets with a fixed-size in-memory window, loading pages ahead/behind as you scroll while evicting old items to avoid WebAssembly out-of-memory crashes.
- Streaming Ingestion (Roadmap): Designed for chunked uploads and background parsing to avoid memory spikes.
- Shared Contracts: Typed DTOs shared between client and server for end-to-end consistency.
- Modular Extensibility: Pluggable parsers, modalities, and viewers via dependency injection.
- Observability Ready: Hooks for telemetry, structured logging, and health endpoints.
- .NET 8.0 SDK or later
- Modern web browser (Chrome, Firefox, Safari, Edge)
- ~2GB RAM for development
- ~100MB disk space
git clone <your-repo-url>
cd HartsysDatasetEditordotnet restoredotnet build# Terminal 1 - Minimal API (serves dataset lifecycle routes)
dotnet run --project src/HartsysDatasetEditor.Api
# Terminal 2 - Blazor WebAssembly client
dotnet run --project src/HartsysDatasetEditor.ClientBoth projects share contracts via HartsysDatasetEditor.Contracts. The API currently uses in-memory repositories for smoke testing.
Navigate to: https://localhost:5001 (client dev server). Ensure the API is running at https://localhost:7085 (default Kestrel HTTPS port) or update the client's appsettings.Development.json accordingly.
Support for uploading and ingesting datasets is being rebuilt for the API-first architecture. The previous client-only ingestion flow has been removed. Follow the roadmap below to help implement the new streaming ingestion pipeline. For now, smoke-test the API using the built-in in-memory dataset endpoints:
POST /api/datasets // create dataset stub
GET /api/datasets // list datasets
GET /api/datasets/{id} // inspect dataset detail
GET /api/datasets/{id}/items?pageSize=100HartsysDatasetEditor/
βββ src/
β βββ HartsysDatasetEditor.Api/ # ASP.NET Core minimal APIs for dataset lifecycle + items
β β βββ Extensions/ # Service registration helpers
β β βββ Models/ # Internal persistence models
β β βββ Services/ # In-memory repositories, ingestion stubs
β βββ HartsysDatasetEditor.Client/ # Blazor WASM UI
β β βββ Components/ # Viewer, Dataset, Filter, Common UI pieces
β β βββ Services/ # State management, caching, API clients (roadmap)
β β βββ wwwroot/ # Static assets, CSS, JS
β βββ HartsysDatasetEditor.Contracts/ # Shared DTOs (pagination, datasets, filters)
β
βββ tests/ # Unit tests
βββ README.md
The editor follows a strictly API-first workflow so that every client action flows through the HTTP layer before touching storage. High-level components:
- Blazor WebAssembly Client β virtualized viewers, upload wizard, and caching services that call the API via typed
HttpClientwrappers. Prefetch and IndexedDB caching are planned per docs/architecture.md. - ASP.NET Core Minimal API β orchestrates dataset lifecycle, ingestion coordination, and cursor-based item paging. Background hosted services handle ingestion and stub persistence today.
- Backing Services β pluggable storage (blob), database (PostgreSQL/Dynamo), and search index (Elastic/OpenSearch) abstractions so we can swap implementations as we scale.
See the detailed blueprint, data flows, and phased roadmap in docs/architecture.md for deeper dives.
- Uses cursor-based paging from the API to request small, contiguous chunks of items.
- Keeps a fixed-size in-memory window (
DatasetState.Items) instead of materializing all N items on the client. - Slides the window forward and backward as you scroll, evicting old items from memory to avoid WebAssembly out-of-memory crashes.
- Rehydrates earlier or later regions of the dataset from IndexedDB (when enabled) or the API when you scroll back.
-
Start the API
dotnet run --project src/HartsysDatasetEditor.Api
By default this listens on
https://localhost:7085. Trust the dev certificate the first time. -
Start the Blazor WASM client
dotnet run --project src/HartsysDatasetEditor.Client
The dev server hosts the static client at
https://localhost:5001. -
Configure the client-to-API base address
- The client reads
DatasetApi:BaseAddressfromwwwroot/appsettings.Development.json. Leave it at the defaulthttps://localhost:7085or update it if the API port changes.
- The client reads
-
Browse the app
- Navigate to
https://localhost:5001. The client will call the API for dataset lists/items. - Verify CORS is enabled for the WASM origin once the API CORS policy is implemented (see roadmap).
- Navigate to
When deploying as an ASP.NET Core hosted app, the API project can serve the WASM assets directly; until then, the two projects run side-by-side as above.
- ASP.NET Core 8.0: Minimal API hosting and background services
- Blazor WebAssembly: Client-side SPA targeting the API
- MudBlazor: Material Design component library
- CsvHelper: Planned streaming ingestion parsing
- IndexedDB / LocalStorage: Client-side caching strategy (roadmap)
- Virtualization: Blazor's built-in
<Virtualize>component
Microsoft.AspNetCore.Components.WebAssemblyMudBlazor- Material Design UI componentsBlazored.LocalStorage- Browser storageCsvHelper- CSV/TSV parsing
- No external dependencies (lightweight by design)
- Client configuration lives in
wwwroot/appsettings*.json. Update theDatasetApi:BaseAddressonce the API host changes. - API configuration is stored in
appsettings*.jsonunder thesrc/HartsysDatasetEditor.Apiproject. Adjust logging and CORS settings here.
- Create a parser implementing
IDatasetParserin the ingestion pipeline. - Register it in DI through a parser registry service.
- Add format to
DatasetFormatenum and expose via API capability endpoint.
public class MyFormatParser : IDatasetParser
{
public bool CanParse(string data) { /* ... */ }
public IAsyncEnumerable<IDatasetItem> ParseAsync(string data) { /* ... */ }
}- Create a provider implementing
IModalityProvider - Register in
ModalityProviderRegistry - Add modality to
Modalityenum - Create viewer component
- Virtualized rendering via
<Virtualize>keeps browser memory flat while streaming new pages. - API pagination uses cursor tokens and configurable page sizes to keep server memory bounded.
- Future ingestion jobs will stream upload parsing to avoid buffering entire files.
dotnet publish -c ReleaseOutput in: src/HartsysDatasetEditor.Client/bin/Release/net8.0/publish/
- Build for production
- Copy
wwwrootcontents togh-pagesbranch - Enable GitHub Pages in repo settings
- Create Static Web App in Azure Portal
- Configure build:
- App location:
src/HartsysDatasetEditor.Client - Output location:
wwwroot
- App location:
- Deploy via GitHub Actions
- Ensure both API and client are running before testing. API defaults to HTTPS, so trust the development certificate when prompted.
- Use Swagger/OpenAPI (coming soon) or tools like
curl/httpie/Postman to verify endpoint availability. - When modifying contracts, update both server and client references to avoid serialization errors.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check existing documentation
- Review the MVP completion status document
The detailed architecture, phased roadmap, and task checklist live in docs/architecture.md. Highlights:
- Infrastructure β β API and shared contracts scaffolded; configure hosted solution + README updates.
- API Skeleton β In progress; dataset CRUD endpoints implemented with in-memory storage, upload endpoint pending.
- Client Refactor β Pending; migrate viewer to API-backed pagination and caching services.
- Ingestion & Persistence β Pending; implement streaming ingestion worker and backing database.
- Advanced Features β Pending; CDN integration, SignalR notifications, plugin architecture.
Current Version: 0.2.0-alpha
Status: API-first migration in progress
Last Updated: 2025