Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -191,3 +191,5 @@ nbdist/
.github/**
.gitignore
/src/main/resources/webui/**
data/**
start.sh
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2021-2024 Aleksandr Serdiukov, Anton Zamyatin, Aleksandr Sinitsyn, Vitalii Dravgelis and Computer Technologies Laboratory ITMO University team.
Copyright (c) 2021-2026 Aleksandr Serdiukov, Anton Zamyatin, Aleksandr Sinitsyn, Vitalii Dravgelis and Computer Technologies Laboratory ITMO University team.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
319 changes: 288 additions & 31 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,69 +4,326 @@ image:https://github.com/AxisAlexNT/HiCT_JVM/actions/workflows/autobuild-release

== Launching pre-built version

**NOTE: currently only Windows (tested on 10 and 11) and Linux (with `glibc`, common Debain/Ubuntu are OK, Alpine users are out of luck) are supported, native libraries for MacOS are not bundled in these builds. Only AMD64 platform is supported. On Windows you might need to install https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170[additional libraries]**.

1. Install Java 19 or newer (older versions won't be able to launch this code);
1. Make sure that `JAVA_HOME` variable points to the correct installation path (if you have multiple JREs or JDKs);
1. Download latest "fat" JAR from the https://github.com/ctlab/HiCT_JVM/releases[*Releases* page] in *Assets* section. Latest build will usually be on top, however the most stable implementation is in the build from `master` branch (called "Latest autogenerated build (branch master)"). You can rename it to `hict.jar` for convenience;
1. Open a terminal and change directory to where the downloaded `hict.jar` is located;
1. Issue `java -jar hict.jar` command and wait until message `Starting WebUI server on port 8080 ... WebUI Server started` appears;
1. Open your browser and navigate to the `http://localhost:8080` where HiCT WebUI should now be available.
== For users of `.jar` distribution

=== Startup options
This section is intended for bioinformatics users who download a ready-to-run fat JAR from GitHub Releases.
You need to install Java 21+ (this project is built for Java 21 bytecode).
Download the latest fat JAR from the https://github.com/ctlab/HiCT_JVM/releases[Releases page] (Assets section).
**NOTE:** prebuilt native bundles are currently provided for *Windows* (tested on 10/11) and *Linux with glibc* (common Debian/Ubuntu-like distributions). Alpine/musl is not supported by these bundled binaries. Current prebuilt artifacts are AMD64-only. On Windows you might need to install https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170[Microsoft Visual C++ Redistributable].

Currently, there are multiple environment variables that could be set prior to launching HiCT.
=== Quick start

* `DATA_DIR` -- should be a path to the directory containing `.hict.hdf5`, `.agp` and `fasta` files. These files could be anywhere in subtree of this directory, it is scanned recursively.
* `VXPORT` -- should be an integer between `1` and `65535` denoting port number which will be served by HiCT API. Note that listening on ports below `4096` usually requires some kind of administrative privileges. If not provided, the default value is `5000`. Startup might fail if the port is already occupied by another service. Be sure to set correct port in Connection -> API Gateway field in HiCT WebUI if changed.
* `WEBUI_PORT` -- should be an integer between `1` and `65535` denoting port number which will be served by HiCT WebUI. Note that listening on ports below `4096` usually requires some kind of administrative privileges. If not provided, the default value is `8080`. Startup might fail if the port is already occupied by another service.
* `SERVE_WEBUI` -- should either be `true` or `false` telling whether to start serving HiCT WebUI on the desired port or not. Might be useful during debugging or when WebUI is served by another process. Default is `true`. This option does not have any effect in case WebUI is not packed into the jar file.
* `TILE_SIZE` -- should be an integer greater than one. Defines the default tile size for visualization. Experimental setting, currently might break WebUI renderer. Default is `256`. The greater the tile size is, the less tiles are shown on screen and therefore less requests are sent to the server, but each request could potentially take longer to process.
1. Download the latest `-fat.jar` from the Releases page (Assets) and rename it to `hict-fat.jar`.
2. Place your `.hict.hdf5`, `.mcool`, `.cool`, `.agp`, and `.fasta` files under a single directory.
3. Run:
+
```bash
java -jar hict-fat.jar start-server
```
+
Directory with files is set using `DATA_DIR` environment variable, by default it scans subtree of the directory in which `hict-fat.jar` is launched from.
In Linux you may set it as follows:
+
```bash
DATA_DIR=/path/to/data/ java -jar hict-fat.jar start-server
```
+
4. Open WebUI at `http://localhost:8080`.

=== CLI commands (summary)

```bash
# API + WebUI (default mode, includes converters in WebUI as descibed below)
java -jar hict-fat.jar start-server

# API only (no WebUI)
java -jar hict-fat.jar start-api-server

# Convert .mcool -> .hict.hdf5 (CLI mode)
java -jar hict-fat.jar convert mcool-to-hict \
--input /data/sample.mcool \
--output /data/sample.hict.hdf5

# Convert .hict.hdf5 -> .mcool (CLI mode)
java -jar hict-fat.jar convert hict-to-mcool \
--input /data/sample.hict.hdf5 \
--output /data/sample.mcool
```

Get full CLI help:

```bash
java -jar hict-fat.jar --help
java -jar hict-fat.jar start-server --help
java -jar hict-fat.jar start-api-server --help
java -jar hict-fat.jar convert --help
java -jar hict-fat.jar convert mcool-to-hict --help
java -jar hict-fat.jar convert hict-to-mcool --help
```

=== WebUI conversion (Experimental / W.I.P.)

WARNING: WebUI conversion is experimental and may be slower or less stable than the CLI.

1. Open the WebUI.
2. Use *File → Convert Coolers*.
3. Track progress in the conversion window.

=== API access (Experimental / W.I.P.)

WARNING: The API is still evolving. Endpoints, parameters, and response formats may change.

Example (Python) for fetching a submatrix tile as an image:

```python
import requests

host = "http://localhost:5000"
params = {
"version": 0,
"bpResolution": 10000,
"format": "PNG_BY_PIXELS",
"row": 0,
"col": 0,
"rows": 512,
"cols": 512,
}

An example of launching HiCT with parameters:
r = requests.get(f"{host}/get_tile", params=params)
r.raise_for_status()
data = r.json()
png_data_url = data["image"]
print(png_data_url[:64])
```

To apply visualization/normalization settings before fetching tiles:

* POST `/set_visualization_options` with visualization parameters.
* POST `/set_normalization` with normalization settings.
* Then call `/get_tile` as shown above.

=== Supported platforms / JDK details

* *OS/CPU (prebuilt libs):* Linux (glibc) and Windows, AMD64.
* *Not bundled by default:* macOS variants and Linux ARM variants.
* *JDK:* Java 19 or newer is required for running/building this repository.

== Startup options and CLI

The fat JAR is runnable and exposes a CLI with subcommands:

* `start-server` -- API + WebUI (default when no args are given)
* `start-api-server` -- API only (no WebUI)
* `convert` -- conversion tools
** `convert mcool-to-hict`
** `convert hict-to-mcool`

Help:

==== *Linux, bash:*
```bash
DATA_DIR=/home/${USER}/hict/data SERVE_WEBUI=false java -jar hict.jar
java -jar hict.jar --help
java -jar hict.jar convert --help
java -jar hict.jar convert mcool-to-hict --help
```

==== *Windows, cmd:*
Environment variables supported by the server startup:

* `DATA_DIR` -- directory that is scanned recursively for `.hict.hdf5`, `.agp`, `fasta`, `.cool`, and `.mcool` files.
* `VXPORT` -- API gateway port, default `5000`.
* `WEBUI_PORT` -- WebUI port, default `8080`.
* `SERVE_WEBUI` -- `true`/`false`, default `true`.
* `TILE_SIZE` -- default visualization tile size, default `256`.
* `MIN_DS_POOL` / `MAX_DS_POOL` -- min/max pool sizes used when opening chunked datasets.

=== Launch examples (fat JAR)

==== Linux (bash)

```bash
DATA_DIR=/home/${USER}/hict/data java -jar hict.jar

# API only
DATA_DIR=/home/${USER}/hict/data java -jar hict.jar start-api-server

# Explicit server (API + WebUI)
DATA_DIR=/home/${USER}/hict/data java -jar hict.jar start-server
```

==== Windows (cmd)

```cmd
set DATA_DIR="D:\hict\data"
set WEBUI_PORT="8888"
java -jar hict.jar
java -jar hict.jar start-server
```

==== *Windows, PowerShell:*
==== Windows (PowerShell)

```powershell
$env:DATA_DIR = "D:\hict\data"
$env:WEBUI_PORT = "8888"
java -jar hict.jar
java -jar hict.jar start-server
```

==== Custom JVM options

```bash
DATA_DIR=/home/${USER}/hict/data java -ea -Xms512M -Xmx16G -jar hict.jar start-api-server
```

=== Launch examples (Gradle, from source)

```bash
# Default: runs HiCT CLI (equivalent to `java -jar ...`)
./gradlew clean run

# Explicit modes
./gradlew run --args="start-server"
./gradlew run --args="start-api-server"
```

==== Custom JVM Options
== Converter workflows (`.mcool` ↔ `.hict.hdf5`)

Of course, you can also pass JVM parameters like this:
=== CLI commands

Use the JVM CLI for both directions:

```bash
DATA_DIR=/home/${USER}/hict/data SERVE_WEBUI=false java -ea -Xms512M -Xmx16G -jar hict.jar
# mcool -> hict
java -jar hict.jar convert mcool-to-hict \
--input /data/sample.mcool \
--output /data/sample.hict.hdf5

# hict -> mcool
java -jar hict.jar convert hict-to-mcool \
--input /data/sample.hict.hdf5 \
--output /data/sample.roundtrip.mcool
```

=== Startup errors
=== Web conversion API flow

Since library naming conventions are different for different platform and libraries, there is currently a mechanism to try and load each library under a different name. This CAN produce errors on server startup, you can ignore them if `Starting WebUI server on port 8080 ... WebUI Server started` message appeared in console.
Typical asynchronous conversion sequence used by WebUI/integrations:

If, however, server works but maps are not displayed in WebUI and an error sign displays at the bottom right corner of WebUI, you should check console for error output.
1. *Upload*: `POST /api/convert/upload`
* Upload source file and target format metadata.
* Response returns a `jobId`.
2. *Status polling*: `GET /api/convert/status/{jobId}`
* Poll until state becomes `DONE` or `FAILED`.
3. *Download*: `GET /api/convert/download/{jobId}`
* Download converted artifact when status is `DONE`.

== Obtaining `.hict.hdf5` files
Recommended size limits:

Currently, it's necessary to use https://github.com/ctlab/HiCT_Utils[`HiCT_Utils` package] for the file format conversion, there are plans to simplify this process.
* Keep upload limits explicit at ingress/proxy and app gateway.
* For JVM safety, avoid unbounded request bodies in production; set max request size and timeouts.
* For very large matrices, prefer direct local file conversion (CLI) and then load resulting artifacts through `DATA_DIR`.

== Building `HiCT_JVM` from source
== Scaffolding API behavior notes

Scaffolding operations are served as POST endpoints and return updated assembly information:

* `/reverse_selection_range`
* `/move_selection_range`
* `/split_contig_at_bin`
* `/group_contigs_into_scaffold`
* `/ungroup_contigs_from_scaffold`
* `/move_selection_to_debris`

Important tile-version expectation:

To start building from source, you can run:
* Tile requests use `GET /get_tile?...&version=<n>`.
* If the requested version is *older* than server-side tile version, server returns HTTP `204` (no tile body) to force client invalidation.
* If the requested version is newer, server advances the internal version counter.
* Practical client rule: after each scaffolding mutation, increment your tile version and refresh visible tile requests.

== Startup errors and JHDF5 native library troubleshooting

During startup, you may see several native-library load attempts with warnings/errors. This can be expected because different platform-specific library names are tried.

If startup completes and API/WebUI are healthy, these warnings can be non-fatal.

When native loading actually fails:

1. Confirm architecture match (AMD64 JVM + AMD64 native bundle).
2. Confirm OS compatibility (Linux glibc; not Alpine/musl).
3. On Linux, ensure native/plugin paths are discoverable, for example:
+
```bash
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/path/to/hdf5/lib:/path/to/hdf5/lib/plugin"
export HDF5_PLUGIN_PATH="/path/to/hdf5/lib/plugin"
```
4. On Windows, install/update Visual C++ runtime redistributables.
5. Verify Java version (`java -version`) is 19+.
6. If tiles fail to render but server starts, inspect logs for `UnsatisfiedLinkError` and HDF5 plugin load failures.

== Production checklist (short)

Before deploying to production, verify:

* Logging: structured logs, retention, and centralized collection.
* Metrics/health: request latency/error metrics and liveness/readiness checks.
* Limits: request body size, timeouts, and JVM heap sizing are set explicitly.
* Graceful shutdown: stop accepting traffic, finish in-flight requests, then terminate.
* Backup/cleanup: regular backup strategy for source/converted files and periodic cleanup of temporary/intermediate artifacts.

== Building `HiCT_JVM` from source

To build from source:

```bash
./gradlew clean build
```

=== Dependency management workflow

This project uses Gradle dependency locking (`gradle.lockfile`) to keep transitive dependency resolution reproducible.

* Refresh lock state after dependency changes:
+
```bash
./gradlew dependencies --write-locks
```
* Inspect the resolved version for a specific dependency before/after updates:
+
```bash
./gradlew dependencyInsight --dependency org.slf4j:slf4j-api --configuration runtimeClasspath
./gradlew dependencyInsight --dependency ch.qos.logback:logback-classic --configuration runtimeClasspath
./gradlew dependencyInsight --dependency org.jetbrains:annotations --configuration compileClasspath
```

Commit both `build.gradle.kts` and `gradle.lockfile` together whenever lock state changes.

Current progress on modifying HDF5 and JHDF5 configuration resides in https://github.com/AxisAlexNT/jhdf5-with-plugins-configuration-snapshot[my personal repository]. Modified configuration is necessary to rebuild native libraries (HDF5, HDF5 plugins and JHDF5 should all be build as dynamic libraries). However, prebuilt native libraries for AMD64 Windows and Linux platforms are already present in `HiCT_JVM` repository. Missing platforms are Linux on `armv7` and `aarch64` and MacOS (both `amd64` and `aarch64` variants).

== Conversion tools (CLI + API)

A native converter module is now available in JVM codebase with two services:

* `McoolToHictConverter` (`mcool-to-hict`)
* `HictToMcoolConverter` (`hict-to-mcool`)

CLI launcher:

```bash
./gradlew runConversionCli --args="convert hict-to-mcool --input=/data/sample.hict.hdf5 --output=/data/sample.mcool --resolutions=10000,50000 --compression=4 --chunk-size=8192"
./gradlew runConversionCli --args="convert mcool-to-hict --input=/data/sample.mcool --output=/data/sample.hict.hdf5 --resolutions=10000,50000 --parallelism=16"
```

Arguments:

* `--input=<path>` source file path
* `--output=<path>` destination file path
* `--resolutions=<comma-separated>` optional resolution filter
* `--compression=<0..9>` deflate level (`0` means chunked/no deflate)
* `--chunk-size=<N>` chunk size for streaming traversal
* `--agp=<file.agp> --apply-agp` apply AGP before `hict-to-mcool` export
* `--parallelism=<N>` max worker threads (default: available CPU cores)

Web API endpoints:

* `POST /convert/upload` (multipart + query params: `direction`, `resolutions`, `compression`, `chunkSize`, `applyAgp`, `agpPath`, `parallelism`)
* `GET /convert/jobs/:jobId`
* `GET /convert/download/:jobId`

Conversion jobs are asynchronous, include streaming logs/error details, enforce upload size limit and have temporary file cleanup TTL.
Loading
Loading