Atlas is a decentralized Infrastructure-as-a-Service (IaaS) platform for fine-tuning and serving AI models. The platform uses blockchain for coordination and reward distribution, with nodes contributing compute and storage resources similar to cryptocurrency mining systems.
- Decentralized Training: Fine-tune AI models using distributed compute nodes
- Federated Learning: Privacy-preserving distributed training with gradient aggregation
- LoRA Fine-Tuning: Efficient fine-tuning with Low-Rank Adaptation
- Model Serving: Serve models for inference via HTTP/gRPC API
- Inference Network: Distributed network for LLM, Vision, Speech-to-text, and Embedding services
- Blockchain Coordination: Cosmos SDK blockchain for job coordination and reward distribution
- IPFS Storage: Decentralized storage for datasets, models, and checkpoints
- P2P Communication: Direct peer-to-peer communication without API Gateway
- Auto Resource Detection: Automatic detection of CPU, GPU, RAM, storage, and network speed
- Fault Tolerance: Automatic task reassignment, checkpoint recovery, and graceful degradation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ATLAS PLATFORM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Client β β Node 1 β β Node 2 β β
β β SDK β β β β β β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β β
β βββββββββββββββββββ΄ββββββββββββββββββ€ β
β β β β
β β IPFS Pub/Sub β β
β β (Real-time messaging) β β
β β β β
β βββββββββββββββββββββββββββββββββββββββ€ β
β β β β
β β IPFS Storage β β
β β (Files: datasets, models) β β
β β β β
β βββββββββββββββββββ¬ββββββββββββββββββββ β
β β β
β β gRPC/RPC β
β β β
β ββββββββΌβββββββ β
β β Cosmos β β
β β Chain β β
β β (Blockchain)β β
β βββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Blockchain Layer (
chain/): Cosmos SDK with 10 custom modules - Compute Node (
node/): Distributed compute nodes for training & inference - Federated Learning (
federated-learning/): Privacy-preserving distributed training - LoRA (
lora/): Efficient fine-tuning with Low-Rank Adaptation - Storage Layer (
storage/): IPFS-based decentralized storage - Client SDK (
sdk/): Python SDK with CLI and daemon mode
- Go 1.21+
- Python 3.10+
- Docker (optional, for containerized deployment)
- IPFS node (or use Infura/IPFS gateway)
- Cosmos SDK (for blockchain)
# Clone repository
git clone https://github.com/iamkyr0/atlas.git
cd atlas
# Setup Go dependencies
cd chain
go mod download
# Setup Python SDK
cd ../sdk/python
pip install -r requirements.txt
# Start IPFS node (or use Infura)
ipfs daemon# Start all services (chain, IPFS, node)
docker-compose up -d
# Check status
docker-compose pscd sdk/python
pip install -r requirements.txtfrom atlas import AtlasClient
client = AtlasClient(
ipfs_api_url="/ip4/127.0.0.1/tcp/5001", # or "https://ipfs.infura.io:5001"
chain_grpc_url="localhost:9090", # Format: host:port
creator="your_wallet_address"
)import asyncio
async def train_model():
async with AtlasClient() as client:
# 1. Upload dataset
dataset_cid = await client.upload_dataset("dataset.zip")
print(f"Dataset CID: {dataset_cid}")
# 2. Submit training job
job_id = await client.submit_job(
model_id="model-123",
dataset_cid=dataset_cid,
config={
"epochs": 10,
"batch_size": 32,
"learning_rate": 0.001,
"lora_rank": 8
}
)
print(f"Job ID: {job_id}")
# 3. Monitor progress (real-time via IPFS Pub/Sub)
async for update in client.subscribe_to_job_updates(job_id):
status = update.get('status')
progress = update.get('progress', 0.0) * 100
print(f"Status: {status}, Progress: {progress:.1f}%")
if status == "completed":
break
# 4. Download trained model
job = await client.get_job(job_id)
await client.download_model(
model_cid=job.get('model_cid'),
output_path="./trained_model.pt"
)
print("Model downloaded!")
asyncio.run(train_model())from atlas.serving.predictor import Predictor
from atlas.serving.loader import ModelLoader
# Load model from IPFS (with caching)
loader = ModelLoader(ipfs_api_url="/ip4/127.0.0.1/tcp/5001")
model_path = loader.load(model_cid="QmXXX...")
# Create predictor
predictor = Predictor(model_path=model_path)
# Run inference
result = predictor.predict(
input_data="Hello, how are you?",
model_type="llm", # or "vision", "speech", "embedding", "auto"
options={
"max_length": 100,
"temperature": 0.7
}
)
print(f"Result: {result}")# Via CLI
atlas serve-model model-123 --port 8000
# Or via Python
from atlas.serving.server import ModelServer
server = ModelServer(chain_client=chain_client)
await server.serve(model_id="model-123", host="0.0.0.0", port=8000)# Upload dataset
atlas upload-dataset dataset.zip --encrypt
# Submit training job
atlas submit-job model-123 QmXXX... --config config.json
# Monitor job
atlas get-job job-123
atlas list-jobs
# Download model
atlas download-model model-123 --output ./models/
# Serve model
atlas serve-model model-123 --port 8000
# Start daemon (REST API)
atlas daemon --port 8080- Go 1.21+
- Docker
- IPFS node
- GPU (optional, for faster training)
- Minimum 8GB RAM
- Minimum 100GB storage
# Install IPFS
wget https://dist.ipfs.io/go-ipfs/v0.20.0/go-ipfs_v0.20.0_linux-amd64.tar.gz
tar -xvzf go-ipfs_v0.20.0_linux-amd64.tar.gz
cd go-ipfs
sudo ./install.sh
# Initialize & start
ipfs init
ipfs daemon# Build node binary
cd node
go mod download
go build -o atlas-node cmd/node/main.go
# Run node
./atlas-node start \
--chain-rpc localhost:9090 \
--ipfs-api /ip4/127.0.0.1/tcp/5001 \
--node-id node-001 \
--address your_wallet_address# Register via blockchain CLI
atlasd tx compute register-node \
--node-id "node-001" \
--address "your_wallet_address" \
--from mykey \
--chain-id atlas-chain# Check node status
atlas-node status
# Check rewards
atlasd query reward node-rewards node-001
# Check reputation
atlasd query compute node-reputation node-001# Build image
docker build -t atlas-node:latest -f docker/Dockerfile.node .
# Run container
docker run -d \
--name atlas-node \
--gpus all \
-v /tmp/atlas-work:/work \
-e CHAIN_RPC_URL=localhost:9090 \
-e IPFS_API_URL=/ip4/127.0.0.1/tcp/5001 \
-e NODE_ID=node-001 \
atlas-node:latestjob = await client.submit_job(
model_id="model-123",
dataset_cid=dataset_cid,
config={
"training_type": "federated",
"min_clients": 5,
"rounds": 10,
"lora_enabled": True
}
)# Upload custom training script
script_cid = await client.upload_dataset("custom_train.py")
# Submit job with custom script
job = await client.submit_job(
model_id="model-123",
dataset_cid=dataset_cid,
config={
"training_script_cid": script_cid,
"epochs": 10
}
)# Upload with encryption
dataset_cid = await client.upload_dataset(
"private_dataset.zip",
encrypt=True,
private_network=True
)# LLM Inference
result = predictor.predict(
input_data="Explain quantum computing",
model_type="llm",
options={"max_length": 200, "temperature": 0.8}
)
# Vision Model
result = predictor.predict(
input_data="./image.jpg",
model_type="vision"
)
# Speech-to-Text
result = predictor.predict(
input_data="./audio.wav",
model_type="speech"
)
# Embedding Service
result = predictor.predict(
input_data="text to embed",
model_type="embedding"
)# Subscribe to job updates via IPFS Pub/Sub
async for update in client.subscribe_to_job_updates(job_id):
print(f"Status: {update.get('status')}")
print(f"Progress: {update.get('progress', 0.0) * 100:.1f}%")
print(f"Tasks: {update.get('tasks', [])}")
if update.get("status") == "completed":
break# Get job status
job = await client.get_job(job_id)
print(f"Status: {job.get('status')}")
print(f"Progress: {job.get('progress', 0.0) * 100:.1f}%")
# List all jobs
jobs = await client.list_jobs()
for job in jobs:
print(f"{job.get('id')}: {job.get('status')} - {job.get('progress', 0.0) * 100:.1f}%")# Start IPFS
ipfs daemon
# Start chain (in separate terminal)
cd chain
atlasd start
# Start node (in separate terminal)
cd node
./atlas-node start# Start all services
docker-compose up -d
# Check logs
docker-compose logs -f
# Stop services
docker-compose down# Deploy chain
kubectl apply -f k8s/chain-deployment.yaml
# Deploy IPFS
kubectl apply -f k8s/ipfs-deployment.yaml
# Deploy node
kubectl apply -f k8s/node-deployment.yaml- Encryption: Private datasets are encrypted before upload to IPFS
- Private Networks: Support for private IPFS networks
- Proof of Computation: Cryptographic proofs for task completion
- Validation: Content hash validation for shards
- Reputation System: Node reputation for trust management
- Authentication: API key and JWT support
- Authorization: RBAC (Role-Based Access Control)
# Client
export ATLAS_IPFS_API="/ip4/127.0.0.1/tcp/5001"
export ATLAS_CHAIN_GRPC="localhost:9090"
export ATLAS_CREATOR="your_wallet_address"
# Node
export CHAIN_RPC_URL="localhost:9090"
export IPFS_API_URL="/ip4/127.0.0.1/tcp/5001"
export NODE_ID="node-001"
export WORK_DIR="/tmp/atlas-work"# config.yaml
node:
id: "node-001"
address: "your_wallet_address"
chain_rpc_url: "localhost:9090"
ipfs_api_url: "/ip4/127.0.0.1/tcp/5001"
resources:
cpu_cores: 4
gpu_enabled: true
memory_gb: 16
storage_gb: 500
heartbeat:
interval_seconds: 30
timeout_seconds: 90- Client Usage Guide - Complete guide for using Atlas as a client
- Node Usage Guide - Guide for running compute nodes
- Architecture - System architecture overview
- Developer Guide - Guide for developers
- Chain - Blockchain modules and implementation
- Node - Compute node implementation
- Federated Learning - FL implementation
- LoRA - LoRA fine-tuning
- Storage - Storage layer
- SDK - Client SDK documentation
# Run unit tests
cd chain
go test ./x/... -v
cd node
go test ./... -v
# Run integration tests
cd tests/integration
pytest -v- IPFS not connecting: Check
ipfs idand ensure IPFS daemon is running - Chain gRPC error: Verify chain gRPC URL and port (default: 9090)
- Network timeout: Check firewall rules and network connectivity
- Job stuck: Check node status and resource availability
- Model download failed: Verify model CID in IPFS network
- Training error: Check task logs in work directory
- Node not receiving tasks: Verify node registration and status "online"
- Resource exhaustion: Check CPU/GPU/Memory utilization
- Heartbeat timeout: Verify network connectivity to blockchain
Contributions are welcome! Please read our contributing guidelines and submit pull requests.
MIT License - see LICENSE file for details
- Repository: GitHub
- Documentation: See Documentation section above
- Issues: GitHub Issues
- Cosmos SDK for blockchain framework
- IPFS for decentralized storage
- PyTorch and TensorFlow for ML frameworks
Made with β€οΈ by the Atlas Team