Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
File renamed without changes.
165 changes: 165 additions & 0 deletions docs/defradb/Concepts/peer-to-peer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Peer-to-peer networking

## Overview

Peer-to-peer (P2P) networking is a way for devices or peers to communicate directly without going through a central server. Every peer is equal—both can send and receive data.

DefraDB is a decentralized database built on this idea. Instead of the traditional client-server setup, DefraDB uses P2P networking so apps can sync data locally and share information without relying on a trusted middleman. This supports a decentralized, private, and user-focused approach.

:::tip[Key Points]

DefraDB leverages P2P networking via libp2p to synchronize data directly between distributed nodes, enabling **offline-first applications without a central server**.

**Key capabilities:**

- **Passive replication** – Automatic broadcasting of updates via PubSub (similar to UDP)
- **Active replication** – Direct, point-to-point synchronization between specific nodes (similar to TCP)
- **NAT traversal** – Circuit relays and hole punching to connect nodes behind firewalls
- **Resilient synchronization** – Updates queue offline and sync automatically when connectivity returns

DefraDB stores documents as update graphs (similar to Git) using IPLD blocks distributed across nodes.

:::

## Key concepts

### Libp2p networking framework

Libp2p is a modular, decentralized networking framework created by Protocol Labs for IPFS (InterPlanetary File System). It handles transport, security, peer routing, and content discovery.

DefraDB uses libp2p to let peers talk to each other directly, replicate documents, and manage updates—similar to how Git tracks and merges changes.

**Note**: See [LibP2P documentation](https://docs.libp2p.io/concepts/introduction/overview/#why-libp2p) for more information.

### Documents and collections

- **Document**: A single record with multiple fields, bound by a schema. Similar to a row in an SQL table.
- **Collection**: A group of documents that share the same schema. Similar to an SQL table.

### Why DefraDB needs P2P networking

DefraDB stores documents and [InterPlanetary Linked Data](https://ipld.io/) (IPLD) blocks across multiple nodes, sometimes spread across the globe. P2P networking keeps them in sync whether they're on the same device, on different devices owned by the same user, or shared with collaborators—all without depending on a central server.

### Replication modes

DefraDB supports two replication modes:

**Passive replication**: Automatically broadcasts updates over a global PubSub network (similar to UDP), great for quick sharing with minimal coordination.

**Active replication**: Creates a direct link to a chosen peer (similar to TCP), ensuring updates are delivered reliably to that node.

### How DefraDB implements P2P

- **PubSub**: Nodes can publish and subscribe to topics. Each document gets its own topic for passive replication.
- **Granularity**: Passive replication focuses on individual documents. Active replication can handle whole collections or just selected items.

## Benefits of P2P in DefraDB

DefraDB's P2P architecture provides several key advantages:

- **Resilience**: Keeps working during network outages. Changes are queued and synced later.
- **Trustless operation**: Works without needing to trust a central server.
- **Global collaboration**: Lets developers collaborate across the globe without built-in restrictions.
- **Advanced networking**: Leverages libp2p's features for discovery and NAT traversal.

### NAT traversal solutions

Connecting to a server in a data center is straightforward—each server has its own IP address. However, home networks present a challenge: a single IP address for the modem and multiple devices protected by a NAT firewall make direct connections difficult. Libp2p offers two solutions:

**Circuit Relays**: A third-party node acts as an intermediary to resolve the NAT firewall issue. Both peers connect to this publicly accessible relay node, which serves as a conduit. While this requires trust in the relay node to properly forward information, connections operate over encrypted transport layers, preventing the relay from intercepting data. The relay must remain online and accessible for this to work.

**NAT Hole Punching**: A technique that allows nodes to connect directly to a device behind a NAT firewall, enabling direct peer connections without a trusted intermediary.

## Passive replication

Passive replication is your "set it and forget it" mode. Once it's on, it quietly keeps things in sync without extra effort.

### How passive replication works

- **Automatic activation**: Starts automatically when P2P is enabled.
- **Document-level topics**: Each document has its own PubSub topic.
- **Targeted updates**: Only peers subscribed to that topic receive the changes.
- **Self-organizing**: Nodes find and connect to the right peers on their own.

### When nodes miss updates

In passive replication mode, the most recent update is broadcast through the network using a Merkle DAG (directed acyclic graph). The broadcasting node doesn't verify that receiving nodes have all previous updates—that's the responsibility of the receiving node.

If a node misses updates and then receives a new one, it must synchronize all previous updates before considering the document current. This is necessary because DefraDB's internal data model is based on all changes over time, not just the most recent change.

When broadcasting the most recent update, it's sent over the PubSub network. However, if a node needs to retrieve previous updates by traversing back through the Merkle DAG, it uses the Distributed Hash Table (DHT) instead.

### Use cases for passive replication

Choose passive replication when you:

- Want automatic syncing without managing connections
- Need updates sent to anyone subscribed to a document
- Prefer a low-maintenance option for collaborative environments with many peers

## Active replication

Active replication is like having a dedicated delivery route between you and a specific peer, ensuring that every update reaches them directly.

### How active replication works

- **Direct peer selection**: Choose exactly who you want to sync with by picking a peer and setting up a direct connection.
- **Real-time updates**: Updates are pushed instantly without waiting for network-wide broadcasts.
- **Reliable delivery**: Ideal for important data, making it a great choice when syncing with archival nodes or trusted partners.
- **Flexible granularity**: Allows you to replicate an entire collection or only specific parts you want.

### Use cases for active replication

Choose active replication when you:

- Need a direct, reliable link to a specific peer
- Want real-time updates with no delays
- Need full control over which collections or documents are shared
- Are syncing with archival nodes or specific collaborators

## Peer IDs and addressing

### Peer ID

When DefraDB starts, it creates a Peer ID—a unique identifier based on a private key generated during the first startup. This Peer ID is essential for various parts of the P2P networking system.

### Multi-address format

A node automatically listens on multiple addresses or ports when the P2P module is instantiated. These are expressed as multi-addresses—strings that represent network addresses and include information about transport protocols and multiple network stack layers.

Format:

```bash
/ip4/<ip_address>/tcp/<port>/p2p/<peer_id>
```

Example:

```bash
/ip4/0.0.0.0/tcp/9171/p2p/12D3KooWEFCQ1iGMobsmNTPXb758kJkFc7XieQyGKpsuMxeDktz4
```

By default, DefraDB listens on P2P port `9171`.

## Current limitations and future development

### Scalability considerations

**Document topic overhead**: Having every document with its own independent topic can create overhead with thousands or millions of documents. The team is exploring aggregate topics scoped to subnets (group-specific or application-specific).

**Multi-hop between subnets**: Currently, synchronizing between subnets requires going through the global network, requiring multiple hops. The team is exploring multi-hop mechanisms to address this.

**Bitswap and DHT scalability**: Current limitations are being addressed through:

- **PubSub-based query system**: Allows queries and updates through the global PubSub network using query topics independent of document topics.
- **GraphSync**: A Protocol Labs protocol that may resolve Bitswap algorithm and DHT issues.

### Future improvements

**Head Exchange protocol**: A new protocol in development to address issues with syncing the Merkle DAG when updates have been missed or concurrent, diverged updates have been made. It aims to efficiently:

- Establish the most recent update seen by each node
- Determine if there are divergent updates
- Find the most efficient way to synchronize nodes with minimal communication

**Replicator persistence**: Currently, replicators don't persist through node updates or restarts—they must be re-added after each restart. This will be resolved in a future release.
5 changes: 5 additions & 0 deletions docs/defradb/How-to Guides/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"label": "How-to Guides",
"position": 2
}

226 changes: 226 additions & 0 deletions docs/defradb/How-to Guides/peer-to-peer-how-to.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
---
sidebar_label: Peer-to-Peer How-to Guide
sidebar_position: 10
---

# Peer-to-peer how-to guides

This guide provides step-by-step instructions for configuring and managing peer-to-peer networking in DefraDB.

## Prerequisites

Before following these guides, ensure you have:

- DefraDB installed on your system
- Basic familiarity with command-line interfaces
- Understanding of [P2P networking concepts](/defradb/Concepts/peer-to-peer.md)

## Start and configure DefraDB

### Start DefraDB with P2P enabled (default)

P2P networking is enabled by default when you start DefraDB:

```bash
defradb start
```

You'll see output similar to:

```bash
Jan 2 10:15:49.124 INF cli Starting DefraDB
Jan 2 10:15:49.161 INF net Created LibP2P host PeerId=12D3KooWEFCQ1iGMobsmNTPXb758kJkFc7XieQyGKpsuMxeDktz4 Address=[/ip4/127.0.0.1/tcp/9171]
Jan 2 10:15:49.162 INF net Starting internal broadcaster for pubsub network
```

### Start DefraDB without P2P

To disable P2P networking:

```bash
defradb start --no-p2p
```

### Change the P2P port

By default, DefraDB listens on port `9171`. To use a different port:

```bash
defradb start --p2paddr /ip4/<ip_address>/tcp/<port>
```

Example:

```bash
defradb start --p2paddr /ip4/0.0.0.0/tcp/9172
```

**Parameters**:

- Replace `<ip_address>` with your actual IP address (use `0.0.0.0` to listen on all interfaces)
- Replace `<port>` with your desired port number

## Manage Peer IDs

### Get your Peer ID

To retrieve your node's Peer ID using HTTP:

```bash
curl -H "Accept: application/json" http://localhost:9181/api/p2p/info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with defradb client p2p info

```

The Peer ID is generated from a private key created during the first startup and remains consistent across restarts.

## Connect to peers

### Connect to a specific peer

To connect to a particular peer when starting DefraDB:

```bash
defradb start --peers /ip4/<ip_address>/tcp/<port>/p2p/<peer_id>
```

Example:

```bash
defradb start --peers /ip4/192.168.1.100/tcp/9171/p2p/12D3KooWEFCQ1iGMobsmNTPXb758kJkFc7XieQyGKpsuMxeDktz4
```

**Parameters**:

- Replace `<ip_address>` with the peer's IP address
- Replace `<port>` with the peer's P2P port
- Replace `<peer_id>` with the peer's Peer ID

## Manage document subscriptions (passive replication)

Passive replication works at the document level. Subscribe to specific documents to receive updates automatically.

### Subscribe to document updates

```bash
defradb client p2p document add <docID>
```

Example:

```bash
defradb client p2p document add bafybeihz5k3c2jzx7m4x5v6p7q8r9s0t1u2v3w4x5y6z7a8b9c0d1e2f3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bafybeihz5k3c2jzx7m4x5v6p7q8r9s0t1u2v3w4x5y6z7a8b9c0d1e2f3 is a CID formatted string.
You can replace with bae-619ea0d2-35ba-5e8c-ac4d-2b769937213b

```

### Unsubscribe from document updates

```bash
defradb client p2p document remove <docID>
```

### View all active document subscriptions

```bash
defradb client p2p document getall
```

## Manage collection subscriptions (active replication)

Active replication can work at the collection level, allowing you to replicate entire collections to specific peers.

### Subscribe to collection updates

```bash
defradb client p2p collection add <collectionID>
```

### Unsubscribe from collection updates

```bash
defradb client p2p collection remove <collectionID>
```

### View all active collection subscriptions

```bash
defradb client p2p collection getall
```

## Enable active replication

Active replication creates a direct, persistent connection to a specific peer for reliable data synchronization.

### Add a replicator using HTTP

```bash
curl -X POST http://localhost:9181/api/p2p/replicators \
-H "Content-Type: application/json" \
-d '{
"Info": {
"ID": "<peer_id>",
"Addrs": ["<peer_address>"]
},
"Collections": ["<collection_name>"]
}'
```

Example:

```bash
curl -X POST http://localhost:9181/api/p2p/replicators \
-H "Content-Type: application/json" \
-d '{
"Info": {
"ID": "12D3KooWEFCQ1iGMobsmNTPXb758kJkFc7XieQyGKpsuMxeDktz4",
"Addrs": ["/ip4/192.168.1.100/tcp/9171"]
},
"Collections": ["Books"]
}'
```
Comment on lines +153 to +177
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can omit these curl methods for now. In the future, we can have every snippet have their curl vs CLI variation


**Parameters**:

- `ID`: The Peer ID of the node you want to replicate to
- `Addrs`: Array of multi-addresses for the peer
- `Collections`: Array of collection names to replicate (e.g., `["Books"]`)

### Add a replicator using CLI

```bash
defradb client p2p replicator set -c <collection_name> <peer_id>
```

Example:

```bash
defradb client p2p replicator set -c Books 12D3KooWEFCQ1iGMobsmNTPXb758kJkFc7XieQyGKpsuMxeDktz4
```

**Note**: Currently, replicators don't persist through node restarts. You'll need to re-add them after each restart. This limitation will be addressed in a future release.

## Troubleshooting

### Verify P2P is running

Check the startup logs for confirmation that the LibP2P host was created and the P2P network is active:

```
INF net Created LibP2P host PeerId=... Address=[...]
INF net Starting internal broadcaster for pubsub network
```

### Check current peer connections

Use the P2P info endpoint to see your current peer connections:

```bash
curl http://localhost:9181/api/p2p/info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note above about this CLI example

```

### Connection issues within home networks

If peers can't connect within the same home Wi-Fi network, this is typically due to NAT firewall restrictions. Consider:

1. Using circuit relays (a publicly accessible third-party node as an intermediary)
2. Configuring NAT hole punching
3. Connecting peers through the internet rather than the local network

See the [P2P Conceptual](/defradb/Concepts/peer-to-peer.md) page for more information on NAT traversal.
Loading