diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..efdf19c --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,282 @@ +# Architecture + +This document provides an overview of the Kubedoop Data Platform architecture, including its internal framework, built-in Operators, component dependencies, design principles, and data flow patterns. + +## Platform Architecture Overview + +Kubedoop is a Kubernetes-native DataOps platform that manages 15+ big data components +through a unified Operator framework. The platform leverages the Operator Lifecycle Manager +(OLM) for Operator installation and lifecycle management, running entirely on top of Kubernetes. + +```mermaid +graph TB + subgraph Users["User Layer"] + UI[Web UI / CLI] + Apps[Data Applications] + end + + subgraph Platform["Kubedoop Platform"] + OLM[Operator Lifecycle Manager] + subgraph Operators["Product Operators"] + OP1[Spark Operator] + OP2[Hive Operator] + OP3[Trino Operator] + OP4[Kafka Operator] + OP5[HDFS Operator] + OP6[... 8 more] + end + subgraph BuiltIn["Built-in Operators"] + CO[Commons Operator] + LO[Listener Operator] + SO[Secret Operator] + end + end + + subgraph K8s["Kubernetes Cluster"] + API[Kubernetes API Server] + PV[Persistent Volumes] + NET[Network Policies] + end + + Users --> Platform + OLM --> Operators + Operators --> BuiltIn + Operators --> K8s + BuiltIn --> K8s +``` + +## operator-go Framework + +All Kubedoop Operators are built on top of the **operator-go** framework, an in-house library that provides a unified abstraction for managing stateful data infrastructure on Kubernetes. + +### Unified CRD Abstraction + +The operator-go framework introduces a consistent CRD model across all Operators: + +- **Cluster**: The top-level resource representing a full component deployment +- **Roles**: Logical groupings of processes with the same responsibility (e.g., NameNode, DataNode) +- **Role Groups**: Multiple instances of a role, allowing differentiated configurations for high availability, resource isolation, or workload separation + +```yaml +apiVersion: {group}.kubedoop.dev/v1alpha1 +kind: {ClusterKind} +metadata: + name: my-cluster +spec: + roleA: + config: # Role-level config + resources: + cpu: { min: "1" } + roleGroups: + group-1: # Role group with default config + replicas: 3 + group-2: # Role group with overridden config + replicas: 2 + config: + resources: + cpu: { min: "2" } +``` + +### Lifecycle Management + +The operator-go framework handles the full lifecycle of component deployments: + +| Phase | Description | +|-------|-------------| +| **Creation** | Deploys StatefulSets, Services, ConfigMaps, and Secrets based on CRD specs | +| **Scaling** | Adjusts replica counts for role groups without disrupting existing pods | +| **Upgrading** | Performs rolling upgrades across role groups with configurable maxUnavailable | +| **Failure Recovery** | Automatically restarts failed pods and reconciles desired vs. actual state | +| **Configuration Updates** | Applies config changes with graceful rolling restarts | + +> Source code: [operator-go on GitHub](https://github.com/zncdatadev/operator-go) + +## Built-in Operators + +Kubedoop includes three built-in Operators that provide cross-cutting functionality shared by all product Operators: + +```mermaid +graph LR + subgraph ProductOps["Product Operators"] + PO1[Spark Operator] + PO2[Hive Operator] + PO3[Trino Operator] + end + + subgraph BuiltInOps["Built-in Operators"] + CO["Commons Operator
Environment variables
JVM parameters
Pod templates"] + LO["Listener Operator
Service / Ingress
TLS certificates
Service discovery"] + SO["Secret Operator
Password injection
Certificate mounting
Credential rotation"] + end + + PO1 --> CO + PO1 --> LO + PO1 --> SO + PO2 --> CO + PO2 --> LO + PO2 --> SO + PO3 --> CO + PO3 --> LO + PO3 --> SO +``` + +### Commons Operator + +The Commons Operator manages shared configuration that applies across all product Operators: + +- **Environment variables**: Injects common environment variables into component pods +- **JVM parameters**: Configures JVM heap size, GC settings, and other Java runtime options +- **Pod templates**: Provides a base Pod template (annotations, labels, affinity) that product Operators extend + +### Listener Operator + +The Listener Operator provides automated service discovery and network configuration: + +- **Service / Ingress generation**: Automatically creates Kubernetes Services and Ingress resources based on listener definitions +- **TLS certificate management**: Provisions and rotates TLS certificates for encrypted communication +- **Service discovery**: Enables components to discover each other through DNS and built-in service resolution + +### Secret Operator + +The Secret Operator handles secure credential management: + +- **Password injection**: Automatically generates and injects passwords into component pods as environment variables or files +- **Certificate mounting**: Mounts TLS certificates and keys into pods from centralized Secret resources +- **Credential rotation**: Supports periodic rotation of credentials without manual intervention + +## Component Dependencies + +The following diagram shows the dependency relationships between Kubedoop product Operators: + +```mermaid +graph TD + ZK["Zookeeper Operator"] + + HDFS["HDFS Operator"] + DB["Database
(External)"] + + Hive["Hive Operator"] + Trino["Trino Operator"] + Spark["Spark Operator"] + Kafka["Kafka Operator"] + Superset["Superset Operator"] + Doris["Doris Operator"] + HBase["HBase Operator"] + Kyuubi["Kyuubi Operator"] + NiFi["NiFi Operator"] + Airflow["Airflow Operator"] + DS["DolphinScheduler Operator"] + + HDFS --> ZK + Hive --> ZK + Hive --> HDFS + Hive --> DB + Trino --> ZK + Trino --> HDFS + Trino --> Hive + Spark --> HDFS + Spark --> Hive + Kafka --> ZK + Superset --> DB + Doris --> ZK + HBase --> ZK + HBase --> HDFS + Kyuubi --> HDFS + Kyuubi --> Hive + NiFi --> ZK + NiFi --> HDFS + Airflow --> DB + DS --> ZK + DS --> DB +``` + +| Operator | Dependencies | +|----------|-------------| +| Zookeeper | None (foundational service) | +| HDFS | Zookeeper | +| Hive | Zookeeper, HDFS, Database | +| Trino | Zookeeper, HDFS, Hive | +| Spark | HDFS, Hive | +| Kafka | Zookeeper | +| Superset | Database | +| Doris | Zookeeper | +| HBase | Zookeeper, HDFS | +| Kyuubi | HDFS, Hive | +| NiFi | Zookeeper, HDFS | +| Airflow | Database | +| DolphinScheduler | Zookeeper, Database | + +## Design Principles + +Kubedoop is built on the following core design principles: + +### Kubernetes Native + +All components are managed through Kubernetes Custom Resource Definitions (CRDs) +and Operators. There are no custom orchestration layers — the platform relies entirely +on the Kubernetes API for state management, scheduling, and self-healing. + +### Declarative Configuration + +Users describe the *desired state* of their data infrastructure through YAML manifests. +The Operators continuously reconcile the actual state with the desired state, +ensuring consistency without manual intervention. + +### Pluggable Storage + +Storage is abstracted through Kubernetes StorageClass, allowing users to choose the +underlying storage backend (SSD, HDD, NFS, cloud storage) without changing their +component configuration. This enables flexible deployment across different environments. + +### Unified Security Model + +All Operators share a consistent security model through the built-in Secret Operator +and Listener Operator. TLS encryption, authentication, and credential management +are handled uniformly across all components. + +### Observability + +Kubedoop provides built-in observability for all managed components: + +- **Logging**: Centralized log collection and management +- **Metrics**: Exposed through Prometheus-compatible endpoints +- **Alerting**: Integration with alerting systems for proactive monitoring + +## Data Flow Example + +The following sequence diagram illustrates the data flow when a user submits a SQL query through Trino to read data from Hive: + +```mermaid +sequenceDiagram + participant User + participant Trino as Trino Coordinator + participant TrinoW as Trino Worker + participant Hive as Hive Metastore + participant HDFS as HDFS NameNode + participant HDFSd as HDFS DataNode + + User->>Trino: Submit SQL query (SELECT * FROM hive_table) + Trino->>Hive: Fetch table metadata (schema, location, format) + Hive-->>Trino: Return table metadata + + Trino->>HDFS: Request file blocks from NameNode + HDFS-->>Trino: Return block locations + + Trino->>TrinoW: Split query into tasks and assign to workers + + loop For each data block + TrinoW->>HDFSd: Read data blocks + HDFSd-->>TrinoW: Return data + end + + TrinoW->>Trino: Return processed results + Trino-->>User: Return query results +``` + +This flow demonstrates how Kubedoop's component Operators work together: + +1. **Trino** receives the query and coordinates execution +2. **Hive Metastore** provides table schema and data location metadata +3. **HDFS NameNode** manages the file system namespace and block locations +4. **HDFS DataNodes** serve the actual data blocks to Trino Workers +5. **Trino Workers** process the data in parallel and return results diff --git a/docs/operators/_template.md b/docs/operators/_template.md new file mode 100644 index 0000000..5981457 --- /dev/null +++ b/docs/operators/_template.md @@ -0,0 +1,267 @@ +# {Operator Name} + +> {A one-line description of the component this Operator manages and its role in the data platform.} + +## Overview + +{Brief introduction: what this component is, what problems it solves, and typical use cases. 2-3 paragraphs.} + +{Paragraph 1: What is this component? Describe its core functionality and position in the data ecosystem.} + +{Paragraph 2: What problems does it solve? What pain points does it address for users?} + +{Paragraph 3: Typical use cases and scenarios where this component shines.} + +## Prerequisites + +- Kubernetes {version}+ +- kubectl {version}+ +- {Other dependencies — e.g., if this component depends on HDFS, list HDFS Operator here} +- {Operator Lifecycle Manager (OLM) installed — see [Quick Start](../quick-start/installation.md)} + +## Quick Start + +### Install the Operator + +Install the Operator via an OLM Subscription: + +```yaml +apiVersion: operators.coreos.com/v1alpha1 +kind: Subscription +metadata: + name: {operator-name} + namespace: operators +spec: + channel: stable + name: {operator-name} + source: operatorhubio-catalog + sourceNamespace: olm +``` + +```bash +kubectl apply -f - < For more details, see [Roles and Role Groups](../core-concepts/common-configuration-mechanisms/roles-and-role-groups.md). + +### Configurations + +{List configurable parameters with descriptions.} + +| Parameter | Description | Default | +|-----------|-------------|---------| +| {param-1} | {Description of the parameter} | {default-value} | +| {param-2} | {Description of the parameter} | {default-value} | + +Configuration can be set at the role level or overridden at the role group level. + +> For more details, see [Overrides](../core-concepts/common-configuration-mechanisms/overrides.md). + +### Listeners and Services + +{If this Operator integrates with the Listener Operator for service discovery, explain the configuration.} + +{Describe which listeners are available (e.g., internal, external) and how to configure them.} + +```yaml +spec: + {roleName}: + config: + listeners: + {listener-name}: + type: {internal|external} + # {Additional listener configuration} +``` + +> For more details, see [Service Discovery](../core-concepts/connectivity/service-discovery.md). + +### Dependencies + +{List the dependencies this Operator has on other components and how to configure them.} + +| Dependency | Required | Description | +|-----------|----------|-------------| +| {dep-1} | Yes | {Why this dependency is needed} | +| {dep-2} | No | {Optional dependency description} | + +## Advanced + +### Resource Management + +{Describe how to configure CPU, memory, and storage resources for this Operator's roles.} + +```yaml +spec: + {roleName}: + config: + resources: + cpu: + min: "1" + max: "2" + memory: + limit: "4Gi" + storage: + {volume-name}: + capacity: 100Gi +``` + +> For more details, see [Resource Management](../core-concepts/resources/resource-manage.md). + +### Pod Placement + +{Describe how to control pod scheduling using affinity, tolerations, and node selectors.} + +> For more details, see [Pod Placement](../core-concepts/operations/pod-placement.md). + +### Authentication and Security + +{Describe security-related configuration such as TLS, Kerberos, or internal authentication.} + +> For more details, see [Authentication](../core-concepts/security/authentication.md). + +### Logging + +{Describe how to configure and access logs for this component.} + +> For more details, see [Logging](../core-concepts/observability/logging.md). + +## Troubleshooting + +{List common issues and their resolutions specific to this Operator.} + +### Common Issues + +1. **{Issue title}** + - **Symptom**: {What the user sees} + - **Cause**: {Why this happens} + - **Resolution**: {Steps to fix} + +2. **{Issue title}** + - **Symptom**: {What the user sees} + - **Cause**: {Why this happens} + - **Resolution**: {Steps to fix} + +> For common issues across all Operators, see [Troubleshooting](../troubleshooting). + +## Clean Up + +Delete the {Component} cluster: + +```bash +kubectl delete {clusterkind} {cluster-name} -n {operator-name} +``` + +Delete the namespace: + +```bash +kubectl delete ns {operator-name} +``` + +Uninstall the Operator by removing the Subscription: + +```bash +kubectl delete subscription {operator-name} -n operators +``` + +## Related Links + +- [{Component} Official Documentation]({upstream-url}) +- [Kubedoop Operator for {Component} on GitHub]({github-url}) +- [{Component} on GitHub]({component-github-url}) diff --git a/sidebars.ts b/sidebars.ts index a3b992c..cbc5078 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -21,6 +21,7 @@ const sidebars: SidebarsConfig = { label: 'Quick Start', items: [ 'introduction', + 'architecture', 'quick-start/installation', ], },