From 9e1f73735b4cfe8388e96c99c38e5cc06bd8a0e9 Mon Sep 17 00:00:00 2001 From: whg517 Date: Sat, 2 May 2026 13:47:12 +0800 Subject: [PATCH 1/2] docs: add operator documentation template and architecture overview - DOC-001: Add docs/operators/_template.md with standardized sections (Overview, Prerequisites, Quick Start, Configuration, Advanced, Troubleshooting) - DOC-002: Add docs/architecture.md with platform architecture, operator-go framework, built-in operators, component dependencies, and data flow examples - Add architecture.md to Quick Start sidebar --- docs/architecture.md | 272 ++++++++++++++++++++++++++++++++++++ docs/operators/_template.md | 267 +++++++++++++++++++++++++++++++++++ sidebars.ts | 1 + 3 files changed, 540 insertions(+) create mode 100644 docs/architecture.md create mode 100644 docs/operators/_template.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..ef167eb --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,272 @@ +# Architecture + +This document provides an overview of the Kubedoop Data Platform architecture, including its internal framework, built-in Operators, component dependencies, design principles, and data flow patterns. + +## Platform Architecture Overview + +Kubedoop is a Kubernetes-native DataOps platform that manages 15+ big data components through a unified Operator framework. The platform leverages the Operator Lifecycle Manager (OLM) for Operator installation and lifecycle management, running entirely on top of Kubernetes. + +```mermaid +graph TB + subgraph Users["User Layer"] + UI[Web UI / CLI] + Apps[Data Applications] + end + + subgraph Platform["Kubedoop Platform"] + OLM[Operator Lifecycle Manager] + subgraph Operators["Product Operators"] + OP1[Spark Operator] + OP2[Hive Operator] + OP3[Trino Operator] + OP4[Kafka Operator] + OP5[HDFS Operator] + OP6[... 8 more] + end + subgraph BuiltIn["Built-in Operators"] + CO[Commons Operator] + LO[Listener Operator] + SO[Secret Operator] + end + end + + subgraph K8s["Kubernetes Cluster"] + API[Kubernetes API Server] + PV[Persistent Volumes] + NET[Network Policies] + end + + Users --> Platform + OLM --> Operators + Operators --> BuiltIn + Operators --> K8s + BuiltIn --> K8s +``` + +## operator-go Framework + +All Kubedoop Operators are built on top of the **operator-go** framework, an in-house library that provides a unified abstraction for managing stateful data infrastructure on Kubernetes. + +### Unified CRD Abstraction + +The operator-go framework introduces a consistent CRD model across all Operators: + +- **Cluster**: The top-level resource representing a full component deployment +- **Roles**: Logical groupings of processes with the same responsibility (e.g., NameNode, DataNode) +- **Role Groups**: Multiple instances of a role, allowing differentiated configurations for high availability, resource isolation, or workload separation + +```yaml +apiVersion: {group}.kubedoop.dev/v1alpha1 +kind: {ClusterKind} +metadata: + name: my-cluster +spec: + roleA: + config: # Role-level config + resources: + cpu: { min: "1" } + roleGroups: + group-1: # Role group with default config + replicas: 3 + group-2: # Role group with overridden config + replicas: 2 + config: + resources: + cpu: { min: "2" } +``` + +### Lifecycle Management + +The operator-go framework handles the full lifecycle of component deployments: + +| Phase | Description | +|-------|-------------| +| **Creation** | Deploys StatefulSets, Services, ConfigMaps, and Secrets based on CRD specs | +| **Scaling** | Adjusts replica counts for role groups without disrupting existing pods | +| **Upgrading** | Performs rolling upgrades across role groups with configurable maxUnavailable | +| **Failure Recovery** | Automatically restarts failed pods and reconciles desired vs. actual state | +| **Configuration Updates** | Applies config changes with graceful rolling restarts | + +> Source code: [operator-go on GitHub](https://github.com/zncdatadev/operator-go) + +## Built-in Operators + +Kubedoop includes three built-in Operators that provide cross-cutting functionality shared by all product Operators: + +```mermaid +graph LR + subgraph ProductOps["Product Operators"] + PO1[Spark Operator] + PO2[Hive Operator] + PO3[Trino Operator] + end + + subgraph BuiltInOps["Built-in Operators"] + CO["Commons Operator
Environment variables
JVM parameters
Pod templates"] + LO["Listener Operator
Service / Ingress
TLS certificates
Service discovery"] + SO["Secret Operator
Password injection
Certificate mounting
Credential rotation"] + end + + PO1 --> CO + PO1 --> LO + PO1 --> SO + PO2 --> CO + PO2 --> LO + PO2 --> SO + PO3 --> CO + PO3 --> LO + PO3 --> SO +``` + +### Commons Operator + +The Commons Operator manages shared configuration that applies across all product Operators: + +- **Environment variables**: Injects common environment variables into component pods +- **JVM parameters**: Configures JVM heap size, GC settings, and other Java runtime options +- **Pod templates**: Provides a base Pod template (annotations, labels, affinity) that product Operators extend + +### Listener Operator + +The Listener Operator provides automated service discovery and network configuration: + +- **Service / Ingress generation**: Automatically creates Kubernetes Services and Ingress resources based on listener definitions +- **TLS certificate management**: Provisions and rotates TLS certificates for encrypted communication +- **Service discovery**: Enables components to discover each other through DNS and built-in service resolution + +### Secret Operator + +The Secret Operator handles secure credential management: + +- **Password injection**: Automatically generates and injects passwords into component pods as environment variables or files +- **Certificate mounting**: Mounts TLS certificates and keys into pods from centralized Secret resources +- **Credential rotation**: Supports periodic rotation of credentials without manual intervention + +## Component Dependencies + +The following diagram shows the dependency relationships between Kubedoop product Operators: + +```mermaid +graph TD + ZK["Zookeeper Operator"] + + HDFS["HDFS Operator"] + DB["Database
(External)"] + + Hive["Hive Operator"] + Trino["Trino Operator"] + Spark["Spark Operator"] + Kafka["Kafka Operator"] + Superset["Superset Operator"] + Doris["Doris Operator"] + HBase["HBase Operator"] + Kyuubi["Kyuubi Operator"] + NiFi["NiFi Operator"] + Airflow["Airflow Operator"] + DS["DolphinScheduler Operator"] + + HDFS --> ZK + Hive --> ZK + Hive --> HDFS + Hive --> DB + Trino --> ZK + Trino --> HDFS + Trino --> Hive + Spark --> HDFS + Spark --> Hive + Kafka --> ZK + Superset --> DB + Doris --> ZK + HBase --> ZK + HBase --> HDFS + Kyuubi --> HDFS + Kyuubi --> Hive + NiFi --> ZK + NiFi --> HDFS + Airflow --> DB + DS --> ZK + DS --> DB +``` + +| Operator | Dependencies | +|----------|-------------| +| Zookeeper | None (foundational service) | +| HDFS | Zookeeper | +| Hive | Zookeeper, HDFS, Database | +| Trino | Zookeeper, HDFS, Hive | +| Spark | HDFS, Hive | +| Kafka | Zookeeper | +| Superset | Database | +| Doris | Zookeeper | +| HBase | Zookeeper, HDFS | +| Kyuubi | HDFS, Hive | +| NiFi | Zookeeper, HDFS | +| Airflow | Database | +| DolphinScheduler | Zookeeper, Database | + +## Design Principles + +Kubedoop is built on the following core design principles: + +### Kubernetes Native + +All components are managed through Kubernetes Custom Resource Definitions (CRDs) and Operators. There are no custom orchestration layers — the platform relies entirely on the Kubernetes API for state management, scheduling, and self-healing. + +### Declarative Configuration + +Users describe the *desired state* of their data infrastructure through YAML manifests. The Operators continuously reconcile the actual state with the desired state, ensuring consistency without manual intervention. + +### Pluggable Storage + +Storage is abstracted through Kubernetes StorageClass, allowing users to choose the underlying storage backend (SSD, HDD, NFS, cloud storage) without changing their component configuration. This enables flexible deployment across different environments. + +### Unified Security Model + +All Operators share a consistent security model through the built-in Secret Operator and Listener Operator. TLS encryption, authentication, and credential management are handled uniformly across all components. + +### Observability + +Kubedoop provides built-in observability for all managed components: + +- **Logging**: Centralized log collection and management +- **Metrics**: Exposed through Prometheus-compatible endpoints +- **Alerting**: Integration with alerting systems for proactive monitoring + +## Data Flow Example + +The following sequence diagram illustrates the data flow when a user submits a SQL query through Trino to read data from Hive: + +```mermaid +sequenceDiagram + participant User + participant Trino as Trino Coordinator + participant TrinoW as Trino Worker + participant Hive as Hive Metastore + participant HDFS as HDFS NameNode + participant HDFSd as HDFS DataNode + + User->>Trino: Submit SQL query (SELECT * FROM hive_table) + Trino->>Hive: Fetch table metadata (schema, location, format) + Hive-->>Trino: Return table metadata + + Trino->>HDFS: Request file blocks from NameNode + HDFS-->>Trino: Return block locations + + Trino->>TrinoW: Split query into tasks and assign to workers + + loop For each data block + TrinoW->>HDFSd: Read data blocks + HDFSd-->>TrinoW: Return data + end + + TrinoW->>Trino: Return processed results + Trino-->>User: Return query results +``` + +This flow demonstrates how Kubedoop's component Operators work together: + +1. **Trino** receives the query and coordinates execution +2. **Hive Metastore** provides table schema and data location metadata +3. **HDFS NameNode** manages the file system namespace and block locations +4. **HDFS DataNodes** serve the actual data blocks to Trino Workers +5. **Trino Workers** process the data in parallel and return results diff --git a/docs/operators/_template.md b/docs/operators/_template.md new file mode 100644 index 0000000..5981457 --- /dev/null +++ b/docs/operators/_template.md @@ -0,0 +1,267 @@ +# {Operator Name} + +> {A one-line description of the component this Operator manages and its role in the data platform.} + +## Overview + +{Brief introduction: what this component is, what problems it solves, and typical use cases. 2-3 paragraphs.} + +{Paragraph 1: What is this component? Describe its core functionality and position in the data ecosystem.} + +{Paragraph 2: What problems does it solve? What pain points does it address for users?} + +{Paragraph 3: Typical use cases and scenarios where this component shines.} + +## Prerequisites + +- Kubernetes {version}+ +- kubectl {version}+ +- {Other dependencies — e.g., if this component depends on HDFS, list HDFS Operator here} +- {Operator Lifecycle Manager (OLM) installed — see [Quick Start](../quick-start/installation.md)} + +## Quick Start + +### Install the Operator + +Install the Operator via an OLM Subscription: + +```yaml +apiVersion: operators.coreos.com/v1alpha1 +kind: Subscription +metadata: + name: {operator-name} + namespace: operators +spec: + channel: stable + name: {operator-name} + source: operatorhubio-catalog + sourceNamespace: olm +``` + +```bash +kubectl apply -f - < For more details, see [Roles and Role Groups](../core-concepts/common-configuration-mechanisms/roles-and-role-groups.md). + +### Configurations + +{List configurable parameters with descriptions.} + +| Parameter | Description | Default | +|-----------|-------------|---------| +| {param-1} | {Description of the parameter} | {default-value} | +| {param-2} | {Description of the parameter} | {default-value} | + +Configuration can be set at the role level or overridden at the role group level. + +> For more details, see [Overrides](../core-concepts/common-configuration-mechanisms/overrides.md). + +### Listeners and Services + +{If this Operator integrates with the Listener Operator for service discovery, explain the configuration.} + +{Describe which listeners are available (e.g., internal, external) and how to configure them.} + +```yaml +spec: + {roleName}: + config: + listeners: + {listener-name}: + type: {internal|external} + # {Additional listener configuration} +``` + +> For more details, see [Service Discovery](../core-concepts/connectivity/service-discovery.md). + +### Dependencies + +{List the dependencies this Operator has on other components and how to configure them.} + +| Dependency | Required | Description | +|-----------|----------|-------------| +| {dep-1} | Yes | {Why this dependency is needed} | +| {dep-2} | No | {Optional dependency description} | + +## Advanced + +### Resource Management + +{Describe how to configure CPU, memory, and storage resources for this Operator's roles.} + +```yaml +spec: + {roleName}: + config: + resources: + cpu: + min: "1" + max: "2" + memory: + limit: "4Gi" + storage: + {volume-name}: + capacity: 100Gi +``` + +> For more details, see [Resource Management](../core-concepts/resources/resource-manage.md). + +### Pod Placement + +{Describe how to control pod scheduling using affinity, tolerations, and node selectors.} + +> For more details, see [Pod Placement](../core-concepts/operations/pod-placement.md). + +### Authentication and Security + +{Describe security-related configuration such as TLS, Kerberos, or internal authentication.} + +> For more details, see [Authentication](../core-concepts/security/authentication.md). + +### Logging + +{Describe how to configure and access logs for this component.} + +> For more details, see [Logging](../core-concepts/observability/logging.md). + +## Troubleshooting + +{List common issues and their resolutions specific to this Operator.} + +### Common Issues + +1. **{Issue title}** + - **Symptom**: {What the user sees} + - **Cause**: {Why this happens} + - **Resolution**: {Steps to fix} + +2. **{Issue title}** + - **Symptom**: {What the user sees} + - **Cause**: {Why this happens} + - **Resolution**: {Steps to fix} + +> For common issues across all Operators, see [Troubleshooting](../troubleshooting). + +## Clean Up + +Delete the {Component} cluster: + +```bash +kubectl delete {clusterkind} {cluster-name} -n {operator-name} +``` + +Delete the namespace: + +```bash +kubectl delete ns {operator-name} +``` + +Uninstall the Operator by removing the Subscription: + +```bash +kubectl delete subscription {operator-name} -n operators +``` + +## Related Links + +- [{Component} Official Documentation]({upstream-url}) +- [Kubedoop Operator for {Component} on GitHub]({github-url}) +- [{Component} on GitHub]({component-github-url}) diff --git a/sidebars.ts b/sidebars.ts index a3b992c..cbc5078 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -21,6 +21,7 @@ const sidebars: SidebarsConfig = { label: 'Quick Start', items: [ 'introduction', + 'architecture', 'quick-start/installation', ], }, From 42768dc98cd251a84e8a4cdb494c44cfaecb32ee Mon Sep 17 00:00:00 2001 From: whg517 Date: Sat, 2 May 2026 14:27:17 +0800 Subject: [PATCH 2/2] fix: resolve markdown lint errors - break long lines --- docs/architecture.md | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/docs/architecture.md b/docs/architecture.md index ef167eb..efdf19c 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -4,7 +4,9 @@ This document provides an overview of the Kubedoop Data Platform architecture, i ## Platform Architecture Overview -Kubedoop is a Kubernetes-native DataOps platform that manages 15+ big data components through a unified Operator framework. The platform leverages the Operator Lifecycle Manager (OLM) for Operator installation and lifecycle management, running entirely on top of Kubernetes. +Kubedoop is a Kubernetes-native DataOps platform that manages 15+ big data components +through a unified Operator framework. The platform leverages the Operator Lifecycle Manager +(OLM) for Operator installation and lifecycle management, running entirely on top of Kubernetes. ```mermaid graph TB @@ -210,19 +212,27 @@ Kubedoop is built on the following core design principles: ### Kubernetes Native -All components are managed through Kubernetes Custom Resource Definitions (CRDs) and Operators. There are no custom orchestration layers — the platform relies entirely on the Kubernetes API for state management, scheduling, and self-healing. +All components are managed through Kubernetes Custom Resource Definitions (CRDs) +and Operators. There are no custom orchestration layers — the platform relies entirely +on the Kubernetes API for state management, scheduling, and self-healing. ### Declarative Configuration -Users describe the *desired state* of their data infrastructure through YAML manifests. The Operators continuously reconcile the actual state with the desired state, ensuring consistency without manual intervention. +Users describe the *desired state* of their data infrastructure through YAML manifests. +The Operators continuously reconcile the actual state with the desired state, +ensuring consistency without manual intervention. ### Pluggable Storage -Storage is abstracted through Kubernetes StorageClass, allowing users to choose the underlying storage backend (SSD, HDD, NFS, cloud storage) without changing their component configuration. This enables flexible deployment across different environments. +Storage is abstracted through Kubernetes StorageClass, allowing users to choose the +underlying storage backend (SSD, HDD, NFS, cloud storage) without changing their +component configuration. This enables flexible deployment across different environments. ### Unified Security Model -All Operators share a consistent security model through the built-in Secret Operator and Listener Operator. TLS encryption, authentication, and credential management are handled uniformly across all components. +All Operators share a consistent security model through the built-in Secret Operator +and Listener Operator. TLS encryption, authentication, and credential management +are handled uniformly across all components. ### Observability