Skip to content

Commit 3040490

Browse files
committed
Update the README.md file
1 parent bd0e972 commit 3040490

1 file changed

Lines changed: 169 additions & 27 deletions

File tree

README.md

Lines changed: 169 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,193 @@
1-
<div align="center">
2-
<picture>
3-
<source media="(prefers-color-scheme: light)" srcset="logo/DuckDB_Logo-horizontal.svg">
4-
<source media="(prefers-color-scheme: dark)" srcset="logo/DuckDB_Logo-horizontal-dark-mode.svg">
5-
<img alt="DuckDB logo" src="logo/DuckDB_Logo-horizontal.svg" height="100">
6-
</picture>
7-
</div>
8-
<br>
1+
# AliSQL with DuckDB Engine
92

10-
<p align="center">
11-
<a href="https://github.com/duckdb/duckdb/actions"><img src="https://github.com/duckdb/duckdb/actions/workflows/Main.yml/badge.svg?branch=main" alt="Github Actions Badge"></a>
12-
<a href="https://discord.gg/tcvwpjfnZx"><img src="https://shields.io/discord/909674491309850675" alt="discord" /></a>
13-
<a href="https://github.com/duckdb/duckdb/releases/"><img src="https://img.shields.io/github/v/release/duckdb/duckdb?color=brightgreen&display_name=tag&logo=duckdb&logoColor=white" alt="Latest Release"></a>
14-
</p>
3+
## Overview
154

16-
## DuckDB
5+
This repository contains **AliSQL** (Alibaba's MySQL fork) integrated with **DuckDB** as an analytical engine. This integration combines the OLTP capabilities of MySQL with the powerful OLAP features of DuckDB, providing a hybrid database solution for both transactional and analytical workloads.
176

18-
DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs, maps), and [several extensions designed to make SQL easier to use](https://duckdb.org/docs/stable/sql/dialect/friendly_sql.html).
7+
## Version Information
198

20-
DuckDB is available as a [standalone CLI application](https://duckdb.org/docs/stable/clients/cli/overview) and has clients for [Python](https://duckdb.org/docs/stable/clients/python/overview), [R](https://duckdb.org/docs/stable/clients/r), [Java](https://duckdb.org/docs/stable/clients/java), [Wasm](https://duckdb.org/docs/stable/clients/wasm/overview), etc., with deep integrations with packages such as [pandas](https://duckdb.org/docs/guides/python/sql_on_pandas) and [dplyr](https://duckdb.org/docs/stable/clients/r#duckplyr-dplyr-api).
9+
- **AliSQL Version**: Based on upstream MySQL 8.0.44
10+
- **DuckDB Engine**: Integrated as a storage/analytical engine within AliSQL
2111

22-
For more information on using DuckDB, please refer to the [DuckDB documentation](https://duckdb.org/docs/stable/).
12+
## What is AliSQL?
2313

24-
## Installation
14+
AliSQL is Alibaba's MySQL branch, forked from official MySQL and used extensively in Alibaba Group's production environment. It includes various performance optimizations, stability improvements, and features tailored for large-scale applications.
2515

26-
If you want to install DuckDB, please see [our installation page](https://duckdb.org/docs/installation/) for instructions.
16+
## What is DuckDB?
2717

28-
## Data Import
18+
DuckDB is an open-source embedded analytical database system (OLAP) designed for data analysis workloads. DuckDB is rapidly becoming a popular choice in data science, BI tools, and embedded analytics scenarios due to its key characteristics:
2919

30-
For CSV files and Parquet files, data import is as simple as referencing the file in the FROM clause:
20+
- **Exceptional Query Performance**: Single-node DuckDB performance not only far exceeds InnoDB, but even surpasses ClickHouse and SelectDB
21+
- **Excellent Compression**: DuckDB uses columnar storage and automatically selects appropriate compression algorithms based on data types, achieving very high compression ratios
22+
- **Embedded Design**: DuckDB is an embedded database system, naturally suitable for integration into MySQL
23+
- **Plugin Architecture**: DuckDB uses a plugin-based design, making it very convenient for third-party development and feature extensions
24+
- **Friendly License**: DuckDB's license allows any form of use, including commercial purposes
25+
26+
## Why Integrate DuckDB with AliSQL?
27+
28+
MySQL has long lacked an analytical query engine. While InnoDB is naturally designed for OLTP and excels in TP scenarios, its query efficiency is very low for analytical workloads. This integration enables:
29+
30+
- **Hybrid Workloads**: Run both OLTP (MySQL/InnoDB) and OLAP (DuckDB) queries in a single database system
31+
- **High-Performance Analytics**: Analytical query performance improves up to **200x** compared to InnoDB
32+
- **Storage Cost Reduction**: DuckDB read replicas typically use only **20%** of the main instance's storage space due to high compression
33+
- **100% MySQL Syntax Compatibility**: No learning curve - DuckDB is integrated as a storage engine, so users continue using MySQL syntax
34+
- **Zero Additional Management Cost**: DuckDB instances are managed, operated, and monitored exactly like regular RDS MySQL instances
35+
- **One-Click Deployment**: Create DuckDB read-only instances with automatic data conversion from InnoDB to DuckDB
36+
37+
## Architecture
38+
39+
### MySQL's Pluggable Storage Engine Architecture
40+
41+
MySQL's pluggable storage engine architecture allows it to extend its capabilities through different storage engines:
42+
43+
![MySQL Architecture](https://raw.githubusercontent.com/baotiao/bb/main/uPic/0f4ea5d6-b3ff-45b8-bdeb-60f03b56fe1e.png)
44+
45+
The architecture consists of four main layers:
46+
- **Runtime Layer**: Handles MySQL runtime tasks like communication, access control, system configuration, and monitoring
47+
- **Binlog Layer**: Manages binlog generation, replication, and application
48+
- **SQL Layer**: Handles SQL parsing, optimization, and execution
49+
- **Storage Engine Layer**: Manages data storage and access
50+
51+
### DuckDB Read-Only Instance Architecture
52+
53+
![DuckDB Architecture](https://raw.githubusercontent.com/baotiao/bb/main/uPic/a5005f18-fb41-46c5-8d11-328b4182766f.png)
54+
55+
DuckDB analytical read-only instances use a read-write separation architecture:
56+
- Analytical workloads are separated from the main instance, ensuring no mutual impact
57+
- Data replication from the main instance via binlog mechanism (similar to regular read replicas)
58+
- InnoDB stores only metadata and system information (accounts, configurations)
59+
- All user data resides in the DuckDB engine
60+
61+
## Implementation Details
62+
63+
### Query Path
64+
65+
![Query Path](https://raw.githubusercontent.com/baotiao/bb/main/uPic/ccb31673-c5cc-429d-b8bc-e432e50a7737.png)
66+
67+
1. Users connect via MySQL client
68+
2. MySQL parses the query and performs necessary processing
69+
3. SQL is sent to DuckDB engine for execution
70+
4. DuckDB returns results to server layer
71+
5. Server layer converts results to MySQL format and returns to client
72+
73+
**Compatibility**:
74+
- Extended DuckDB's syntax parser to support MySQL-specific syntax
75+
- Rewrote numerous DuckDB functions and added many MySQL functions
76+
- Automated compatibility testing platform with ~170,000 SQL tests shows **99% compatibility rate**
77+
78+
### Binlog Replication Path
79+
80+
![Binlog Replication](https://raw.githubusercontent.com/baotiao/bb/main/uPic/79d99d71-1e2b-419d-977a-94d10faea090.png)
81+
82+
Key features:
83+
84+
**Idempotent Replay**:
85+
- Since DuckDB doesn't support two-phase commit, custom transaction commit and binlog replay processes ensure data consistency after instance crashes
86+
87+
**DML Replay Optimization**:
88+
- DuckDB favors large transactions; frequent small transactions cause severe replication lag
89+
- Implemented batch replay mechanism achieving **30K rows/s** replay capability
90+
- In Sysbench testing, achieves zero replication lag, even higher than InnoDB replay performance
91+
92+
**Parallel Copy DDL**:
93+
- For DDL operations DuckDB doesn't natively support (e.g., column reordering), implemented Copy DDL mechanism
94+
- Natively supported DDL uses Inplace/Instant execution
95+
- Copy DDL creates a new table to replace the original using multi-threaded parallel execution
96+
- Execution time reduced by **7x**
97+
98+
![Copy DDL Performance](https://raw.githubusercontent.com/baotiao/bb/main/uPic/5ddc14f2-9b8a-4a00-a346-bace639009e5.png)
99+
100+
## Performance Benchmarks
101+
102+
**Test Environment**:
103+
- ECS Instance: 32 CPU, 128GB Memory, ESSD PL1 Cloud Disk 500GB
104+
- Benchmark: TPC-H SF100
105+
106+
![Performance Comparison](https://raw.githubusercontent.com/baotiao/bb/main/uPic/f844ff93-34d5-4971-89f7-684bea81a001.png)
107+
108+
DuckDB demonstrates significant performance advantages over InnoDB in analytical query scenarios, with up to **200x improvement**.
109+
110+
## Getting Started
111+
112+
### Building AliSQL with DuckDB Engine
113+
114+
**Prerequisites**:
115+
- [CMake](https://cmake.org) 3.x or higher
116+
- Python3
117+
- C++11 compliant compiler (GCC 5.x+ or Clang 3.4+)
118+
119+
**Build Instructions**:
120+
121+
```bash
122+
# Clone the repository
123+
git clone https://github.com/your-repo/myduck.git
124+
cd myduck
125+
126+
# Build the project
127+
make
128+
129+
# For development/debugging
130+
make debug
131+
132+
# Run unit tests
133+
make unit
134+
make allunit
135+
136+
# Build with benchmarks (optional)
137+
BUILD_BENCHMARK=1 BUILD_TPCH=1 make
138+
```
139+
140+
### Using DuckDB Engine in MySQL
141+
142+
Once built, you can create tables using the DuckDB storage engine:
31143

32144
```sql
33-
SELECT * FROM 'myfile.csv';
34-
SELECT * FROM 'myfile.parquet';
145+
-- Create a table with DuckDB engine
146+
CREATE TABLE analytics_table (
147+
id INT,
148+
name VARCHAR(100),
149+
value DECIMAL(10,2)
150+
) ENGINE=DuckDB;
151+
152+
-- Import data from Parquet files
153+
LOAD DATA INFILE '/path/to/data.parquet' INTO TABLE analytics_table;
154+
155+
-- Run analytical queries
156+
SELECT name, SUM(value) as total
157+
FROM analytics_table
158+
GROUP BY name
159+
ORDER BY total DESC;
35160
```
36161

37-
Refer to our [Data Import](https://duckdb.org/docs/stable/data/overview) section for more information.
162+
### Configuration
163+
164+
Key MySQL parameters for DuckDB engine:
165+
- Configure DuckDB-specific settings through MySQL system variables
166+
- Refer to the documentation for tuning parameters based on your workload
167+
168+
## Try It on Alibaba Cloud
169+
170+
You can experience RDS MySQL with DuckDB engine on Alibaba Cloud:
171+
172+
https://help.aliyun.com/zh/rds/apsaradb-rds-for-mysql/duckdb-based-analytical-instance/
173+
174+
## Resources
175+
176+
- [DuckDB Official Documentation](https://duckdb.org/docs/stable/)
177+
- [DuckDB GitHub Repository](https://github.com/duckdb/duckdb)
178+
- [MySQL 8.0 Documentation](https://dev.mysql.com/doc/refman/8.0/en/)
179+
- [Detailed Article (Chinese)](https://mp.weixin.qq.com/s/_YmlV3vPc9CksumXvXWBEw)
38180

39181
## SQL Reference
40182

41183
The documentation contains a [SQL introduction and reference](https://duckdb.org/docs/stable/sql/introduction).
42184

43185
## Development
44186

45-
For development, DuckDB requires [CMake](https://cmake.org), Python3 and a `C++11` compliant compiler. Run `make` in the root directory to compile the sources. For development, use `make debug` to build a non-optimized debug version. You should run `make unit` and `make allunit` to verify that your version works properly after making changes. To test performance, you can run `BUILD_BENCHMARK=1 BUILD_TPCH=1 make` and then perform several standard benchmarks from the root directory by executing `./build/release/benchmark/benchmark_runner`. The details of benchmarks are in our [Benchmark Guide](benchmark/README.md).
187+
Please refer to the [DuckDB Build Guide](https://duckdb.org/docs/stable/dev/building/overview) for detailed build instructions.
46188

47-
Please also refer to our [Build Guide](https://duckdb.org/docs/stable/dev/building/overview) and [Contribution Guide](CONTRIBUTING.md).
189+
The benchmark details are available in the [Benchmark Guide](benchmark/README.md).
48190

49191
## Support
50192

51-
See the [Support Options](https://duckdblabs.com/support/) page.
193+
For DuckDB-specific support, see the [Support Options](https://duckdblabs.com/support/) page.

0 commit comments

Comments
 (0)