GBase Database (GBase 数据库) is a high-performance relational database widely used in data analysis and performance optimization scenarios.
1. Overview
GBase 8a MPP Cluster is a distributed parallel database cluster built on a Shared Nothing architecture. It offers high performance, high availability, and high scalability. Leveraging an MPP + Shared Nothing design, the cluster performs queries in a distributed parallel manner, decomposing each query into parallel tasks executed across nodes. This article provides an in-depth look at the distributed implementation principles of the frequently used join
and group by
operators within queries, along with key parameters in GBase 8a that affect the distributed execution plan, including their mechanisms and influence on execution.
2. Principles of GBase 8a Distributed Execution Plan
2.1 Data Distribution in GBase 8a
Tables in GBase 8a appear as logical tables to users, similar to those in standalone databases. The actual storage of tables supports two types of data distribution:
-
Distributed Table: Data within a table is distributed across nodes in the cluster, with each node storing a data fragment, or physical table, representing a portion of the logical table. GBase 8a supports two distribution strategies:
-
Hash Distribution: Uses a specified column(s) defined by
distributed by
to calculate a hash value that determines the node where data resides. Both single and multiple columns can be used for hash distribution. - Random Distribution: Data is distributed randomly across nodes.
-
Hash Distribution: Uses a specified column(s) defined by
- Replicated Table: Each node contains an identical copy of the complete data.
2.2 Hash Distribution Principles
In GBase 8a, hash distribution operates as follows, given a table definition:
create table emp (pname varchar(50), position varchar(50), ...) distributed by('pname');
Here, pname
serves as the hash distribution column, meaning its values are used to calculate the hash value.
The hash function in GBase 8a is f(x) = crc32(x) % 65536
, where the hash value is derived from crc32
and modulated by 65536
, distributing values across a range from 0 to 65535. The relationship between hash values and data shards (physical tables) is stored in the system table gbase.nodedatamap
, where the hashkey
column maps the 65536 hash values, and nodeid
denotes the physical table number (1 to N) associated with each value. Shard-to-node binding is established using the gcadmin distribution
command and can be verified with gcadmin showdistribution
.
2.3 Distributed Join Execution Plan Principles
This section describes the implementation principles for distributed join
operations, specifically for equality-based joins (excluding range joins such as between
or inequality comparisons).
There are two basic models for distributed joins:
1) Distributed Table Join Distributed Table: Both tables have matching data distribution, meaning the equality condition includes the hash distribution column of each table.
2) Distributed Table Join Replicated Table: One table is a distributed table, and the other is a replicated table (typically with the right table being the replicated table in a left join).
In these models, the join
can be executed in parallel across nodes. If neither model is met, data redistribution is performed to transform the data distribution and allow parallel join calculations based on these models.
It can be divided into the following three situations:
- Small Table Broadcast Join Execution Plan
- Single Table Redistribution Join Execution Plan
- Dual Table Redistribution Join Execution Plan
2.3.1 Small Table Broadcast Join Execution Plan
This execution plan is used when:
- Both tables are distributed tables.
- Table T1 is a large table, and T2 is a small table.
- The
join
condition does not include an equality condition for both tables’ distribution columns.
Execution:
1) Table T2 (small table) is broadcast across nodes to create a temporary replicated table tmp2
.
2) This transformation allows the join
to follow Model 2 for distributed table join replicated table execution.
2.3.2 Single Table Redistribution Join Execution Plan
This execution plan is used when:
- Both tables are distributed tables.
- The
join
condition includes T1's distribution column equaling a non-distribution column in T2.
Execution:
1) Table T2 is redistributed based on the join
equality condition, creating a distributed table tmp2
.
2) The join
follows Model 1 for distributed table join distributed table execution.
2.3.3 Dual Table Redistribution Join Execution Plan
This execution plan is used when:
- Both tables are distributed tables.
- The
join
condition includes no distribution columns from either table.
Execution:
1) Redistribute T1 based on the equality condition columns to create a temporary table tmp1
.
2) Redistribute T2 similarly to create a temporary table tmp2
.
3) Transform to Model 1 for distributed table join distributed table execution.
Note: If there are multiple equality conditions, the last condition in the list is used for redistribution.
2.4 Distributed Group By Execution Plan Principles
Like joins, distributed group by
also has basic models that allow for straightforward parallel execution on nodes. The two models are:
1) Model 1:
- The
group by
table is a hash-distributed table. - The
group by
columns include the table’s hash distribution column. - Execution: Each node performs
group by
in parallel, with results merged and sorted at the coordinator node.
2) Model 2:
- The
group by
table is a replicated table. - Execution: A single node executes the
group by
operation and sends the result to the coordinator node.
If these conditions are not met, data redistribution is used to fit the data into the simplest models, with two derived execution plans:
- Redistribution Group By Execution Plan
- Two-Phase Group By Execution Plan
2.4.1 Redistribution Group By Execution Plan
Conditions:
- The
group by
table is distributed. - The
group by
columns do not include the table's hash distribution column.
Execution:
1) Perform an initial aggregation on each node to reduce network transfer volume.
2) Redistribute data by one group by
column, creating a temporary table tmp1
.
3) Follow Model 1 for further parallel execution.
2.4.2 Two-Phase Group By Execution Plan
Conditions:
- The
group by
table is distributed. - The
group by
columns do not include the table's hash distribution column.
Execution:
1) Each node performs an initial aggregation on the group by
columns.
2) The aggregated data is sent to a single node for further aggregation and sorting.
3. Execution Plan Parameters
3.1 Key Parameters
The specific execution plan selected during SQL execution is influenced by certain parameters:
-
gcluster_hash_redistribute_join_optimize
: Controls dynamic redistribution in equality joins.- Values: 0 (Small table broadcast), 1 (Dynamic redistribution), 2 (Auto-evaluate)
- Default: 2
-
gcluster_hash_redist_threshold_row
: Threshold for automatic redistribution whengcluster_hash_redistribute_join_optimize
is set to 2.- Values: 0 (No limit), positive integers as row count threshold.
- Default: 0
-
gcluster_hash_redistribute_groupby_optimize
: Controls dynamic redistribution forgroup by
.- Values: 0 (Disabled), 1 (Dynamic redistribution)
- Default: 1
3.2 Parameter Influence on Execution Plans
These parameters impact whether the optimizer uses specific redistribution plans to execute queries based on the table and condition characteristics.
Join Scenario | gcluster_hash_redistribute_join_optimize Parameter 1 |
gcluster_hash_redist_threshold_row Parameter 2 |
Execution Plan |
---|---|---|---|
Both tables are distributed tables, with join condition on hash distribution columns of both tables | Parameter 1 = 2 or = 1 |
No effect | Join operation is executed locally following Execution Plan Model 1 |
One table is a distributed table, and the other is a replicated table | No effect | No effect | Join operation is executed locally following Execution Plan Model 2 |
Both tables are distributed tables, with join condition on the hash distribution column of only one table | Parameter 1 = 2: Execution plan depends on Parameter 2 = 1: Follow Execution Plan 1 = 0: Follow Execution Plan 2 |
If Parameter 1 = 2, evaluate based on the value of Parameter 2 as follows: IF the row count of the smaller table < gcluster_hash_redist_threshold_row AND the row count difference between the large and small tables exceeds 20% THEN follow Execution Plan 2 ELSE follow Execution Plan 1 END |
1) Redistribute the hash-distributed table by the join column (non-join column); follow the Single Table Redistribution Hash Join plan. 2) Convert the smaller table to a replicated table; follow the Small Table Replication Plan. |
Both tables are distributed tables, with join condition not on hash distribution columns of either table | = 2: Execution plan depends on Parameter 2 = 1: Follow Execution Plan 1 = 0: Follow Execution Plan 2 |
If Parameter 1 = 2, evaluate based on the value of Parameter 2 as follows: IF the row count of the smaller table < gcluster_hash_redist_threshold_row AND the row count difference between the large and small tables exceeds 20% THEN follow Execution Plan 2 ELSE follow Execution Plan 1 END |
1) Redistribute both hash-distributed tables by the join column; follow the Two-Table Redistribution Hash Join plan. 2) Convert the smaller table to a replicated table; follow the Small Table Replication Plan. |
Group By Scenario | Parameter: gcluster_hash_redistribute_groupby_optimize | Execution Plan |
---|---|---|
Group by list contains the hash distribution column, without functions or expressions on the hash column | = 1 | Perform group by operation locally, following Group By Basic Model 1 |
Any other case | = 1: Follow Execution Plan 1 = 0: Follow Execution Plan 2 |
Execution Plan 1: Redistribute the table by the first column in the group by list (function or expression), following the Redistribution Group By plan |
The above covers some practical tips with GBase Database (GBase 数据库). If you're interested in learning more about GBase Database, feel free to follow our related technical articles.