1. SQL Execution Parameters
1.1 Insert Value Data Distribution Parameter
gcluster_random_insert
This parameter controls how data is distributed across nodes when executing insert value
on a randomly distributed table. The default value is 0, and the recommended configuration is 1.
-
0: All
insert value
data is inserted into a single node (if the executing node is a composite node, data is inserted there; otherwise, it is inserted into a random node). -
1:
insert value
data is evenly distributed across all nodes randomly.
1.2 Parameter for Supporting insert into select from dual
t_gcluster_use_new_dual
This parameter controls whether GCluster uses the new implementation of the dual table, which supports insert into ... select ... from dual
.
- Range: 0, 1
- Default: 0
-
0: Uses the old implementation, does not support
insert into ... select ... from dual
. -
1: Uses the new implementation, supports
insert into ... select ... from dual
. - Scope: session, global
1.3 Data Redistribution Parameter for group by
t_gcluster_hash_redistribute_groupby_on_multiple_expression
This parameter enables hash redistribution for group by
operations across all columns. The default value is 0 (disabled). It can be optimized for SQL where the first group by
field is a constant or has few distinct values.
1.4 Join/Materialized Result Set Size Parameter
_gbase_result_threshold
This parameter limits the size of JOIN result sets and materialized result sets. It needs to be configured in both GCluster and GNode. The default value is large, at 137,438,953,472. It is recommended to set it to twice the number of rows in the largest table. This helps avoid Cartesian products by erroring out if the resulting row count exceeds this value.
1.5 Parallel Materialization Threshold for Result Sets
gbase_parallel_threshold
This parameter defines the threshold for parallel materialization of result sets. When the result set row count is greater than or equal to this value, multiple threads are used for parallel materialization; otherwise, it is done serially. The default value is 10,000. Adjusting this parameter can optimize performance for serial materialization stages that are time-consuming.
1.6 Allow Binary/Varbinary Column Creation
gcluster_support_binary
The default value is 1, allowing the creation of binary/varbinary columns. When set to 0, binary/varbinary columns are not allowed, and varchar type must be used. Adjust based on business needs.
1.7 Parameter for Table and Column Names with Chinese Characters
gcluster_extend_ident
This parameter controls whether table and column names can include Chinese characters and special characters. The default value is 0 (disabled). It is generally not recommended to enable this parameter unless necessary.
1.8 Optimization Parameter for group by
with Window Functions
t_gcluster_group_by_ext_optimization
When enabled, this parameter optimizes group by rollup/cube/grouping sets
by converting them into union all
executions. The default value is 0 (disabled), and the recommended value is 1. Note that this optimization does not work if the grouping column in the projection is a function.
Example SQL rewrite to bypass the current optimization limitation:
SELECT func(a), b, COUNT(*) FROM t GROUP BY rollup(a, b);
-- Change to
SELECT func(a), cnt FROM (SELECT a, b, COUNT(*) AS cnt FROM t GROUP BY rollup(a, b)) tmp;
1.9 One-pass Hash Group Optimization
_gbase_one_pass_hash_group
This optimization is suitable for cases where the source table has a large number of rows relative to the group buffer, and the distinct values in the group by
columns are numerous. There are three partitioning methods: RR, original hash, and one-pass hash, with the evaluation criteria as follows:
- DistinctRatio < 10: Uses RR partitioning (requires two aggregations).
- Otherwise: Uses hash partitioning. If the group buffer can hold 50% of the source data, original hash partitioning is used; otherwise, one-pass hash partitioning is used.
Note: The algorithm might not be optimal due to inaccurate sampling results or when only considering data volume without data characteristics.
1.10 Recursive Call Depth for Stored Procedures
max_sp_recursion_depth
This parameter specifies the maximum depth for recursive calls in stored procedures. The range is [0-255], with a default value of 0. Adjust this parameter based on the need for recursive calls in stored procedures. Increasing the value may also require increasing the thread_stack
parameter in GCluster.
1.11 CTE Support Parameter
t_gcluster_support_cte
This parameter controls whether the common table expression (CTE) syntax (with as
) is supported. It is a session-level parameter, with a default value of 0 (disabled) and 1 enabling CTE syntax support.
1.12 Recursive Query Parameter for connect by start with
_gbase_connect_by_support_table_with_deleted_records
This parameter controls whether connect by start with
recursive queries can be executed on tables after data deletion. The default value is 0 (OFF). Enabling this parameter allows such queries even after deletions.
1.13 Recursion Depth Limit for or
Operator in Correlated Subqueries
_gbase_or_recursion_depth
This parameter limits the maximum depth of nested conditions with the or
operator in correlated subqueries. The default value is 10, and exceeding this value results in an error. The parameter is session-level in GNode.
1.14 Distinct Row Count Limit for in
Subquery Results
_gbase_in_subquery_result_threshold
This parameter limits the distinct row count for in
subquery results. The range is [0-100 million], with a default value of 10 million. Adjust based on business scenarios.
2. dblink Parameters
2.1 Retaining Intermediate Temporary Results
gcluster_dblink_direct_data_exchange
This parameter is related to transparent gateways and controls how data is transferred between two GBase 8a clusters during insert select
operations. The default value is 1, using select into server
for cross-cluster data distribution. When set to 0, the select
results are converted into insert values
statements.
Note: For significantly different GBase 8a cluster versions, set gcluster_dblink_direct_data_exchange
to 0 for compatibility, despite the reduced performance.
2.2 Controlling Table Generation Method with dblink
t_gcluster_dblink_generate_interim_table_policy
This parameter controls how interim tables are generated during table pulls with dblink. It is a global and session-level parameter.
- Range: 0, 1
- Default: 1
- 0: Uses automatic evaluation based on the data type of the projection expression results.
-
1: Requests the gateway to use
create ... select ... limit 0
to determine the interim table structure, resulting in more accurate column data type evaluation.