GBase 8c allows you to locate issues and diagnose problems using slow logs. Here is how to configure and use slow logs in GBase 8c:
1. Slow Log Configuration
(1) Relevant GUC Parameters
The main configuration parameters for GBase 8c slow logs are:
-
enable_stmt_track
-
on
: Default value, enables capture of Full/Slow SQL. -
off
: Disables capture of Full/Slow SQL.
-
-
track_stmt_stat_level
- Composite parameter in the form of 'full sql stat level, slow sql stat level'.
- The first part is the full SQL tracking level, with values:
OFF
,L0
,L1
,L2
. - The second part is the slow SQL tracking level, with values:
OFF
,L0
,L1
,L2
. - Recommended value:
OFF,L0
.
-
log_min_duration_statement
- Time threshold: when a statement's execution time exceeds this value, its execution information is logged.
- Value range:
-1
to2147483647
milliseconds. Default value:30min
.
-
instr_unique_sql_count
- When the number of unique SQL entries generated in the system exceeds
instr_unique_sql_count
, automatic cleanup occurs or logging stops. - Recommended value:
200000
.
- When the number of unique SQL entries generated in the system exceeds
-
track_stmt_details_size
- Maximum size of a single statement's collected SQL, default value:
4096 byte
.
- Maximum size of a single statement's collected SQL, default value:
-
track_stmt_retention_time
- Composite parameter controlling the retention time for full/slow SQL records, read every 60 seconds.
- Format: 'full sql retention time, slow sql retention time'.
- Full SQL retention time range:
0
to86400
. - Slow SQL retention time range:
0
to604800
. - Default value:
3600,604800
.
(2) Method to Configure Slow Logs
First, log in as the database installation user:
su - gbase
Then execute:
gs_guc reload -Z coordinator -N all -I all -c "enable_stmt_track = ON"
gs_guc reload -Z coordinator -N all -I all -c "track_stmt_stat_level = 'OFF,L0'"
gs_guc reload -Z coordinator -N all -I all -c "log_min_duration_statement = 1000"
gs_guc reload -Z coordinator -N all -I all -c "instr_unique_sql_count = 200000"
gs_guc reload -Z coordinator -N all -I all -c "track_stmt_retention_time = '3600,10800'"
2. Querying Slow Logs
Log in to the database to query slow logs and full logs within the GBase 8c database. Note that queries need to be performed in the postgres
system library.
Slow Log Query Views:
dbe_perf.get_global_slow_sql_by_timestamp(start_timestamp timestamp with time zone, end_timestamp timestamp with time zone);
Slow Log Query:
select * from dbe_perf.get_global_slow_sql_by_timestamp(start_timestamp, end_timestamp);
Full Log Query Views:
dbe_perf.get_global_full_sql_by_timestamp(start_timestamp timestamp with time zone, end_timestamp timestamp with time zone);
Full Log Query:
select * from dbe_perf.get_global_full_sql_by_timestamp(start_timestamp, end_timestamp);
3. Common SQL Optimization Techniques in GBase 8c
GUC Parameter Optimization
Optimize execution plans by setting GUC parameters.
Table Definition Optimization
Choose appropriate table definitions based on business needs, including storage models, table distribution keys, partition tables, suitable data types, and indexes.
Table definition optimization includes:
- Storage Mode: GBase8c supports row storage, column storage, and in-memory storage. Choose the appropriate mode based on the business scenario.
- Distribution Key: In a distributed structure, data is split according to the specified distribution key (column). If data needed for a join or operation is on different nodes, redistribution may be required, reducing efficiency. Keeping related data on the same node reduces data interaction during SQL execution, improving efficiency.
- Field Types and Lengths: Design appropriate field types and lengths; create suitable indexes; use partition tables where necessary.
Example: Efficiency difference due to different distribution keys:
-- When join field is a distribution key, the execution is pushed down to DN nodes for higher efficiency.
postgres=# EXPLAIN SELECT * FROM td1,td2 WHERE td1.a=td2.c ORDER BY a;
QUERY PLAN
------------------------------------------------------------------------------
Remote Subquery Scan on all (dn1,dn2,dn3) (cost=2.04..2.05 rows=1 width=16)
-> Sort (cost=2.04..2.05 rows=1 width=16)
Sort Key: td1.a
-> Nested Loop (cost=0.00..2.03 rows=1 width=16)
Join Filter: (td1.a = td2.c)
-> Seq Scan on td1 (cost=0.00..1.01 rows=1 width=8)
-> Seq Scan on td2 (cost=0.00..1.01 rows=1 width=8)
(7 rows)
-- When join field is not a distribution key, data is pulled to CN node for execution, resulting in lower efficiency.
postgres=# EXPLAIN SELECT * FROM td1,td2 WHERE td1.b=td2.b ORDER BY a;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn1,dn2,dn3) (cost=2.04..2.05 rows=1 width=16)
-> Sort (cost=2.04..2.05 rows=1 width=16)
Sort Key: td1.a
-> Nested Loop (cost=0.00..2.03 rows=1 width=16)
Join Filter: (td1.b = td2.b)
-> Remote Subquery Scan on all (dn1,dn2,dn3) (cost=100.00..101.02 rows=1 width=8)
Distribute results by H: b
-> Seq Scan on td1 (cost=0.00..1.01 rows=1 width=8)
-> Materialize (cost=100.00..101.03 rows=1 width=8)
-> Remote Subquery Scan on all (dn1,dn2,dn3) (cost=100.00..101.02 rows=1 width=8)
Distribute results by H: b
-> Seq Scan on td2 (cost=0.00..1.01 rows=1 width=8)
(12 rows)
Statistical Information Optimization
GBase 8c generates optimal execution plans based on cost estimation using analyze
collected statistics. Statistics are crucial for optimizer estimations and cost calculations. Analyze execution plans to find inefficient operators and optimize or force plan changes. Typically, GBase8c chooses the best plan during execution, with manual adjustments needed only if they improve efficiency.
Operator-Level Optimization
Queries go through multiple operator steps to produce the final result. Performance issues often arise from bottleneck operators. Use EXPLAIN ANALYZE
or PERFORMANCE
commands to identify bottlenecks and optimize accordingly.
SQL Query Rewriting
Adjust SQL queries following specific rules to improve execution efficiency while ensuring correct results.
Example: Using exists
instead of in
:
-- Inefficient query using IN
select st.id,st.name from stu st where st.age=5 and st.id not in (
select stu_id from stu_info where code in ('a','b','c','d')
);
test=# explain select st.id,st.name from stu st where st.age=5 and st.id not in (
test(# select stu_id from stu_info where code in ('a','b','c','d')
test(# );
QUERY PLAN
--------------------------------------------------------------------------------------------------
LightProxy (cost=0.00..0.00 rows=1000 width=222)
Node/s: All datanodes
-> Nested Loop Anti Join (cost=0.00..2.22 rows=1 width=10)
Join Filter: ((st.id = stu_info.stu_id) OR (st.id IS NULL) OR (stu_info.stu_id IS NULL))
-> Seq Scan on stu st (cost=0.00..1.06 rows=1 width=10)
Filter: (age = 5)
-> Materialize (cost=0.00..1.09 rows=4 width=4)
-> Seq Scan on stu_info (cost=0.00..1.07 rows=4 width=4)
Filter: ((code)::text = ANY ('{a,b,c,d}'::text[]))
(9 rows)
-- Efficient query using EXISTS
select st.id,st.name from stu st where st.age=5 and not exists (
select 1 from stu_info si where si.stu_id=st.id and si.code in ('a','b','c','d')
);
test=# explain select st.id,st.name from stu st where st.age=5 and not exists (
test(# select 1 from stu_info si where si.stu_id=st.id and si.code in ('a','b','c','d')
test(# );
QUERY PLAN
-------------------------------------------------------------------------
LightProxy (cost=0.00..0.00 rows=1000 width=222)
Node/s: All datanodes
-> Hash Right Anti Join (cost=1.07..2.15 rows=1 width=10)
Hash Cond: (si.stu_id = st.id)
-> Seq Scan on stu_info si (cost=0.00..1.07 rows=4 width=4)
Filter: ((code)::text = ANY ('{a,b,c,d}'::text[]))
-> Hash (cost=1.06..1.06 rows=1 width=10)
-> Seq Scan on stu st (cost=0.00..1.06 rows=1 width=10)
Filter: (age = 5)
(9 rows)
Other optimizations include avoiding SELECT *
, using LIMIT
to restrict rows returned, avoiding functions in WHERE
conditions, and combining multiple inserts into batch inserts.