This article provides a detailed introduction to the data compression feature in GBase databases, covering the benefits, methods, commands, and limitations of data compression. GBase 8s data compression is used in scenarios where storage space is limited, there's a need to reduce data storage footprint, and data processing efficiency must be improved. It is recommended to compress tables and shards that are not frequently updated, offering an effective solution for saving storage space and enhancing I/O efficiency.
1. Benefits of Data Compression
1) Saving Storage Space and Associated Costs
2) Improving I/O Efficiency
- The database reads compressed data, reducing I/O load.
- The time saved on I/O operations outweighs the time required to decompress data in memory.
3) Enhancing Read Operation Performance
4) Slight Decrease in Write Operation Performance
2. Data Compression Methods
1) Automatic Compression for New Tables
a. Automatic Compression:
CREATE TABLE cust5 (...) COMPRESSED;
- Automatically generates a data dictionary.
- Automatically compresses data.
b. Light Append Mode:
Light append mode bypasses the buffer pool for faster data loading. It is automatically enabled under the following conditions:
- Row table.
- No index.
2) Disabling Automatic Compression:
TASK/ADMIN(TABLE UNCOMPRESS);
3) Re-enabling Compression:
TASK/ADMIN(TABLE COMPRESS);
4) Controlling Fragment Decompression:
TASK/ADMIN(FRAGMENT COMPRESS/UNCOMPRESS);
2. Compression for Existing Tables
1) Enabling Data Compression:
EXECUTE FUNCTION TASK('enable compression');
2) Generating Data Dictionary Compression:
EXECUTE FUNCTION TASK('table create_dictionary', 'stdtab', 'test', 'gbasedbt');
3) Consolidating Data Space: Merges unused space into the tail of the table or fragment.
EXECUTE FUNCTION TASK('table compress repack', 'stdtab', 'test', 'gbasedbt');
4) Returning Unused Space to the Database Space:
EXECUTE FUNCTION TASK('table shrink', 'stdtab', 'test', 'gbasedbt');
EXECUTE FUNCTION TASK('fragment shrink', 'partn');
3. Validating Data Compression Commands
1) Viewing the Data Dictionary with onstat -g ppd
after Data Compression:
Example command:
EXECUTE FUNCTION TASK('table create_dictionary', 'table', 'db', 'gbasedbt');
Example output:
onstat -g ppd
partnum ColOffset DbsNum CrTS CrLogID CrLogPos DrTS DrLogID DrLogPos
0x1001d5 -1 1 1393371661 4 16339024 0 0 0
0x1001d5 4 1 1393371661 4 16355408 0 0 0
-
partnum
: Partition number to which the compression dictionary is applied. -
ColOffset
: Byte offset of the BLOB column in a compressed partition;-1
if only rows are compressed. -
DbsNum
: Number of database spaces residing in the dictionary. -
CrTS
: Timestamp when the data dictionary was created. -
CrLogID
: Unique ID of the logical log created when the data dictionary was generated. -
CrLogPos
: Position of the logical log when creating the data dictionary. -
DrTS
: Timestamp when the data dictionary was cleared. -
DrLogID
: Unique ID of the logical log when the data dictionary was cleared. -
DrLogPos
: Position of the logical log when the data dictionary was cleared.
2) Estimating Compression Ratio:
EXECUTE FUNCTION TASK("table estimate_compression", "tablename", "dbname", "owner_name");
Example output:
est curr change partnum coloff table
----- ----- ------ ---------- ------- --------------------
75.7% 75.3% +0.4 0x00200003 -1 insurance:bwilson.auto
75.7% 0.0% +75.7 0x00300002 -1 insurance:pchang.home
-
est
: Existing compression ratio. -
curr
: Current compression ratio. -
change
: Change in compression from the current state. -
partnum
: Partition number. -
coloff
: Indicates if the data in the space is row data or blobs data. -
table
: Table name, formatted asdatabase:owner.tablename
.
3) Displaying Ongoing Compression Operations:
onstat –g dsk
Processed Remaining Duration
Partnum OP Pass Rows Blobs Rows Time(s) Table Name
400002 Repack 1 6325 1752 1497 00:00:00 db:sl:t1
-
partnum
: Partition number. -
OP
: Compression operation (e.g.,compress
,repack
, orshrink
). -
Pass
: Relevant for repack operations,1
indicates the first pass,2
indicates the second pass. -
Processed Rows
: Number of rows processed by the compression operation. -
Blobs
: Number of blobs processed. -
Remaining Rows
: Number of rows not yet compressed. -
Duration Time(s)
: Duration of the operation in seconds. -
Table Name
: Name of the table.
4. Data Compression Limitations
Compression cannot be applied in the following cases:
- Data not stored in rows (e.g., blobs).
- Indexes.
- Temporary tables.
- Catalog tables.
- Tables in the system library.
GBase 8s data compression technology is a crucial tool for database administrators and developers to optimize storage and enhance performance. This detailed introduction aims to help you effectively utilize GBase 8s compression features. Thank you for reading!