Performance Optimization of GBase 8c Based on TPCC Testing

Cong Li - Aug 27 - - Dev Community

To achieve optimal performance for GBase 8c in TPC-C standard testing, we need to approach the following aspects:

  1. Understand Hardware Resources: Effectively utilize hardware resources.
  2. Operating System Configuration: Ensure no restrictions on hardware resource usage.
  3. Network Utilization: Optimize network usage to maximize its advantages.
  4. Database Parameter Adjustment: Tailor the database parameters to better suit the TPC-C business model.

Performance testing aims to maximize hardware resource utilization, ensuring the network and disk are not bottlenecks, and making full use of the CPU. The CPU should be predominantly used by the user processes, with minimal sys usage, maintaining CPU usage above 95%. When monitored with perf top, CPU time spent on each function should be below 5%, avoiding hotspot functions.

1. Understanding Hardware Resources

Log in to the machine where the database is deployed to understand the CPU, memory, disk type, and network status.

(1) Understanding the CPU:
Comprehend how fast the server's "brain" operates. Check its architecture using the lscpu command:
This command provides details such as:
- Architecture: Displays architecture type, e.g., aarch64, x86_64.
- CPU MHz: Indicates the CPU clock speed, which determines how fast a single core operates.
- NUMA nodeX CPU(S): Shows the relationship between CPU cores and NUMA nodes.

(2) Understanding Memory:
Determine the server's short-term memory capacity using free -g:

Image description

For CPU-memory relationship numactl -H:

Image description

During testing, try to ensure each CPU uses memory from its own NUMA node to avoid cross-memory access issues.

(3) Understanding Disk:
Understand the size and access speed of storage using lsblk:

lsblk -d -o NAME,SIZE,ROTA
Enter fullscreen mode Exit fullscreen mode
  • The ROTA column indicates disk type: 1 for HDD (mechanical hard drive), 0 for SSD (solid-state drive).
  • Performance tests should prioritize NVMe SSDs, then SSDs, and finally HDDs.

(4) Understanding Network:
Determine communication speed by finding the NIC for the 10 Gigabit network IP using ip a. Then check NIC speed: xxxxMb/s with ethtool [NIC_NAME]:
Prefer using a 10 Gigabit network for performance testing. If unavailable, use the local machine.

2. Operating System Optimizations

(1) Verify Performance Mode:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Enter fullscreen mode Exit fullscreen mode

If it shows performance, it's in performance mode. If it shows something like powersave, switch to performance mode.

(2) Disable irqbalance:
irqbalance is a Linux service that distributes interrupt load among multiple CPU cores. Disabling it may lead to imbalance and performance degradation. However, for specific scenarios like real-time systems, manual interrupt management might be preferred.

Steps to disable irqbalance:

service irqbalance stop
echo 0 > /proc/sys/kernel/numa_balancing
Enter fullscreen mode Exit fullscreen mode

(3) Disable Transparent Huge Pages:
Transparent Huge Pages (THP) can improve memory access performance but may lead to fragmentation and potential performance issues. To disable:

echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
Enter fullscreen mode Exit fullscreen mode

(4) Disable Firewall:

service firewalld stop
Enter fullscreen mode Exit fullscreen mode

(5) Adjust Resource Limits (limits.conf):
nofile specifies the maximum number of file descriptors that a user can open simultaneously. File descriptors include not only files but also network connections, pipes, and other resources that can be treated as files.

nproc specifies the maximum number of processes a user can create simultaneously, including all types of processes.

On high-load servers (such as large-scale web servers or database servers), handling a large number of concurrent connections and processes is necessary. Appropriately increasing the limits of nofile and nproc can ensure that the server operates normally under high load.

To make changes, edit the /etc/security/limits.conf file. Log out and reconnect for the changes to take effect.

vim /etc/security/limits.conf
-- Configure the following parameters:
* hard nofile 1024000
* hard nproc 1024000
* soft nofile 1024000
* soft nproc 1024000
Enter fullscreen mode Exit fullscreen mode

3. Network Interrupt Optimization

In traditional large-scale symmetric multiprocessing (SMP) systems, writing and maintaining multithreaded programs is relatively easy because all processors can access the same resources. However, as system scales increase, certain issues, such as memory bus bottlenecks, can become performance challenges. Non-Uniform Memory Access (NUMA) systems address these issues by partitioning memory and using multiple nodes to enhance performance. In a NUMA architecture, different parts of the system (typically processors and memory) are divided into multiple nodes, each containing a certain number of processors and memory, interconnected by a network. This network facilitates data transfer and communication between nodes. The local memory access speed of each node is faster than accessing memory from remote nodes, leading to non-uniform memory access characteristics. These characteristics include:

  • Node: Each node in the system contains a set of processors and corresponding local memory. Each node is connected to other nodes, enabling communication through the interconnection network.
  • Non-Uniform Memory Access (NUMA): Memory access latency can vary across different nodes. In a NUMA architecture, a processor can access its local node's memory faster than remote node memory, resulting in performance differences in memory access.
  • Memory Locality: Programs perform better when accessing local node memory. Designing programs to maximize memory locality is an optimization goal in NUMA architectures.
  • Affinity: In NUMA systems, processes or threads are often bound to specific nodes to minimize the need to access remote memory. This is known as affinity settings.
  • NUMA-Aware Scheduling: Operating systems and schedulers may employ NUMA-aware scheduling strategies, ensuring tasks are scheduled on nodes where their data resides to enhance performance.

NUMA architecture is typically used in building large-scale SMP systems, especially those requiring substantial memory, such as databases and scientific computing. The test server used in this document features a Kunpeng 920 processor with a NUMA architecture, containing 128 CPU cores and 4 NUMA nodes. During testing, 16 CPU cores (4 cores per NUMA) are manually bound to a 10GbE network card dedicated to handling network interrupt logic.

# ./bind_net_irq.sh [number of CPU cores] [network interface name]
./bind_net_irq.sh 16 enp131s0f0
Enter fullscreen mode Exit fullscreen mode

After binding, you can use the following command to check the network card binding information:

# ethtool -l [network interface name]
ethtool -l enp131s0f0
Enter fullscreen mode Exit fullscreen mode

4. Database Parameter Optimization

Optimizing database parameters is crucial for enhancing overall performance. Key parameters for TPCC scenarios include:

  • work_mem: Adjusts memory for internal sorting and hash tables before writing to temporary files, reducing disk I/O.
  • maintenance_work_mem: Sets maximum memory for maintenance operations (e.g., VACUUM, CREATE INDEX), affecting efficiency.
  • max_process_memory: Maximum physical memory for the database node, ideally around 80% of server memory.
  • shared_buffers: Size of shared memory used by the database, recommended to be about 60% of max_process_memory.
  • enable_thread_pool: Enables thread pooling for better thread management, set to on or off.
  • thread_pool_attr: Defines thread pool attributes when enable_thread_pool is on. Includes:
    • thread_num: Number of threads, ideally 3-5 times the CPU core count.
    • group_num: Number of thread groups, ideally matching NUMA nodes.
    • cpubind_info: CPU binding options: 1) nobind: No binding 2) allbind: Bind to all CPU cores 3) nodebind:1,2: Bind to NUMA nodes 1 and 2 4) cpubind:0-30: Bind to CPU cores 0-30 5) numabind:0-30: Bind within NUMA node to CPU cores 0-30

Example:

thread_pool_attr='812,4,(cpubind:0-27,32-59,64-91,96-123)'
Enter fullscreen mode Exit fullscreen mode

This configuration assigns 4 CPUs per NUMA node to handle network interrupts, with the rest for the database. During TPCC stress testing, monitor system resources and avoid functions exceeding 5% CPU time using perf top.

Through optimizing hardware, OS, network interrupts, and GBase 8c database parameters, we can achieve optimized TPCC performance for GBase 8c.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player