Emergency Handling for GBase Database Issues (4) - Other Exceptions

Cong Li - Jul 15 - - Dev Community

1.1 Data Inconsistency Error

Description

Cluster nodes report data inconsistency alarms.

Analysis

Data inconsistency may occur if a node experiences a brief network interruption. Typically, data synchronization happens automatically when the network is restored. However, if the inconsistency persists for an extended period, manual synchronization is required.

Emergency Procedure

When the network recovers, nodes with data inconsistency should automatically synchronize. If data remains unsynchronized for an hour after the network recovery, consider manual synchronization.

  1. Notify the operations department, open platform, and GBase vendor for assistance.
  2. For temporary table alarms, wait 10 minutes. If the cluster auto-synchronizes, the issue is resolved. Otherwise, GBase support may need to determine whether to stop the cluster service and running tasks, then perform steps 3-6 (depending on task size, typically between 1-4 hours).
  3. Stop GBase database service (20 minutes).
  4. GBase vendor analyzes the inconsistent tables and performs manual synchronization (2-8 hours, depending on table size and number of inconsistencies).
  5. Start the GBase database service and verify data consistency (30 minutes).
  6. Notify the operations department that the system is restored and resume running tasks.

1.2 Data Error

Description

An SQL statement returns incorrect result sets.

Analysis

This issue is caused by a bug in the GBase database execution plan, resulting in incorrect SQL statement result sets.

Emergency Procedure

If this issue is found, coordinate with the application team to assess the impact and consider subsequent fixes.

  1. Notify the operations department, open platform, and GBase vendor for assistance.
  2. GBase vendor analyzes and identifies the issue, providing a detailed explanation, fix, and workaround.
  3. The application department evaluates the impact range based on the vendor's explanation.
  4. Operations department and GBase vendor fix the erroneous data and modify programs to avoid the issue.
  5. GBase vendor provides a patched version to fix the issue.

1.3 Execution Error

Description

An SQL statement execution results in an error.

Analysis

The error is caused by a bug in the GBase database.

Emergency Procedure

If this issue is found, task the vendor to analyze the bug and provide a resolution timeframe.

  1. Notify the operations department, open platform, and GBase vendor for assistance.
  2. GBase vendor analyzes the issue and provides a workaround.
  3. The application department implements the workaround based on the vendor's instructions.
  4. GBase vendor provides a patched version to fix the issue.

1.4 High Concurrency Leading to High Database Node Load

High concurrency can be identified by the following indicators:

  1. System CPU usage averages over 90%.
  2. Disk IO is nearly saturated.
  3. High concurrency is observed through show processlist, with 1-3 long-running tasks (exceeding or approaching 1 hour).

Solution

  1. Reduce the concurrency level in the scheduling system.
  2. Adjust the order of long and short jobs, ensuring a balanced execution to avoid a concentration of long-running jobs.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player