In database queries, filtering a single table using multiple fields is a common operation. However, without appropriate index support, these queries can become slow. This article explores the impact of indexes on query performance with different combinations of filtering fields by creating a test environment, simulating data generation, and executing a series of test SQL statements.
1. Create Test Table
create table t_user(
c_id serial primary key, --ID
c_name varchar(50), --NAME
c_age int,c_sex char(6), --GENDER
c_cardno char(20), --ID NUMBER
c_birthday char(10), --BIRTH DATE
c_phone char(11), --PHONE NUMBER
c_address1 varchar(255), --ADDRESS 1
c_address2 lvarchar(255) --ADDRESS 2
);
2. Generate Test Data
sh gendata.sh 100000>t_user.unl
#!/bin/sh
#gendata.sh
for i in `seq $1`
do
surnames=("Smith" "Johnson" "Williams" "Brown" "Jones" "Garcia" "Miller" "Davis" "Rodriguez" "Martinez" "Hernandez" "Lopez" "Gonzalez" "Wilson" "Anderson" "Thomas" "Taylor" "Moore" "Jackson" "Martin" "Lee" "Perez" "Thompson" "White" "Harris" "Sanchez" "Clark" "Ramirez" "Lewis" "Robinson" "Walker" "Young" "Allen" "King" "Wright" "Scott" "Torres" "Nguyen" "Hill" "Flores" "Green" "Adams" "Nelson" "Baker" "Hall" "Rivera" "Campbell" "Mitchell" "Carter" "Roberts" "Gomez" "Phillips" "Evans" "Turner" "Diaz" "Parker" "Cruz" "Edwards" "Collins" "Reyes" "Stewart" "Morris" "Morales" "Peterson")
num=$((RANDOM % ${#surnames}))
given_names=("James" "Mary" "John" "Patricia" "Robert" "Jennifer" "Michael" "Linda" "William" "Elizabeth" "David" "Barbara" "Richard" "Susan" "Joseph" "Jessica" "Thomas" "Sarah" "Charles" "Karen" "Christopher" "Nancy" "Daniel" "Margaret" "Matthew" "Lisa" "Anthony" "Betty" "Donald" "Dorothy" "Mark" "Sandra" "Paul" "Ashley" "Steven" "Kimberly" "Andrew" "Donna" "Kenneth" "Emily" "George" "Michelle" "Joshua" "Carol" "Kevin" "Amanda" "Brian" "Melissa" "Edward" "Deborah" "Ronald" "Stephanie" "Timothy" "Rebecca" "Jason" "Sharon" "Jeffrey" "Laura" "Ryan" "Cynthia" "Jacob" "Kathleen" "Gary" "Amy")
num1=$((RANDOM % ${#given_names}))
num2=$((RANDOM % ${#given_names}))
age=$(( $RANDOM % (99)))
genders=("M" "F")
gender=$((RANDOM % ${#genders}))
random_day=$((RANDOM % (36500)))
target_timestamp=$((random_day * 86400))
random_date=$(date -d @$target_timestamp "+%Y-%m-%d")
addr=`openssl rand -base64 100`
echo "0|${surname} ${given_name1} ${given_name2}|${age}|${gender}|${id}|${random_date}|${phone}|${addr}|${addr}|"
done
Or use Python and the Faker library to generate test data:
#!/usr/bin/env python3
import sys
import datetime
from faker import Faker
# Get parameters
num = 0
if len(sys.argv) == 2:
num = str(sys.argv[1])
# Get the current year
curyear = datetime.datetime.now().year
# Initialize Faker for English data
fdata = Faker("en_US")
# Print random data 'num' times
for i in range(int(num)):
# Generate a random SSN, get birth date and gender
ssn = fdata.ssn()
year = ssn[0:3]
month = ssn[4:6]
day = ssn[7:9]
sex = int(ssn[-1]) % 2
birth = f'{year}-{month}-{day}'
print("%d|%s|%d|%s|%s|%s|%s|%s|%s|" % (
i + 1,
fdata.name_male() if sex == 1 else fdata.name_female(),
curyear - int(year),
"Male" if sex == 1 else "Female",
ssn,
birth,
fdata.phone_number(),
fdata.address(),
fdata.address()))
3. Import Data
echo "load from t_user.unl insert into t_user;" |dbaccess testdb
4. Test SQL and Results
No. | SQL | Number of Rows | Filter Fields | Index Fields | Execution Time |
---|---|---|---|---|---|
1 | select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; | 100000 | c_name and c_sex and c_cardno | None | 0.022 sec |
2 | select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; | 100000 | c_name and c_sex and c_cardno | c_name | 0.003 sec |
3 | select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; | 100000 | c_name and c_sex and c_cardno | c_sex | 0.041 sec |
4 | select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; | 100000 | c_name and c_sex and c_cardno | c_cardno | 0.002 sec |
5 | select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; | 100000 | c_name or c_sex or c_cardno | None | 0.027 sec |
6 | select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; | 100000 | c_name or c_sex or c_cardno | c_name | 0.028 sec |
7 | select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; | 100000 | c_name or c_sex or c_cardno | c_sex | 0.027 sec |
8 | select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; | 100000 | c_name or c_sex or c_cardno | c_cardno | 0.028 sec |
9 | select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; | 100000 | c_name or c_sex or c_cardno | c_cardno, c_name, c_sex | 0.028 sec |
10 | select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; | 100000 | c_name or c_cardno | None | 0.027 sec |
11 | select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; | 100000 | c_name or c_cardno | c_name | 0.029 sec |
12 | select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; | 100000 | c_name or c_cardno | c_cardno | 0.027 sec |
13 | select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; | 100000 | c_name or c_cardno | idx1(c_cardno, c_name) | 0.042 sec |
14 | select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; | 100000 | c_name or c_cardno | idx1(c_cardno), idx2(c_name) | 0.005 sec |
- When multiple filtering fields are combined with AND, create an index on the field with high selectivity.
- When multiple filtering fields are combined with OR and none of the fields have low selectivity, create separate indexes for each field.
Through the testing and analysis of single-table multi-field filtering queries, we gain a deeper understanding of the role of indexes in optimizing database queries. Proper index design can significantly improve query performance and reduce system resource consumption. We hope that the test results and optimization strategies discussed in this article provide valuable insights for database administrators and developers in index design. Thank you for reading, and we hope this article offers a practical perspective on optimizing database query performance.