Since version V9.5.2.X, GBase 8a MPP Cluster supports loading data from general data servers through various protocols such as ftp, http, hdfs, and sftp. It also supports Kafka clusters as a data source for loading data. Today, we will take Red Hat Enterprise Linux 6.2 as an example to introduce how to configure an FTP file server in GBase databases.
1. FTP Server Configuration
We will use vsftp to set up the FTP server.
1) Check if vsftpd is installed
# rpm -qa vsftpd
vsftpd-2.2.2-6.el6_0.1.x86_64
2) Install vsftpd
# rpm -ivh vsftpd-2.2.2-6.el6_0.1.x86_64.rpm
3) Modify the default configuration of the FTP server
# vim /etc/vsftpd/vsftpd.conf
- Allow anonymous user login (default is YES):
anonymous_enable=YES
- Allow local users to log in (default is YES):
local_enable=YES
- Open write permissions for local users (default is YES; if only used as a file loading server, you can set this to NO):
write_enable=NO
- Set the file creation mask for local users (default mask is 077, but you can change it to 022):
local_umask=022
- Allow anonymous FTP users to upload files (default is NO):
#anon_upload_enable=YES
- Allow anonymous FTP users to create directories (default is NO):
#anon_mkdir_write_enable=YES
- Enable connection requests on FTP data ports (default is YES):
connect_from_port_20=YES
- PAM authentication configuration file, located in the /etc/pam.d directory:
pam_service_name=vsftpd
- Control access to the FTP server with the userlist file:
userlist_enable=YES
- Deny access to specific files or directories:
#deny_file={*.mp3,*.mov,.private}
- Hide specific files or directories:
#hide_file={*.mp3,.hidden,hide*,h?}
- Set the port range for passive mode (default is 0, meaning any available port):
pasv_min_port=20001
pasv_max_port=21000
- Set the maximum number of client connections (default is 2000):
max_clients=2000
- Set the maximum number of client connections per IP (default is 50):
max_per_ip=50
- Timeout for passive transmission mode connections (default is 60 seconds):
accept_timeout=60
- Timeout for active transmission mode connections (default is 60 seconds):
connect_timeout=60
- Timeout for data transfer in progress with no activity (default is 300 seconds):
data_connection_timeout=300
- Idle session timeout (default is 300 seconds):
idle_session_timeout=300
- Whether to use system calls sendfile to optimize transmission (default is YES; if using NFS or other network disks, it should be set to NO):
use_sendfile=YES
- Set the home directory for non-anonymous users:
#local_root=/var/ftp/pub
For more configuration options, refer to the vsftpd.conf documentation:
# man vsftpd.conf
When the maximum number of concurrent load tasks in the cluster is N, and the maximum number of machines per load task (max_data_processors) is M, the minimum and recommended values for some parameters are as follows:
Parameter Name | Default Value | Minimum Value | Recommended Value |
---|---|---|---|
max_clients | 2000 | M*N | M*N*2 |
max_per_ip | 50 | N | N*2 |
pasv_min_port | 0 | M*N | M*N*2 |
4) Configure the user list that allows or denies access to the FTP server (optional)
# vim /etc/vsftpd/user_list
When the following is configured in /etc/vsftpd/vsftpd.conf, all users in /etc/vsftpd/user_list are denied access to the FTP server:
userlist_enable=YES
userlist_deny=YES (default is YES)
When configured as below, all users in /etc/vsftpd/user_list are allowed access to the FTP server:
userlist_enable=YES
userlist_deny=NO
5) Configure the user list to deny access to the FTP server (optional)
# vim /etc/vsftpd/ftpusers
6) Disable or modify SELINUX configuration (choose one of the two methods)
Disable SELINUX
# vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disable
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
Restart or execute:
# setenforce 0
Modify SELINUX configuration
# setsebool ftp_home_dir 1
Note: When accessing the FTP server using a browser, if you encounter "500 OOPS: cannot change directory:/home/...", this might be the issue.
7) Disable or configure the firewall
Disable firewall
Stop the firewall service:
# service iptables stop
Check if the firewall starts automatically at boot:
# chkconfig --list iptables
Prevent the firewall from starting automatically at boot:
# chkconfig iptables off
Configure firewall
Set default rules:
# iptables -A INPUT -j DROP
Open FTP ports:
# iptables -I INPUT -p tcp --dport 21 -j ACCEPT
# iptables -I OUTPUT -p tcp --sport 21 -j ACCEPT
# iptables -I INPUT -p tcp --dport 20001:21000 -j ACCEPT
# iptables -I OUTPUT -p tcp --sport 20001:21000 -j ACCEPT
Save firewall settings:
# iptables-save > /etc/sysconfig/iptables
8) Start the vsftpd service and set it to start on boot
# service vsftpd start
# chkconfig vsftpd on
9) Copy files to the FTP directory
If local_root=/var/ftp/pub
is not set, copy files to /home/xxxx (the user's home directory).
If local_root=/var/ftp/pub
is set, copy files to /var/ftp/pub.
If anonymous_enable=YES
is set, copy files to /var/ftp or /var/ftp/pub (the home directory for anonymous login).