We recently got a new Windows Server machine at work and I wanted to install an Ubuntu VM within it. After a few tries, I managed to install everything the way I wanted. One thing which was particularly difficult to set up was the SSH connection between MobaXTerm (running on a Windows 7 machine on our company's subnetwork, part of a nearby university's network) and the Ubuntu VM running on the Windows Server machine (which is actually behind a firewall on the university's network, and which forwards a port to the Windows Server machine to allow SSH connections through). It was a mess. But it works now!
If you want to know how I manage all sorts of JVM languages, deal with multiple Python versions, easily install Hadoop and Spark, and keep my $PATH
sensible, read on!
Setting up a Linux VM on Windows
I installed Ubuntu 18.04.1 LTS on VirtualBox, choosing a "minimal installation" and allowing the installer to "download updates while installing Ubuntu". In spite of this...
Sure.
Again? Fine.
I know I'll need to do this later to SSH in, so while it's rebooting, I set up a network adapter in VirtualBox. I click on "Global Tools" at the top-right of the VirtualBox screen:
Hit "Create" at the top-left:
Hit "Yes":
And click "Properties" in the menubar to open the Properties tabs at the bottom of the screen:
I make a note of the IPv4 address (it should be something like 192.168.___.1
) and check "Enable" under "DHCP Server" in the right-most column. Now that that's all set up, I head back inside the VM. (Note that sometimes, in these tutorials, the VM -- in my case, Ubuntu -- is called the "guest" OS, while the OS within which it's run -- in my case, Windows Server -- is called the "host" OS.)
I open the terminal within Ubuntu and install some basics:
$ sudo apt-get update && sudo apt-get upgrade -y
$ sudo apt-get install net-tools tree git openssh-server ifupdown ssh curl yum -y
Note that the -y
flag means you don't have to explicitly type "Y" when Ubuntu asks...
After this operation, ___ of additional disk space will be used.
Do you want to continue? [Y/n]
Finally, I change the password on the root
account so I can su
as root
later:
$ sudo passwd root
Enter new UNIX password: <type>
Retype new UNIX Password: <type>
passwd: password updated successfully
$ su -
Password: <type>
root$ exit
$ # back to normal command prompt
Install JVM Things and Haskell
I use SDKMAN! to install most of my JVM-based things (like Java, Scala, etc.). It's really easy:
$ curl -s "https://get.sdkman.io" | bash
$ source $HOME/.sdkman/bin/sdkman-init.sh
$ sdk version # to check that it was installed
You can see all of the software which can be installed through SDKMAN! with:
$ sdk list
And if there are multiple versions which can be installed, those can be listed with:
$ sdk list <software>
For instance, sdk list java
returns
12.ea.15-open
11.0.1.-zulu
11.0.1-open
10.0.2-zulu
10.0.2-open
...
...and so on. I'm going to install a stable legacy Java version (Java 8) and the newest LTS version (Java 11):
$ sdk install java 11.0.1-open
$ sdk install java 8.0.191-oracle
View the current version of a particular piece of software with
$ sdk list <software> # OR
$ sdk current <software>
Set the default version with
$ sdk default <software> <version>
Or change the current version (only valid for current shell) with
$ sdk use <software> <version>
Try switching back and forth between Java versions and verify that the version has changed by running java -version
. Also, the Java shell, jshell
, didn't exist before Java 9, so if you switch to Java 8 and try the command jshell
, you'll get an error (but you won't get that error with Java 11).
Next, I install a bunch of other JVM/Java-related things:
$ sdk install groovy # JVM language
$ sdk install kotlin # JVM language
$ sdk install maven # Java build tool
$ sdk install sbt # Scala build tool
$ sdk install scala # JVM language
$ sdk install spark # Scala shell
All of this software installs into $SDKMAN_DIR/candidates/
, which is, by default, $HOME/.sdkman/candidates/
. You'll need to choose default versions for each piece of software. See which versions you're currently using for everything with:
$ sdk current
Using:
java: 11.0.1-open
I only have Java set up so far. Let me pick default versions for all this other stuff:
$ sdk default groovy 2.5.3
$ sdk default kotlin 1.3.0
$ sdk default maven 3.5.4
$ sdk default sbt 1.2.6
$ sdk default scala 2.12.7
$ sdk default spark 2.3.1
Finally -- something that's left out of the instructions on the SDKMAN! website -- you need to "source" the sdkman-init
script again. After you do that, you should see all of your new software in sdk current
:
$ source $HOME/.sdkman/bin/sdkman-init.sh
$ sdk current
Using:
groovy: 2.5.3
java: 11.0.1-open
kotlin: 1.3.0
maven: 3.5.4
sbt: 1.2.6
scala: 2.12.7
spark: 2.3.1
Verify that these have installed correctly by calling them with the appropriate version
flags or command-line arguments:
$ groovy --version
...
Groovy Version: 2.5.3 ...
$ java -version
openjdk version "11.0.1" 2018-10-16
...
$ kotlin -version
Kotlin version 1.3.0-release-212 ...
$ mvn --version
Apache Maven 3.5.4 ...
...
$ sbt sbtVersion
...
[info] 1.2.6
$ scala -version
Scala code runner version 2.12.7 ...
$ spark-submit --version
Welcome to ... version 2.3.0 ...
...and that's it! You can also check that spark-shell
, groovysh
, etc. work. (Note that the spark-shell
will probably crash unless you're using Java 8).
Install Haskell and Cabal
Haskell is really easy to install:
$ sudo apt-get install haskell-platform -y
$ ghci
This also installs the Haskell package manager, cabal
:
$ cabal --version
cabal-install version 1.24.0.2
...
Install and Configure Hadoop
Next, I find the most recent stable release of Hadoop and make a note of its URL and download it with (following along roughly with this guide):
$ wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
...and untar to /usr/local/hadoop
, redirecting the output to /dev/null
:
$ sudo mkdir /usr/local/hadoop
$ sudo tar -xzvf hadoop-2.8.5.tar.gz -C /usr/local/hadoop >/dev/null
Note that when we change the Java version with SDKMAN!, it changes the $JAVA_HOME
system variable:
$ echo $JAVA_HOME
/home/andrew/.sdkman/candidates/java/11.0.1-open
$ sdk use java 8.0.191-oracle
$ echo $JAVA_HOME
/home/andrew/.sdkman/candidates/java/8.0.191-oracle
Hadoop requires access to the Java libraries. If we want Hadoop to use the default version of Java, we can use $JAVA_HOME
in /usr/local/hadoop/hadoop-2.8.5/etc/hadoop/hadoop-env.sh
. If we want it to stick to a specific Java version (say Java 8), we can use a static value like $SDKMAN_DIR/candidates/java/8.0.191-oracle
. I'm going to leave this alone for now. (See the previous link for more information.)
I set a HADOOP_HOME
variable for ease of use:
$ export HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.5
...and run one of Hadoop's MapReduce examples to ensure it's working:
$ mkdir ~/mrtest
$ cp $HADOOP_HOME/etc/hadoop/*.xml ~/mrtest
$ $HADOOP_HOME/bin/hadoop jar \
$ $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar \
$ grep ~/mrtest ~/grep_example 'principal[.]*'
If it's run successfully, the output ends with something like
File Input Format Counters
Bytes Read=151
File Output Format Counters
Bytes Written=37
The result can be seen by typing:
$ cat ~/grep_example/*
6 principal
1 principal.
Hadoop works!
Manage Multiple Python Versions with pyenv
Next, I install pyenv
to manage multiple python versions. Install the prerequisites with:
$ sudo apt-get install make build-essential libssl-dev zlib1g-dev libbz2-dev \
libreadline-dev libsqlite3-dev wget libncurses5-dev libncursesw5-dev \
llvm xz-utils tk-dev libffi-dev liblzma-dev -y
Then install pyenv
with:
$ git clone https://github.com/pyenv/pyenv.git ~/.pyenv
$ export PYENV_HOME=$HOME/.pyenv
$ pyenv init -
Verify that it works with:
$ $PYENV_HOME/bin/pyenv versions
* system (set by /home/andrew/.pyenv/version)
Install other versions with:
$ $PYENV_HOME/bin/pyenv install 2.7.15
$ $PYENV_HOME/bin/pyenv install 3.7.1
See the available versions again:
$ $PYENV_HOME/bin/pyenv versions
* system (set by /home/andrew/.pyenv/version)
2.7.15
3.7.1
Switch default versions and verify that you've switched with:
$ $PYENV_HOME/bin/pyenv global 2.7.15
$ $PYENV_HOME/shims/python --version
Python 2.7.15
$ $PYENV_HOME/bin/pyenv global 3.7.1
$ $PYENV_HOME/shims/python --version
Python 3.7.1
Customising ~/.bashrc
I don't add anything to my $PATH
until I'm sure I understand what it's doing there. It's easy to have a huge $PATH
that just includes every directory and not have any clue where an actual executable is being sourced from. I also try to have _HOME
system variables for each piece of software that I install (this is usually the directory which contains the bin/
directory), so I can find them more easily later. So in my .bashrc
, I'll now add the following:
##----------------------------------------------------------------------------
## handled by SDKMAN:
##----------------------------------------------------------------------------
export SDKMAN_HOME=$SDKMAN_DIR
# GROOVY_HOME
# JAVA_HOME
# KOTLIN_HOME
# MAVEN_HOME
# SBT_HOME
# SCALA_HOME
# SPARK_HOME
##----------------------------------------------------------------------------
## other important directories:
##----------------------------------------------------------------------------
export PYENV_HOME=$HOME/.pyenv
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.5
export HADOOP_CLASSPATH=$HADOOP_HOME/share/hadoop/common
##----------------------------------------------------------------------------
## JAR files
##----------------------------------------------------------------------------
export JAVA_JARS=$SDKMAN_HOME/candidates/scala/jars
export SCALA_JARS=$SDKMAN_HOME/candidates/java/jars
##----------------------------------------------------------------------------
## JAR lists
##----------------------------------------------------------------------------
export JAVA_JARS_LIST=".:$JAVA_JARS/\*"
export SCALA_JARS_LIST=".:$JAVA_JARS/\*:$HADOOP_CLASSPATH/\*:$SCALA_JARS/\*"
##----------------------------------------------------------------------------
## update PATH
##----------------------------------------------------------------------------
export PATH=$PATH:/bin:/sbin
export PATH=$PATH:/usr/bin:/usr/sbin
export PATH=$PATH:/usr/local/bin:/usr/local/sbin
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export PATH=$PATH:$PYENV_HOME/bin
Source the file to load these shortcuts into the shell, and then make the directories for JAVA_JARS
and SCALA_JARS
:
$ source ~/.bashrc
$ mkdir $JAVA_JARS
$ mkdir $SCALA_JARS
I try to keep all of my *.jar
files in one place.
Now that we have all of these bin/
directories on the PATH
, we can just write:
$ pyenv versions
$ hadoop version
...instead of the lengthier:
$ $PYENV_HOME/bin/pyenv versions
$ $HADOOP_HOME/bin/hadoop version
In general, you should always know which binary you're calling when you run something on the command line. Doing it this way (verifying that the software works before we go editing the PATH) helps you to understand where the software you're running actually "lives" on your system.
As Jess mentioned in this article:
Essential quality of life terminal improvements
jess unrein ・ Oct 26 '18
...~/.bash_profile
is sourced for login shells only, while ~/.bashrc
is sourced for interactive non-login shells. This means that ~/.bashrc_profile
is only sourced when you log into the machine via ssh or something. Most of the time this probably isn't what you want. I put all of my custom shell scripts, aliases, etc. in ~/.bashrc
.
I also have lots of custom shell scripts that I source on startup, but that's for another article, I think! At this point, you should be able to run all major JVM languages and manage their versions, manage Java and Scala projects with Maven and sbt, analyse and store "Big Data" with Apache Hadoop and Spark, code in Haskell and the ghci
shell, and easily run multiple versions of Python with pyenv
.
In my article on my shell scripts, I'll also talk about how I customise colours, fonts, etc. in my shell for maximum prettiness.
SSHing into a Remote Linux VM from Windows
The last thing to set up for now is the SSH connection between my local Windows PC and the remote Windows Server which hosts by Ubuntu VM. Note that these instructions are particular to my setup and may not work for you, but it's worth a try if this is similar to what you're trying to accomplish. The first thing I do is run ifconfig
within the virtual machine:
$ ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
...
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
...
You should see two connections similar to the ones above -- enp0s3
and lo
. We need to add a third one, so let's power down the VM and go back to VirtualBox. Go back to "Machine Tools", select your VM from the list on the left, and click the "Settings" button:
Click on "Network" from the menu on the left-hand side and click on "Adapter 2". Check "Enable Network Adapter" and next to "Attached to:", select "Host-only Adapter":
Whatever name VirtualBox fills in here is fine. Hit "OK" and restart your VM. Now, when you run ifconfig
on the Ubuntu VM, you should see three connections:
$ ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
...
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
...
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
...
We need to edit the /etc/network/interfaces
file next. Open it as sudo
and add the following lines:
auto enp0s8
iface enp0s8 inet static
address 192.168.___.10
netmask 255.255.255.0
...where the ___
should be the same as in the network adapter setup steps at the beginning of this walkthrough. Note that the last part of this IP address is .10
, while above it was .1
. The most important step is to then set up a port forward within VirtualBox. This is what allows you to ssh into the host machine (Windows Server, in my case) and have it forward that connection to the VM (Ubuntu, in my case).
To set up a port forward in VirtualBox, go to the "Machine Tools" page, make sure your VM is selected, and click "Settings", just as we did previously. Click on "Network" on the lef-hand side menu, stay on "Adapter 1", and open the "Advanced" section. Click on the "Port Forwarding" button:
Add a new port forwarding rule by clicking the green "+" sign at the top-right:
This part is a bit complex in my setup. We have a server sitting behind a firewall, so the IP address of the server itself is different than the IP address I actually ssh
into. Here, I use the IP address of the server, which you can find by running ipconfig
in the Windows cmd
prompt on the remote host (Windows Server):
In my case, the server is on a private LAN behind the firewall and the server's IP on that LAN is 192.168.100.100
. We also have port forwarding set up on the firewall so that the port I ssh
into is different than the port the server sees us trying to access. So the port the server sees us accessing is 22
. In general, the "Host IP" and "Host Port" are the IP address and port you're trying to access on your remote Windows machine (Windows Server, for my setup).
The "Guest IP" and "Guest Port" are the IP address and port of your VM as seen by your server. The guest IP is the 192.168.___.10
one we set up above, and the port can be any number. I try to use high-value prime numbers, but anything is fine really. In this case, I'll use 33331
, because why not. This is only the port that the remote host uses to talk to the VM so it doesn't really matter what you use here:
The port forwarding rule can have any name. Hit "OK" and "OK" again and go back to your virtual machine. The last thing we need to do is edit /etc/ssh/sshd_config
. Open it as sudo and change the line:
#Port 22
to
Port 33331
or whatever port number you picked. This ensures that your ssh server on your virtual machine is listening for ssh connections on that port. Now, run ifconfig
again and take a look at the inet
address associated with enp0s8
. In my case, it's 192.168.100.111
. But we declared enp0s8
to be static and to have the IP address 192.168.___.10
by editing /etc/network/interfaces
. Let's turn this network adapter off and on again with the commands:
$ sudo ifdown enp0s8
$ sudo ifup enp0s8
Now when you run ifconfig
on the Ubuntu VM, you should see the IP address you defined as the inet
address. You should be able to ping
the IP of enp0s8
from Windows cmd
prompt now, on the remote host (Windows Server, for me):
And we can now SSH into the VM remotely from MobaXTerm on the local Windows machine:
ssh -p <port> <username>@<IP>
Above, <port>
is the port address you use to access the remote host. In my case, this is the port that was opened on the firewall, which forwards to port 22
on the server. <username>
is your username within the VM (although in my case, I have the same name on the remote host and the VM, andrew
). Finally, <IP>
is the IP address of the remote host (or, again, in my case, the IP address of the firewall behind which the remote host sits).
Even after you do all this, you might get an error that says something like:
WARNING: REMOTE HOST SPECIFICATION HAS CHANGED!
This happens because we're forwarding the port on the remote host to the port of the VM, but we already have a key within %MOBAXTERM_HOME%\home\.ssh\known_hosts
which relates to this IP address. Simply open that file and remove any lines which begin with that IP. I did that and then (IP address and port number below are for illustrative purposes only):
$ ssh -p 11111 andrew@156.77.221.23
Permanently added `[156.77.221.23]:11111' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1. LTS (GNU/Linux 4.15.0-38-generic x86_64)
...
andrew@ubuntuvm:~$
...it works! I know this setup is extremely specific and may not help most people, but it took me a bit of time to figure out how to get everything up and running and if I can help just one person, then it's worth it. (Also I had to write this up so my coworkers would know how to do this in the future!) Let me know what you think below, and thanks for reading (if you've made it this far)!
In a future post, I'll discuss how I customise my shell with convenience functions, aliases, and fonts and colors! Stay tuned!