Install Docker CE on Linux Centos 7.x

This is just a short post paraphrasing the very good (and verbose!) instructions on the Docker site here:

Basically, to install Docker CE on a fresh Centos 7.x server, you have to:

  • Install the YUM config manager.
  • Install device-mapper-persistent data and LVM (for the storage driver).
  • Use the YUM config manager to add the stable  docker YUM repository.
  • Install docker.
  • Start docker.
  • Test that it worked.

This script does all of that and basically just saves you from skimming through the linked page repeatedly to find the few commands you need.

sudo yum install -y yum-utils \
  device-mapper-persistent-data \
sudo yum-config-manager \
    --add-repo \
sudo yum install docker-ce docker-ce-cli
sudo systemctl start docker
sudo docker run hello-world

Assuming it works, you should see “Hello from Docker!” among various other output on your screen.

Running Terraform on Centos7/RHEL7 With Docker

Install Docker

Here is a lean version of the Docker site content that I tested on Centos 7.5.  It yum installs some pre-requisites, adds the stable Docker Community Edition repository to yum, and then installs and starts Docker.

sudo yum install -y yum-utils \
device-mapper-persistent-data \
sudo yum-config-manager \
--add-repo \
sudo yum install docker-ce
sudo systemctl start docker

Now Docker is started – but only the root user can really use it.  So, let’s create the docker group and add our current user to it.  That way we can use docker with our current user and avoid having to use sudo on every command.

These instructions from from here:

sudo groupadd docker
sudo usermod -aG docker $USER

After this, please re-log in (e.g. exit out of SSH and jump back into your server) so that your group memberships apply.

Now Docker is running and we can use it as ourselves.

Get Terraform Working in Docker

We will run Terraform as a single command inside of a Docker image.  So, let’s start by getting the latest Terraform image form Hashicorp:

docker pull hashicorp/terraform

Create a directory for your Terraform work and give ownership to your user. Also create a sub-directory to act as the Docker volume in which we will put your Terraform plans.

sudo mkdir /opt/terraform && sudo chown $USER:$USER /opt/terraform
cd /opt/terraform
mkdir tf-vol

Now let’s create a file at /opt/terraform/tf-vol/ with a sample Terraform plan (just a debug one).

output "test" {
  value = "Hello World!"

After this, we can run Terraform and tell docker to use that tf-vol directory as as a volume. Terraform will use it as the working directory, will find our plan, and will display “Hello World!”.

$ docker run -i -t -v /opt/terraform/tf-vol:/tf-vol/ -w /tf-vol/ hashicorp/terraform:light apply

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.


test = Hello World!

So, we now have Docker installed, and Terraform running with it using an external volume to store our plans.

Docker + Windows 10 – Volume Mount Shows No Files // Firewall

I wasted roughly an hour on this two separate times now.  Basically, my docker volume mount would stop showing files.

I dug through endless git hub pages and error reports, tried making the docker NAT private and everything… but the problem ended up being that I went home from work and was using my VPN!

So, before spending too much time on the complicated solutions you find online; just start by disabling your VPN if you have one running and see if that helps first.

Docker + Windows “Error starting userland proxy”

Docker Start Error

I ran into a new docker issue today.  Basically, I restarted my PC, and when I tried to bring up a container with a Postgres instance I use for testing, I received this confusing error:

Error response from daemon: driver failed programming external connectivity on endpoint postgres (15b348b1f5bf8d2bfd17c1c41b340d1c66f63ace7cab39ea69aeca3f69ed7442): Error starting userland proxy: mkdir /port/tcp: input/output error
Error: failed to start containers: postgres

What Does it Mean?

It turns out this is a big headache which is still unresolved, and which has one of the longer Git Issue threads I’ve ever seen right here.

Here’s a summary of it:

  • Windows 10 has a “Fast Start Up Mode”, and Docker doesn’t play well with it (or vice versa).
  • So, after a restart, you may find that you see this issue.
  • Theoretically, restating the Docker Daemon fixes this (which is a little annoying but fine).  You should be able to do that in Services.
  • This personally didn’t help me the first try.  So, I went and disabled Fast Start mode (which is also annoying) by:
    • Go to start and type “Power and Sleep”, click it when it pops up.
    • Click “Additional power settings” on the right.
    • Click “Choose what the power buttons do”.
    • Click “Change settings that are currently unavailable” and log in if you can’t already toggle the “Turn on fast startup (recommended) checkbox.
    • Turn off that checkbox.

Note that once you reboot you have to wait a bit for docker to come up (it can take a few minutes).  For example, the first 4 or 5 times I ran “docker version”, the daemon showed as down even though I could see the service running.  But a minute later it was up and working fine.

Docker Run Postgres, expose to Local Host

I needed to spin up a Postgres database for testing a new application, so I figured I’d do it via Docker to keep my system clean.

So, the plan is to develop an application on my PC / localhost (e.g. in PyCharm), but connect to the Postgres instance within the Docker container.

Getting & Running Postgres

This is actually quite trivial:

docker pull postgres
docker run --name postgres -e POSTGRES_PASSWORD=password -d -p 5432:5432 postgres

The first command pulls the image from docker-hub, and the second one runs the container and exposes the Postgres port externally (to the same numbered port) so that we can communicate with it from our local host.

Setting Up pgAdmin

The pgAdmin utility is a wonderful UI for working with Postgres.  You can download it here:

In my case, I actually want to verify that connecting to Postgres works from outside of the container environment.  So, I chose to install the Windows version locally to help verify this.  If you are so inclined, they have Docker images instead so that you can install pgAmin in a container as well to keep your system clean.

Once you’ve installed pgAdmin, you can open it, go to the browser on the left panel, right click on “Servers”, and add a new one targeting “localhost” and port “5432”.

Once you do that and open it, you should hopefully be able to see monitoring statistics on it.  Then you can create a new database and work with it at will!

Persistent Data

Remember, if you delete your docker container, the data for the database will go away.  You can stop it and start it at much as you like though.  If you need to delete it and still access the data for some reason, look into using a volume in docker (this is what they’re for).




Install Airflow on Windows + Docker + CentOs

Continuing on my journey; setting up Apache Airflow on Windows directly was a disaster for various reasons.

Setting it up in the WSL (Windows Subsystem for Linux) copy of Ubuntu worked great.  But unfortunately, you can’t run services/etc properly in that, and I’d like to run it in a state reasonably similar to how we’ll eventually deploy it.

So, my fallback plan is Docker on Windows, which is working great (no surprise there).  It was also much less painful to set up in the end than the other options.  I’m also switching from Ubuntu to CentOS (non-enterprise version of RHEL) as I found out that docker has service files tested with it here:

Assuming you have docker for Windows set up properly, just do the following to set up Airflow in a new CentOS container.

Get and Run CentOS With Python 3.6 in Docker

docker pull centos/python-36-centos7
docker container run --name airflow-centos -it centos/python-36-centos7:latest /bin/bash

Install Airflow with Pip

pip install --upgrade pip
pip install apache-airflow

Set up Airflow

First install VIM.  Yes… this is docker so the images are hyper-stripped down to contain only the essentials.  You have to install anything else.

First, install VIM. I think I had to go connect to the container as root to do this using this command:

docker exec -it -u root airflow-centos /bin/bash

Then you can just install with yum fine. I’m not 100% sure this was needed, so feel free to try it as the normal user first.

yum install vim

I jumped back into the normal user after that (by removing the -u root from the command above).

Then set up Airflow’s home directory and database.

  • Set the Airflow home directory (permanently for the user).
    • vi ~/.bashrc and add this to the bottom of the file.
      • export AIRFLOW_HOME=~/airflow
    • Then re-source the file so you can use it immediately:
      • ~/.bashrc
  • Initialize the Airflow database (we just did defaults, so it will use a local SQLite DB).
    • airflow initdb

Then verify the install worked by checking its version:

root@03bae42c5cdb:/# airflow version
[2018-11-07 20:26:44,372] {} INFO - Using executor SequentialExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/

Run Airflow Services

The actual Airflow hello world page here: just says to run Airflow like this:

  • airflow webserver -p 8080
  • airflow scheduler

You probably want to run these in the background and tell the logs to go to a file, etc.

It’s more professional just to run it as a service (on CentOS/RHEL which is why I switched to CentOS from Ubuntu).  But it turns out that running it as a service in Docker is tricky.

Even if you get everything set up properly, Docker by default enables/disables some features for security that make systemctl not work (so you can’t start the service).  It sounds like this is a whole rework to get this working (read here).

Also, I realize my idea may have been flawed in the first place (running it as  service in a container).  Containers are really intended to hold micro-services.  So, it would make more sense to launch the web server and the scheduler as their own containers and allow them to communicate with each other probably (I’m still figuring this out).  This thread nudged me into realizing that:

It says:

Normally when you run a container you aren’t running an init system. systemctl is a process that communicates with systemd over dbus. If you aren’t running dbus or systemd, I would expect systemctl to fail.

What is the pid1 of your docker container? It should reflect the entrypoint and command that were used to launch the container.

For example, if I do the following, my pid1 would be bash:

$ docker run --rm -it centos:7 bash
[root@180c9f6866f1 /]# ps faux
root         1  0.7  0.1  11756  2856 ?        Ss   03:01   0:00 bash
root        15  0.0  0.1  47424  3300 ?        R+   03:02   0:00 ps faux

Since only bash and ps faux are running in the container, there would be nothing for systemctl to communicate with.

So, the below steps probably get it working if you set the container up right in the first place (as a privileged container), but it isn’t working for me for now.  So feel free to stop reading here and use Airflow, but it won’t be running as a service.

I might come back and update this post and/or make future one on how to run airflow in multiple containers.  I’m also aware that there is an awesome image here that gets everything off the ground instantly; but I was really trying to get it working myself to understand it better:

—- service setup (not complete yet)

I found information on the Airflow website here: stating:

Airflow can integrate with systemd based systems. This makes watching your daemons easy as systemd can take care of restarting a daemon on failure. In the scripts/systemd directory you can find unit files that have been tested on Redhat based systems. You can copy those to/usr/lib/systemd/system. It is assumed that Airflow will run under airflow:airflow. If not (or if you are running on a non Redhat based system) you probably need to adjust the unit files.

Environment configuration is picked up from /etc/sysconfig/airflow. An example file is supplied. Make sure to specify the SCHEDULER_RUNS variable in this file when you run the scheduler. You can also define here, for example, AIRFLOW_HOME or AIRFLOW_CONFIG.

I didn’t see much in the installation, so I found the scripts on Git Hub for the 1.10 version that we are running (based on our earlier version prompt):

Based on this, I: