Terraform on Docker – Run Using Current Directory as Volume

Quick Tip

You can use the following command to run a terraform apply using the current directory as the volume. This is great if you, say, do a git checkout of your repository and want to just run the terraform files from the checkout folder.

docker run -it -v $(pwd):/workpace -w /workpace hashicorp/terraform:light apply

 

Shut Down All Docker Containers Based on Internal Analysis – JupyterHub Example

I manage a few decent sized Jupyter Hub environments based on the docker spawner.  Each frequently has more than 50 users, sometimes much more…. and recently, one of the servers ran out of memory.

I have some read-only notebooks inside the user containers… so I figured that if a user only had those read only notebooks, I could shut down their docker containers.  They weren’t doing any work that could be lost.

So, I wrote this script to:

  1. List all docker containers.
  2. Get their names.
  3. Exec a bash command in them.
  4. Shut them down based on the result.

I hope it helps you with a similar docker-related issue! 🙂

CONTAINERS=`docker container ls | awk '{print $14}'`
for NAME in ${CONTAINERS}
do
  #echo $name
  COUNT=`docker exec ${NAME} ls -a | grep .ipynb | grep -v checkpoints | wc -l`
  if [[ $COUNT = 1 ]];
  then
    echo "Stopping $NAME with COUNT = $COUNT."
    docker container stop $NAME;
  fi
done

AWS + Terraform + Auto Scale Group + User Data Bash Script on Startup to Customize Image

User Data  – On Startup

If you want to customize your VM image on its first start-up, you may want to use “user data”.  You can basically think of this as a script that will be run right after boot-up the very first time.  You can also make it run every reboot apparently (with extra config).

Why would you need this?  Well, in my case, I was spawning up a Presto cluster.  I generally do this in a special HA way… but even if you did it the simple way, you would have 1 coordinator and N workers, and the N workers would have to point at your 1 coordinator.

So, there are 2 interesting things here:

  1. The coordinator and workers are identical barring some slightly different configuration in one file.
  2. The workers need to know about the coordinator in order to use it.

So, for both of these cases, we’d like to run a script on start-up!.

The Terraform Code

When you want to create an auto-scale-group, you have to start by creating a launch template: https://www.terraform.io/docs/providers/aws/r/launch_template.html.

You can use that template to spawn up multiple auto-scale groups when its is done.  The launch template itself has the user data though.  So, you are best off trying to make your user data script generic enough that it can work for all your cases.  It can be a bash file and can use variables, so this isn’t too hard.

If you do need multiple separate user data scripts you’ll have to use separate launch templates, which is not the end of the world either.

The launch template in the link above is very complete, so all I’m going to show you is how to pass a bash script that takes parameters to the user data.

Basically replace:

user_data = "${base64encode(...)}"

In their example with something like this:

user_data = base64encode(templatefile("${path.module}/worker-script.sh", {coordinator_lb = "${aws_lb.coordinator.dns_name}", hive_thrift_csv = "${var.hive_thrift_csv}"}))

Assuming your worker-script has content like this:

#!/bin/bash
echo "Hello World" > /tmp/test-output.txt

and you have the hive_thrift_csv variable defined in your variables file like this:

variable "hive_thrift_csv" {
type = "string"
default = "thrift://ip-addr-1:9083,thrift://ip-addr-2:9083"
}

you should be good. Note, the first variable, definition coordinator_lb = “${aws_lb.coordinator.dns_name}” is a reference to the DNS name from a load balancer created in another part of my terraform config. I left it in as its a good example for a more complex separate variable.

Building Presto Admin

Presto Admin – Is it Worth Using?

I generally deploy Presto clusters using packer and terraform.  Packer builds an image for me with the target presto distribution, some utility scripts, ha proxy (to get multiple coordinators acting in HA), etc.

I kept noticing this presto-admin project though: https://github.com/prestosql/presto-admin.  It allows you to quickly/easily deploy clusters from a central node, and it will handle the coordinator, workers, catalogs, and everything.  That sounds pretty cool.

Advance disclaimer – after I built this, I decided not to use it.  This was because it seems to just deploy a single coordinator and worker set.  For an HA setup, I need multiple coordinators, a load balancer, and workers/users pointing at the load balancer.  So, it’s just not the right fit for me.

Presto Admin – Build

In any case, I did go through the motions of building this – because I could not find a source release.  Fortunately, it’s pretty easy on Centos 7.x (basically RHEL 7.x):

# Download and unzip.
wget https://github.com/prestosql/presto-admin/archive/2.7.tar.gz
tar -xvzf 2.7.tar.gz

# Install pip/etc.
sudo yum install epel-release
sudo yum install python-pip
sudo yum install python-wheel

# Run make file and build web installer.
make dist

After this, just go into the dist folder and find prestoadmin-2.7-online.tar.gz.

I hope this saves you some time; I wasted around 20 minutes trying to find the dist online for download (which I never did).

Using Ansible in Jenkins – Ansible Plugin

Jenkins + Ansible Options

I have been doing a lot of Jenkins, Ansible, Terraform, and similar automations lately and I have seen multiple ways of running Ansible in Jenkins.  These include:

  1. Install Ansible on your Jenkins node and call it with “sh” in a pipeline.
  2. Install Ansible on your Jenkins node and call it with the ansible plugin for Jenkins (not a default plugin).
  3. Run a docker image with Jenkins and configure ansible there.

I think the 3rd option is the most powerful as you can separately configure and version Ansible for your various Jenkins jobs.  But the second option is pretty sleek as well and is what we’ll talk about here.

Jenkins Ansible Plugin

If you go to the Jenkins plugins management page, you can install the ansible plugin pretty easily.  There are some good documents on it right here too.  After installing that plugin, I also suggest you install the AnsiColor plugin.

With both of those in place, you can get ansible running your playbooks with its private key credentials and it will print beautiful colored output to your logs.

Here is an example of how to call it from a pipeline.  Note that this lets you specify your target hosts in a CSV, assuming your playbook uses the “target_hosts” group.  It also disables host key checking (which took me a while to work out as the docs are wrong).  For some reason my ssh wouldn’t work without that setting with this plugin even though it is not ideal.

pipeline {
    agent any

    parameters {
        string(name: 'GIT_BRANCH', defaultValue: 'master', description: 'git branch to work from.')
        string(name: 'TARGET_HOSTS_CSV', defaultValue: 'none', description: 'target deployment hosts.')
    }

    stages {
        stage('Deploy application.') {
            steps {
                sh 'echo [target_hosts] > /tmp/inventory.ini'
                sh 'echo "${TARGET_HOSTS_CSV}" | tr "," "\n" >> /tmp/inventory.ini'

                ansiColor('xterm') {
                    ansiblePlaybook(
                        playbook: './ansible/playbook.yml',
                        inventory: '/tmp/inventory.ini',
                        credentialsId: 'your-jenkins-pk-credential',
                        disableHostKeyChecking: true,
                        colorized: true)
                }
            }

        }
    }
}