Ansible Vault Secrets – Encrypt Single Strings in Config Files (Not Whole File)

Why Use Ansible Vault?

Ansible vault is a very useful tool in the world of devops.  It provides you an easy way to check your entire configuration into version control (like git) without actually revealing your passwords.

This means that when you apply config changes, it gets recorded and you can roll-back or upgrade just like you do with normal code.  Nothing is lost, and nothing is insecure.

Encrypting a Single Property

Ansible can be used in countless ways; it is a very flexible tool.  Take variables for example; you can put them in your inventory file, in separate variable files, at the top of your playbook, in the CLI command, and I’m guessing even more places.

In this case, lets say you have a nice existing variables file that just happens to not be encrypted.  You don’t want to split up your file and you don’t want to encrypt the whole file either because then it is hard for others to review (e.g. why encrypt the DNS name for a service you’re hitting?).

Example vars-dev.yml:

mysql_password: password123
dns_name: some-service@company.com

So, let’s encrypt just the password in that file and leave everything else as plain text. We’ll use this command to encrypt the password (password123).

$> ansible-vault encrypt_string --vault-id vault_password 'password123' --name 'mysql_password'
mysql_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
31336265626432653930373332313963666565336333303731653133613732616234383634613134
6463393739616232613837363065376336346265646432360a383464326231333035633762663738
65396562386166613537346165646539303334383039396235636331343830633761303239643434
6335343035646665640a666338346130636161663561376662666566316565646133653162663065
3534
Encryption successful

Note that “vault_password” in this line is actually a text file in the current directory that just has the password to my vault in it. Keeping your vault password in a file lets you use it in automation like Jenkins without it showing up in log entries/prompts.

$> cat vault_password
some_good_password

The entire part from ‘mysql_password’ onward in the command output (prior to Encryption Successful) is the line you should now use for the password in your variables file. So, the modified variables file looks like this:

Updated Example vars-dev.yml:

mysql_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
31336265626432653930373332313963666565336333303731653133613732616234383634613134
6463393739616232613837363065376336346265646432360a383464326231333035633762663738
65396562386166613537346165646539303334383039396235636331343830633761303239643434
6335343035646665640a666338346130636161663561376662666566316565646133653162663065
3534

dns_name: some-service@company.com

Testing It Out

At this point, let’s test that we can use this partially encrypted variable file for something real.

Let’s start by creating some new files, all of which we’ll be lazy with and just keep in our current directory:

We’ll create a config.properties template file that uses our variables.

mysql.password={{mysql_password}}
dns.name={{dns_name}}

Well create a playbook.yml file that is our actual playbook to run:

---

 - name: Ansible Vault Example
   hosts: my_hosts
   tasks:
     - name: Copy config file to servers with decrypted password.
       template: src="config.properties" dest="/tmp/config.properties"

And we’ll create inventory.yml to tell ourselves which hosts to target. We’ll just target localhost, but whatever. Make sure localhost can ssh to itself for this :).

[my_hosts]
localhost

Now we can run our test! Use this command to run our playbook with our inventory file and our variables file with our vault password. Based on the playbook, it should copy config.properties from our current directory to /tmp/config.properties with the variables replaced and decrypted where relevant.

ansible-playbook -i inventory.yml --vault-password-file vault_password --extra-vars @vars-dev.yml playbook.yml

Now we can cat /tmp/config.properties and see that it worked!

$> cat /tmp/config.properties
mysql.password=password123
dns.name=some-service@company.com

I hope this helps you in the future! I’ll be doing more blogs on Ansible, Vault, etc in the near future. So, come back for more!

Jenkins Pipeline Maven /Build + Deploy to Nexus/Artifactory in Docker

Overview

If you want to use a Jenkins pipeline to build your maven project in a clean docker container, you can do so as shown below.

You can do any maven commands you like.  In this particular case I am:

  • Setting the version in the POM to the one provided by the person running the Jenkins job.
  • Building and deploying the maven project to Artifactory.

This is the “build one” of “build once deploy everywhere”.

To test this though, you can just swap my commands out for maven –version or something simple like that.

Configuration

Create  a new pipeline job and:

  1. Add a string parameter called VERSION_TO_TAG.
  2. Set pipeline definition to “Pipeline script from SCM”.
  3. Give your repo checkout URL – e.g. ssh://git@url.mycompany.com:7999/PROJECT/repo-name.git.
  4. Specify your git credentials to use.
  5. Specify your Jenkinsfile path (It is probably literally just “Jenkinsfile” which is the default if you have the Jenkinsfile in the root of your repo).

Make sure your project has a “Jenkinsfile” at its root with a definition like this:

pipeline {
    agent {
        docker { image 'maven:3-alpine' }
    }

    stages {
        stage('Set version, build, and deploy to artifactory.') {
            steps {
                sh 'mvn versions:set -DnewVersion=$VERSION_TO_TAG'
                sh 'mvn deploy'
            }
        }
    }
}

Now, when you build your project, you should see it docker-pull a maven:3-alpine image, start the container, and run our maven commands and upload artifacts to the Artifactory repository you have set up in your maven POM (in your distributionManagement section in case you haven’t done that yet).

Azure LB Dropping Traffic Mysteriously – HaProxy / NGNIX / Apache / etc.

Failure Overview

I lost a good portion of last week fighting dropping traffic / intermittent connection issues in a basic tier azure load balancer.  The project this was working on had been up and running for 6 months without configuration changes and had not been restarted in 100 days.  Restarting it did not help, so clearly something had changed about the environment.  It also started happening in multiple deployments in different azure subscriptions, implying that it was not an isolated issue or server/etc related.

Solution

After doing a crazy amount of tests and eventually escalating to Azure support, who reviewed the problem for over 12 hours, Azure support pointed out this:

https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview#types

“Do not translate or proxy a health probe through the instance that receives the health probe to another instance in your VNet as this configuration can lead to cascading failures in your scenario. Consider the following scenario: a set of third-party appliances is deployed in the backend pool of a Load Balancer resource to provide scale and redundancy for the appliances and the health probe is configured to probe a port that the third-party appliance proxies or translates to other virtual machines behind the appliance. If you probe the same port you are using to translate or proxy requests to the other virtual machines behind the appliance, any probe response from a single virtual machine behind the appliance will mark the appliance itself dead. This configuration can lead to a cascading failure of the entire application scenario as a result of a single backend instance behind the appliance. The trigger can be an intermittent probe failure that will cause Load Balancer to mark down the original destination (the appliance instance) and in turn can disable your entire application scenario. Probe the health of the appliance itself instead.”

I was using a load balancer over a scale set, and the load balancer pointed at HaProxy, which was designed to route traffic to the “primary” server.  So, I wanted Azure’s load balancer to consider every server up as long as it could route to the “primary” server, even if other things on this server specifically were down.

But having the health probe check HAProxy meant that the health probe was routed to the “primary” server and triggered this error.

This seems like an Azure quirk to me… but they have it documented.  Once I switched the health probe to target something not routed by HaProxy the LB stabilized and everything was ok.

 

Spring Time out REST HTTP Calls With RestTemplate

No Timeouts By Default!

Spring’s RestTemplate is an extremely convenient way to make REST calls to web services.  But most people don’t realize initially that these calls have no timeout by default.  This means no connection timeout and no data call timeout.  So, potentially, your app can make a call that should take 1 second and could freeze up for a very long time if the back end is behaving badly.

Setting a Timeout

There are a lot of ways of doing this, but the best one I’ve seen recently (from this stackoverflow post) is to create the RestTemplate in an @Configuration class and then inject it into your services.  That way you know the RestTemplate you are using everywhere was configured properly with your desired timeouts.

Here is a full example.

package com.company.cloudops.config;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.web.client.RestTemplateBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestTemplate;
import java.time.Duration;

@Configuration
public class AppConfig {

    @Value("${rest.template.timeout}") private int restTemplateTimeoutMs;

    @Bean
    public RestTemplate restTemplate(RestTemplateBuilder builder) {
        return builder
                .setConnectTimeout(Duration.ofMillis(restTemplateTimeoutMs))
                .setReadTimeout(Duration.ofMillis(restTemplateTimeoutMs))
                .build();
    }
}

To use this RestTemplate in another Spring bean class, just pull it in with:

@Autowired private RestTemplate template;

Connection Pooling With Spring 2.0 Hikari – Verify Idle Timeouts are Working

Use Case

I’ve been working on an odd API project where each user needs their own connection to various back-end databases/data-sources.  This is a break from the norm because in general, you set up a connection pool of, say, 10 connections and everyone shares it and you’re golden.

If you have 500 users throughout the day though and each one gets some connections, that would be a disaster.  So, in my case making sure the pool is of limited size and making sure the idle timeout works is pretty vital.  So, I started playing around to see how I can verify old connections are really being removed.

My Configuration

I had started with an Apache BasicDataSource (old habits die hard).  But then when I enabled debug I didn’t see connections being dropped, or info on them being logged at all for that matter.  Before bothering with trace, I started reading about Hikari which is a connection pool I see spring using a lot… and it looked pretty awesome! See some good performance and usage info right here.

Anyway! I switched to Hikari quick which was easy since its already in Spring Boot 2.X (which I habitually use for everything these days).

Here’s my Spring config class/code. I have it set in properties to allow a minimum of 0 connections, to time out connections after 60 seconds, and to have a maximum of 4 connections. Connections are tested with “select 1” which is pretty safe on most databases.

@Configuration
public class Config {

    //Configuration for our general audit data source.
    private @Value("${audit.ds.url}") String auditDsUrl;
    private @Value("${audit.ds.user}") String auditDsUser;
    private @Value("${audit.ds.password}") String auditDsPassword;

    @Bean
    public DataSource auditDataSource() {
        HikariConfig config = new HikariConfig();
        config.setJdbcUrl(auditDsUrl);
        config.setUsername(auditDsUser);
        config.setPassword(auditDsPassword);
        config.setMaximumPoolSize(4);
        config.setMinimumIdle(0);
        config.setIdleTimeout(60000);
        config.setConnectionTestQuery("select 1");
        config.setPoolName("Audit Pool");
        config.setValidationTimeout(10000);
        return new HikariDataSource(config);
    }

    @Bean
    public NamedParameterJdbcTemplate auditJdbcTemplate() {
        return new NamedParameterJdbcTemplate(auditDataSource());
    }
}

Verifying it Works

After sending a query to my API, where it uses a basic JDBC template to execute the query, I see the logs do this (note that I removed the date/time/class/etc for brevity).

Audit Pool - Before cleanup stats (total=0, active=0, idle=0, waiting=0)
Audit Pool - After cleanup stats (total=0, active=0, idle=0, waiting=0)
Audit Pool - Before cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - After cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - Before cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - After cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - After cleanup stats (total=0, active=0, idle=0, waiting=0)
Audit Pool - Closing connection PG...: (connection has passed idleTimeout)

So, we can see that it went from 0 connections total, to 1 connection total. The connection looks idle pretty quick because it was a short query that was done before the regular output log. Then after a minute, the connection gets closed and the total goes back to 0.

So, we’re correctly timing out idle connections using our settings. Also, we’re getting our pool name (Audit Pool) in the logs which is awesome too!