Setting up HTTPS on an AWS Load Balancer

Context

This was my first time setting up HTTPS using a real certificate rather than a self-signed one.  This way you get that nice lock symbol + https in your browser without the user seeing the obnoxious “do you trust this site, continue unsafely” warnings.

Steps

Generate a Key Pair

Note that for common name you should use the DNS name that you will be using for the website (or load balancer in front of the website).  This is important.  E.g. *.appname.yourcompany.com.  This one is in case of a wildcard cert to handle anything in front of the app-name as well.

openssl req -new -newkey rsa:2048 -nodes -keyout appname.key -out appname.csr

Do not lose this key pair! Store it somewhere safe and backed up.

Request a Certificate from DigiCert/etc

Go to DigiCert or whatever service your company uses.  Request a standard SSL certificate using the files you generated above.

For the case of a wild-card cert, you will want to add Subject Alternative Names (SANs) for appname.yourcompany.com and *.appname.yourcompany.com.

If you get odd failure messages, these sites tend to have something to validate your CSR.  In my case, I think it made me go remake the CSR without a password before it would accept it.  This was only obvious from the validator; the initial messages were just confusing.

Note: The server name and location fields don’t seem to matter much.  I just put AWS for location and my app name with some environment suffix for the server name (nothing exists with this name).

Add Your Certificate to the AWS Certificate Manager

Once your certificate comes back from Digicert, you can upload it to the Certificate Manager in AWS.  The body is the certificate returned to you from Digicert/etc.  The private key is the one you generated above.  The certificate chain just needed the “intermediate certificate” that Digicert returned to me along with the new certificate (and only that, don’t put your certificate in there as well).

Create Your Load Balancer

Go to EC2 and create a new network load balancer.  Add a TLS (Secure TCP) listener and on the security settings page, pick “Choose a certificate from ACM (recommended)”.  Then you can select your certificate.

Create Your DNS Name

Contact your system admins to create a DNS name for you with the name you used in your certificate request (e.g. appname.yourcompany.com from above).  Point it at the IP of the load balancer.  You will have to resolve the load balancer name to an IP for this (I’m not sure that is best practice, so you may want to read up more before following this last step).

All Done!

Assuming your load balancer points to an application now, you can use https to talk to talk to the DNS name of the load balancer, and the load balancer will take it, terminate the TLS, and forward to your application.  So, you’re good!

 

Azure LB Dropping Traffic Mysteriously – HaProxy / NGNIX / Apache / etc.

Failure Overview

I lost a good portion of last week fighting dropping traffic / intermittent connection issues in a basic tier azure load balancer.  The project this was working on had been up and running for 6 months without configuration changes and had not been restarted in 100 days.  Restarting it did not help, so clearly something had changed about the environment.  It also started happening in multiple deployments in different azure subscriptions, implying that it was not an isolated issue or server/etc related.

Solution

After doing a crazy amount of tests and eventually escalating to Azure support, who reviewed the problem for over 12 hours, Azure support pointed out this:

https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview#types

“Do not translate or proxy a health probe through the instance that receives the health probe to another instance in your VNet as this configuration can lead to cascading failures in your scenario. Consider the following scenario: a set of third-party appliances is deployed in the backend pool of a Load Balancer resource to provide scale and redundancy for the appliances and the health probe is configured to probe a port that the third-party appliance proxies or translates to other virtual machines behind the appliance. If you probe the same port you are using to translate or proxy requests to the other virtual machines behind the appliance, any probe response from a single virtual machine behind the appliance will mark the appliance itself dead. This configuration can lead to a cascading failure of the entire application scenario as a result of a single backend instance behind the appliance. The trigger can be an intermittent probe failure that will cause Load Balancer to mark down the original destination (the appliance instance) and in turn can disable your entire application scenario. Probe the health of the appliance itself instead.”

I was using a load balancer over a scale set, and the load balancer pointed at HaProxy, which was designed to route traffic to the “primary” server.  So, I wanted Azure’s load balancer to consider every server up as long as it could route to the “primary” server, even if other things on this server specifically were down.

But having the health probe check HAProxy meant that the health probe was routed to the “primary” server and triggered this error.

This seems like an Azure quirk to me… but they have it documented.  Once I switched the health probe to target something not routed by HaProxy the LB stabilized and everything was ok.