AWS CentOS Extend Root Volume with No Downtime

In AWS, you can generally extend the root (or other) volume of any of your EC2 instances without downtime.  The steps slightly vary by OS, file system type, etc though.

On a rather default-configured AWS instance running the main marketplace Centos 7 image, I had to run the following commands.

  1. Find/modify volume in the AWS console “volumes” page under the EC2 service.
  2. Wait for it to get into the “Optimizing” state (visible in the volume listing).
  3. Run: sudo file -s /dev/xvd*
    • If you’re in my situation, this will output a couple lines like this.
      • /dev/xvda: x86 boot sector; partition 1: ID=0x83, active, starthead 32, startsector 2048, 134215647 sectors, code offset 0x63
      • /dev/xvda1: SGI XFS filesystem data (blksz 4096, inosz 512, v2 dirs)
    • The important part is the XFS; that is the file system type.
  4. Run: lsblk
    • Again, in my situation the output looked like this:
      • NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
      • xvda 202:0 0 64G 0 disk
      • └─xvda1 202:1 0 64G 0 part /
    • This basically says that the data is in one partition under xvda.  Note; mine said 32G to start.   I increased it to 64G and am just going back through the process to document it.
  5. Run: sudo growpart /dev/xvda 1
    • This grows partition #1 of /dev/xvda to take up remaining space.
  6. Run: sudo xfs_growfs -d /
    • This tells the root volume to take up the available space in the partition.
  7. After this, you can just do a “df -h” to see the increased partition size.

Note, your volume may take hours to get out of the “optimizing” stage, but it still can be used immediately.

You can view the raw AWS instructions here in case any of this doesn’t line up for you when you go to modify your instance: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-modify-volume.html.

 

AWS Packer Centos 7 Example – Get AMI ID

I was very surprised to see how incredibly hard it is to determine an AMI ID in AWS for use with Packer.

I generally use Centos 7 marketplace images for my servers; e.g. CentOS 7 (x86_64) – with Updates HVM.  There is no place anywhere in the AWS UI or the linked Centos product page to actually find what the AMI ID is in a given region (and it does change per region).

I came across this stack-overflow post which was a life-saver though.  Basically, for us-east-1 as an example, you can run this command using the AWS CLI (yeah, you actually have to use the CLI – that’s how wrong this is).

aws ec2 describe-images \
      --owners aws-marketplace \
      --filters Name=product-code,Values=aw0evgkw8e5c1q413zgy5pjce \
      --query 'Images[*].[CreationDate,Name,ImageId]' \
      --filters "Name=name,Values=CentOS Linux 7*" \
      --region us-east-1 \
      --output table \
  | sort -r

And you get output like this:

|  2019-01-30T23:40:58.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1901_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-05713873c6794f575.4  |  ami-02eac2c0129f6376b  |
|  2018-06-13T15:53:24.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1805_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-77ec9308.4           |  ami-9887c6e7           |
|  2018-05-17T08:59:21.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1804_2-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-55a2322a.4            |  ami-d5bf2caa           |
|  2018-04-04T00:06:30.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1803_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-8274d6ff.4           |  ami-b81dbfc5           |
|  2017-12-05T14:46:53.000Z|  CentOS Linux 7 x86_64 HVM EBS 1708_11.01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-95096eef.4            |  ami-02e98f78           |

The upper one will be the newest and probably the one you want (at least in my case).

I hope that saves you some precious googling time; it took me a while to find it since AWS’s less than admirable documentation on the subject shows up first.
 

AWS + Terraform + Auto Scale Group + User Data Bash Script on Startup to Customize Image

User Data  – On Startup

If you want to customize your VM image on its first start-up, you may want to use “user data”.  You can basically think of this as a script that will be run right after boot-up the very first time.  You can also make it run every reboot apparently (with extra config).

Why would you need this?  Well, in my case, I was spawning up a Presto cluster.  I generally do this in a special HA way… but even if you did it the simple way, you would have 1 coordinator and N workers, and the N workers would have to point at your 1 coordinator.

So, there are 2 interesting things here:

  1. The coordinator and workers are identical barring some slightly different configuration in one file.
  2. The workers need to know about the coordinator in order to use it.

So, for both of these cases, we’d like to run a script on start-up!.

The Terraform Code

When you want to create an auto-scale-group, you have to start by creating a launch template: https://www.terraform.io/docs/providers/aws/r/launch_template.html.

You can use that template to spawn up multiple auto-scale groups when its is done.  The launch template itself has the user data though.  So, you are best off trying to make your user data script generic enough that it can work for all your cases.  It can be a bash file and can use variables, so this isn’t too hard.

If you do need multiple separate user data scripts you’ll have to use separate launch templates, which is not the end of the world either.

The launch template in the link above is very complete, so all I’m going to show you is how to pass a bash script that takes parameters to the user data.

Basically replace:

user_data = "${base64encode(...)}"

In their example with something like this:

user_data = base64encode(templatefile("${path.module}/worker-script.sh", {coordinator_lb = "${aws_lb.coordinator.dns_name}", hive_thrift_csv = "${var.hive_thrift_csv}"}))

Assuming your worker-script has content like this:

#!/bin/bash
echo "Hello World" > /tmp/test-output.txt

and you have the hive_thrift_csv variable defined in your variables file like this:

variable "hive_thrift_csv" {
type = "string"
default = "thrift://ip-addr-1:9083,thrift://ip-addr-2:9083"
}

you should be good. Note, the first variable, definition coordinator_lb = “${aws_lb.coordinator.dns_name}” is a reference to the DNS name from a load balancer created in another part of my terraform config. I left it in as its a good example for a more complex separate variable.

Setting up HTTPS on an AWS Load Balancer

Context

This was my first time setting up HTTPS using a real certificate rather than a self-signed one.  This way you get that nice lock symbol + https in your browser without the user seeing the obnoxious “do you trust this site, continue unsafely” warnings.

Steps

Generate a Key Pair

Note that for common name you should use the DNS name that you will be using for the website (or load balancer in front of the website).  This is important.  E.g. *.appname.yourcompany.com.  This one is in case of a wildcard cert to handle anything in front of the app-name as well.

openssl req -new -newkey rsa:2048 -nodes -keyout appname.key -out appname.csr

Do not lose this key pair! Store it somewhere safe and backed up.

Request a Certificate from DigiCert/etc

Go to DigiCert or whatever service your company uses.  Request a standard SSL certificate using the files you generated above.

For the case of a wild-card cert, you will want to add Subject Alternative Names (SANs) for appname.yourcompany.com and *.appname.yourcompany.com.

If you get odd failure messages, these sites tend to have something to validate your CSR.  In my case, I think it made me go remake the CSR without a password before it would accept it.  This was only obvious from the validator; the initial messages were just confusing.

Note: The server name and location fields don’t seem to matter much.  I just put AWS for location and my app name with some environment suffix for the server name (nothing exists with this name).

Add Your Certificate to the AWS Certificate Manager

Once your certificate comes back from Digicert, you can upload it to the Certificate Manager in AWS.  The body is the certificate returned to you from Digicert/etc.  The private key is the one you generated above.  The certificate chain just needed the “intermediate certificate” that Digicert returned to me along with the new certificate (and only that, don’t put your certificate in there as well).

Create Your Load Balancer

Go to EC2 and create a new network load balancer.  Add a TLS (Secure TCP) listener and on the security settings page, pick “Choose a certificate from ACM (recommended)”.  Then you can select your certificate.

Create Your DNS Name

Contact your system admins to create a DNS name for you with the name you used in your certificate request (e.g. appname.yourcompany.com from above).  Point it at the IP of the load balancer.  You will have to resolve the load balancer name to an IP for this (I’m not sure that is best practice, so you may want to read up more before following this last step).

All Done!

Assuming your load balancer points to an application now, you can use https to talk to talk to the DNS name of the load balancer, and the load balancer will take it, terminate the TLS, and forward to your application.  So, you’re good!

 

Ansible – Install AWS CLI and Log in To Amazon ECR Docker Registry via Ansible

There is probably a much cleaner way of doing this using off-the-shelf automations.  But I was just following along with the AWS installation instructions and got this working.

    - name: Download AWS CLI bundle.
      shell: "cd /tmp && rm -rf /tmp/awscli* && curl 'https://s3.amazonaws.com/aws-cli/awscli-bundle.zip' -o 'awscli-bundle.zip'"

    - name: Update repositories cache and install "unzip" package
      apt:
        name: unzip
        update_cache: yes

    - name: Unzip AWS CLI bundle.
      shell: "cd /tmp && unzip awscli-bundle.zip"

    - name: Run AWS CLI installer.
      shell: "/tmp/awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws"

    - name: Log into aws ecr docker registry
      when: jupyterhub__notebook_registry != ''
      shell: "$(/usr/local/bin/aws ecr get-login --no-include-email --region us-east-1)"

In order to do the actual login, you need to endure your EC2 instance has an IAM role assigned to it that has reader privileges. Then you should be all good!

S3 Eventual Consistency

Consistent Distributed File Systems

Historically, I’ve used standard HDFS, MapR’s version of HDFS (MapR-FS), and ADLS (Azure’s data lake service).  All of these behave very much like you would expect a local file system to.  If you write files and another process lists files, it will immediately see them and be able to use them without issue.

Amazon s3 File System Issues

I was surprised when I started learning about Amazon s3 after using all of these prior file systems.  I understand that s3 is an object store… similar to Azure Blob Storage.  I also understand that it is the main data lake solution though.

Maybe it’s just because I’m new and am missing something… but there doesn’t seem to be any AWS version of ADLS.

The s3 storage service is eventually consistent.   This means that if you run Spark, or similar tools on it, they will likely produce improper results or fail.  This is because multiple tasks will write files in parallel and list them and they won’t necessarily get the fully up to date view of the storage.  So, they may write 10 files, list them, and see 5 files, etc.

I came across a very good article describing this in detail here: https://www.opendoor.com/w/blog/why-s3guard-with-s3-as-a-filesystem-spark.

The TLDR is that you have to use a consistency layer between your big data frameworks and s3 to ensure they function well.  You can confirm this by reading the short hadoop documentation site here -> https://wiki.apache.org/hadoop/AmazonS3.

Note that the first article recommends S3Guard which works based on DynamoDB, but there may be other options (e.g. EMR will have a way of dealing with this).

Determine Compatibility of hadoop-aws and aws-java-sdk-bundle JARs

When you’re integrating hadoop and other big-data frameworks into AWS s3, you will quickly run into a situation where you need to include the hadoop-aws and aws-java-sdk-bundle JARs into your class path.

Unfortunately, these JARs are separately versioned and it is hard to figure out compatibility.  The hadoop-aws JAR has to match your hadoop version exactly, so that one is fine.

Determining the Right Version

  1. Check your hadoop version.
  2. Get the hadoop-aws.jar with the same exact version.
  3. Go to the maven central page for the correct version of the hadoop-aws.jar and look at its compile dependencies.  E.g. at https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.9 you can see the SDK dependency is com.amazonaws » aws-java-sdk-bundle 1.11.199.