AWS EC2 Varying Generation and Linear Cost Example

Due to increased query sizes on our presto clusters (causing aggregation failures), I’m in the middle of evaluating moving from 16 core 64GB RAM general purpose EC2 machines (m4.4xlarge) to 64 core 256GB RAM general purpose machines  (a 4x increase in power/RAM).

Here is the list of m4 and m5 models for 16-core/64GB and 64-core/256GB specs.  Below, we’ll see how they compare to each other and what the best option is.

Type vCPU ECU Memory (GiB) Instance Storage (GB) Linux/UNIX Usage
m4.4xlarge 16 53.5 64 GiB EBS Only $0.80 per Hour
m4.16xlarge 64 188 256 GiB EBS Only $3.20 per Hour
m5.4xlarge 16 70 64 GiB EBS Only $0.768 per Hour
m5.16xlarge 64 256 256 GiB EBS Only $3.072 per Hour

For reference:

EC2 uses the EC2 Compute Unit (ECU) term to describe CPU resources for each instance size where one ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

There are a few good things to notice here:

  • For m4.4xlarge to m4.16xlare, we are getting 4x the resources for exactly  4x the cost  ($.80 x 4 = $3.20).  The one exception is we get less than 4x the ECU  units (so technically less than 4x the processing power).  So, compute roughly scales linearly within a model it seems.
  • Pretty much the exact same situation holds true for the m5 models; going from xlarge to 16xlarge is exactly a 4x increase in cost and resources except for ECUs which are a little less than 4x.
  • The m5 models have more ECUs than their m4 counterparts and they  also cost less, so they are a better deal both  performance and cost wise.

So, we’ll go with  m5.16xlarge instances which cost $3.072 an hour.  This comes out to $2,211 a month.

AWS Not Authorized to Use Launch Template (in Terraform or in Console)

This is just a quick note for anyone facing this issue.

A few of us lost about a day debugging what we thought was a terraform issue originally.  While we were creating an auto scaling group (ASG), we were getting “Invalid details specified: You are not authorized to use launch template…”.

It turned out that the same error was presented in the AWS console when we tried to create the ASG there.

After some substantial debugging, it turned out that terraform was allowed to create a launch template with an AMI (Amazon Machine Image) that did not exist.  We had used the AMI ID from our non-prod account in our prod account, but AMIs must exist in each account with unique IDs – so it wasn’t working.

It took us a while to get to this point in our debugging because, frankly, we were very astounded that the error message was so miss-leading.  We spent a very long time trying to figure out everything that could trigger a permissions error on the template itself, not realizing that a missing resource used within the template would make the whole template present that error.

AWS CentOS Extend Root Volume with No Downtime

In AWS, you can generally extend the root (or other) volume of any of your EC2 instances without downtime.  The steps slightly vary by OS, file system type, etc though.

On a rather default-configured AWS instance running the main marketplace Centos 7 image, I had to run the following commands.

  1. Find/modify volume in the AWS console “volumes” page under the EC2 service.
  2. Wait for it to get into the “Optimizing” state (visible in the volume listing).
  3. Run: sudo file -s /dev/xvd*
    • If you’re in my situation, this will output a couple lines like this.
      • /dev/xvda: x86 boot sector; partition 1: ID=0x83, active, starthead 32, startsector 2048, 134215647 sectors, code offset 0x63
      • /dev/xvda1: SGI XFS filesystem data (blksz 4096, inosz 512, v2 dirs)
    • The important part is the XFS; that is the file system type.
  4. Run: lsblk
    • Again, in my situation the output looked like this:
      • NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
      • xvda 202:0 0 64G 0 disk
      • └─xvda1 202:1 0 64G 0 part /
    • This basically says that the data is in one partition under xvda.  Note; mine said 32G to start.   I increased it to 64G and am just going back through the process to document it.
  5. Run: sudo growpart /dev/xvda 1
    • This grows partition #1 of /dev/xvda to take up remaining space.
  6. Run: sudo xfs_growfs -d /
    • This tells the root volume to take up the available space in the partition.
  7. After this, you can just do a “df -h” to see the increased partition size.

Note, your volume may take hours to get out of the “optimizing” stage, but it still can be used immediately.

You can view the raw AWS instructions here in case any of this doesn’t line up for you when you go to modify your instance: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-modify-volume.html.

 

AWS Packer Centos 7 Example – Get AMI ID

I was very surprised to see how incredibly hard it is to determine an AMI ID in AWS for use with Packer.

I generally use Centos 7 marketplace images for my servers; e.g. CentOS 7 (x86_64) – with Updates HVM.  There is no place anywhere in the AWS UI or the linked Centos product page to actually find what the AMI ID is in a given region (and it does change per region).

I came across this stack-overflow post which was a life-saver though.  Basically, for us-east-1 as an example, you can run this command using the AWS CLI (yeah, you actually have to use the CLI – that’s how wrong this is).

aws ec2 describe-images \
      --owners aws-marketplace \
      --filters Name=product-code,Values=aw0evgkw8e5c1q413zgy5pjce \
      --query 'Images[*].[CreationDate,Name,ImageId]' \
      --filters "Name=name,Values=CentOS Linux 7*" \
      --region us-east-1 \
      --output table \
  | sort -r

And you get output like this:

|  2019-01-30T23:40:58.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1901_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-05713873c6794f575.4  |  ami-02eac2c0129f6376b  |
|  2018-06-13T15:53:24.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1805_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-77ec9308.4           |  ami-9887c6e7           |
|  2018-05-17T08:59:21.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1804_2-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-55a2322a.4            |  ami-d5bf2caa           |
|  2018-04-04T00:06:30.000Z|  CentOS Linux 7 x86_64 HVM EBS ENA 1803_01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-8274d6ff.4           |  ami-b81dbfc5           |
|  2017-12-05T14:46:53.000Z|  CentOS Linux 7 x86_64 HVM EBS 1708_11.01-b7ee8a69-ee97-4a49-9e68-afaee216db2e-ami-95096eef.4            |  ami-02e98f78           |

The upper one will be the newest and probably the one you want (at least in my case).

I hope that saves you some precious googling time; it took me a while to find it since AWS’s less than admirable documentation on the subject shows up first.
 

AWS + Terraform + Auto Scale Group + User Data Bash Script on Startup to Customize Image

User Data  – On Startup

If you want to customize your VM image on its first start-up, you may want to use “user data”.  You can basically think of this as a script that will be run right after boot-up the very first time.  You can also make it run every reboot apparently (with extra config).

Why would you need this?  Well, in my case, I was spawning up a Presto cluster.  I generally do this in a special HA way… but even if you did it the simple way, you would have 1 coordinator and N workers, and the N workers would have to point at your 1 coordinator.

So, there are 2 interesting things here:

  1. The coordinator and workers are identical barring some slightly different configuration in one file.
  2. The workers need to know about the coordinator in order to use it.

So, for both of these cases, we’d like to run a script on start-up!.

The Terraform Code

When you want to create an auto-scale-group, you have to start by creating a launch template: https://www.terraform.io/docs/providers/aws/r/launch_template.html.

You can use that template to spawn up multiple auto-scale groups when its is done.  The launch template itself has the user data though.  So, you are best off trying to make your user data script generic enough that it can work for all your cases.  It can be a bash file and can use variables, so this isn’t too hard.

If you do need multiple separate user data scripts you’ll have to use separate launch templates, which is not the end of the world either.

The launch template in the link above is very complete, so all I’m going to show you is how to pass a bash script that takes parameters to the user data.

Basically replace:

user_data = "${base64encode(...)}"

In their example with something like this:

user_data = base64encode(templatefile("${path.module}/worker-script.sh", {coordinator_lb = "${aws_lb.coordinator.dns_name}", hive_thrift_csv = "${var.hive_thrift_csv}"}))

Assuming your worker-script has content like this:

#!/bin/bash
echo "Hello World" > /tmp/test-output.txt

and you have the hive_thrift_csv variable defined in your variables file like this:

variable "hive_thrift_csv" {
type = "string"
default = "thrift://ip-addr-1:9083,thrift://ip-addr-2:9083"
}

you should be good. Note, the first variable, definition coordinator_lb = “${aws_lb.coordinator.dns_name}” is a reference to the DNS name from a load balancer created in another part of my terraform config. I left it in as its a good example for a more complex separate variable.