User Data – On Startup
If you want to customize your VM image on its first start-up, you may want to use “user data”. You can basically think of this as a script that will be run right after boot-up the very first time. You can also make it run every reboot apparently (with extra config).
Why would you need this? Well, in my case, I was spawning up a Presto cluster. I generally do this in a special HA way… but even if you did it the simple way, you would have 1 coordinator and N workers, and the N workers would have to point at your 1 coordinator.
So, there are 2 interesting things here:
- The coordinator and workers are identical barring some slightly different configuration in one file.
- The workers need to know about the coordinator in order to use it.
So, for both of these cases, we’d like to run a script on start-up!.
The Terraform Code
When you want to create an auto-scale-group, you have to start by creating a launch template: https://www.terraform.io/docs/providers/aws/r/launch_template.html.
You can use that template to spawn up multiple auto-scale groups when its is done. The launch template itself has the user data though. So, you are best off trying to make your user data script generic enough that it can work for all your cases. It can be a bash file and can use variables, so this isn’t too hard.
If you do need multiple separate user data scripts you’ll have to use separate launch templates, which is not the end of the world either.
The launch template in the link above is very complete, so all I’m going to show you is how to pass a bash script that takes parameters to the user data.
Basically replace:
user_data = "${base64encode(...)}"
In their example with something like this:
user_data = base64encode(templatefile("${path.module}/worker-script.sh", {coordinator_lb = "${aws_lb.coordinator.dns_name}", hive_thrift_csv = "${var.hive_thrift_csv}"}))
Assuming your worker-script has content like this:
#!/bin/bash
echo "Hello World" > /tmp/test-output.txt
and you have the hive_thrift_csv variable defined in your variables file like this:
variable "hive_thrift_csv" {
type = "string"
default = "thrift://ip-addr-1:9083,thrift://ip-addr-2:9083"
}
you should be good. Note, the first variable, definition coordinator_lb = “${aws_lb.coordinator.dns_name}” is a reference to the DNS name from a load balancer created in another part of my terraform config. I left it in as its a good example for a more complex separate variable.