Azure + Packer – Create Image With Only Access to Resource Group (Not Subscription)

What Was the Problem?

I recently had to create a VM image for an Azure scale-set using packer.  Overall, the experience was great… but getting off the ground took me about an hour.  This was because most tutorials/examples assume you have contributor access to the whole subscription, but I wanted to do it with a service principal that just had access to a specific resource group.

Working Configuration

Basically, you just need the right combination (or lack-there-of) of fields.

The tricky ones to get right were the combination of build_resource_group_name, managed_image_resource_group_name, and managed_image_name while leaving out location.

There was a Git Hub issue chain on this (https://github.com/hashicorp/packer/issues/5873) that went on for a very long time before someone finally worked out that you had to leave out location when you wanted to do this without subscription level contributor access.

Here is a reference config file that works if you populate your details:

{
"builders":[
{
"type":"azure-arm",
"client_id":"your-client-id",
"client_secret":"your-client-secret",
"tenant_id":"your-tenant-id",
"subscription_id":"your-subscription",
"build_resource_group_name":"your-existing-rg",
"managed_image_resource_group_name":"your-existing-rg",
"managed_image_name":"your-result-output-image-name",
"os_type":"Linux",
"image_publisher":"OpenLogic",
"image_offer":"CentOS",
"image_sku":"7.5",
"azure_tags":{
"ApplicationName":"Some Sample App"
},
"vm_size":"Standard_D2s_v3"
}
],
"provisioners":[
{
"execute_command":"chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
"inline":[
"yum -y install haproxy-1.5.18-8.el7",
"/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync"
],
"inline_shebang":"/bin/sh -x",
"type":"shell"
}
]
}

[Reblogged] Copy Managed Images Between Subscriptions via Powershell

I recently had to promote a VM image for a scale set between subscriptions.  It turns out this was very complex; but this blog was a life saver.  So, I highly recommend reading it if you need to do this.

This is re-blogged; so just click the link below to see the full original block (which I highly recommend).

Michael S. Collier's avatarMichael S. Collier's Blog

Introduction

Azure Managed Disks were made generally available (GA) in February 2017. Managed Disks greatly simplify working with Azure Virtual Machines (VM) and Virtual Machine Scale Sets (VMSS). They effectively eliminate the need for you to have to worry about Azure Storage accounts and related VHD constraints/limits. When using managed disks for VMs or VMSS, you select the type of disk storage (SSD or HDD) and the size of disk needed. The Azure platform takes care of the rest. Besides the simplified management aspect, managed disks bring several additional benefits, but I’ll not reiterate those here, as there is a lot of good info already available (here, here and here).

While managed disks simplify management of Azure VMs, they also simplify working with VM images. Prior to managed disks, an image would need to be copied to the Storage account where the derived VM would be created…

View original post 1,255 more words

Azure Custom Script Extension – Text File Busy – Centos7.5 – VM Stuck on Creating

What’s Wrong

I’ve been building a scale set on Azure and have repeatedly observed around 40% of my VMs getting stuck on “Creating” in the azure portal.  The scale set uses a custom script VM extension and runs on the Centos 7.5 OS.

Debugging

After looking around online a lot, I came across numerous Git Hub issues against the custom script extension or the Azure Linux agent.  They are for varying OS’s, but they often involve the VM getting stuck in creating.  For example, here is one vs Ubuntu:

If you go to this file “/var/log/azure/custom-script/handler.log”, you can see details about what the custom script extension is doing.  Also note that “/var/log/waagent.log” can be useful as well.

$> vi /var/log/azure/custom-script/handler.log
+ /var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/bin/custom-script-extension install
/var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/bin/custom-script-shim: line 77: /var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/bin/custom-script-extension: Text file busy

In my case, it failed with “Text file busy”. for some reason. Again, there are numerous Git Hub entries for this – but no solutions:

Somewhere else online I saw reports that the Agent was failing while downloading files.  Note that if your plugin download works, you should see the script and more info in this location -> /var/lib/waagent/custom-script/download/1/script-name.sh (in my case, it is not there).

My custom script extension takes a script out of Azure Blob storage… so I’m going to try to bundle that script into the image and just issue the run command from the custom script extension to see if that makes it go away.

Result – Failure

Taking the script out of blob storage and putting it into the VM itself, and just calling it with the custom script extension’s command-to-execute mitigated this issue.  This is unfortunate as internalizing the script means every tweak requires a new image… but at least the scale set can work properly now and be stable :).

Avoiding downloading files made the issue less likely to occur… but it did come back.  It is just rarer.

I tried downgrading the Azure Linux Agent (waagent) to a version noted in one of those Git Hub issues.  It did not help.  I also tried reverting to Centos 7.3 which didn’t help.  I can’t find any way to make this work reliably.

Workaround

My workaround will be:

  • Take all customizations I was doing with the agent.
  • Move them into a packer build (from Hashicorp).
  • Packer will build the image I need for each environment, fully configured and working.
  • This way, I just run the image and don’t worry about modifying its config with the custom script extension.

This is painful and frustrating, so I will also raise the bug with Microsoft while doing the workaround.

 

Running Terraform on Centos7/RHEL7 With Docker

Install Docker

Here is a lean version of the Docker site content that I tested on Centos 7.5.  It yum installs some pre-requisites, adds the stable Docker Community Edition repository to yum, and then installs and starts Docker.

sudo yum install -y yum-utils \
device-mapper-persistent-data \
lvm2
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce
sudo systemctl start docker

Now Docker is started – but only the root user can really use it.  So, let’s create the docker group and add our current user to it.  That way we can use docker with our current user and avoid having to use sudo on every command.

These instructions from from here: https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user.

sudo groupadd docker
sudo usermod -aG docker $USER

After this, please re-log in (e.g. exit out of SSH and jump back into your server) so that your group memberships apply.

Now Docker is running and we can use it as ourselves.

Get Terraform Working in Docker

We will run Terraform as a single command inside of a Docker image.  So, let’s start by getting the latest Terraform image form Hashicorp:

docker pull hashicorp/terraform

Create a directory for your Terraform work and give ownership to your user. Also create a sub-directory to act as the Docker volume in which we will put your Terraform plans.

sudo mkdir /opt/terraform && sudo chown $USER:$USER /opt/terraform
cd /opt/terraform
mkdir tf-vol

Now let’s create a file at /opt/terraform/tf-vol/plan.tf with a sample Terraform plan (just a debug one).

output "test" {
  value = "Hello World!"
}

After this, we can run Terraform and tell docker to use that tf-vol directory as as a volume. Terraform will use it as the working directory, will find our plan, and will display “Hello World!”.

$ docker run -i -t -v /opt/terraform/tf-vol:/tf-vol/ -w /tf-vol/ hashicorp/terraform:light apply

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

test = Hello World!

So, we now have Docker installed, and Terraform running with it using an external volume to store our plans.

Ansible – Refer to Host in Group by Index

Occasionally it is very useful to refer to a host in a group by an index.  For example, if you are setting up Apache or HAProxy, you may need to push a configuration file out to each host that can redirect to all other hosts.

It is actually quite easy to refer to the hosts in a group by index, but its not necessarily easy to google it unfortunately.  Here is the syntax for the first 3 hosts in a group:

{{groups['coordinators'][0]}}
{{groups['coordinators'][1]}}
{{groups['coordinators'][2]}}