Azure Scale Set vs Availability Set

Why Was I Worried?

I have been habitually using scale sets for all of my needs as long as my requirements only involved needing multiple copies of a VM image running safely. Then I started to worry about the difference between a scale set and an availability set… were my scale set VMs not safe?

TLDR; I actually read Azure and Azure CLI documentation and made a simple but cool command below that put my mind at ease for scale sets, so feel free to skip to that if you like.

Research

There is a good stack overflow right here which I added to just now.  It has quite a few good answers about availability sets vs scale sets, including some info about a scale set by default having 5 fault domains.  So, I recommend starting there if you’re interested in digging in.

A good summary of what I found is that:

  • Availability sets by default will spread your resources over fault domains to ensure that outage of one due to a power or network issue, etc does not affect another.
  • Availability sets also allow mixing of resources; e.g. 2 VMs with different configuration.
  • Scale sets only allow you to have an identical image deployed and they provide the ability to scale it out linearly.
  • Scale sets implicitly have one “placement group”.  If you want to go over 100 VMs, you have to remove that restriction.
  • A placement group has 5 fault domains and is similar (or maybe the same as) an availability set.

Validation

As I’m responsible for highly available infrastructure, I wasn’t keen on just accepting this.  So, I fiddled around with the Azure CLI for scale sets and made this simple command which indeed shows my 10 instance scale set is indeed spread across multiple fault domains – I hope you find it useful too.

az vmss get-instance-view --subscription "your-subscription-id" \ 
--resource-group "your-rg" --name "your-scale-set-name" \
--instance-id "*" | grep platformFaultDomain

    "platformFaultDomain": 0,
    "platformFaultDomain": 1,
    "platformFaultDomain": 2,
    "platformFaultDomain": 4,
    "platformFaultDomain": 0,
    "platformFaultDomain": 1,
    "platformFaultDomain": 3,
    "platformFaultDomain": 4,
    "platformFaultDomain": 2,
    "platformFaultDomain": 3

Here are some additional good resources:

 

MRemoteNG – SSH – Connect to Azure VM

What is MRemoteNG?

MRemoteNG is a nice Windows OS tool for managing multiple SSH sessions (and session configurations) in one window – so you can log onto 10 servers and hop around trivially.  It is built on top of Putty.

How Do You Use It With Azure VMs?

  • When you create a VM in Azure, you give it a public key (assuming you didn’t use password authentication, which you should generally avoid).
  • You can generate a key pair with PuTTYGen if you don’t have one (but then I’m assuming that you do have one if you already created the VM).
  • Take the private key corresponding to that public key and save it into a file (it may already be in an “id_rsa” file in your .ssh directory in your user directory; e.g. C:\users\your-name\.ssh\id_rsa).
  • Open PuTTYgen (it should come with MRemoteNG or Putty, otherwise you can get it yourself.
  • Load the private key file.
  • Click “Save private key” with Type = RSA selected (2048 bits is fine).  It will save as a “PPK” file.
  • Save it to your .ssh folder for consistency, or anywhere else – it really doesn’t matter much.
  • Open MRemoteNG -> Tools -> Options -> Advanced -> Launch Putty -> Expand “SSH” -> Click Auth (Don’t expand) -> Put your PPK file path in “Private key file for authentication”.
  • Click Session in putty and give the session a name in the “Saved Sessions” text box and then click Save.  It should appear in the box below that.
  • Now you have a saved session that can use this private key via a PPK file.
  • Close Putty, make a new connection in MRemoteNG and select “Putty Session” = the new session you saved.  It should be listed as an option.
  • Celebrate!

Ansible – Refer to Host in Group by Index

Occasionally it is very useful to refer to a host in a group by an index.  For example, if you are setting up Apache or HAProxy, you may need to push a configuration file out to each host that can redirect to all other hosts.

It is actually quite easy to refer to the hosts in a group by index, but its not necessarily easy to google it unfortunately.  Here is the syntax for the first 3 hosts in a group:

{{groups['coordinators'][0]}}
{{groups['coordinators'][1]}}
{{groups['coordinators'][2]}}

Azure – Linux VM Image Creation – Powershell – With Service Principal/Account

Overview

I was working on creating generalized VM images for use with scale sets and auto-scaling and I found it rather painful to get the complete set of examples for:

  1. De-provision user/etc from VM.
  2. Use Azure Powershell with a Service principal.
  3. Generalize the VM and create an image.

So, here’s a short mostly-code post on how to do that.

Specific Steps

Fair warning… as far as I know, you can’t use the VM after doing this… but you can create a new copy of it from the image, so that doesn’t matter much.

Before getting to Powershell, run this in your VM to de-provision the most recently set up user account (e.g. I’ll install everything on user “john” created with the Azure VM).  This will remove that user.

sudo waagent -deprovision+user

Now, just run the below command after setting your own values for the 5 variables up top.  This will log in to the RM with the credentials you provide in the pop-up, and then it will stop and generalize the VM, adn tehn create an image from it and store that image in the same resource group as the VM.

$vmName = "YOUR_VM_NAME"
$rgName = "YOUR_RG_NAME"
$location = "YOUR_REGION"
$imageName = "YOUR_IMAGE_NAME"
$tenant = "YOUR_TENANT_ID"

$c = Get-Credential # Input your service principal client-id/secret.
Connect-AzureRmAccount -Credential $c -ServicePrincipal -Tenant $tenant

Stop-AzureRmVM -ResourceGroupName $rgName -Name $vmName -Force
Set-AzureRmVm -ResourceGroupName $rgName -Name $vmName -Generalized
$vm = Get-AzureRmVM -Name $vmName -ResourceGroupName $rgName
$image = New-AzureRmImageConfig -Location $location -SourceVirtualMachineId $vm.Id
New-AzureRmImage -Image $image -ImageName $imageName -ResourceGroupName $rgName

Configuration Trouble?

  • If you’re not sure what a service account / principal is or how to create one, the process is quite involved and I highly recommend following one of the many Microsoft-provided tutorials.
  • You can find your tenant ID by clicking the directory + subscription button at the top of the portal OR by hovering over your name/info at the top right corner.
  • The region strings can be tricky; but just Google the Microsoft site if you’re not sure.  A US East 2 example is “EastUS2”.

What’s Next?

Your VM image can now be found in that resource group – go to the portal and see.  You can go into the image in the portal and create a new VM from it, or you can use it to boot up a scale set, etc.

Centos7 and RHEL7 Increasing Open File Descriptors & Process Limits (AND SystemD / SystemCTL!)

What’s the Problem?

When deploying on RHEL7 or Centos7, it is fairly common to see a warning like the following one (which I just got while installing Presto from Facebook):

WARNING: Current OS file descriptor limit is 4096. Presto recommends at least 8192.

There are a variety of these issues… but the basic problem is that your OS has set limits for things and sometimes we need to raise those limits depending on what we’re running (especially when we’re running large apps on large servers).

The ulimit being referred to here always ends up being extra hard to edit as you have to do it in multiple places and most blogs/posts don’t cover them all for some reason (having suffered through it multiple times now, I know that).

How Do We View the Limits?

In this warning, we see that the “OS File Descriptor” limit is 4096 currently.  So, lets look at the current settings with the “ulimit -a” command:

$> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257564
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

We can see in here that “max user processes” is 4096.  We can also see another option, open files, is 1024.

So, let’s increase both of those (only the first is relevant to the warning though).

Increasing the Limits

Edit/etc/sysctl.conf and add:

fs.file-max = 65536

Edit /etc/security/limits.conf  and add:

* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535

For some reason, the proc limit is also defined in a separate file located roughly at this path (the number can vary) – so please edit /etc/security/limits.d/20-nproc.conf  and make the contents into the following:

* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
root soft nproc unlimited

That last one is the one that most places miss.

Verifying the New Limits

Here’s the last tricky part… if you run “ulimit -a” again now, it won’t really look much better.  So, re-log-in to your shell/server and then run it, and you’ll see the settings are now updated (yay!).

$> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257564
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 65535
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

But What About SystemD and SystemCTL?

I felt victorious at this point, but alas, when I ran presto and haproxy they both spit out warnings and/or errors again for the same reason.  What is this!?

It turns out I was running both in SystemD, and SystemD has its own way of managing these things.  So, in that case, the final step is to go to your unit file in /etc/systemd/system/your-app.service and add the following inside the [Service] section (the … just implies there may be content above or below it, just add those two properties in the existing section).

[Service]
...
LimitNPROC=65535
LimitNOFILE=65535
...

After adding that you should do a “sudo systemctl daemon-reload” and “sudo systemctl restart your-app” to apply the settings.

And finally, everything is right with the world!

Docker + Windows 10 – Volume Mount Shows No Files // Firewall

I wasted roughly an hour on this two separate times now.  Basically, my docker volume mount would stop showing files.

I dug through endless git hub pages and error reports, tried making the docker NAT private and everything… but the problem ended up being that I went home from work and was using my VPN!

So, before spending too much time on the complicated solutions you find online; just start by disabling your VPN if you have one running and see if that helps first.