Azure VM Unresponsive, Can’t SSH

My VM Was Non-Responsive

Today I had an Azure virtual machine go down very unexpectedly.

I received error reports from users and tried to go to the related service endpoint myself… and sure enough, it didn’t come up.  Then, I tried to ssh onto the VM and I couldn’t.

I hopped into the Azure portal, went to the VM, and things actually looked alright… it wasn’t stopped, or de-allocated, or anything.

Why?

After multiple minutes of digging around the Azure portal for more information, suddenly the “Activity Log” popped up with a new entry.   This was relatively disconcerting as the issue had been reported over half an hour ago and I had been on the portal for multiple minutes.

The activity log said I had a “health event” which was “updated”.  Upon expanding it, I could see more events that had been “in progress”.  When you click the “in progress” event, you can get JSON for it and look into the details.  In my case, the bottom of the details said this:

    "properties": {
        "title": "We're sorry, your virtual machine isn't available because an unexpected failure on the host server",
        "details": null,
        "currentHealthStatus": "Unavailable",
        "previousHealthStatus": "Unknown",
        "type": "Downtime",
        "cause": "PlatformInitiated"
    }

So, the physical host which was running my VM in azure died. Azure automatically noticed this and moved it to a new physical host, though much slower than I would have appreciated.

The VM came up after a few more minutes and all was right with the world. So… the moral of the story is that if your VM is unresponsive, it may be because the host died, and you may have to wait quite a while to see information on that in the activity log. But it does auto resolve apparently which is nice.

Azure CLI Get Scale Set Private IP Addresses

Getting Scale Set Private IPs is Hard

I have found that it is impressively difficult to get the private IP addresses of Azure scale set instances in almost every tool.

For example, if you go and create a scale set in Terraform, even Terraform will not provide you the addresses or a way to look them up to act upon them in future steps.  Similarly, you cannot easily list the addresses in Ansible.

You can make dynamic inventories in Ansible based on scripts though.  So, in order to make an ansible playbook target the nodes in a recently created scale set dynamically, I decided to use a dynamic inventory created by the Azure CLI.

Azure CLI Command

Here is an azure CLI command (version 2.0.58) which directly lists the IP addresses of scale set nodes.  I hope it helps you as it has helped me.  It took a while to build it out from the docs but its pretty simple now that it’s done.

az vmss nic list --resource-group YourRgName \
--vmss-name YourVmssName \
--query "[].ipConfigurations[].privateIpAddress"

The output will look similar to this, though I just changed the IP addresses to fake ones here an an example.

[
"123.123.123.123",
"123.123.123.124"
]

Using SystemD Timers (Like Cron) – Centos

The more I use it, the more I like using SystemD to manage services.  It is sometimes verbose, but in general, it is very easy to use and very powerful.  It can restart failed services, start services automatically, handle logging and rolling for simplistic services, and much more.

SystemD Instead of Cron

Recently, I had to take a SystemD service and make it run on a regular timer.  So, of course, my first thought was using cron… but I’m not a particularly big fan of cron for reasons I won’t go into here.

I found out that SystemD can do this as well!

Creating a SystemD Timer

Let’s say you have a (very) simple service defined like this in SystemD (in /etc/systemd/system/your-service.service):

[Unit]
Description=Do something useful.

[Service]
Type=simple
ExecStart=/opt/your-service/do-something-useful.sh
User=someuser
Group=someuser

Then you can create a timer for it by creating another SystemD file in the same location but ending with “.timer” instead of “.service”. It can handle basically any interval, can start at reboot, and can even time tasks based on the reboot time.

[Unit]
Description=Run service once daily.

[Timer]
OnCalendar=*-*-* 00:30:00
Unit=your-service.service

[Install]
WantedBy=multi-user.target

Once the timer file is made, you can enable it like to:

sudo systemctl daemon-reload
sudo systemctl enable your-service.timer
sudo systemctl start your-service.timer

After this, you can verify the timer is working and check its next (and later on, previous) execution time with this command:

$> systemctl list-timers --all
NEXT                         LEFT          LAST                         PASSED  UNIT                         ACTIVATES
Tue 2019-03-05 20:00:01 UTC  14min left    Mon 2019-03-04 20:00:01 UTC  23h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Wed 2019-03-06 00:30:00 UTC  4h 44min left n/a                          n/a     your-service.timer     your-service.service
n/a                          n/a           n/a                          n/a     systemd-readahead-done.timer systemd-readahead-done.service

SystemD/SystemCTL Tail or View Service Log (Centos 7)

By default, systemd services will log their output to /var/log/messages, and you can view these messages with journalctl commands.

To tail the logs for a specific service you are running, you can simply do the following.

journalctl -u service-name -f

Just remove the -f if you want to view the log in general.

You can also just lazily use -f without the -u and service name if you want to tail recent system logs, which is often useful.

Snowflake SQL compilation error: Object does not exist – Schema / Case Sensitive

Recently I was having strange issues while trying to grant a role access to a database schema in snowflake.

The schema was manually created after a migration from another database, and its name was in lower-case – e.g. MYDATABASE.”dbo”, “dbo” being the schema name.

Auto Upper Case + Schema Case Sensitivity

What I realized after a short while was that all SQL identifiers you place into Snowflake SQL are automatically made upper case.  Snowflake cares about schema case sensitivity though.

So, unless you’ve been going around and adding double-quotes around all your database/schema/table names while creating them, almost everything you have will be in upper case.

When you do create things in lower-case manually with quoting, you have to go around adding quotes to them in every query to ensure they are actually given in lower-case to the database.  For example, SELECT * FROM mydatabase.dbo.mytable will implicitly become SELECT * FROM MYDATABASE.DBO.MYTABLE.  So, if “dbo” is the real name and not “DBO” for the schema, you actually need to do SELECT * FROM MYDATABASE.”dbo”.MYTABLE instead.

Note, this assumes MYDATABASE and MYTABLE were created in upper-case or without quoting.

Final Thoughts

I personally feel that you should avoid quoting and let everything be upper case.  If you did have to create things in lower-case, then I suggest always using quoting everywhere.  Anything in between the two will get confusing.