Azure – Linux VM Image Creation – Powershell – With Service Principal/Account

Overview

I was working on creating generalized VM images for use with scale sets and auto-scaling and I found it rather painful to get the complete set of examples for:

  1. De-provision user/etc from VM.
  2. Use Azure Powershell with a Service principal.
  3. Generalize the VM and create an image.

So, here’s a short mostly-code post on how to do that.

Specific Steps

Fair warning… as far as I know, you can’t use the VM after doing this… but you can create a new copy of it from the image, so that doesn’t matter much.

Before getting to Powershell, run this in your VM to de-provision the most recently set up user account (e.g. I’ll install everything on user “john” created with the Azure VM).  This will remove that user.

sudo waagent -deprovision+user

Now, just run the below command after setting your own values for the 5 variables up top.  This will log in to the RM with the credentials you provide in the pop-up, and then it will stop and generalize the VM, adn tehn create an image from it and store that image in the same resource group as the VM.

$vmName = "YOUR_VM_NAME"
$rgName = "YOUR_RG_NAME"
$location = "YOUR_REGION"
$imageName = "YOUR_IMAGE_NAME"
$tenant = "YOUR_TENANT_ID"

$c = Get-Credential # Input your service principal client-id/secret.
Connect-AzureRmAccount -Credential $c -ServicePrincipal -Tenant $tenant

Stop-AzureRmVM -ResourceGroupName $rgName -Name $vmName -Force
Set-AzureRmVm -ResourceGroupName $rgName -Name $vmName -Generalized
$vm = Get-AzureRmVM -Name $vmName -ResourceGroupName $rgName
$image = New-AzureRmImageConfig -Location $location -SourceVirtualMachineId $vm.Id
New-AzureRmImage -Image $image -ImageName $imageName -ResourceGroupName $rgName

Configuration Trouble?

  • If you’re not sure what a service account / principal is or how to create one, the process is quite involved and I highly recommend following one of the many Microsoft-provided tutorials.
  • You can find your tenant ID by clicking the directory + subscription button at the top of the portal OR by hovering over your name/info at the top right corner.
  • The region strings can be tricky; but just Google the Microsoft site if you’re not sure.  A US East 2 example is “EastUS2”.

What’s Next?

Your VM image can now be found in that resource group – go to the portal and see.  You can go into the image in the portal and create a new VM from it, or you can use it to boot up a scale set, etc.

Centos7 and RHEL7 Increasing Open File Descriptors & Process Limits (AND SystemD / SystemCTL!)

What’s the Problem?

When deploying on RHEL7 or Centos7, it is fairly common to see a warning like the following one (which I just got while installing Presto from Facebook):

WARNING: Current OS file descriptor limit is 4096. Presto recommends at least 8192.

There are a variety of these issues… but the basic problem is that your OS has set limits for things and sometimes we need to raise those limits depending on what we’re running (especially when we’re running large apps on large servers).

The ulimit being referred to here always ends up being extra hard to edit as you have to do it in multiple places and most blogs/posts don’t cover them all for some reason (having suffered through it multiple times now, I know that).

How Do We View the Limits?

In this warning, we see that the “OS File Descriptor” limit is 4096 currently.  So, lets look at the current settings with the “ulimit -a” command:

$> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257564
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

We can see in here that “max user processes” is 4096.  We can also see another option, open files, is 1024.

So, let’s increase both of those (only the first is relevant to the warning though).

Increasing the Limits

Edit/etc/sysctl.conf and add:

fs.file-max = 65536

Edit /etc/security/limits.conf  and add:

* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535

For some reason, the proc limit is also defined in a separate file located roughly at this path (the number can vary) – so please edit /etc/security/limits.d/20-nproc.conf  and make the contents into the following:

* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
root soft nproc unlimited

That last one is the one that most places miss.

Verifying the New Limits

Here’s the last tricky part… if you run “ulimit -a” again now, it won’t really look much better.  So, re-log-in to your shell/server and then run it, and you’ll see the settings are now updated (yay!).

$> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257564
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 65535
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

But What About SystemD and SystemCTL?

I felt victorious at this point, but alas, when I ran presto and haproxy they both spit out warnings and/or errors again for the same reason.  What is this!?

It turns out I was running both in SystemD, and SystemD has its own way of managing these things.  So, in that case, the final step is to go to your unit file in /etc/systemd/system/your-app.service and add the following inside the [Service] section (the … just implies there may be content above or below it, just add those two properties in the existing section).

[Service]
...
LimitNPROC=65535
LimitNOFILE=65535
...

After adding that you should do a “sudo systemctl daemon-reload” and “sudo systemctl restart your-app” to apply the settings.

And finally, everything is right with the world!

Docker + Windows 10 – Volume Mount Shows No Files // Firewall

I wasted roughly an hour on this two separate times now.  Basically, my docker volume mount would stop showing files.

I dug through endless git hub pages and error reports, tried making the docker NAT private and everything… but the problem ended up being that I went home from work and was using my VPN!

So, before spending too much time on the complicated solutions you find online; just start by disabling your VPN if you have one running and see if that helps first.

Centos7 / RHEL7 Services with SystemD + Systemctl For Dummies – Presto Example

History – SystemV & Init.d

Historically in Centos and RHEL, you would use system-v to run a service.  Basically an application (e.g. Spring Boot) would provide an init-d script and you would either place it in /etc/init.d or place a symbolic link from there to your script.

The scripts would have functions for start/stop/restart/status and they would follow some general conventions.  Then you could use “chkconfig” to turn the services on so they would start with the sysem when it rebooted.

SystemD and SystemCTL

Things have moved on a bit and now you can use SystemD instead.  It is a very nice alternative.  Basically, you put a “unit” file in /etc/systemd/system/.service.  This unit file has basic information on what type of application you are trying to run and how it works.  You can specify the working directory, etc as well.

Here is an example UNIT file for Facebook’s Presto application.  We would place this at /etc/systemd/system/presto.service.

[Unit]
Description=Presto
After=syslog.target network.target

[Service]
User=your-user-here
Type=forking
ExecStart=/opt/presto/current/bin/launcher start
ExecStop=/opt/presto/current/bin/launcher stop
WorkingDirectory=/opt/presto/current/bin/
Restart=always

[Install]
WantedBy=multi-user.target

Here are the important things to note about this:

  1. You specify the user the service will run as – it should have access to the actual program location.
  2. Type can be “forking” or “simple”.  Forking implies that you have specific start and stop commands to manage the service (i.e. it kind of manages itself).  Simple implies that you’re just running something like a bash script or a Java JAR that runs forever (so SystemD will just make sure to start it with the command you give and restart it if it fails).
  3. Restart=always will make sure that, as long as you had it started in the first place, it starts whenever it does.  Try it; just kill -9 your application and it will come back.
  4. The install section is critical if you want the application to start up when the computer reboots.  You can not enable it for restart without this.

Useful Commands

  • sudo systemctl status presto (or your app name) –> current status.
  • sudo systemctl stop presto
  • sudo systemctl start presto
  • sudo systemctl restart presto
  • sudo systemctl enable presto -> enable for starting on reboot of server.
  • sudo systemctl disable presto -> don’t start on reboot of server.
  • sudo systemctl is-enabled presto; echo $? –> show if it is currently enabled for start-on-boot.

Azure Key Vault Usage

If you want to store passwords or certificates securely and have them separated from your application code, then Azure Key Vaults are a wonderful option.

You can even set up key vaults so that you can access them without providing a client ID, etc. which makes them ultra secure as you don’t have to provide your credentials in your code or config files.

Creating a Key Vault

To set up a key vault, you just:

  • Go to All Services in the portal.
  • Search for Key Vault.
  • Click create and then provide a name, resource group, and region.
    • Remember, all of your resources in Azure have to go into a resource group so they are logically identified and manageable.

Assigning Users

When you’re programmatically accessing resources in Azure, you always need a service principal.  You can get this by creating an azure App Registration.   This is involved, and if you’re doing this you probably already have one.  If not though, you can refer to this Microsoft tutorial for creating a service principal.

Assuming you have the principal ready, go into your vault in the portal and click “Access Policies”.  In here, you can pick which things you need to manage from a template, then give your service principal name and create.

Remember, after you do this and it shows the created one on the summary page, you STILL have to click “Save” at the top.  If you don’t, it’s not really there.  When you’re done refresh the web page with F5 to make sure it’s really there.

Adding Secrets

Adding secrets/passwords is simple.  Just click “Secrets” and then the (+) sign and type in your name/value.

Querying Secrets From an Application

This is very language dependent.  Microsoft has great tutorials for every language though.  Here are two for Python and Java for example:

Managed Service Identity

Now, we still have one problem here.  The key vault holds all of our passwords which is great… but we need a service principal (with a password) to access the vault.  So, if we leave that in our code or config files, we’re no better off in reality.

The final step is to read up on Managed Service Identities which let you configure a machine to securely talk to a key vault without providing the principal information.  This way your code and deployment config is 100% free of any passwords/etc.

Connecting to Hive from Python

I was using Hive via Presto for a project, but then I ran into an issue where Presto cannot support Hive views.  So, to be kind to the user, I wanted to present the view definition so they could see how to query the underlying tables.

Unfortunately, you can’t get view definitions from Presto either! So, I had to directly query hive from a Python project.

Two Options

There are two options that I found for achieving this, and surprisingly neither one was great.  You would think this was easy right!?

  1. Use PyHive – This is the standard connector you would have expected to find, except it does not install and/or work on Windows.  So, if you develop on Windows and deploy to Linux, it is painful.  Also, you need some other things on the system for it to work which can be painful to find.
  2. Use JayDeBeApi – This uses the Java JAR to connect to Hive which means it needs Java installed on your machine.  DO NOT USE THIS – I quickly ran into a critical bug that happens on both Windows and Linux – if you open one connection, do work, and close it, you cannot open another connection.  It happens on Windows and Linux.  There is a git story for it and the person had to resort to putting it in another script and calling it as a sub-process for each command which is ridiculous.

So, as I’m deploying on Linux (even though I develop on Windows), PyHive wins.

More on PyHive

So, to install PyHive, you would do the following (but it probably won’t work yet, at least not on Centos7 where I tried it).

pip install pyhive[hive]

Additional Dependencies

In order to get “pyhive[hive]” to install on a server (I tested with Centos7), you have to ensure some other dependencies are available as well.

I was working from Python 3.6 in a virtual environment, and the following worked properly:

sudo yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64
pip install pyhive[hive]

Windows Development

Note that if you do the install without the extra [hive] you will not get all the dependencies.  The reason they’re broken out is this technically supports both Hive and Presto, and that means you get to pick which dependencies you need.

This is a mixed blessing; you can install the package on Windows and develop without the extra [hive] but if you try to execute the code it will fail.  To run it on Linux you need the full set of dependencies.

I recommend guarding the pyhive import and any related code in your project with if os.name != “nt”: in order to ensure you can run through on Windows without getting errors.  Hopefully your project is like mine where this is a side case and I can test plenty without the final calls.

Query Code

The following is a short example of how to do a query from PyHive assuming you have it all set up properly as we talked about above.

conn = None
cursor = None

try:
    query = "describe extended ``.``"
    conn = hive.Connection(host="host-name", port="10000")

    cursor = conn.cursor()
    cursor.execute(query)
    query_results = cursor.fetchall()
    column_names = [part[0] for part in cursor.description]
    df = pd.DataFrame(query_results, columns=column_names)

except Exception as ex:
    logger.info("Error while pulling view details.", ex)
    raise ex

finally:

    if cursor is not None:
        cursor.close()
    if conn is not None:
        conn.close()

My VI Cheat Sheet

For years, I’ve been somewhat avoiding learning any advanced features of VIM. I have always predominantly relied on desktop editors for anything complex and just use VI to do basic text modification.

Anyway, I’m finally trying to change that. So, I’ll start forcing myself to do things in VIM and will record the keys here over time. I’m just starting with one command though; so it’ll be a while before this is useful! 🙂

My Cheat Sheet

Remember, generally you want to press “esc” before doing these.

  • Search Forward & Backwards
    • Forward = /search-term
    • Backward = ?search-term
  • Show or Hide Line Numbers
    • : set number
    • :set nonumber
  • Edit Multiple Lines (e.g. Block Comment Lines 10-20 With #)
    • :10,20s/^/#/
  • Clear Highlight After Search
    • There are some fancy ways, but just search for something that won’t exist and it will clear.  For example:
      • /blahfwoeaf