Centos7 / RHEL7 Services with SystemD + Systemctl For Dummies – Presto Example

Posted on December 19, 2018 by John Humphreys

History – SystemV & Init.d

Historically in Centos and RHEL, you would use system-v to run a service. Basically an application (e.g. Spring Boot) would provide an init-d script and you would either place it in /etc/init.d or place a symbolic link from there to your script.

The scripts would have functions for start/stop/restart/status and they would follow some general conventions. Then you could use “chkconfig” to turn the services on so they would start with the sysem when it rebooted.

SystemD and SystemCTL

Things have moved on a bit and now you can use SystemD instead. It is a very nice alternative. Basically, you put a “unit” file in /etc/systemd/system/.service. This unit file has basic information on what type of application you are trying to run and how it works. You can specify the working directory, etc as well.

Here is an example UNIT file for Facebook’s Presto application. We would place this at /etc/systemd/system/presto.service.

[Unit]
Description=Presto
After=syslog.target network.target

[Service]
User=your-user-here
Type=forking
ExecStart=/opt/presto/current/bin/launcher start
ExecStop=/opt/presto/current/bin/launcher stop
WorkingDirectory=/opt/presto/current/bin/
Restart=always

[Install]
WantedBy=multi-user.target

Here are the important things to note about this:

You specify the user the service will run as – it should have access to the actual program location.
Type can be “forking” or “simple”. Forking implies that you have specific start and stop commands to manage the service (i.e. it kind of manages itself). Simple implies that you’re just running something like a bash script or a Java JAR that runs forever (so SystemD will just make sure to start it with the command you give and restart it if it fails).
Restart=always will make sure that, as long as you had it started in the first place, it starts whenever it does. Try it; just kill -9 your application and it will come back.
The install section is critical if you want the application to start up when the computer reboots. You can not enable it for restart without this.

Useful Commands

sudo systemctl status presto (or your app name) –> current status.
sudo systemctl stop presto
sudo systemctl start presto
sudo systemctl restart presto
sudo systemctl enable presto -> enable for starting on reboot of server.
sudo systemctl disable presto -> don’t start on reboot of server.
sudo systemctl is-enabled presto; echo $? –> show if it is currently enabled for start-on-boot.

Azure Key Vault Usage

Posted on December 17, 2018 by John Humphreys

If you want to store passwords or certificates securely and have them separated from your application code, then Azure Key Vaults are a wonderful option.

You can even set up key vaults so that you can access them without providing a client ID, etc. which makes them ultra secure as you don’t have to provide your credentials in your code or config files.

Creating a Key Vault

To set up a key vault, you just:

Go to All Services in the portal.
Search for Key Vault.
Click create and then provide a name, resource group, and region.
- Remember, all of your resources in Azure have to go into a resource group so they are logically identified and manageable.

Assigning Users

When you’re programmatically accessing resources in Azure, you always need a service principal. You can get this by creating an azure App Registration. This is involved, and if you’re doing this you probably already have one. If not though, you can refer to this Microsoft tutorial for creating a service principal.

Assuming you have the principal ready, go into your vault in the portal and click “Access Policies”. In here, you can pick which things you need to manage from a template, then give your service principal name and create.

Remember, after you do this and it shows the created one on the summary page, you STILL have to click “Save” at the top. If you don’t, it’s not really there. When you’re done refresh the web page with F5 to make sure it’s really there.

Adding Secrets

Adding secrets/passwords is simple. Just click “Secrets” and then the (+) sign and type in your name/value.

Querying Secrets From an Application

This is very language dependent. Microsoft has great tutorials for every language though. Here are two for Python and Java for example:

Managed Service Identity

Now, we still have one problem here. The key vault holds all of our passwords which is great… but we need a service principal (with a password) to access the vault. So, if we leave that in our code or config files, we’re no better off in reality.

The final step is to read up on Managed Service Identities which let you configure a machine to securely talk to a key vault without providing the principal information. This way your code and deployment config is 100% free of any passwords/etc.

Connecting to Hive from Python

Posted on December 11, 2018 by John Humphreys

I was using Hive via Presto for a project, but then I ran into an issue where Presto cannot support Hive views. So, to be kind to the user, I wanted to present the view definition so they could see how to query the underlying tables.

Unfortunately, you can’t get view definitions from Presto either! So, I had to directly query hive from a Python project.

Two Options

There are two options that I found for achieving this, and surprisingly neither one was great. You would think this was easy right!?

Use PyHive – This is the standard connector you would have expected to find, except it does not install and/or work on Windows. So, if you develop on Windows and deploy to Linux, it is painful. Also, you need some other things on the system for it to work which can be painful to find.
Use JayDeBeApi – This uses the Java JAR to connect to Hive which means it needs Java installed on your machine. DO NOT USE THIS – I quickly ran into a critical bug that happens on both Windows and Linux – if you open one connection, do work, and close it, you cannot open another connection. It happens on Windows and Linux. There is a git story for it and the person had to resort to putting it in another script and calling it as a sub-process for each command which is ridiculous.

So, as I’m deploying on Linux (even though I develop on Windows), PyHive wins.

More on PyHive

So, to install PyHive, you would do the following (but it probably won’t work yet, at least not on Centos7 where I tried it).

pip install pyhive[hive]

Additional Dependencies

In order to get “pyhive[hive]” to install on a server (I tested with Centos7), you have to ensure some other dependencies are available as well.

I was working from Python 3.6 in a virtual environment, and the following worked properly:

sudo yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64
pip install pyhive[hive]

Windows Development

Note that if you do the install without the extra [hive] you will not get all the dependencies. The reason they’re broken out is this technically supports both Hive and Presto, and that means you get to pick which dependencies you need.

This is a mixed blessing; you can install the package on Windows and develop without the extra [hive] but if you try to execute the code it will fail. To run it on Linux you need the full set of dependencies.

I recommend guarding the pyhive import and any related code in your project with if os.name != “nt”: in order to ensure you can run through on Windows without getting errors. Hopefully your project is like mine where this is a side case and I can test plenty without the final calls.

Query Code

The following is a short example of how to do a query from PyHive assuming you have it all set up properly as we talked about above.

conn = None
cursor = None

try:
    query = "describe extended ``.``"
    conn = hive.Connection(host="host-name", port="10000")

    cursor = conn.cursor()
    cursor.execute(query)
    query_results = cursor.fetchall()
    column_names = [part[0] for part in cursor.description]
    df = pd.DataFrame(query_results, columns=column_names)

except Exception as ex:
    logger.info("Error while pulling view details.", ex)
    raise ex

finally:

    if cursor is not None:
        cursor.close()
    if conn is not None:
        conn.close()

My VI Cheat Sheet

Posted on November 27, 2018 by John Humphreys

For years, I’ve been somewhat avoiding learning any advanced features of VIM. I have always predominantly relied on desktop editors for anything complex and just use VI to do basic text modification.

Anyway, I’m finally trying to change that. So, I’ll start forcing myself to do things in VIM and will record the keys here over time. I’m just starting with one command though; so it’ll be a while before this is useful! 🙂

My Cheat Sheet

Remember, generally you want to press “esc” before doing these.

Search Forward & Backwards
- Forward = /search-term
- Backward = ?search-term
Show or Hide Line Numbers
- : set number
- :set nonumber
Edit Multiple Lines (e.g. Block Comment Lines 10-20 With #)
- :10,20s/^/#/
Clear Highlight After Search
- There are some fancy ways, but just search for something that won’t exist and it will clear. For example:
  - /blahfwoeaf

Logging in Python 3 (Like Java Log4J/Logback)

Posted on November 26, 2018 by John Humphreys

What is Proper Logging?

Having a proper logger is essential for any production application. In the Java world, almost every framework automatically pulls in Logback or Log4J, and libraries tend to use SLF4J in order to be logger agnostic and to wire up to these loggers. So, I had to set out to see how to do similar logging in python.

While it can get fancier, I think the following things are essential when setting up a logger; so they were what I was looking for:

It should be externally configured from a file that your operations team can change.
It should write to a file automatically, not just console.
It should roll the file it writes to at a regular size (date or time rolling on top of that can be beneficial too; but the size restriction ensures you won’t fill up your disk with a ton of logs and break your applications).
It should keep a history of a few previous rolled files to aid debugging.
It should use a format that specifies both the time of the logs and the class that logged them.

On top of these, obviously we must be able to log at different levels and filter out which logs go to the file easily. This way, when we have issues, operations can jack up the logging level and figure out what is going wrong as needed.

How Do We Do it in Python 3?

It turns out that Python actually has a strong logging library built into its core distribution. The only extra library I had to add to use it was PyYAML, and even that could have been avoided (Python supports JSON out of the box and that could be used instead, but people seem to prefer YAML configuration in the community).

In the place where your app starts up, write the following code. Note that you have to install the PyYAML module yourself. Also, this expects the “logging.yaml” to be in the same directory as the startup code (change that if you like though). We’ll show the “logging.yaml” content lower.

import logging
import logging.config
import yaml

# Initialize the logger once as the application starts up.
with open("logging.yaml", 'rt') as f:
config = yaml.safe_load(f.read())
logging.config.dictConfig(config)

# Get an instance of the logger and use it to write a log!
# Note: Do this AFTER the config is loaded above or it won't use the config.
logger = logging.getLogger(__name__)
logger.info("Configured the logger!")

Then, when you want to use the logger in other modules, simply do this:

import logging
logger.info("Using the logger from another module.")

Of course, you just have to import logging once at the top of each module, not every time you write a log.

This code uses “logging.yaml” which contains the following settings. Note that:

It defines a formatter with the time, module name, level name, and the logging message.
It defines a rotating file handler which writes to my.log and rolls the file at 10MB, keeping 5 historic copies. The handler is set up to use our simple format from above.
The “root” logger writes to our handler and allows only INFO messages through.
The handler is set to DEBUG, so if the root logger is increased to DEBUG during an investigation, it will let the messages through to the log file.

Here is the “logging.yaml” example file:

---
version: 1
disable_existing_loggers: False

# Define format of output logs (named 'simple').
formatters:
    simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

handlers:

    # Create rotating file handler using 'simple' format.
    file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: operations-audit-query-service.log
        maxBytes: 10485760 # 10MB
        backupCount: 5
        encoding: utf8

root:

    level: INFO
    handlers: [file_handler]

References

The code and YAML for this was adapted from this very good blog which I recommend reading: https://fangpenlin.com/posts/2012/08/26/good-logging-practice-in-python/.

Coding Stream of Consciousness

by John Humphreys – Random code from my life.

Author Archives: John Humphreys