Logging in Python 3 (Like Java Log4J/Logback)

Posted on November 26, 2018 by John Humphreys

What is Proper Logging?

Having a proper logger is essential for any production application. In the Java world, almost every framework automatically pulls in Logback or Log4J, and libraries tend to use SLF4J in order to be logger agnostic and to wire up to these loggers. So, I had to set out to see how to do similar logging in python.

While it can get fancier, I think the following things are essential when setting up a logger; so they were what I was looking for:

It should be externally configured from a file that your operations team can change.
It should write to a file automatically, not just console.
It should roll the file it writes to at a regular size (date or time rolling on top of that can be beneficial too; but the size restriction ensures you won’t fill up your disk with a ton of logs and break your applications).
It should keep a history of a few previous rolled files to aid debugging.
It should use a format that specifies both the time of the logs and the class that logged them.

On top of these, obviously we must be able to log at different levels and filter out which logs go to the file easily. This way, when we have issues, operations can jack up the logging level and figure out what is going wrong as needed.

How Do We Do it in Python 3?

It turns out that Python actually has a strong logging library built into its core distribution. The only extra library I had to add to use it was PyYAML, and even that could have been avoided (Python supports JSON out of the box and that could be used instead, but people seem to prefer YAML configuration in the community).

In the place where your app starts up, write the following code. Note that you have to install the PyYAML module yourself. Also, this expects the “logging.yaml” to be in the same directory as the startup code (change that if you like though). We’ll show the “logging.yaml” content lower.

import logging
import logging.config
import yaml

# Initialize the logger once as the application starts up.
with open("logging.yaml", 'rt') as f:
config = yaml.safe_load(f.read())
logging.config.dictConfig(config)

# Get an instance of the logger and use it to write a log!
# Note: Do this AFTER the config is loaded above or it won't use the config.
logger = logging.getLogger(__name__)
logger.info("Configured the logger!")

Then, when you want to use the logger in other modules, simply do this:

import logging
logger.info("Using the logger from another module.")

Of course, you just have to import logging once at the top of each module, not every time you write a log.

This code uses “logging.yaml” which contains the following settings. Note that:

It defines a formatter with the time, module name, level name, and the logging message.
It defines a rotating file handler which writes to my.log and rolls the file at 10MB, keeping 5 historic copies. The handler is set up to use our simple format from above.
The “root” logger writes to our handler and allows only INFO messages through.
The handler is set to DEBUG, so if the root logger is increased to DEBUG during an investigation, it will let the messages through to the log file.

Here is the “logging.yaml” example file:

---
version: 1
disable_existing_loggers: False

# Define format of output logs (named 'simple').
formatters:
    simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

handlers:

    # Create rotating file handler using 'simple' format.
    file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: operations-audit-query-service.log
        maxBytes: 10485760 # 10MB
        backupCount: 5
        encoding: utf8

root:

    level: INFO
    handlers: [file_handler]

References

The code and YAML for this was adapted from this very good blog which I recommend reading: https://fangpenlin.com/posts/2012/08/26/good-logging-practice-in-python/.

Python Dictionary Comprehension For Multi Level Caching

Posted on November 21, 2018 by John Humphreys

What’s the Use Case?

I was coding a multi-level cache in Python and came across dictionary comprehensions. It turns out they are very useful for this! So, it’s a nice example to teach the feature.

Let’s say our cache is like a database schema layout:

database_1
1. table_1
  1. column_1 – type 1
  2. column_2 – type 2
2. table_2
  1. column_1 – type 1
  2. column_2 – type 2
database_2
1. table_1
  1. column_1 – type 1

What’s a Dictionary Comprehension?

A dictionary comprehension basically lets you create a dictionary out of an expression. So, you can essentially say “for each value in this list create a key for my dictionary where the value is X if some condition is true”.

A couple things to note:

You can technically provide any number of input iterables. So, if you want to form a dictionary from multiple sources, it can work; but I”m not going to get into that; google elsewhere!
You can provide any number of “if” clauses to prune the results down. You could achieve this with one, but using one for each condition is neater to write.

A quick example:

>>> some_list = [1,3,8,12]
>>> {key: key * key for key in some_list if key * key % 2 == 0}
{8: 64, 12: 144}

So, here we can see that we took in a list of 4 numbers and made a dictionary out of the number and its square, but only kept the results where the square was even.

The first part “key: key * key” is really just a key : value pair. The key is on the left and the value (the key * key) could be anything you wanted on the right. You can call “key” anything you like in the “for key” section. The “in some_list” is the source collection where our input comes from – again, you can have multiple of these. Finally, the “if key % 2 == 0” is a filter condition which, again, you can have multiple of.

Why is it Useful For Multi-Level Caching?

In our database example, we must first query the list of databases, then query the list of tables for any database we care about, then query the list of columns for any table we care about.

We don’t want to cache things we don’t need.

So, first off, it would be nice to prime the cache with just the database names and empty table values like so. If the cache is already populated, we just return its top level keys which are the database names:

if cache is None:
    cache = {database: None for database in get_database()}
return list(cache.keys())

Now, what about when the user goes to list the tables in a database?

if cache[database_name] is None:
    cache[database_name] = {table: None for table in get_tables(database_name)}
return list(cache[database_name].keys())

Finally, what about when the user goes to list the columns in a database?

if cache[database_name][table_name] is None:
    cache[database_name][table_name] = get_columns(database_name, table_name)
return cache[database_name][table_name]

So, we can see here that it was trivial to use dictionary comprehensions to turn a list into a dictionary with empty keys as a utility while building the multi level cache out – which is very cool.

This might not have been the best way to build a cache – but it seems pretty simple and efficient to me. Building classes around things is usually a better approach though admittedly :).

Simple Flask Web Service With Debugging

Posted on November 19, 2018 by John Humphreys

I needed to make a quick Python web-service and quickly found that of the two most common frameworks, Flask is intended to be minimalist and extensible while Django is intended to be large out of the box and somewhat opinionated (it has ORM tools, etc).

Being that I wanted to get something running fast and I basically just needed a REST API to return a couple results sets from a database, I chose to go with Flask.

Simple Example

The Flask documentation itself has a wonderful examaple here: http://flask.pocoo.org/docs/0.12/quickstart/#a-minimal-application which I followed along with… but here are the cliffs notes:

First, Set up a new virtual environment (PyCharm is a good IDE and gives this as an option when creating a new project).

Create a new file at the top-level of your project and enter this code:

from flask import Flask, request

app = Flask(__name__)

@app.route('/query/run', methods=['GET'])
def run_query():
    query = request.args.get('query')
    limit = request.args.get('limit')
    return f"You ran {query} with {limit} records."


if __name__ == "__main__":
    app.run(debug=True)

Run the application with PyCharm, or just with your python interpreter.

At this point, you should have a running flask application, and if you go to:

localhost:5000/query/run?query="select * from abc"&limit=5

in your browser, you should see the response properly. You’ve also turned on the interactive debugger (which can be a security issue, so only do this locally). That means if you make changes to your file, it will reload the app and use them immediately. So, if you change “abc” to “abcdef”, save, and refresh the browser, you’ll see the changes.

Note that this is just available locally at the minute as we’re focused on developing the application, not on deploying it yet. So, if you want to show it to someone else from your desktop, you will have to tell Flask to expose the service to all interfaces (0.0.0.0).

No Private Variables / Methods in Python (Jupyter & iPython)!?

Posted on November 19, 2018 by John Humphreys

Having coded for a long time and in a relatively large number of languages, I was a little panicky to realize that Python doesn’t have private variables and/or methods.

Some Context

When I came across this fact, I was trying to write an iPython notebook for use by others. You really can’t secure things in iPython as users have access to execute arbitrary code, but I thought that you could at least make it relatively hard to break by storing things in a module and loading them. But even this doesn’t work because nothing is private in Python – a user could just interrogate my classes and get right to the connection information variable (even if they had little to no knowledge of programming).

In Java, you can technically get around private variables by cracking open classes with reflection… but non-technical people wouldn’t know that and most programmers wouldn’t bother. The entry bar in Python is a lot lower unfortunately.

What Does Python Do Instead?

This stack overflow post says the following which helps shed some light on the situation.

It’s cultural. In Python, you don’t write to other classes’ instance or class variables. In Java, nothing prevents you from doing the same if you really want to – after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.

If you want to emulate private variables for some reason, you can always use the __ prefix from PEP 8. Python mangles the names of variables like __foo so that they’re not easily visible to code outside the class that contains them (although you can get around it if you’re determined enough, just like you can get around Java’s protections if you work at it).

By the same convention, the _ prefix means stay away even if you’re not technically prevented from doing so. You don’t play around with another class’s variables that look like __foo or _bar.

So… basically, python’s style guide, PEP-8, suggests using “_xyz” for internal identifiers and “__xyz ” (with 2 underscores) for private identifiers. Private identifiers will be name-mangled outside the module, so people won’t know what they’re using… but they’re not really private. People can still probe around, find them, and use them if they are determined.

Again, in Java you could go use reflection to crack open a private class… so while I’m a little annoyed at this in Python, it is true that it’s not really terribly different from a real security standpoint.

Final Thoughts

So… it seems that if you want to use real secrets (like database connection details), you have to put them in a separate application on a separate server behind an API. That way the user (especially in an iPython context) is completely decoupled from the information as a whole.

Jupyter Contributor Extensions – Auto Run + Hide Code

Posted on November 15, 2018 by John Humphreys

My Problem

I’ve been messing around with Jupyter quite a bit trying to make a nice notebook for people that are not necessarily coders. So, it would be nice to give them a mostly graphical notebook with widgets and just let they play with the results at the end.

Unfortunately, you cannot auto-run things in Jupyter notebooks properly, and the hacks are brittle. You also cannot hide code easily, etc.

The Solution?

Thankfully, while these features are not built into Jupyter for some reason, there are a ton of contributor extensions to Jupyter it turns out! For example, if you need to reliably auto-run cells on start-up, you can install the init_cell plugin and the hide_input plugin.

The installation is actually very easy and can me done in a few bash commands as shown below, and there are a ton of other plugins around that you can use as well. You can even manage the plugins from within the UI after you set them up.

pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --system
jupyter nbextension enable init_cell/main
jupyter nbextension enable hide_input/main

To use these, just read the links (or the documentation inside the UI at Edit > NB Extensions Config). You’ll find that there is just a cell metadata attribute you can add for each cell you want to affect. You can enable and disable the plugins in the UI as you like too.

Coding Stream of Consciousness

by John Humphreys – Random code from my life.

Category Archives: python