My VI Cheat Sheet

For years, I’ve been somewhat avoiding learning any advanced features of VIM. I have always predominantly relied on desktop editors for anything complex and just use VI to do basic text modification.

Anyway, I’m finally trying to change that. So, I’ll start forcing myself to do things in VIM and will record the keys here over time. I’m just starting with one command though; so it’ll be a while before this is useful! ūüôā

My Cheat Sheet

Remember, generally you want to press “esc” before doing these.

  • Search Forward & Backwards
    • Forward = /search-term
    • Backward = ?search-term
  • Show or Hide Line Numbers
    • : set number
    • :set nonumber
  • Edit Multiple Lines (e.g. Block Comment Lines 10-20 With #)
    • :10,20s/^/#/
  • Clear Highlight After Search
    • There are some fancy ways, but just search for something that won’t exist and it will clear.¬† For example:
      • /blahfwoeaf

 

Logging in Python 3 (Like Java Log4J/Logback)

What is Proper Logging?

Having a proper logger is essential for any production application.  In the Java world, almost every framework automatically pulls in Logback or Log4J, and libraries tend to use SLF4J in order to be logger agnostic and to wire up to these loggers.  So, I had to set out to see how to do similar logging in python.

While it can get fancier, I think the following things are essential when setting up a logger; so they were what I was looking for:

  1. It should be externally configured from a file that your operations team can change.
  2. It should write to a file automatically, not just console.
  3. It should roll the file it writes to at a regular size (date or time rolling on top of that can be beneficial too; but the size restriction ensures you won’t fill up your disk with a ton of logs and break your applications).
  4. It should keep a history of a few previous rolled files to aid debugging.
  5. It should use a format that specifies both the time of the logs and the class that logged them.

On top of these, obviously we must be able to log at different levels and filter out which logs go to the file easily.  This way, when we have issues, operations can jack up the logging level and figure out what is going wrong as needed.

How Do We Do it in Python 3?

It turns out that Python actually has a strong logging library built into its core distribution.  The only extra library I had to add to use it was PyYAML, and even that could have been avoided (Python supports JSON out of the box and that could be used instead, but people seem to prefer YAML configuration in the community).

In the place where your app starts up, write the following code. Note that you have to install the PyYAML module yourself. Also, this expects the “logging.yaml” to be in the same directory as the startup code (change that if you like though). We’ll show the “logging.yaml” content lower.

import logging
import logging.config
import yaml

# Initialize the logger once as the application starts up.
with open("logging.yaml", 'rt') as f:
config = yaml.safe_load(f.read())
logging.config.dictConfig(config)

# Get an instance of the logger and use it to write a log!
# Note: Do this AFTER the config is loaded above or it won't use the config.
logger = logging.getLogger(__name__)
logger.info("Configured the logger!")

Then, when you want to use the logger in other modules, simply do this:

import logging
logger.info("Using the logger from another module.")

Of course, you just have to import logging once at the top of each module, not every time you write a log.

This code uses “logging.yaml” which contains the following settings. Note that:

  • It defines a formatter with the time, module name, level name, and the logging message.
  • It defines a rotating file handler which writes to my.log and rolls the file at 10MB, keeping 5 historic copies.¬† The handler is set up to use our simple format from above.
  • The “root” logger writes to our handler and allows only INFO messages through.
  • The handler is set to DEBUG, so if the root logger is increased to DEBUG during an investigation, it will let the messages through to the log file.

Here is the “logging.yaml” example file:

---
version: 1
disable_existing_loggers: False

# Define format of output logs (named 'simple').
formatters:
    simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

handlers:

    # Create rotating file handler using 'simple' format.
    file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: operations-audit-query-service.log
        maxBytes: 10485760 # 10MB
        backupCount: 5
        encoding: utf8

root:

    level: INFO
    handlers: [file_handler]

References

The code and YAML for this was adapted from this very good blog which I recommend reading: https://fangpenlin.com/posts/2012/08/26/good-logging-practice-in-python/.

Python Dictionary Comprehension For Multi Level Caching

What’s the Use Case?

I was coding a multi-level cache in Python and came across dictionary comprehensions.¬† It turns out they are very useful for this! So, it’s a nice example to teach the feature.

Let’s say our cache is like a database schema layout:

  1. database_1
    1. table_1
      1. column_1 – type 1
      2. column_2 – type 2
    2. table_2
      1. column_1 – type 1
      2. column_2 – type 2
  2. database_2
    1. table_1
      1. column_1 – type 1

What’s a Dictionary Comprehension?

A dictionary comprehension basically lets you create a dictionary out of an expression.¬† So, you can essentially say “for each value in this list create a key for my dictionary where the value is X if some condition is true”.

A couple things to note:

  • You can technically provide any number of input iterables. So, if you want to form a dictionary from multiple sources, it can work; but I”m not going to get into that; google elsewhere!
  • You can provide any number of “if” clauses to prune the results down.¬† You could achieve this with one, but using one for each condition is neater to write.

A quick example:

>>> some_list = [1,3,8,12]
>>> {key: key * key for key in some_list if key * key % 2 == 0}
{8: 64, 12: 144}

So, here we can see that we took in a list of 4 numbers and made a dictionary out of the number and its square, but only kept the results where the square was even.

The first part “key: key * key” is really just a key : value pair.¬† The key is on the left and the value (the key * key) could be anything you wanted on the right.¬† You can call “key” anything you like in the “for key” section.¬† The “in some_list” is the source collection where our input comes from – again, you can have multiple of these.¬† Finally, the “if key % 2 == 0” is a filter condition which, again, you can have multiple of.

Why is it Useful For Multi-Level Caching?

In our database example, we must first query the list of databases, then query the list of tables for any database we care about, then query the list of columns for any table we care about.

We don’t want to cache things we don’t need.

So, first off, it would be nice to prime the cache with just the database names and empty table values like so. If the cache is already populated, we just return its top level keys which are the database names:

if cache is None:
    cache = {database: None for database in get_database()}
return list(cache.keys())

Now, what about when the user goes to list the tables in a database?

if cache[database_name] is None:
    cache[database_name] = {table: None for table in get_tables(database_name)}
return list(cache[database_name].keys())

Finally, what about when the user goes to list the columns in a database?

if cache[database_name][table_name] is None:
    cache[database_name][table_name] = get_columns(database_name, table_name)
return cache[database_name][table_name]

So, we can see here that it was trivial to use dictionary comprehensions to turn a list into a dictionary with empty keys as a utility while building the multi level cache out – which is very cool.

This might not have been the best way to build a cache – but it seems pretty simple and efficient to me. Building classes around things is usually a better approach though admittedly :).
 

The Python yield keyword explained

I don’t usually re-blog posts, but this person’s post is a wonderful explanation of what yield does in python, and I definitely recommend reading through it.

Python Tips

Hi there folks. Again welcome to yet another useful tutorial. This is again a stackoverflow answer. This one is related to the Python yield keyword. It explains you what yield, generators and iterables are. So without wasting any time lets continue with the answer.

To understand what yield does,

View original post 1,362 more words

Postgres Schema Creation

Historically, I have not worked with Postgres much. So, when I had to start using it, one of my first questions was how to create a schema, and how to use it for my new tables and such.

Creating a schema is exactly what you expect:

create schema myschema;

But using it is not quite what I expected.¬† Of course, you can do the standard thing when you’re managing your objects and use . like this:

create table myschema.mytable (x int);

But what if you just want:

create table mytable (x int);

to go into myschema by default?  To do this in Postgres, you have to add the schema to your search path.  By default your search path will be just set to the public schema; you can view it like this:

SHOW search_path;

You can set it to one or more schemas in reality.  The first schema your query sees a the named table in will be the one it takes it from.  The first schema in the list will be the default one for when you create new objects too.  So, if you did this:

SET search_path TO myschema;
create table mytable (x int);

Then your table would in fact be created in the “myschema” schema properly.

Simple Flask Web Service With Debugging

I needed to make a quick Python web-service and quickly found that of the two most common frameworks, Flask is intended to be minimalist and extensible while Django is intended to be large out of the box and somewhat opinionated (it has ORM tools, etc).

Being that I wanted to get something running fast and I basically just needed a REST API to return a couple results sets from a database, I chose to go with Flask.

Simple Example

The Flask documentation itself has a wonderful examaple here: http://flask.pocoo.org/docs/0.12/quickstart/#a-minimal-application which I followed along with… but here are the cliffs notes:

First, Set up a new virtual environment (PyCharm is a good IDE and gives this as an option when creating a new project).

Create a new file at the top-level of your project and enter this code:

from flask import Flask, request

app = Flask(__name__)

@app.route('/query/run', methods=['GET'])
def run_query():
    query = request.args.get('query')
    limit = request.args.get('limit')
    return f"You ran {query} with {limit} records."


if __name__ == "__main__":
    app.run(debug=True)

Run the application with PyCharm, or just with your python interpreter.

At this point, you should have a running flask application, and if you go to:

localhost:5000/query/run?query="select * from abc"&limit=5

in your browser, you should see the response properly. You’ve also turned on the interactive debugger (which can be a security issue, so only do this locally). That means if you make changes to your file, it will reload the app and use them immediately. So, if you change “abc” to “abcdef”, save, and refresh the browser, you’ll see the changes.

Note that this is just available locally at the minute as we’re focused on developing the application, not on deploying it yet. So, if you want to show it to someone else from your desktop, you will have to tell Flask to expose the service to all interfaces (0.0.0.0).

No Private Variables / Methods in Python (Jupyter & iPython)!?

Having coded for a long time and in a relatively large number of languages, I was a little panicky to realize that Python doesn’t have private variables and/or methods.

Some Context

When I came across this fact, I was trying to write an iPython notebook for use by others.¬† You really can’t secure things in iPython as users have access to execute arbitrary code, but I thought that you could at least make it relatively hard to break by storing things in a module and loading them.¬† But even this doesn’t work because nothing is private in Python – a user could just interrogate my classes and get right to the connection information variable (even if they had little to no knowledge of programming).

In Java, you can technically get around private variables by cracking open classes with reflection… but non-technical people wouldn’t know that and most programmers wouldn’t bother.¬† The entry bar in Python is a lot lower unfortunately.

What Does Python Do Instead?

This stack overflow post says the following which helps shed some light on the situation.

It’s cultural. In Python, you don’t write to other classes’ instance or class variables. In Java, nothing prevents you from doing the same if you¬†really¬†want to – after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.

If you want to emulate private variables for some reason, you can always use the¬†__¬†prefix from¬†PEP 8. Python mangles the names of variables like¬†__foo¬†so that they’re not easily visible to code outside the class that contains them (although you¬†can¬†get around it if you’re determined enough, just like you¬†can¬†get around Java’s protections if you work at it).

By the same convention, the¬†_¬†prefix means¬†stay away even if you’re not technically prevented from doing so. You don’t play around with another class’s variables that look like¬†__foo¬†or¬†_bar.

So… basically, python’s style guide, PEP-8, suggests using “_xyz” for internal identifiers and “__xyz ” (with 2 underscores) for private identifiers.¬† Private identifiers will be name-mangled outside the module, so people won’t know what they’re using… but they’re not really private.¬† People can still probe around, find them, and use them if they are determined.

Again, in Java you could go use reflection to crack open a private class… so while I’m a little annoyed at this in Python, it is true that it’s not really terribly different from a real security standpoint.

Final Thoughts

So… it seems that if you want to use real secrets (like database connection details), you have to put them in a separate application on a separate server behind an API.¬† That way the user (especially in an iPython context) is completely decoupled from the information as a whole.