Simple Flask Web Service With Debugging

I needed to make a quick Python web-service and quickly found that of the two most common frameworks, Flask is intended to be minimalist and extensible while Django is intended to be large out of the box and somewhat opinionated (it has ORM tools, etc).

Being that I wanted to get something running fast and I basically just needed a REST API to return a couple results sets from a database, I chose to go with Flask.

Simple Example

The Flask documentation itself has a wonderful examaple here: http://flask.pocoo.org/docs/0.12/quickstart/#a-minimal-application which I followed along with… but here are the cliffs notes:

First, Set up a new virtual environment (PyCharm is a good IDE and gives this as an option when creating a new project).

Create a new file at the top-level of your project and enter this code:

from flask import Flask, request

app = Flask(__name__)

@app.route('/query/run', methods=['GET'])
def run_query():
    query = request.args.get('query')
    limit = request.args.get('limit')
    return f"You ran {query} with {limit} records."


if __name__ == "__main__":
    app.run(debug=True)

Run the application with PyCharm, or just with your python interpreter.

At this point, you should have a running flask application, and if you go to:

localhost:5000/query/run?query="select * from abc"&limit=5

in your browser, you should see the response properly. You’ve also turned on the interactive debugger (which can be a security issue, so only do this locally). That means if you make changes to your file, it will reload the app and use them immediately. So, if you change “abc” to “abcdef”, save, and refresh the browser, you’ll see the changes.

Note that this is just available locally at the minute as we’re focused on developing the application, not on deploying it yet. So, if you want to show it to someone else from your desktop, you will have to tell Flask to expose the service to all interfaces (0.0.0.0).

No Private Variables / Methods in Python (Jupyter & iPython)!?

Having coded for a long time and in a relatively large number of languages, I was a little panicky to realize that Python doesn’t have private variables and/or methods.

Some Context

When I came across this fact, I was trying to write an iPython notebook for use by others.  You really can’t secure things in iPython as users have access to execute arbitrary code, but I thought that you could at least make it relatively hard to break by storing things in a module and loading them.  But even this doesn’t work because nothing is private in Python – a user could just interrogate my classes and get right to the connection information variable (even if they had little to no knowledge of programming).

In Java, you can technically get around private variables by cracking open classes with reflection… but non-technical people wouldn’t know that and most programmers wouldn’t bother.  The entry bar in Python is a lot lower unfortunately.

What Does Python Do Instead?

This stack overflow post says the following which helps shed some light on the situation.

It’s cultural. In Python, you don’t write to other classes’ instance or class variables. In Java, nothing prevents you from doing the same if you really want to – after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.

If you want to emulate private variables for some reason, you can always use the __ prefix from PEP 8. Python mangles the names of variables like __foo so that they’re not easily visible to code outside the class that contains them (although you can get around it if you’re determined enough, just like you can get around Java’s protections if you work at it).

By the same convention, the _ prefix means stay away even if you’re not technically prevented from doing so. You don’t play around with another class’s variables that look like __foo or _bar.

So… basically, python’s style guide, PEP-8, suggests using “_xyz” for internal identifiers and “__xyz ” (with 2 underscores) for private identifiers.  Private identifiers will be name-mangled outside the module, so people won’t know what they’re using… but they’re not really private.  People can still probe around, find them, and use them if they are determined.

Again, in Java you could go use reflection to crack open a private class… so while I’m a little annoyed at this in Python, it is true that it’s not really terribly different from a real security standpoint.

Final Thoughts

So… it seems that if you want to use real secrets (like database connection details), you have to put them in a separate application on a separate server behind an API.  That way the user (especially in an iPython context) is completely decoupled from the information as a whole.

Docker + Windows “Error starting userland proxy”

Docker Start Error

I ran into a new docker issue today.  Basically, I restarted my PC, and when I tried to bring up a container with a Postgres instance I use for testing, I received this confusing error:

Error response from daemon: driver failed programming external connectivity on endpoint postgres (15b348b1f5bf8d2bfd17c1c41b340d1c66f63ace7cab39ea69aeca3f69ed7442): Error starting userland proxy: mkdir /port/tcp:0.0.0.0:5432:tcp:172.17.0.2:5432: input/output error
Error: failed to start containers: postgres

What Does it Mean?

It turns out this is a big headache which is still unresolved, and which has one of the longer Git Issue threads I’ve ever seen right here.

Here’s a summary of it:

  • Windows 10 has a “Fast Start Up Mode”, and Docker doesn’t play well with it (or vice versa).
  • So, after a restart, you may find that you see this issue.
  • Theoretically, restating the Docker Daemon fixes this (which is a little annoying but fine).  You should be able to do that in Services.
  • This personally didn’t help me the first try.  So, I went and disabled Fast Start mode (which is also annoying) by:
    • Go to start and type “Power and Sleep”, click it when it pops up.
    • Click “Additional power settings” on the right.
    • Click “Choose what the power buttons do”.
    • Click “Change settings that are currently unavailable” and log in if you can’t already toggle the “Turn on fast startup (recommended) checkbox.
    • Turn off that checkbox.

Note that once you reboot you have to wait a bit for docker to come up (it can take a few minutes).  For example, the first 4 or 5 times I ran “docker version”, the daemon showed as down even though I could see the service running.  But a minute later it was up and working fine.

Database Star Schemas and Snowflake Schemas

Schema Confusion

A lot of people very regularly work with databases (even high end ones), but get thrown by terms like star-schema, snowflake-schema, etc. due to lack of formal training or working with data warehousing technologies.

These same people will often be perfectly comfortable with indexing, query optimization, foreign keys, concepts of de-normalization and normal forms, etc.

I personally started working with the actual “Snowflake” database recently https://www.snowflake.com/about/ and had to review what a snowflake shema was when I started looking at it.

Useful Articles

I found an interesting article on Star schemas vs Snowflake schemas pretty quickly, and back tracked it to precursor articles digging into the Star and Snowflake schemas respectively.  Here are each in case you want the original content; I’m just going to paraphrase it below to give people a quick overview and/or refresher.

Star Schema

A star schema just means that your main table has a primary key made out of multiple columns, each of which is a foreign key to a “dimension” table.  Then you have one or more “fact” columns in addition to the primary key.

The dimension columns will be all the relevant attributes you may want to aggregate and/or query the main table on.  For example, you might have a table for the date which breaks out the year, month, day, and day-of-week so they can be directly used.  You may then have another dimension table for the geographical region with columns for the continent, country, and city, for example, so you can aggregate on those.

Each dimension table is NOT de-normalized though.  So, if you have “New York City” as the city for 1 million rows, you are literally repeating that a million times.  This makes queries easy to write but has a penalty in terms of data storage (which can be bad if you’re, say, in the cloud and paying more for more storage over time).

Snowflake Schema

Plain and simple; a snowflake schema is a star schema where the dimension tables are normalized.  This means that, for example, the geographical region dimension table itself would actually be turned into 4 tables (kind of its own star schema).  You would have one table for the continent, one for the country, one for the city, and one main table for the combination of the 3 as a primary key.

This makes queries more complex and possibly a little slower, but it means we have complete normalization and are not wasting any data storage.  Also, if say, a city changed its name, we would have exactly one database cell to update where as in a star schema we would have to update potentially millions of rows with copies of that name.

Why the Names?

If you think of a “Star Schema”, picture a main table with, say, 5 extra dimension tables around it like the 5 points of a star.  Makes sense, right?

Now, for a snowflake, picture each point being 5 tables by itself… so each point is its own star.  This starts to branch out like a snowflake.  Just think of fractals if you don’t believe me :).

 

 

Jupyter Contributor Extensions – Auto Run + Hide Code

My Problem

I’ve been messing around with Jupyter quite a bit trying to make a nice notebook for people that are not necessarily coders.  So, it would be nice to give them a mostly graphical notebook with widgets and just let they play with the results at the end.

Unfortunately, you cannot auto-run things in Jupyter notebooks properly, and the hacks are brittle.  You also cannot hide code easily, etc.

The Solution?

Thankfully, while these features are not built into Jupyter for some reason, there are a ton of contributor extensions to Jupyter it turns out!  For example, if you need to reliably auto-run cells on start-up, you can install the init_cell plugin and the hide_input plugin.

The installation is actually very easy and can me done in a few bash commands as shown below, and there are a ton of other plugins around that you can use as well.  You can even manage the plugins from within the UI after you set them up.

pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --system
jupyter nbextension enable init_cell/main
jupyter nbextension enable hide_input/main

To use these, just read the links (or the documentation inside the UI at Edit > NB Extensions Config).  You’ll find that there is just a cell metadata attribute you can add for each cell you want to affect.  You can enable and disable the plugins in the UI as you like too.