JupyterHub – NGINX TLS Termination – Ubuntu

Due to corporate security requirements, I just had to ensure I had TLS both between clients and my load balancer as well as TLS from the load balancer to the back-end application (JupyterHub).

This was a little problematic because I was using a real certificate, so I had intentionally terminated TLS at the load balancer for cost reasons.  So, I used a self-signed certificate between the load balancer and the back-end just now.

If you use this GitHub gist right here for your nginx config, and you modify the certificate paths to point to files you generate from this digital-ocean tutorial, it works out just fine.  Then you just have to point your load balancer to point 443 on your JuptyerHub host(s) and everything works out great.

Here’s an excerpt of the relevant parts of the digital-ocean tutorial. Once you make the files, you can just update the gist yourselves to use them. The Diffie-Hellman group line is not in the gist; so add that yourself based on the digital-ocean one if you are so inclined.

mkdir -p /etc/nginx/ssl && cd /etc/nginx/ssl;
sudo chmod 700 /etc/nginx/ssl
sudo openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout nginx-selfsigned.key -out nginx-selfsigned.crt

Generally fill out all the details for the cert normally, but pay extra attention to common name. This should match your DNS name (e.g. env.yoursite.com). If you deploy to multiple environments and this is an internal app/etc, you may consider *.yoursite.com to avoid needing one per environment).

Once you’re done that, also run the following to create a “strong Diffie-Helman group”. Refer to digital-ocean’s link for this one; I honestly didn’t have the time to look into why this is needed yet.

sudo openssl dhparam -out dhparam.pem 2048

JupyterHub or JupyterLab – Back Up All User Docker Container Files

If, like me, you deployed the JupyerHub docker spawner without persistent volumes and then ended up with tons of users having tons of content to lose, this may help you.

This short bash script will list all containers and then copy out their contents and zip it up to save space.

#!/bin/bash

mkdir -p ~/notebook-backup
cd ~/notebook-backup

CONTAINERS=`docker container ls | grep -v PORTS | awk '{print $14}'`
for NAME in ${CONTAINERS}
do
  echo "Copying out files for ${NAME}; this may take a minute."
  docker cp ${NAME}:/home/notebook ./${NAME}

  echo "Zipping files for ${NAME}."
  tar -zcvf ${NAME}.tar.gz ${NAME}

  echo "Removing source files for ${NAME}."
  rm -rf ./${NAME}
done

 

Shut Down All Docker Containers Based on Internal Analysis – JupyterHub Example

I manage a few decent sized Jupyter Hub environments based on the docker spawner.  Each frequently has more than 50 users, sometimes much more…. and recently, one of the servers ran out of memory.

I have some read-only notebooks inside the user containers… so I figured that if a user only had those read only notebooks, I could shut down their docker containers.  They weren’t doing any work that could be lost.

So, I wrote this script to:

  1. List all docker containers.
  2. Get their names.
  3. Exec a bash command in them.
  4. Shut them down based on the result.

I hope it helps you with a similar docker-related issue! 🙂

CONTAINERS=`docker container ls | awk '{print $14}'`
for NAME in ${CONTAINERS}
do
  #echo $name
  COUNT=`docker exec ${NAME} ls -a | grep .ipynb | grep -v checkpoints | wc -l`
  if [[ $COUNT = 1 ]];
  then
    echo "Stopping $NAME with COUNT = $COUNT."
    docker container stop $NAME;
  fi
done

No Private Variables / Methods in Python (Jupyter & iPython)!?

Having coded for a long time and in a relatively large number of languages, I was a little panicky to realize that Python doesn’t have private variables and/or methods.

Some Context

When I came across this fact, I was trying to write an iPython notebook for use by others.  You really can’t secure things in iPython as users have access to execute arbitrary code, but I thought that you could at least make it relatively hard to break by storing things in a module and loading them.  But even this doesn’t work because nothing is private in Python – a user could just interrogate my classes and get right to the connection information variable (even if they had little to no knowledge of programming).

In Java, you can technically get around private variables by cracking open classes with reflection… but non-technical people wouldn’t know that and most programmers wouldn’t bother.  The entry bar in Python is a lot lower unfortunately.

What Does Python Do Instead?

This stack overflow post says the following which helps shed some light on the situation.

It’s cultural. In Python, you don’t write to other classes’ instance or class variables. In Java, nothing prevents you from doing the same if you really want to – after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.

If you want to emulate private variables for some reason, you can always use the __ prefix from PEP 8. Python mangles the names of variables like __foo so that they’re not easily visible to code outside the class that contains them (although you can get around it if you’re determined enough, just like you can get around Java’s protections if you work at it).

By the same convention, the _ prefix means stay away even if you’re not technically prevented from doing so. You don’t play around with another class’s variables that look like __foo or _bar.

So… basically, python’s style guide, PEP-8, suggests using “_xyz” for internal identifiers and “__xyz ” (with 2 underscores) for private identifiers.  Private identifiers will be name-mangled outside the module, so people won’t know what they’re using… but they’re not really private.  People can still probe around, find them, and use them if they are determined.

Again, in Java you could go use reflection to crack open a private class… so while I’m a little annoyed at this in Python, it is true that it’s not really terribly different from a real security standpoint.

Final Thoughts

So… it seems that if you want to use real secrets (like database connection details), you have to put them in a separate application on a separate server behind an API.  That way the user (especially in an iPython context) is completely decoupled from the information as a whole.

Jupyter Contributor Extensions – Auto Run + Hide Code

My Problem

I’ve been messing around with Jupyter quite a bit trying to make a nice notebook for people that are not necessarily coders.  So, it would be nice to give them a mostly graphical notebook with widgets and just let they play with the results at the end.

Unfortunately, you cannot auto-run things in Jupyter notebooks properly, and the hacks are brittle.  You also cannot hide code easily, etc.

The Solution?

Thankfully, while these features are not built into Jupyter for some reason, there are a ton of contributor extensions to Jupyter it turns out!  For example, if you need to reliably auto-run cells on start-up, you can install the init_cell plugin and the hide_input plugin.

The installation is actually very easy and can me done in a few bash commands as shown below, and there are a ton of other plugins around that you can use as well.  You can even manage the plugins from within the UI after you set them up.

pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --system
jupyter nbextension enable init_cell/main
jupyter nbextension enable hide_input/main

To use these, just read the links (or the documentation inside the UI at Edit > NB Extensions Config).  You’ll find that there is just a cell metadata attribute you can add for each cell you want to affect.  You can enable and disable the plugins in the UI as you like too.