Read-only / Protected Jupyter Notebooks

Jupyter notebooks are fantastic, but they’re really geared at developers.  I had to lock one down so it could be used by non-developers (without damaging it).  It took quite a lot of googling!

I figure that a lot of people must need this.  If I were a university instructor, I’d like to send students to a server, let them play, but prevent them from breaking my example.

Locking Things Down

Here are the different things I did to mitigate damage:

  1. You can actually make the entire notebook read-only by setting the file permissions on command line or in properties (works for Windows and Linux).  Jupyter will detect this (after you reload the page) and show that you can’t save anything.
  2. You can make individual cells un-deletable and un-editable (so they don’t mess up the top cells that the lower down cells they’re working with depend on):
    • Run a cell.
    • Click “Edit Metadata” in its banner.
    • Add:
      • “deletable”: false,
      • “edittable”: false,
  3. You can actually hide the code for a range of cells like this code which hides the first four (which is very useful if you’re using iPython UI widgets and just want to show widgets and now how they were made) – disclaimer – I got this off stack overflow but am having trouble finding it to reference it currently:
from IPython.display import HTML
HTML('''

    code_show=true;
    function code_toggle_and_hide_move_buttons() {
        if (code_show){
            $('div.input').slice(0,4).hide();
            $('#move_up_down').hide()
        }
        else {
            $('div.input').slice(0,4).show();
            $('#move_up_down').show()
        }
        code_show = !code_show
    }
    $(document).ready(code_toggle_and_hide_move_buttons);


    

''')

Further Recommendations

I would also suggest making sure you disable the terminal in your Jupyter config file and that you set a known location for your notebooks to be loaded from so that you can add the read-only attribute.

Also, you can disable various hotkeys in the UI, and you can use a CSS selector similar to the one in my “hide code” example above to hide the move-up/move-down cell buttons to help prevent errors cropping in that way.

 

Set Up Google Search Console (Web Master Tools) on Word Press

You can now integrate your WordPress site into Google Search Console (Web Master Tools) very easily.

Just do this:

  1. Go to Google Search Console: https://search.google.com/search-console.
  2. If you haven’t set it up yet, it will have you add a property (website) to your account.
  3. To add a property, you just provide the URL of your site (the one users end up at, not the one you use to manage WordPress in the background).
  4. After this, you will be presented with multiple options for verifying the site.  One of them will be adding a meta tag.  Click that one and get the “content” string from the tag (the part between the quotes after “conent” =.
  5. Go to your WordPress site manager (the place you change things).  On the left bar, you can find “WP Admin”.  You can also probably just go to https://<your-site-name&gt;.wordpress.com/wp-admin/ directly.
  6. Click “Tools” then scroll down to “Google Webmaster Tools”, and add the content string there.
  7. Save.
  8. Go back to Google Search Console, refresh, and you should be able to see your website verified.  This might take a little while, but it was pretty instant for me.

Note that it can take a few days for Google to scan your website and build the necessary data to fully populate the UI and make it useful.

Python PIP Install Local Module While Developing

I’m definitely still in the early stages of learning module/package building and deployment for python.  So, take this with a grain of salt…

But I ran into a case where I wanted to develop/manipulate a package locally in PyCharm while I was actively using it in another project I was developing (actually, in a Jupyter notebook).  It turns out there’s a pretty cool way to do this.

Module Preparation

The first thing I had to do was prepare the package so that it was deployable using the standard python distribution style.

In my case, I just made a directory for my package (lower-case name, underscore separators).  Inside the directory, I created:

Here’s an example.  Ignore everything that I didn’t mention; all of that is auto-generated by PyCharm and not relevant.  In fact, it probably would have been better to create a sub-directory in this project for the package; but I just considered the top level directory the package directory for now.

package-layout

Module Installation

Once you have your module set up like this, you can jump into your command line, assuming you have PIP installed, and you can run this command (tailored for your package directory location):

λ pip install -e C:\dev\python\jupyter_audit_query_tools
Obtaining file:///C:/dev/python/jupyter_audit_query_tools
Installing collected packages: PostgresQueryRunner
Running setup.py develop for PostgresQueryRunner
Successfully installed PostgresQueryRunner

You’ll also be able to see the package mapped to that directory when you list the packages in PIP:

λ pip list | grep postgres
postgres-query-runner 1.1.0 c:\dev\python\jupyter_audit_query_tools

Module Usage

After this, you should be able to import and use the package / modules in your interpreter or notebook.  You can change the code in the package and it will update in the places you’re using it assuming you re-import the package.  So, in Jupyter, this would mean clicking the restart-kernel/re-run button.

Docker Run Postgres, expose to Local Host

I needed to spin up a Postgres database for testing a new application, so I figured I’d do it via Docker to keep my system clean.

So, the plan is to develop an application on my PC / localhost (e.g. in PyCharm), but connect to the Postgres instance within the Docker container.

Getting & Running Postgres

This is actually quite trivial:

docker pull postgres
docker run --name postgres -e POSTGRES_PASSWORD=password -d -p 5432:5432 postgres

The first command pulls the image from docker-hub, and the second one runs the container and exposes the Postgres port externally (to the same numbered port) so that we can communicate with it from our local host.

Setting Up pgAdmin

The pgAdmin utility is a wonderful UI for working with Postgres.  You can download it here: https://www.pgadmin.org/download/.

In my case, I actually want to verify that connecting to Postgres works from outside of the container environment.  So, I chose to install the Windows version locally to help verify this.  If you are so inclined, they have Docker images instead so that you can install pgAmin in a container as well to keep your system clean.

Once you’ve installed pgAdmin, you can open it, go to the browser on the left panel, right click on “Servers”, and add a new one targeting “localhost” and port “5432”.

Once you do that and open it, you should hopefully be able to see monitoring statistics on it.  Then you can create a new database and work with it at will!

Persistent Data

Remember, if you delete your docker container, the data for the database will go away.  You can stop it and start it at much as you like though.  If you need to delete it and still access the data for some reason, look into using a volume in docker (this is what they’re for).

 

 

 

Minikube – Setup Kubernetes UI

If you have not yet installed Minikube, refer to this blog first: https://coding-stream-of-consciousness.com/2018/11/08/installing-minikube-on-windows-10/.

Assuming you have Minikube running, open another command prompt (I used an administrator one).  Then run:

kubectl proxy

Once that is up, you can go to this link: http://localhost:8001/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy/#!/node?namespace=default and the kubernetes UI should be available.

The Kubernetes documentation may recommend a different link that does not work. I believe this is probably because Minikube is out of date compared to the current Kubernetes releases; but maybe it is just a documentation error.

For the record in case it changes in the future, the link from the Kubernetes documentation is: http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/.

Installing Minikube on Windows 10

Installing Minikube on Windows 10 was not as straightforward as I had expected.  I tried it at home and gave up originally (instead using https://labs.play-with-k8s.com/).

I wanted to get it set up properly for a new job though, so I worked through the issues.  Please note in case your issues are different than mine, these links are extremely enlightening:

Much of the online documentation I found was outdated.  The actual Git Hub repository was fine though; so I recommend using the instructions there : https://github.com/kubernetes/minikube.  Mine are based on those and the previously noted links.

Before We Start – If You’re Already in a Bad State

If you’ve already started installing Minikube and you failed, or if you have been hacking at this with my instructions and are stuck, you may end up needing to delete the C:\users\user-name{.minikube and .kube} directories to clean things up.  I had to do this as I messed up the virtual switch/etc set-up originally.  Don’t do this unless you do get stuck though.

You can’t do this folder clean-up if the Hyper-V minikube VM is still running (you can see that in the Hyper-V manager).  If you try to delete them while its running, it will partially delete files and then you’re in a state where you can’t even stop the VM on purpose!  To fix that, you need to kill the processes related to the VM and set it to not automatically start up in the Hyper-V manager. Then you can remove the VM and try again.

Some detailed information on how to kill a hung Hyper-V VM is here: http://woshub.com/how-to-stop-a-hung-virtual-machine-on-hyper-v-2016/. This was painful to fix.

The Minikube start command will hang if you’re in these bad states, so you really do need to delete those directories to get it set up cleanly.

Warning: Shut Down Issues

I found after I had this all working that (1) my PC was having trouble restarting – it would never stop, and (2) if I came and tried to do “minikube stop”, it would hang forever.  It turns out the current version of minikube has issues noted here: https://github.com/kubernetes/minikube/issues/2914.  So, to shut it down you need to do “minikube ssh” and then “sudo poweroff”.  Even trying to manage it from Hyper-V will not work properly.  They recommend downgrading to v.27; the current version I saw the issue with is .30.  I haven’t tried the downgrade yet.  So, these instructions will get it working with v.30; but you will have this shut-down issue potentially.

For now, I’m personally just going to stay with v.30, but I told it to NOT automatically start the minikube VM when my PC starts up in Hyper-V.  If I use this too often and they don’t fix this issue, I may downgrade at a later date though.

Installation Steps

Here is the full set of steps I had to use to get Minikube correctly running on my Windows 10 desktop:

  • Enable the “Hyper-V” windows feature (I also enabled containers, and WIndows Hypervisor Platform, they probably aren’t needed though).
  • Go into Hyper-V manager (search your start menu) and:
    • Click Virtual Switch Manager
    • Create a new virtual switch of type “External”.
    • Name it “External”.
    • Set it to use the “External network” and pick the NIC you commonly use.
    • Press OK.
  • Get an administrative command prompt open.
  • Install https://chocolatey.org/ if you don’t already use it; it’s like yum or apt-get in Linux.
  • choco install minikube
  • choco install kubernetes-cli
  • minikube start –vm-driver “hyperv” –hyperv-virtual-switch “External” –v=7
    • Note that we’re telling it to use Hyper-V and the switch we created.
    • We’re also setting a very verbose log level.

If you have issues after this, you may want to clean up and try again with the information I provided before the steps.

Validation

Let’s try and launch something to make sure it’s working (with instructions copied from the Git Hub link).

  • Launch the “hello-minikube” sample image as a pod.
    • kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.4 --port=8080
  • Expose it as a service.
    • kubectl expose deployment hello-minikube --type=NodePort
  • Verify the pod is up and running (wait if it is still starting).
    • kubectl get pod
  • Ask Kubernetes for the service URL:
    • minikube service hello-minikube --url
  • Hit that URL in your browser or CURL (you should get back a bunch of text with CLIENT VALUES at the top of it.
  • Hopefully all that is working.  So, let’s remove the service and pod we deployed:
    • kubectl delete service hello-minikube
    • kubectl delete deployment hello-minikube
  • Now you have Minikube running, and it’s a clean copy.

I hope that helps you get started!

User Interface

Refer to this blog for the quick user-interface setup instructions: https://coding-stream-of-consciousness.com/2018/11/08/minikube-set-up-kubernetes-ui/.

Doing Upgrades

You will want to update minikube and the kubernetes CLI after a while.  You can see their versions in the status command and then you can update them easily with chocolatey.

  • minikube status
  • choco upgrade kubernetes-cli
  • choco upgrade minikube
  • minikube update-context

 

 

Install Airflow on Windows + Docker + CentOs

Continuing on my journey; setting up Apache Airflow on Windows directly was a disaster for various reasons.

Setting it up in the WSL (Windows Subsystem for Linux) copy of Ubuntu worked great.  But unfortunately, you can’t run services/etc properly in that, and I’d like to run it in a state reasonably similar to how we’ll eventually deploy it.

So, my fallback plan is Docker on Windows, which is working great (no surprise there).  It was also much less painful to set up in the end than the other options.  I’m also switching from Ubuntu to CentOS (non-enterprise version of RHEL) as I found out that docker has service files tested with it here: https://airflow.readthedocs.io/en/stable/howto/run-with-systemd.html.

Assuming you have docker for Windows set up properly, just do the following to set up Airflow in a new CentOS container.

Get and Run CentOS With Python 3.6 in Docker

docker pull centos/python-36-centos7
docker container run --name airflow-centos -it centos/python-36-centos7:latest /bin/bash

Install Airflow with Pip

pip install --upgrade pip
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow

Set up Airflow

First install VIM.  Yes… this is docker so the images are hyper-stripped down to contain only the essentials.  You have to install anything else.

First, install VIM. I think I had to go connect to the container as root to do this using this command:

docker exec -it -u root airflow-centos /bin/bash

Then you can just install with yum fine. I’m not 100% sure this was needed, so feel free to try it as the normal user first.

yum install vim

I jumped back into the normal user after that (by removing the -u root from the command above).

Then set up Airflow’s home directory and database.

  • Set the Airflow home directory (permanently for the user).
    • vi ~/.bashrc and add this to the bottom of the file.
      • export AIRFLOW_HOME=~/airflow
    • Then re-source the file so you can use it immediately:
      • ~/.bashrc
  • Initialize the Airflow database (we just did defaults, so it will use a local SQLite DB).
    • airflow initdb

Then verify the install worked by checking its version:

root@03bae42c5cdb:/# airflow version
[2018-11-07 20:26:44,372] {__init__.py:51} INFO - Using executor SequentialExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.10.0

Run Airflow Services

The actual Airflow hello world page here: https://airflow.apache.org/start.html just says to run Airflow like this:

  • airflow webserver -p 8080
  • airflow scheduler

You probably want to run these in the background and tell the logs to go to a file, etc.

It’s more professional just to run it as a service (on CentOS/RHEL which is why I switched to CentOS from Ubuntu).  But it turns out that running it as a service in Docker is tricky.

Even if you get everything set up properly, Docker by default enables/disables some features for security that make systemctl not work (so you can’t start the service).  It sounds like this is a whole rework to get this working (read here).  https://serverfault.com/questions/824975/failed-to-get-d-bus-connection-operation-not-permitted.

Also, I realize my idea may have been flawed in the first place (running it as  service in a container).  Containers are really intended to hold micro-services.  So, it would make more sense to launch the web server and the scheduler as their own containers and allow them to communicate with each other probably (I’m still figuring this out).  This thread nudged me into realizing that: https://forums.docker.com/t/systemctl-status-is-not-working-in-my-docker-container/9075.

It says:

Normally when you run a container you aren’t running an init system. systemctl is a process that communicates with systemd over dbus. If you aren’t running dbus or systemd, I would expect systemctl to fail.

What is the pid1 of your docker container? It should reflect the entrypoint and command that were used to launch the container.

For example, if I do the following, my pid1 would be bash:

$ docker run --rm -it centos:7 bash
[root@180c9f6866f1 /]# ps faux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.7  0.1  11756  2856 ?        Ss   03:01   0:00 bash
root        15  0.0  0.1  47424  3300 ?        R+   03:02   0:00 ps faux

Since only bash and ps faux are running in the container, there would be nothing for systemctl to communicate with.


So, the below steps probably get it working if you set the container up right in the first place (as a privileged container), but it isn’t working for me for now.  So feel free to stop reading here and use Airflow, but it won’t be running as a service.

I might come back and update this post and/or make future one on how to run airflow in multiple containers.  I’m also aware that there is an awesome image here that gets everything off the ground instantly; but I was really trying to get it working myself to understand it better: https://hub.docker.com/r/puckel/docker-airflow/.

—- service setup (not complete yet)

I found information on the Airflow website here: https://airflow.readthedocs.io/en/stable/howto/run-with-systemd.html stating:

Airflow can integrate with systemd based systems. This makes watching your daemons easy as systemd can take care of restarting a daemon on failure. In the scripts/systemd directory you can find unit files that have been tested on Redhat based systems. You can copy those to/usr/lib/systemd/system. It is assumed that Airflow will run under airflow:airflow. If not (or if you are running on a non Redhat based system) you probably need to adjust the unit files.

Environment configuration is picked up from /etc/sysconfig/airflow. An example file is supplied. Make sure to specify the SCHEDULER_RUNS variable in this file when you run the scheduler. You can also define here, for example, AIRFLOW_HOME or AIRFLOW_CONFIG.

I didn’t see much in the installation, so I found the scripts on Git Hub for the 1.10 version that we are running (based on our earlier version prompt):

https://github.com/apache/incubator-airflow/tree/v1-10-stable/scripts/systemd

Based on this, I: