Jupyter/Hub – Export Data to User (Not Local) PC

While building a mostly widget-based notebook for some other people, I came across a situation where I needed to allow them to export data from a pandas data frame to CSV.  This seemed trivial, but it actually was not.

What’s the Problem!?

I was building a notebook that was intended to run on Jupyter Hub… so, not on the same PC as the person using it. So, when I just saved the file, it was on the notebook server and the user could not access it.

Solutions?

My first thought was to have the notebook servers automatically set up a file server and just to save the files there.  Then the notebook could give users the URL to the file via the file server. I’m sure this would work, but it requires extra components and would require some clean-up of old files now and then.

While searching online, I found this solution which is much more elegant (though it will take a little extra memory). 

It base64 encodes the content and provides a link to it that way (the link actually contains all the data).  You can find the original article from medium by clicking here.  It has some other options as well.  I changed the display line and added an extra import and some altered CSV generation arguments; aside from that it is theirs.


from IPython.display import HTML
import base64

def create_download_link( df, title = "Download CSV file", filename = "data.csv"):  
    csv = df.to_csv(index=False, line_terminator='\r\n')
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

display(create_download_link, your_data_frame)

I hope it helps you!

What is Jupyter Hub?

First Things First… What is Jupyter?

Lately, I’ve been moving into the Python world where I quickly encountered Jupyter notebooks.  They seem like a pretty dominant technology that lets you script python block-by-block and render the results.  You can also render data into charts, manage user-interface widgets, and do most anything else.

What is the Problem With Jupyter?

But Jupyter really just runs on a single machine.  What about when you want to share this information to say, teach a class, or work with a team of data scientists?

So… We Have Jupyter Hub!

Jupyter Hub is a multi-user version of Jupyter… so it fixes our problems! Here I’ll paraphrase content and use images from a wonderful video I watched on YouTube – you can watch it at the bottom of this post if you like.

Basically, Jupyter Hub just provides a higher level service to the standard Jupyter notebooks.  It contains:

  1. A proxy server to route requests.
  2. A “hub” which handles authentication, user details, and spawning new notebooks.  Authentication is flexible and can most likely tie in your corporate authentication system.
  3. Any number of spawned Jupyter processes to run notebooks for the given users.  A variety of spawning techniques exist (e.g. spawning to Docker).

You can see this architecture below.

Image result for jupyter hub

So, if you need multi-user Jupyter, I suggest you look into installing and trying Jupyter hub, and I highly recommend the video below as a starting point!

Escape HTML in WordPress

This sounds a little silly, but even as a developer I briefly got confused while trying to render (or rather, not render) HTML in WordPress.

I typically dump code into WordPress using their markdown syntax, which works great for almost everything.  But if you need to actually put HTML in a code block, that fails because it will literally get rendered into the page!

The solution is easy though.  Just google “HTML Entity Encoder” or something similar and you’ll get to a site like this https://mothereff.in/html-entities where you can enter your HTML and have it encoded so that it will display properly.

In case that doesn’t make sense to you, let’s use a div opening tag as an example.  It would change to &lt;div&gt; where the “lt” is less-than, or “<“, and the “rt” is greater-than, or “>”.  Since it’s encoded, it will be displayed properly, but the div tag will not be interpreted part of the page.

 

Jupyter Auto-Run Cells on Load

Why Do We Need This?

If you are making a Jupyter notebook that heavily uses widgets and conceals the code used to make them, you’ll quickly run into an issue. Another person coming to this notebook would basically just see this message for all of your widgets:

“A Jupyter widget could not be displayed because the widget state could not be found. This could happen if the kernel storing the widget is no longer available, or if the widget state was not saved in the notebook. You may be able to create the widget by running the appropriate cells.”.

You can simulate this for yourself by pressing the “restart the kernel (with dialog)” button and then force refreshing your browser (ctrl + shift + r in chrome).

How Do We Do It?

I came across this stack-overflow post which gives a good solution (especially if you are already hiding code in other areas to make it look neater, like I noted in this blog).

Just paste this in its own cell at the top of your notebook:

%%html
<script>
    // AUTORUN ALL CELLS ON NOTEBOOK-LOAD!
    require(
        ['base/js/namespace', 'jquery'], 
        function(jupyter, $) {
            $(jupyter.events).on("kernel_ready.Kernel", function () {
                console.log("Auto-running all cells-below...");
                jupyter.actions.call('jupyter-notebook:run-all-cells-below');
                jupyter.actions.call('jupyter-notebook:save-notebook');
            });
        }
    );
</script>

 
Then all your cells will run on load and all of your widgets will show up nice and neat the first time around.

Read-only / Protected Jupyter Notebooks

Jupyter notebooks are fantastic, but they’re really geared at developers.  I had to lock one down so it could be used by non-developers (without damaging it).  It took quite a lot of googling!

I figure that a lot of people must need this.  If I were a university instructor, I’d like to send students to a server, let them play, but prevent them from breaking my example.

Locking Things Down

Here are the different things I did to mitigate damage:

  1. You can actually make the entire notebook read-only by setting the file permissions on command line or in properties (works for Windows and Linux).  Jupyter will detect this (after you reload the page) and show that you can’t save anything.
  2. You can make individual cells un-deletable and un-editable (so they don’t mess up the top cells that the lower down cells they’re working with depend on):
    • Run a cell.
    • Click “Edit Metadata” in its banner.
    • Add:
      • “deletable”: false,
      • “edittable”: false,
  3. You can actually hide the code for a range of cells like this code which hides the first four (which is very useful if you’re using iPython UI widgets and just want to show widgets and now how they were made) – disclaimer – I got this off stack overflow but am having trouble finding it to reference it currently:
from IPython.display import HTML
HTML('''

    code_show=true;
    function code_toggle_and_hide_move_buttons() {
        if (code_show){
            $('div.input').slice(0,4).hide();
            $('#move_up_down').hide()
        }
        else {
            $('div.input').slice(0,4).show();
            $('#move_up_down').show()
        }
        code_show = !code_show
    }
    $(document).ready(code_toggle_and_hide_move_buttons);


    

''')

Further Recommendations

I would also suggest making sure you disable the terminal in your Jupyter config file and that you set a known location for your notebooks to be loaded from so that you can add the read-only attribute.

Also, you can disable various hotkeys in the UI, and you can use a CSS selector similar to the one in my “hide code” example above to hide the move-up/move-down cell buttons to help prevent errors cropping in that way.