Python Dependency Management and Virtual Environments (vs Maven or NPM).

Historically, I’ve mostly used Python for automating minimal tasks that would otherwise have been bash scripts. So, terms like the following were alien to me, and I didn’t really know how to manage dependencies properly in Python.

  • pip
  • freezing
  • virtual environment

The main languages I’ve used in recent memory were Java and JavaScript.  They both have a dependency manager; so I expected Python to have one.  In Java, people generally use Maven.  In JavaScript, they generally use NPM (or YARN).  Either way, you make a file, note down some modules you require and their versions (if you’re smart), and then run a “mvn install” or “npm install” to go get all the stuff you need.

Maven is also a build system so, its more like NPM + WebPack in JavaScript; but nonetheless, they work similarly from a dependency management perspective.

Moving on to Python, I’ve learned the following:

Python’s version of Maven or NPM, and why it’s different:

  • PIP is python’s version of NPM or Maven.
  • However, it installs things globally for the python version, and not on a project basis.
  • So, if I had 2 projects with conflicting dependencies, I could have issues because… well… everything is global.
  • In a lot of cases, people install python modules as they need them by just randomly adding “pip install” to their release notes or running it when they’re hacking a server and need a new library.
  • This is clearly not a “production-ready” solution though.

Working on a per-project level:

  • Virtual environments are a bolt-on that allows you to properly run python in isolated environments.
  • You can install the module for working with virtual environments globally by running “pip install virtualenv”.
  • After this, for each project, you can create your own virtual environment with “virtualenv <env-name>”.  You can also specify the target python version you want to use if you have multiple, etc.
  • You activate a virtual environment by sourcing or running the “activate” bash or bat script (Linux or windows) in its bin folder.  The prior command will have created a folder with the environment name with many sub-folders, one of which is the bin folder.
  • Once the environment is activated, your shell prompt will change to show you’re within it.  Now if you run “pip list”, you’ll notice that you only have 3 basic dependencies; you are shielded from all of your global system ones.
  • You can run pip installs and python code here until your project works great (but only while you’re in the virtual environment).
  • Note that you should not necessarily keep your python code in your virtual environment.  This is probably similar to how you should not keep your Java code inside your maven directory or your JavaScript code inside your NPM directory.  I haven’t had experience either way with this, but I’ve seen it generally recommended in nearly all documentation I’ve come across.

Freezing your dependencies:

  • When you’re happy with it, you can do “pip freeze -l > requirements.txt” in order to generate a file that locks down your dependencies (the -l means just local ones, not global – and you should do it from your virtual environment).
  • Then you can install these in other places (e.g. on a prod server with automation) by doing “pip install -r requirements.txt”.  This makes it quite similar to installing a JavaScript application with npm install (which would get dependencies from the package.json file).
  • Again, if you were running multiple projects on the server, you might want to do this in a virtual environment to keep things isolated/clean.

I probably have a lot to learn still, and I’m sure this gets more complex as I’ve used enough languages to know that it takes time to fully learn these things.  But, I feel more comfortable with the idea of python in production now that I can see how you can isolate projects and install specific dependencies from a target file.

Apache Airflow Windows 10 Install (Ubuntu)

After my failed attempt at installing Aifrflow into python on Windows the normal way, I heard that it is better to run it in an Ubuntu sub-system available in the Windows 10 store.  So, I’m changing to this route.

You can find and install “Ubuntu” on the Windows 10 store, and it will give you a full fledged Ubuntu Linux shell.  Here’s what the installation looks like:

Ubuntu Installation

It installs quite quickly, then you just press “Launch”.  The shell opens, and in my case, I was presented with this:

Installing, this may take a few minutes…
WslRegisterDistribution failed with error: 0x8007019e
The Windows Subsystem for Linux optional component is not enabled. Please enable it and try again.
See https://aka.ms/wslinstall for details.
Press any key to continue…

Go to your start menu and type “features” and click “Turn Windows features on or off”, then check the “Windows Subsystem for Linux” box and press “OK”.

It will install some things and take a few minutes.  For me, it took about 2 minutes on “Searching for required files” even though I’m on a very fast corporate internet connection.  So, don’t be discouraged if that happens.

Unfortunately, you’ll have to reboot once this finishes!  Such is windows :(.

After the reboot, open the “Ubuntu” shell from your windows button search, and then it will take a minute to install and will ask you to create a user and ID (note that “admin” will not work, so don’t bother trying that).

Installing, this may take a few minutes…
Please create a default UNIX user account. The username does not need to match your Windows username.
For more information visit: https://aka.ms/wslusers
Enter new UNIX username:
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Installation successful!
To run a command as administrator (user “root”), use “sudo “.
See “man sudo_root” for details.

If you check, you’ll already have python installed.  It is version 3.6.5 for me which is good, because a previous post where I tried to install it on windows showed that Airflow is not compatible (yet) with Python 3.7 when pip installing as it added the “async” keyword which broke some things.

$ python3 –version
Python 3.6.5

Now, we should just have to install Airflow.  But we need pip first, and when I try to install pip the way it recommends (when you try to use it as is), then it doesn’t work.  So, I found this: https://askubuntu.com/questions/672808/sudo-apt-get-install-python-pip-is-failing which recommends:

sudo apt-get install software-properties-common
sudo apt-add-repository universe
sudo apt-get update

After you run those commands, you can run the last one:

sudo apt-get install python-pip

This is actually the one the Ubuntu terminal recommended if you just tried to blindly use pip in the first place; but it wouldn’t have worked without the other 3 first.  This took around 5 minutes to install for me, and and it will require you to say “y” for yes once to kick it off.

After this, we can FINALLY install Airflow properly.  This is a pretty big victory if you realize that I started on my other blog post trying to make it work in Windows first, and that was a rabbit hole in itself!

export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow

If you’re wondering why that first export line is there, just skip it and read the terminal error message which recommends it.  I ran into the same thing in the pure Windows install which failed in the other blog post.

This installation took around 3 minutes for me.  The Airflow documentation recommends initializing its database (SQLite by default) when you’re done as other things won’t work without it – https://airflow.apache.org/installation.html:

Surprisingly, I found I had to open a new terminal before I could use the airflow command.  I’m not sure if this is a quirk about running it on windows, or if I should have just sourced my profile again/etc as I didn’t play around with it.

In any case, initialize the DB and then check the version, and hopefully you’re as happy as I am to be done with that.
 

airflow initdb

hujo8003@USLJ96YRQ2:~$ airflow version
[2018-11-06 11:36:38,930] {__init__.py:51} INFO - Using executor SequentialExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.10.0