Minikube ImagePullBackOff – Local Docker Image

Background Context

Earlier today I was beginning to port our Presto cluster into kubernetes.  So, the first thing I did was containerize Presto and try to run it in a kubernetes deployment in minikube (locally).

I’m fairly new to minikube.  So far, I’m running it with vm-driver=None so that it uses the local docker instance rather than a virtualbox VM/etc.

The Error

So, I got my docker image building well and tested it within docker itself.  That all worked great.  Then I wrote my kubernetes deployment and ran it using the image… but unfortunately, it came up with the pod saying Error: ImagePullBackOff.

I went down a rabbit hole for a while after this because many posts talk about how to enable your minikube to have access to your local docker repo.   But when you’re running vm-driver=None, you are literally running in your local docker – so it should already have access to everything.

The actual error is: “Failed to pull image “qaas-presto:latest”: rpc error: code = Unknown desc = Error response from daemon: pull access denied for qaas-presto, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied”.

So, the issue is that it’s trying to do a pull and it can’t find the image… but it shouldn’t need to pull because the image already exists locally as it was built locally.


I found the workaround in this github entry: Basically, in your deployment/pod spec/whatever, you just set:

imagePullPolicy: Never

This makes it avoid trying to pull the image, so it never fails to find it.  It just assumes it is present, which it is, and it uses it and moves on.  You may not necessarily want to deploy your config to production with these settings, but you can always template them out with helm or something, so it’s a viable workaround.



12 Factor App Best Practices – Quick Notes

Why Review This?

The 12 factor app “best practices” are a good overview of most things you should be doing in a modern application.  They provide a good overview of many architectural and dev-ops practices which people tend to learn in a more peace-meal and painful way over time.

Sources / References

While reviewing this recently myself, I read and viewed a few sources.  I found this video on YouTube gave a very practical summary… especially for those who like to work in Java a lot such as myself.  I also found this other one from AWS which fills in some areas better than the other, especially when it comes to practical use with containerization and also in the area of logging.  I recommend taking the time to watch both.

The Twelve Factors

Provided with my interpretation/commentary of course based on the various sources I looked at.

  1. Code Base – Should be checked into version control (e.g. Git).  Should have  one application per repository (or sub-module).  E.g. if you have an API and a website, separate them so you don’t end up having to combine releases / so you aren’t inclined to couple them and muddy the waters.
  2. Dependencies – Should be declared.  Should also be external to your code base (e.g. pull from a maven repo, don’t check JARs into your git repo).  Should avoid using anything from outside the project (e.g. if you’re in Node, don’t global install anything, make sure everything is local to the project).  This also kind of leads to the fact that you should not rely on something like an app server (e.g. tomcat) external to your app to host your app.  Your app should be more like a spring-boot application which has is deployed with its own web server bundled in.
  3. Configuration – Passwords/tokens/connections/etc should be separate from your code.  E.g. using profiles and ansible-secrets/etc within your code base to handle multiple environments can be bad because you have to modify your code base to support new environments.  These configurations should be moved out to environment variables or some other mechanism.  The use of environment variables seems to be recommended in many frameworks/technologies online these days, so I think its good to do when possible.  Also note, the use of CNAMES and other flexible configuration mechanisms to abstract the location from which things are running is recommended.
  4. Backing Services – Databases/queues/caches/etc should be decoupled and easily changeable in your configuration.  Again, prefer CNAMEs/etc for abstraction.
  5. Build/Release/Run – A deployment = building code, combining it with config to form a release, and executing that release in the target environment.  This one sounds a little… flexible to me?  E.g. Helm lets you take a template, apply values to it, and deploy the result to kubernetes.  So, in that case, I guess the helm template/vars part is the “release” formation, and running it is the execution of helm?  We don’t need separate docker images built for each environment in this case.  If, on the other hand, we’re taking each environment’s config and generating a new docker image per environment, then we’d have one release per environment (code + config combined), and it would be nicely decoupled from the “run”, in which case this makes more direct sense.  Note that the AWS video calls this later case an anti-pattern and recommends deploying the config within the environment and using one standard docker image (which sounds better).  This could be achieved with config maps, secrets, etc in kubernetes, for example.
  6. Processes – Stateless (stick sessions = bad).
  7. Port Binding – Back to the end of #2/dependencies – apps should be self-contained.  They should provide their own web server and should just expose a port themselves as-is and should not depend on anything else to do so.
  8. Concurrency – Build apps to scale out, not just up.  Of course you can have more threads/pools/etc… but if you can break up your application well, you can have separate apps for each purpose and each can scale at its own pace and use resources effectively.  E.g. if I have an app reading files and dumping records to Kafka, and then it reads the kafka records back out and sends them to an external service, it can be 2 separate apps.  Maybe the first part that reads files needs 4 copies with large memory to keep pace, and the second one needs 10 copies with high CPU and low memory to keep pace.   They can be  dealt with individually and easily once designed well (and this sort of thing is a cake walk when using something like kubernetes).
  9. Disposability“Servers are cattle, not pets”.  You don’t want a long lived server (like an app server) where people know its name and IP and how to debug it.  You want little, ephemeral, easily created and destroyed servers (or rather, containers).  In the cloud, you may use auto-scaling-groups / scale sets (AWS/Azure) to easily scale up and scale down copies of a server (machine image).  But even this isn’t great; it’s heavyweight and slow.  If you have an container orchestrator like kubernetes (or docker swarm, etc) in the cloud, or even on-prem, you can scale up and down docker/etc containers in seconds at large scale.  Either way, you should basically never need to look at a running machine image or a container unless you’re debugging something.  You can create new ones trivially and you don’t care about them.  They are disposable.
  10. Dev/Prod Parity – All your environments should be identical minus their configuration.  Use the same databases, same caches, same web server versions (bundling web-servers into code like spring-boot does helps with the last one).
  11. Logs – View logs as a continuous stream of events and store them in an external place like ELK or Splunk.  Preferably have an agent separate from your app be responsible for uploading the log stream; just let your app write to standard output.
  12. Admin Processes – Things like database pruning, backups, etc. should be treated as any other application you are running and they should be distinct applications separate from the application(s) they are aiding.  E.g. any of those processes should basically follow these steps themselves.



Checkpoint Linux SSL Network Extender VPN Auto Closes after Connecting.

Not much of a post here… but FYI for everyone – I’m running on Ubuntu Linux.  There is no checkpoint VPN client for Linux, so I have to go through a website and use their “SSL Network Extender”.

A fairly large portion of the time, this seems to hang/break under load (e.g. reading lots of database results).  It also randomly stops working periodically.  I haven’t been able to figure out why honestly.

Eventually, if I keep reconnecting to it, I get into a situation where it auto-closes right after it connects.  I couldn’t fix this until I restarted my laptop.

Anyway, I just figured out that disabling my wireless card and re-enabling it also fixes that issue, and that is much, much faster.  So,  until we figure out the root cause for the other issues, I hope this helps you too!

Presto + Hive View Storage/Implementation

I’ve been learning about how presto handles views lately.  This is because we are heavily reliant on presto and we recently ran into multiple use cases where our hive metastore had views which wouldn’t work within presto.

What are Presto Views Exactly?

Presto has its own view implementation which is distinct from a hive’s view implementation.  Presto will not use a hive view, and if you try to  query one, you will get a clear error immediately.

A presto view is based on a Presto SQL query string.  A hive view is based on a a hive query string.  A hive query string is written in HQL (Hive Query  Language), and presto simply does not know that SQL dialect.

How Are Views Stored?

Presto actually stores its views in the exact same location as hive does.  The hive metastore database has a TBLS table which holds every hive table and view.  Views have two columns populated that tables ignore – view_original_text and view_expanded_text.  Hive views will have plain SQL in the view_original_text column whereas presto views will have some encoded representation prefixed with “/* Pesto View…”.   If presto queries a view and does not find it’s “/* Pesto View” prefix, it will consider it a hive view and say that it is not supported.

Making Presto Handle Hive Views

I’ve been doing work for some time to try to make presto-sql support hive views.  I’m using the pull request noted in this issue as a template.  It is fairly old though and was made against presto-db rather than presto-sql, so the exercise has turned out to be non-trivial.

I’m still chugging along and will post more when done.  But one thing to note is that this PR does not really make presto support hive views.  It actually allows presto to attempt to run hive views as they are.  Many hive views will be valid presto SQL – e.g. where you’re just selecting from a table with some basic joins and/or where clause filters.

So, this PR basically prevents presto from outright failing when it sees a view that does not start with “/* Presto View”.  It then helps it read the hive query text, possibly lightly modify it, and attempt to run it as if the same had been done for a Presto query.

I plan on doing a number of corrections to the SQL as well; e.g. replacing double quotes, converting back-ticks, replacing obvious function names like NVL with COALESCE, etc.  Eventually I may try to fix more by parsing the hive text with ANTLR or something similar to make as many hive views run by default as possible.  But it will never be a complete solution.  A complete solution would be very hard as it would require a full understanding and conversion of hive language to presto language (which is probably not even possible given some of their differences).

AWS Not Authorized to Use Launch Template (in Terraform or in Console)

This is just a quick note for anyone facing this issue.

A few of us lost about a day debugging what we thought was a terraform issue originally.  While we were creating an auto scaling group (ASG), we were getting “Invalid details specified: You are not authorized to use launch template…”.

It turned out that the same error was presented in the AWS console when we tried to create the ASG there.

After some substantial debugging, it turned out that terraform was allowed to create a launch template with an AMI (Amazon Machine Image) that did not exist.  We had used the AMI ID from our non-prod account in our prod account, but AMIs must exist in each account with unique IDs – so it wasn’t working.

It took us a while to get to this point in our debugging because, frankly, we were very astounded that the error message was so miss-leading.  We spent a very long time trying to figure out everything that could trigger a permissions error on the template itself, not realizing that a missing resource used within the template would make the whole template present that error.