Checkpoint Linux SSL Network Extender VPN Auto Closes after Connecting.

Not much of a post here… but FYI for everyone – I’m running on Ubuntu Linux.  There is no checkpoint VPN client for Linux, so I have to go through a website and use their “SSL Network Extender”.

A fairly large portion of the time, this seems to hang/break under load (e.g. reading lots of database results).  It also randomly stops working periodically.  I haven’t been able to figure out why honestly.

Eventually, if I keep reconnecting to it, I get into a situation where it auto-closes right after it connects.  I couldn’t fix this until I restarted my laptop.

Anyway, I just figured out that disabling my wireless card and re-enabling it also fixes that issue, and that is much, much faster.  So,  until we figure out the root cause for the other issues, I hope this helps you too!

Presto + Hive View Storage/Implementation

I’ve been learning about how presto handles views lately.  This is because we are heavily reliant on presto and we recently ran into multiple use cases where our hive metastore had views which wouldn’t work within presto.

What are Presto Views Exactly?

Presto has its own view implementation which is distinct from a hive’s view implementation.  Presto will not use a hive view, and if you try to  query one, you will get a clear error immediately.

A presto view is based on a Presto SQL query string.  A hive view is based on a a hive query string.  A hive query string is written in HQL (Hive Query  Language), and presto simply does not know that SQL dialect.

How Are Views Stored?

Presto actually stores its views in the exact same location as hive does.  The hive metastore database has a TBLS table which holds every hive table and view.  Views have two columns populated that tables ignore – view_original_text and view_expanded_text.  Hive views will have plain SQL in the view_original_text column whereas presto views will have some encoded representation prefixed with “/* Pesto View…”.   If presto queries a view and does not find it’s “/* Pesto View” prefix, it will consider it a hive view and say that it is not supported.

Making Presto Handle Hive Views

I’ve been doing work for some time to try to make presto-sql support hive views.  I’m using the pull request noted in this issue https://github.com/prestodb/presto/issues/7338 as a template.  It is fairly old though and was made against presto-db rather than presto-sql, so the exercise has turned out to be non-trivial.

I’m still chugging along and will post more when done.  But one thing to note is that this PR does not really make presto support hive views.  It actually allows presto to attempt to run hive views as they are.  Many hive views will be valid presto SQL – e.g. where you’re just selecting from a table with some basic joins and/or where clause filters.

So, this PR basically prevents presto from outright failing when it sees a view that does not start with “/* Presto View”.  It then helps it read the hive query text, possibly lightly modify it, and attempt to run it as if the same had been done for a Presto query.

I plan on doing a number of corrections to the SQL as well; e.g. replacing double quotes, converting back-ticks, replacing obvious function names like NVL with COALESCE, etc.  Eventually I may try to fix more by parsing the hive text with ANTLR or something similar to make as many hive views run by default as possible.  But it will never be a complete solution.  A complete solution would be very hard as it would require a full understanding and conversion of hive language to presto language (which is probably not even possible given some of their differences).

AWS Not Authorized to Use Launch Template (in Terraform or in Console)

This is just a quick note for anyone facing this issue.

A few of us lost about a day debugging what we thought was a terraform issue originally.  While we were creating an auto scaling group (ASG), we were getting “Invalid details specified: You are not authorized to use launch template…”.

It turned out that the same error was presented in the AWS console when we tried to create the ASG there.

After some substantial debugging, it turned out that terraform was allowed to create a launch template with an AMI (Amazon Machine Image) that did not exist.  We had used the AMI ID from our non-prod account in our prod account, but AMIs must exist in each account with unique IDs – so it wasn’t working.

It took us a while to get to this point in our debugging because, frankly, we were very astounded that the error message was so miss-leading.  We spent a very long time trying to figure out everything that could trigger a permissions error on the template itself, not realizing that a missing resource used within the template would make the whole template present that error.

Hive Metastore DB (HMS DB) – Get All Tables/Columns including Partition Keys

How Hive Stores Schemas

I recently had to sync the schemas from a hive database to a normal MySQL database for reasons I won’t bother going into.  The exercise required me to get all columns from all tables in hive for each DB though, and I found that this was not amazingly straight-forward.

The hive metastore DB is a normal MySQL/etc database with a hive schema in it.  The hive schema holds the hive tables though.  So, the information schema is irrelevant to hive; to get the hive table details, you have to interrogate the TBLS table, for example.  To get columns, you need to interrogate COLUMNS_V2, and to get the databases themselves, you look toward the DBS table.

Missing Columns

Other posts I’ve seen would leave you here – but  I found when I joined these three tables together (plus some others you have to route through), I was still missing some columns in certain tables for some reason.  It turns out that partition columns are implicit in hive.  So, in a file system of hive data (like HDFS), a partition column in a table is literally represented by just having the directory named with the partition value; there are no columns with the value in the data.

What this means is that partition columns don’t show up in these normal tables.  You have to look to a separate partition keys table to find them with a separate query.

The Working Query

The query below finds all columns of any kind and sorts them in the order they’ll appear when you select from a table in hive/presto/etc.  I hope it helps you!

select db_name, table_name, column_name from (
SELECT d.NAME as db_name, t.TBL_NAME as table_name, c.COLUMN_NAME as column_name, c.INTEGER_IDX as idx
FROM DBS d
JOIN TBLS t on t.DB_ID = d.DB_ID
JOIN SDS s on t.SD_ID = s.SD_ID
JOIN COLUMNS_V2 c on c.CD_ID = s.CD_ID
WHERE d.NAME = :dbName
union
SELECT d.NAME as db_name, t.TBL_NAME as table_name, k.PKEY_NAME as column_name, 50000 + k.INTEGER_IDX
FROM DBS d
JOIN TBLS t on t.DB_ID = d.DB_ID
join PARTITION_KEYS k on t.TBL_ID = k.TBL_ID
where d.NAME = :dbName
) x
order by db_name, table_name, idx

Ubuntu Desktop 18 + Vi – Add Color Schemes (Molokai)

This is just a simple reminder for myself.  To get VIM working with custom color schemes on Ubuntu Desktop, you need to install a couple extra packages.

Here are the steps to install the molokai theme from GitHub properly:

sudo apt-get update
sudo apt-get install vim-gui-common -y
sudo apt-get install vim-runtime -y
mkdir -p ~/.vim/colors
cd ~/.vim/colors
wget https://raw.githubusercontent.com/tomasr/molokai/master/colors/molokai.vim

And to enable it as default, you have to edit your “~/.vimrc” file to say the following color scheme line. I also left in a couple lines that help arrow keys and backspace to work normally for the heck of it.

set nocompatible
set backspace=2
colorscheme molokai

Fn / Function Key Lock on Dell Laptops (Running Ubuntu which is Irrelevant)

This isn’t much of a post, but I suspect some people will find it useful =).

If your laptop is acting as if your function key is always held down; e.g. the F1-F12 keys do the special action (change volume, or whatever) instead of their normal purpose, then try Hold Fn + Esc and, depending on your model, it may disable the “Function Key Lock”.

On my laptop this actually shows a picture of a lock with the letters Fn within it to indicate it locks the function key as on.  But I didn’t see that in advance and had to google more than I would have liked to figure it out.

Linux Neatly Manage SSH Sessions Like MRemoteNG on Windows

SSH technically works the same anywhere, but on Windows without special tooling, using SSH can be cumbersome as the CLI is not very… elegant?  Generally, on Windows, people use tools like putty or MRemoteNG (also putty based) to make things neater.

If you’re on Linux, you can use SSH out of the box very cleanly by leveraging the config file to create named hosts.

For example, this verbose command:

ssh -i ~/.ssh/prod-key.pem centos@10.20.30.40

Can be replaced by the following command – and the nice “pgadmin-prod” name even auto-completes.  So, if you have 100 servers for different things, you can still find them easily.

ssh pgadmin-prod

This works if you add the following block into your ~/.ssh/config file.

Host pgadmin-prod
User centos
HostName 10.20.30.40
IdentityFile ~/.ssh/prod-key.pem

Where:

  • Host = the user-friendly name I want to use for this connection.
  • User = the user I normally have to log in as.
  • HostName = the IP address of the target host.
  • IdentityFile = the path to the private key used for the connection.

You can have as many named hosts as you like and, again, their names will auto complete after the ssh command – so they’re very easy to use.

This gets even more effective if you combine it with a screen/session management app like tmux or screen.  So, I recommend you look into those too if you don’t use them already (tmux is a little more modern if you’re a new to both).

I hope this helps you be a little more efficient, and thanks for reading.