Presto + Hive View Storage/Implementation

I’ve been learning about how presto handles views lately.  This is because we are heavily reliant on presto and we recently ran into multiple use cases where our hive metastore had views which wouldn’t work within presto.

What are Presto Views Exactly?

Presto has its own view implementation which is distinct from a hive’s view implementation.  Presto will not use a hive view, and if you try to  query one, you will get a clear error immediately.

A presto view is based on a Presto SQL query string.  A hive view is based on a a hive query string.  A hive query string is written in HQL (Hive Query  Language), and presto simply does not know that SQL dialect.

How Are Views Stored?

Presto actually stores its views in the exact same location as hive does.  The hive metastore database has a TBLS table which holds every hive table and view.  Views have two columns populated that tables ignore – view_original_text and view_expanded_text.  Hive views will have plain SQL in the view_original_text column whereas presto views will have some encoded representation prefixed with “/* Pesto View…”.   If presto queries a view and does not find it’s “/* Pesto View” prefix, it will consider it a hive view and say that it is not supported.

Making Presto Handle Hive Views

I’ve been doing work for some time to try to make presto-sql support hive views.  I’m using the pull request noted in this issue as a template.  It is fairly old though and was made against presto-db rather than presto-sql, so the exercise has turned out to be non-trivial.

I’m still chugging along and will post more when done.  But one thing to note is that this PR does not really make presto support hive views.  It actually allows presto to attempt to run hive views as they are.  Many hive views will be valid presto SQL – e.g. where you’re just selecting from a table with some basic joins and/or where clause filters.

So, this PR basically prevents presto from outright failing when it sees a view that does not start with “/* Presto View”.  It then helps it read the hive query text, possibly lightly modify it, and attempt to run it as if the same had been done for a Presto query.

I plan on doing a number of corrections to the SQL as well; e.g. replacing double quotes, converting back-ticks, replacing obvious function names like NVL with COALESCE, etc.  Eventually I may try to fix more by parsing the hive text with ANTLR or something similar to make as many hive views run by default as possible.  But it will never be a complete solution.  A complete solution would be very hard as it would require a full understanding and conversion of hive language to presto language (which is probably not even possible given some of their differences).

Hive Metastore DB (HMS DB) – Get All Tables/Columns including Partition Keys

How Hive Stores Schemas

I recently had to sync the schemas from a hive database to a normal MySQL database for reasons I won’t bother going into.  The exercise required me to get all columns from all tables in hive for each DB though, and I found that this was not amazingly straight-forward.

The hive metastore DB is a normal MySQL/etc database with a hive schema in it.  The hive schema holds the hive tables though.  So, the information schema is irrelevant to hive; to get the hive table details, you have to interrogate the TBLS table, for example.  To get columns, you need to interrogate COLUMNS_V2, and to get the databases themselves, you look toward the DBS table.

Missing Columns

Other posts I’ve seen would leave you here – but  I found when I joined these three tables together (plus some others you have to route through), I was still missing some columns in certain tables for some reason.  It turns out that partition columns are implicit in hive.  So, in a file system of hive data (like HDFS), a partition column in a table is literally represented by just having the directory named with the partition value; there are no columns with the value in the data.

What this means is that partition columns don’t show up in these normal tables.  You have to look to a separate partition keys table to find them with a separate query.

The Working Query

The query below finds all columns of any kind and sorts them in the order they’ll appear when you select from a table in hive/presto/etc.  I hope it helps you!

select db_name, table_name, column_name from (
SELECT d.NAME as db_name, t.TBL_NAME as table_name, c.COLUMN_NAME as column_name, c.INTEGER_IDX as idx
JOIN TBLS t on t.DB_ID = d.DB_ID
JOIN SDS s on t.SD_ID = s.SD_ID
WHERE d.NAME = :dbName
SELECT d.NAME as db_name, t.TBL_NAME as table_name, k.PKEY_NAME as column_name, 50000 + k.INTEGER_IDX
JOIN TBLS t on t.DB_ID = d.DB_ID
where d.NAME = :dbName
) x
order by db_name, table_name, idx

Creating Java Key Stores (JKS), Setting Validity Periods, and Analyzing JKS for Expiry Dates

What is a JKS File?

Taken from:

A Java KeyStore (JKS) is a repository of security certificates – either authorization certificates or public key certificates – plus corresponding private keys, used for instance in SSL encryption.

Basically, applications like Presto use JKS files to enable them to do Transport Layer Security (TLS).

How Do You Create One?

As an example, Presto, a common big-data query tool, uses JKS for both secure internal and external communication.  At this link they show you how to create one.

Here is an excerpt.  Note that you must make sure you replace * with the domain you will host the service using the JKS file on or it will not work.  Certificates are domain specific.

keytool -genkeypair -alias -keyalg RSA \
    -keystore keystore.jks
Enter keystore password:
Re-enter new password:
What is your first and last name?
  [Unknown]:  *
What is the name of your organizational unit?
What is the name of your organization?
What is the name of your City or Locality?
What is the name of your State or Province?
What is the two-letter country code for this unit?
Is CN=*, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
  [no]:  yes

Enter key password for <presto>
        (RETURN if same as keystore password):

Pit Fall! – 90 Day Expiry Date – Change It?

Now… here’s a fun pitfall.  Certificates are made with an expiry date and have to be reissued periodically (for security reasons).  The default expiry date for JKS is 90 days.

This certificate will be valid for 90 days, the default validity period if you don’t specify a –validity option.

This is fine on a big managed service with lots of attention.  But if you’re just TLS securing an internal app not many people see, you will probably forget to rotate it or neglect to set up appropriate automation.  Then, when it expires, things will break.

Now… for security reasons you generally shouldn’t set too high a value for certificate expiry time.  But for example purposes, here is how you would set it to 10 years.

keytool -genkeypair -alias -keyalg RSA \
    -keystore keystore.jks -validity 3650

Determine Expiry Date

If you have a JKS file that you are using for your application and you are not sure when it expires, here’s a command that you can use:

keytool -list -v -keystore keystore.jks

This will output different things based on where your key store came from.  E.g. you will probably see more interesting output from a real SSL cert than you will from a self-signed one created like we did above.

In any case, you will clearly see a line in the large output from this command that says something like this:

Valid from: Fri Jul 12 01:27:21 UTC 2019 until: Thu Oct 10 01:27:21 UTC 2019

Note that in this case, it is just valid for 3 months.  Be careful when looking at this output because you may find multiple expiry dates in the output for different components of the JKS file.  You need to make sure you read the right one.  Though, chances are that the one on your domain will be the one that expires earliest anyway.


Building Presto Admin

Presto Admin – Is it Worth Using?

I generally deploy Presto clusters using packer and terraform.  Packer builds an image for me with the target presto distribution, some utility scripts, ha proxy (to get multiple coordinators acting in HA), etc.

I kept noticing this presto-admin project though:  It allows you to quickly/easily deploy clusters from a central node, and it will handle the coordinator, workers, catalogs, and everything.  That sounds pretty cool.

Advance disclaimer – after I built this, I decided not to use it.  This was because it seems to just deploy a single coordinator and worker set.  For an HA setup, I need multiple coordinators, a load balancer, and workers/users pointing at the load balancer.  So, it’s just not the right fit for me.

Presto Admin – Build

In any case, I did go through the motions of building this – because I could not find a source release.  Fortunately, it’s pretty easy on Centos 7.x (basically RHEL 7.x):

# Download and unzip.
tar -xvzf 2.7.tar.gz

# Install pip/etc.
sudo yum install epel-release
sudo yum install python-pip
sudo yum install python-wheel

# Run make file and build web installer.
make dist

After this, just go into the dist folder and find prestoadmin-2.7-online.tar.gz.

I hope this saves you some time; I wasted around 20 minutes trying to find the dist online for download (which I never did).

Presto Custom Password Authentication Plugin (Internal)

Presto Authentication (Out of the Box)

Out of the box, presto will not make you authenticate to run any queries.  For example, you can just connect with JDBC from Java or DBeaver/etc and run whatever queries you want with any user name and no password.

When you want to enable a password, it has a few options out of the box:

So, unfortunately, if you just want to create some users/passwords and hand them out, or get users from an existing system or database, there really isn’t a way to do it here.

Custom Password Authenticator

Most things in Presto are implemented as plugins, including many of its own out-of-the-box features.  This is true for the LDAP authenticator above.  It actually is a “Password Authenticator” plugin implementation.

So, if we copy the LDAP plugin and modify it, we can actually make our own password plugin that lets us use any user-name and password/etc we want!  Note that we’ll just make another “internal” plugin which means we have to recompile presto.  I’ll try to make this an external plugin in another article.

Let’s Do It!

Note: we will be modifying presto code, and presto code only builds on Linux.  I use windows, so I do my work on an Ubuntu desktop in the cloud; but you can do whatever you like.  If you have a Linux desktop, it should build very easily out of Intellij or externally with Maven.

  1. Clone the presto source code.
  2. Open it in Intellij or your editor of choice.
  3. Find the “presto-password-authenticators” sub-project and navigate to com.facebook.presto.password.
  4. Copy to
  5. Copy LdapConfig to
  6. Copy LdapAuthenticatorFactory to
  7. For MyAutnenticatorConfig, make a password and a user private variable, both strings.  Then make getters and setters for them similar to the LDAP URL ones; though you can take out the patterns/etc.  You can remove everything else; our user name and password will be our only config.
  8. For MyAuthenticatorFactory, change getName() to return “my-authenticator”.  Also edit it so that it uses your new config class and so that it returns your new authenticator in config.
  9. For MyAuthenticator, just make the private Principal authenticate(String user, String password) {} method throw AccessDeniedException if the user and password don’t match the ones from your config.  You can get the ones from your config in the constructor.
  10. Add MyAuthenticator to the file under the LdapAuthenticatorFactory instance.

This is all the code changes you need for your new authenticator (assuming it works).  But how do we use/test it?

  • The code runs from the presto-main project.
  • Go there and edit; add these properties:
  • Then add a in the same directory with these properties:
  • Note that the is equal to the string you used in the factory class in step #8.
  • The first config file tells it to use a password plugin, the second one tells it which one to use based on that name and what extra properties to give to it based on the config (in our case, just a user and a password).

The Hard Part

Okay, so all of this was pretty easy to follow.  Unfortunately, if you go try to run this, it won’t work (Note that you can run the presto in IntelliJ by following these instructions).

Anyway, your plugin won’t work because password plugins don’t do anything unless you have HTTPS enabled on the coordinator.   This is because presto developers don’t want you sending clear-text passwords; so the password plugin type just makes it flat out not work!

We put properties in our first config file for HTTPS already.  Now you need to follow the JKS instructions here: to generate a key store and put it at the path from our config file.

Note that if you’re using AWS/etc, the “first and last name” should be the DNS name of your server.  If you put the IP, it won’t work.

Once you’ve got that set up, you can start presto and connect via JDBC from the local host by filling out the parameters noted here (

SSL Use HTTPS for connections
SSLKeyStorePath The location of the Java KeyStore file that contains the certificate and private key to use for authentication.
SSLKeyStorePassword The password for the KeyStore.

SSL = true, key store path is the one from the config file earlier, and the password is also the one from the config file.

Now, if you connect via JDBC and use the user name and password from your config file, everything should work!  Note that you probably want to get a real certificate in the future or else you’ll have to copy yours key store to each computer/node you use so JDBC works on each (which isn’t great).

UPDATE – 2019/08/04 – YOU should make clients use a trust store with the private key removed for numerous reasons.  See this newer post for how to take this JKS file and modify it for client use properly: