Presto + Hive View Storage/Implementation

I’ve been learning about how presto handles views lately.  This is because we are heavily reliant on presto and we recently ran into multiple use cases where our hive metastore had views which wouldn’t work within presto.

What are Presto Views Exactly?

Presto has its own view implementation which is distinct from a hive’s view implementation.  Presto will not use a hive view, and if you try to  query one, you will get a clear error immediately.

A presto view is based on a Presto SQL query string.  A hive view is based on a a hive query string.  A hive query string is written in HQL (Hive Query  Language), and presto simply does not know that SQL dialect.

How Are Views Stored?

Presto actually stores its views in the exact same location as hive does.  The hive metastore database has a TBLS table which holds every hive table and view.  Views have two columns populated that tables ignore – view_original_text and view_expanded_text.  Hive views will have plain SQL in the view_original_text column whereas presto views will have some encoded representation prefixed with “/* Pesto View…”.   If presto queries a view and does not find it’s “/* Pesto View” prefix, it will consider it a hive view and say that it is not supported.

Making Presto Handle Hive Views

I’ve been doing work for some time to try to make presto-sql support hive views.  I’m using the pull request noted in this issue https://github.com/prestodb/presto/issues/7338 as a template.  It is fairly old though and was made against presto-db rather than presto-sql, so the exercise has turned out to be non-trivial.

I’m still chugging along and will post more when done.  But one thing to note is that this PR does not really make presto support hive views.  It actually allows presto to attempt to run hive views as they are.  Many hive views will be valid presto SQL – e.g. where you’re just selecting from a table with some basic joins and/or where clause filters.

So, this PR basically prevents presto from outright failing when it sees a view that does not start with “/* Presto View”.  It then helps it read the hive query text, possibly lightly modify it, and attempt to run it as if the same had been done for a Presto query.

I plan on doing a number of corrections to the SQL as well; e.g. replacing double quotes, converting back-ticks, replacing obvious function names like NVL with COALESCE, etc.  Eventually I may try to fix more by parsing the hive text with ANTLR or something similar to make as many hive views run by default as possible.  But it will never be a complete solution.  A complete solution would be very hard as it would require a full understanding and conversion of hive language to presto language (which is probably not even possible given some of their differences).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s