Presto Doesn’t Work with Apache Ranger (Yet)

Google Group Discovery

After a fairly long fight at building ranger and getting it ready to install, I came across this google group item randomly which made me sad:

https://groups.google.com/forum/m/#!topic/presto-users/gp5tRn9J7kk

It has the following question:

I have setup Presto, Hive, Hue and also setup Ranger for controlling column level access to LDAP users.
Able to see the restrictions getting applied on Hive queries by LDAP users, but however these restrictions are not getting applied on Presto queries.
I understand Presto also uses the same Hive Metastore and Can someone help me why the restricted column access are obeyed in Hive and not Presto when logged in as LDAP user?
And this response:

I am afraid Presto is not integrated with Apache Ranger today. Instead Presto only obeys table-level permissions defined in Hive Metastore.

It’s definitely a roadmap item, we have heard similar requests for integration with Apache Sentry. No specific target date for either at this point.

The Verdict

So, unfortunately, it looks like even if I do finish installing Ranger, I will not be able to get the column level security I’m looking for in Presto.  So, I’m going to move on to analyzing other non-Ranger options.  I’ll also had somewhat ruled out Sentry even before reading this due to a stack-overflow post I read: https://stackoverflow.com/a/56247090/857994 which states:

Just quick update with Cloudera+Hortonworks merge last year. These companies have decided to standardize on Ranger. CDH5 and CDH6 will still use Sentry until CDH product line retires in ~2-3 years. Ranger will be used for Cloudera+Hortonworks’ combined “Unity” platform.

Cloudera were saying to us that Ranger is a more “mature” product. Since Unity hasn’t released yet (as of May 2019), something may come up in the future, but that’s the current direction. If you’re a former Cloudera customer / or CDH user, you would still have to use Apache Sentry. There is a significant overlap between Sentry and Ranger, but if you start fresh, definitely look at Ranger.

I had also already seen numerous other things online agreeing with this and saying that Sentry is weak and Ranger is far more advanced; so this is not surprising.

Eventual Implementation

I found this page https://cwiki.apache.org/confluence/display/RANGER/Presto+Plugin which tells you how to use a ranger-presto plugin.  It was literally made and last edited on May 19th 2019 and refers to version 1.2 of Ranger (the current release).

As I’m writing this on June 9th and 1.2 was released in September 2018 (based on its release note creation date at this site https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+1.2.0+-+Release+Notes), this is clearly not released yet.

I double checked on git hub and sure enough, this was just committed 20 days ago.

I wrote one of the committers to get their view on this problem and potential release schedules/etc just for future reference.

Other Options

Apparently Starburst, a Presto vendor company that works on top of various clouds (Azure and AWS), has integrated Sentry and Ranger into their Presto distribution.  You can see that here: https://www.starburstdata.com/technical-blog/presto-security-apache-ranger/.

AWS is also working on Cloud Formation (still in Preview) which supports column level authorization with its Athena (Presto) engine.

 

Hive + Presto + Ranger Version Hell

My Use Case

I was trying to test out Apache Ranger in order to give Presto column-level security over hive data.  Presto itself doesn’t seem to support Ranger yet, though some github entries suggest it will soon.  Ranger can integrate with hive though so that when presto queries hive, the security can work fine (apparently).

Conflicting Versions

I started off by deploying a version of Hive I’ve worked with before; 2.3.5, the latest 2.x version (I avoided 3.x).  After that, I deployed Presto .220, also the latest version.

This was all working great, so I moved on to Ranger.  This is when I found out that the Ranger docs specifically say that it only works with Hive version 1.2.0:

Apache Ranger version 0.5.x is compatible with only the component versions mentioned below

HIVE 1.2.0 https://hive.apache.org/downloads.html

That came from this link: https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+0.5.0+Installation.

Alternative Options

I have a fairly stringent need for the security Ranger provides.  So, I was willing to use a 1.x version of hive, depending on what the feature loss was.  After all, quite a few big providers seem to use 1.x.

Unfortunately, the next thing I noticed was that Presto says: “The Hive connector supports Apache Hadoop 2.x and derivative distributions including Cloudera CDH 5 and Hortonworks Data Platform (HDP).”

That is coming from its latest documentation: https://prestodb.github.io/docs/current/connector/hive.html.

I’m not particularly excited to start digging through old versions of Presto as well.

Next Steps

I’m going to try to stick with Hive 2.x for now and a modern version of Presto.  So, my options are:

  1. Research Ranger more and see if it can actually work with Hive 2.x.  Various vendors seem to use Ranger and Hive/Presto together; so I’m curious to see how.  Maybe the documentation on Ranger is just out of date (I know, being hopeful).
  2. Look at Ranger alternatives like Apache Sentry and see if they support Hive 2.x.  Apparently Ranger is beating out Sentry in features, usage, and future support… so I’m not excited about using Sentry.  But if it works, I can always migrate back to Ranger once its support grows for either Hive or Presto.

Update

I starting digging in from JIRA and mailing lists and found that Ranger appears to have had work done on it as early as 2017 for supporting hive 2.3.2.  Here’s a link.  https://issues.apache.org/jira/browse/RANGER-1927.

So, I’m going to give installing ranger a shot on 2.3.5 and see if it works.  If not, I’ll try with 2.3.2 and/or seek community help.  Hopefully I’ll come back and update this afterward with some good news :).