Custom HiveAuthorizationProvider Example to Block Column Access

Hive has a very good set of hooks which you can use to customize all kinds of things.  It also has other “pluggable” areas which are basically hooks, but that aren’t called as such.

Here is a great article to get you started on Hive Hooks -> http://dharmeshkakadia.github.io/hive-hook/.

Creating a HiveAuthorizationProvider

In this case we aren’t implementing a hook specifically, but we’re doing the same exact flow to create our own HiveAuthorizationProvider.  We’ll do a very simple example to just block access to any column named “description” (as a silly example).

package com.john.humphreys.hive;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hive.metastore.api.Database;
import org.apache.hadoop.hive.ql.metadata.AuthorizationException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.metadata.Partition;
import org.apache.hadoop.hive.ql.metadata.Table;
import org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProviderBase;
import org.apache.hadoop.hive.ql.security.authorization.Privilege;

import java.util.List;

public class MyHiveAuthorizationProvider
        extends HiveAuthorizationProviderBase {
    @Override
    public void init(Configuration conf) throws HiveException {

    }

    @Override
    public void authorize(Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException {

    }

    @Override
    public void authorize(Database db, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException {

    }

    @Override
    public void authorize(Table table, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException {

    }

    @Override
    public void authorize(Partition part, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException {

    }

    @Override
    public void authorize(Table table, Partition part, List<String> columns, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException {
        if (columns.contains("description")) {
            throw new AuthorizationException("Not allowed to select description column!");
        }
    }
}

The only dependency required by this in maven is:

<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-exec</artifactId>
  <version>2.3.5</version>
</dependency>

You literally don’t need any special build plugins or anything. If you build a project with just this (Java 1.8), and you take the JAR file, and you put it in your hive/lib folder, then you’re almost ready.

The last step is to modify your hive-site.xml and to add these 2 properties:

<property>
  <name>hive.security.authorization.enabled</name>
  <value>true</value>
</property>

<property>
  <name>hive.security.authorization.manager</name>
  <value>com.john.humphreys.hive.MyHiveAuthorizationProvider</value>
</property>

After that, restart your hiveserver2, and when you try to select the “description” column from any table with it, it will get rejected.

Example In Practice

If I have a table called sample_data and I have a description column in it, and I run this query:

select * from (
    select * from (
        select description from sample_data
        ) x
    ) y;

I get this result:

Query execution failed

Reason:
SQL Error [403] [42000]: Error while compiling statement: Not allowed to select description column!

So, we can see it worked properly.

Limitations

Unfortunately, while this guards hive, it surprisingly doesn’t guard Presto when it access data via the hive metastore. So, as I need to guard hive and presto, I need to understand why and see if there is some other option.

Java – Get Parent Class Private Variable in Sub Class

I’m working on making a derived version of the Presto Hive plugin.  Unfortunately, the base class I need to inherit from uses its own private class loader, and the function I need to override (which is override-able!) for some reason requires that class loader as a parameter.

Anyway, long story short, I need to get the parent object’s private field to use it in the sub class I’m creating.  Reflection to the rescue!

Note: This is not generally good programming practice. Understand what the code does and why it does it before doing this.

Solution

//Class A file.

public class ClassA {
    private String name;
    public ClassA() {
        this.name = "Hello World!";
    }
}

// Class B file.

import java.lang.reflect.Field;

public class ClassB extends ClassA {
    public ClassB() {
        super();
    }

    public void printSuperPrivateMember() throws Exception {
        Field nameField = ClassA.class.getDeclaredField("name");
        nameField.setAccessible(true);
        System.out.println((String) nameField.get(this));
    }

    public static void main(String[] args) throws Exception {
        ClassB b = new ClassB();
        b.printSuperPrivateMember();
    }
}

Presto Doesn’t Work with Apache Ranger (Yet)

Google Group Discovery

After a fairly long fight at building ranger and getting it ready to install, I came across this google group item randomly which made me sad:

https://groups.google.com/forum/m/#!topic/presto-users/gp5tRn9J7kk

It has the following question:

I have setup Presto, Hive, Hue and also setup Ranger for controlling column level access to LDAP users.
Able to see the restrictions getting applied on Hive queries by LDAP users, but however these restrictions are not getting applied on Presto queries.
I understand Presto also uses the same Hive Metastore and Can someone help me why the restricted column access are obeyed in Hive and not Presto when logged in as LDAP user?
And this response:

I am afraid Presto is not integrated with Apache Ranger today. Instead Presto only obeys table-level permissions defined in Hive Metastore.

It’s definitely a roadmap item, we have heard similar requests for integration with Apache Sentry. No specific target date for either at this point.

The Verdict

So, unfortunately, it looks like even if I do finish installing Ranger, I will not be able to get the column level security I’m looking for in Presto.  So, I’m going to move on to analyzing other non-Ranger options.  I’ll also had somewhat ruled out Sentry even before reading this due to a stack-overflow post I read: https://stackoverflow.com/a/56247090/857994 which states:

Just quick update with Cloudera+Hortonworks merge last year. These companies have decided to standardize on Ranger. CDH5 and CDH6 will still use Sentry until CDH product line retires in ~2-3 years. Ranger will be used for Cloudera+Hortonworks’ combined “Unity” platform.

Cloudera were saying to us that Ranger is a more “mature” product. Since Unity hasn’t released yet (as of May 2019), something may come up in the future, but that’s the current direction. If you’re a former Cloudera customer / or CDH user, you would still have to use Apache Sentry. There is a significant overlap between Sentry and Ranger, but if you start fresh, definitely look at Ranger.

I had also already seen numerous other things online agreeing with this and saying that Sentry is weak and Ranger is far more advanced; so this is not surprising.

Eventual Implementation

I found this page https://cwiki.apache.org/confluence/display/RANGER/Presto+Plugin which tells you how to use a ranger-presto plugin.  It was literally made and last edited on May 19th 2019 and refers to version 1.2 of Ranger (the current release).

As I’m writing this on June 9th and 1.2 was released in September 2018 (based on its release note creation date at this site https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+1.2.0+-+Release+Notes), this is clearly not released yet.

I double checked on git hub and sure enough, this was just committed 20 days ago.

I wrote one of the committers to get their view on this problem and potential release schedules/etc just for future reference.

Other Options

Apparently Starburst, a Presto vendor company that works on top of various clouds (Azure and AWS), has integrated Sentry and Ranger into their Presto distribution.  You can see that here: https://www.starburstdata.com/technical-blog/presto-security-apache-ranger/.

AWS is also working on Cloud Formation (still in Preview) which supports column level authorization with its Athena (Presto) engine.

 

Building Apache Ranger

I was not particularly thrilled to see that I have to build ranger myself to get the various binaries needed for it.

Anyway, the first thing I did was download a “release”.  There is surprisingly little information on what a “release” is or how to use it.  But, given that all installation documentation seems to ask for artifacts named like ranger-%version-number%-admin.tar.gz” and I didn’t see any gz files, I assumed it was more like a bundled source code release that had to be built.

Note: referring to documentation here: https://ranger.apache.org/quick_start_guide.html and he release I used is here: http://mirrors.sonic.net/apache/ranger/1.2.0/apache-ranger-1.2.0.tar.gz.

Docker Build Script

My initial thought was to do the build using the convenient sounding “build_ranger_using_docker.sh” script which is in the root directory.  So, I installed docker quickly, did a docker login, and ran it.  It failed! (on Centos 7 for the record).

It tried to download a version of maven which doesn’t exist on the maven site currently.  If you switch to a slightly newer one the script breaks due to the maven release artifacts being a little different too.  So, I reverted to 3.3.9 which required changes to multiple lines.

After that, it went through to the end and failed on the last step with “gosu: not found”.  There had been some scary red text higher up about “no ultimately trusted keys found” related to installing gosu.

I tried various ways of fixing this and they all failed (on Centos 7.x)… but to be honest, I didn’t invest my own time in reading up on gosu or why the various proposed solutions were failing.

Build with Maven

Giving up and building with Maven failed on Centos and my Windows 10 box with similar python errors about half way through despite one being on Python 2 and one being on Python 3.  So, building straight from source wasn’t great either.

Success

I decided to go back to the docker build.  This time, I removed some of the maven validations, used a newer version of maven (which I’m confident doesn’t matter much).  But I also removed the gosu install and usage from the final build commands.

This finally worked.  Note that my copy is hacky and doesn’t bother using the “builder” account to do the build.  But it worked at least and built the artifacts.  So, I’m happy enough for my own purposes.  If it was a long-running web app or something, I’d go work out the bugs in the docker container/gosu/etc – but that’s not required for a build utility.

After this, you see a nice listing of tar.gz files in the ./target folder like so:

antrun ranger-1.2.0-kafka-plugin.zip ranger-1.2.0-sqoop-plugin.zip
archive-tmp ranger-1.2.0-kms.tar.gz ranger-1.2.0-src.tar.gz
maven-shared-archive-resources ranger-1.2.0-kms.zip ranger-1.2.0-src.zip
ranger-1.2.0-admin.tar.gz ranger-1.2.0-knox-plugin.tar.gz ranger-1.2.0-storm-plugin.tar.gz
ranger-1.2.0-admin.zip ranger-1.2.0-knox-plugin.zip ranger-1.2.0-storm-plugin.zip
ranger-1.2.0-atlas-plugin.tar.gz ranger-1.2.0-kylin-plugin.tar.gz ranger-1.2.0-tagsync.tar.gz
ranger-1.2.0-atlas-plugin.zip ranger-1.2.0-kylin-plugin.zip ranger-1.2.0-tagsync.zip
ranger-1.2.0-hbase-plugin.tar.gz ranger-1.2.0-migration-util.tar.gz ranger-1.2.0-usersync.tar.gz
ranger-1.2.0-hbase-plugin.zip ranger-1.2.0-migration-util.zip ranger-1.2.0-usersync.zip
ranger-1.2.0-hdfs-plugin.tar.gz ranger-1.2.0-ranger-tools.tar.gz ranger-1.2.0-yarn-plugin.tar.gz
ranger-1.2.0-hdfs-plugin.zip ranger-1.2.0-ranger-tools.zip ranger-1.2.0-yarn-plugin.zip
ranger-1.2.0-hive-plugin.tar.gz ranger-1.2.0-solr-plugin.tar.gz rat.txt
ranger-1.2.0-hive-plugin.zip ranger-1.2.0-solr-plugin.zip version
ranger-1.2.0-kafka-plugin.tar.gz ranger-1.2.0-sqoop-plugin.tar.gz

Here was my final docker file.  Note that you should read up on gosu/etc before using it and I take no responsibility for any security issues; you should use the official one – if you can :).

default_command="mvn -DskipTests=true clean compile package install assembly:assembly"
build_image=0
if [ "$1" = "-build_image" ]; then
build_image=1
shift
fi

params=$*
if [ $# -eq 0 ]; then
params=$default_command
fi

image_name="ranger_dev"
remote_home=
container_name="--name ranger_build"

if [ ! -d security-admin ]; then
echo "ERROR: Run the script from root folder of source. e.g. $HOME/git/ranger"
exit 1
fi

images=`docker images | cut -f 1 -d " "`
[[ $images =~ $image_name ]] && found_image=1 || build_image=1

if [ $build_image -eq 1 ]; then
echo "Creating image $image_name ..."
docker rmi -f $image_name

docker build -t $image_name - < /scripts/mvn.sh RUN echo 'set -x; if [ "\$1" = "mvn" ]; then usermod -u \$(stat -c "%u" pom.xml) bash -c '"'"'ln -sf /.m2 \$HOME'"'"'; exec "\$@"; fi; exec "\$@" ' >> /scripts/mvn.sh

RUN chmod -R 777 /scripts
RUN chmod -R 777 /tools

ENTRYPOINT ["/scripts/mvn.sh"]
Dockerfile

fi

src_folder=`pwd`

LOCAL_M2="$HOME/.m2"
mkdir -p $LOCAL_M2
set -x
docker run --rm -v "${src_folder}:/ranger" -w "/ranger" -v "${LOCAL_M2}:${remote_home}/.m2" $container_name $image_name $params

 

Install Docker CE on Linux Centos 7.x

This is just a short post paraphrasing the very good (and verbose!) instructions on the Docker site here: https://docs.docker.com/install/linux/docker-ce/centos/.

Basically, to install Docker CE on a fresh Centos 7.x server, you have to:

  • Install the YUM config manager.
  • Install device-mapper-persistent data and LVM (for the storage driver).
  • Use the YUM config manager to add the stable  docker YUM repository.
  • Install docker.
  • Start docker.
  • Test that it worked.

This script does all of that and basically just saves you from skimming through the linked page repeatedly to find the few commands you need.

sudo yum install -y yum-utils \
  device-mapper-persistent-data \
  lvm2
sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce docker-ce-cli containerd.io
sudo systemctl start docker
sudo docker run hello-world

Assuming it works, you should see “Hello from Docker!” among various other output on your screen.