Hive has a very good set of hooks which you can use to customize all kinds of things. It also has other “pluggable” areas which are basically hooks, but that aren’t called as such.
Here is a great article to get you started on Hive Hooks -> http://dharmeshkakadia.github.io/hive-hook/.
Creating a HiveAuthorizationProvider
In this case we aren’t implementing a hook specifically, but we’re doing the same exact flow to create our own HiveAuthorizationProvider. We’ll do a very simple example to just block access to any column named “description” (as a silly example).
package com.john.humphreys.hive; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hive.metastore.api.Database; import org.apache.hadoop.hive.ql.metadata.AuthorizationException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.metadata.Partition; import org.apache.hadoop.hive.ql.metadata.Table; import org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProviderBase; import org.apache.hadoop.hive.ql.security.authorization.Privilege; import java.util.List; public class MyHiveAuthorizationProvider extends HiveAuthorizationProviderBase { @Override public void init(Configuration conf) throws HiveException { } @Override public void authorize(Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException { } @Override public void authorize(Database db, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException { } @Override public void authorize(Table table, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException { } @Override public void authorize(Partition part, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException { } @Override public void authorize(Table table, Partition part, List<String> columns, Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException { if (columns.contains("description")) { throw new AuthorizationException("Not allowed to select description column!"); } } }
The only dependency required by this in maven is:
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.3.5</version> </dependency>
You literally don’t need any special build plugins or anything. If you build a project with just this (Java 1.8), and you take the JAR file, and you put it in your hive/lib folder, then you’re almost ready.
The last step is to modify your hive-site.xml and to add these 2 properties:
<property> <name>hive.security.authorization.enabled</name> <value>true</value> </property> <property> <name>hive.security.authorization.manager</name> <value>com.john.humphreys.hive.MyHiveAuthorizationProvider</value> </property>
After that, restart your hiveserver2, and when you try to select the “description” column from any table with it, it will get rejected.
Example In Practice
If I have a table called sample_data and I have a description column in it, and I run this query:
select * from ( select * from ( select description from sample_data ) x ) y;
I get this result:
Query execution failed Reason: SQL Error [403] [42000]: Error while compiling statement: Not allowed to select description column!
So, we can see it worked properly.
Limitations
Unfortunately, while this guards hive, it surprisingly doesn’t guard Presto when it access data via the hive metastore. So, as I need to guard hive and presto, I need to understand why and see if there is some other option.