Hive Metastore DB (HMS DB) – Get All Tables/Columns including Partition Keys

How Hive Stores Schemas

I recently had to sync the schemas from a hive database to a normal MySQL database for reasons I won’t bother going into.  The exercise required me to get all columns from all tables in hive for each DB though, and I found that this was not amazingly straight-forward.

The hive metastore DB is a normal MySQL/etc database with a hive schema in it.  The hive schema holds the hive tables though.  So, the information schema is irrelevant to hive; to get the hive table details, you have to interrogate the TBLS table, for example.  To get columns, you need to interrogate COLUMNS_V2, and to get the databases themselves, you look toward the DBS table.

Missing Columns

Other posts I’ve seen would leave you here – but  I found when I joined these three tables together (plus some others you have to route through), I was still missing some columns in certain tables for some reason.  It turns out that partition columns are implicit in hive.  So, in a file system of hive data (like HDFS), a partition column in a table is literally represented by just having the directory named with the partition value; there are no columns with the value in the data.

What this means is that partition columns don’t show up in these normal tables.  You have to look to a separate partition keys table to find them with a separate query.

The Working Query

The query below finds all columns of any kind and sorts them in the order they’ll appear when you select from a table in hive/presto/etc.  I hope it helps you!

select db_name, table_name, column_name from (
SELECT d.NAME as db_name, t.TBL_NAME as table_name, c.COLUMN_NAME as column_name, c.INTEGER_IDX as idx
FROM DBS d
JOIN TBLS t on t.DB_ID = d.DB_ID
JOIN SDS s on t.SD_ID = s.SD_ID
JOIN COLUMNS_V2 c on c.CD_ID = s.CD_ID
WHERE d.NAME = :dbName
union
SELECT d.NAME as db_name, t.TBL_NAME as table_name, k.PKEY_NAME as column_name, 50000 + k.INTEGER_IDX
FROM DBS d
JOIN TBLS t on t.DB_ID = d.DB_ID
join PARTITION_KEYS k on t.TBL_ID = k.TBL_ID
where d.NAME = :dbName
) x
order by db_name, table_name, idx

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s