Skip to content

Commit

Permalink
feat: Hive partitioned tables support (#375)
Browse files Browse the repository at this point in the history
* feat: add support for partitioned tables

* feat: import schema class

* fix: update docs
  • Loading branch information
nehanene15 authored and ngdav committed Mar 16, 2022
1 parent 65e7188 commit 368c99b
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 1 deletion.
7 changes: 7 additions & 0 deletions docs/connections.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,13 @@ Then `pip install pyodbc`.
```

## Hive
Please note that for Group By validations, the following property must be set in Hive:

`set hive:hive.groupby.orderby.position.alias=true`

If you are running Hive on Dataproc, you will also need to run
`pip install ibis-framework[impala]`

```
{
# Hive is based off Impala connector
Expand Down
2 changes: 1 addition & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ After installing the CLI tool using the instructions below, you will be ready to

## Deploy Data Validation CLI on your machine

The Data Validation tooling requires Python 3.6+.
The Data Validation tooling requires Python 3.7+.

```
sudo apt-get install python3
Expand Down
35 changes: 35 additions & 0 deletions third_party/ibis/ibis_impala/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@

from ibis.backends.impala import connect
from ibis.backends.impala import udf
from ibis.backends.impala.client import ImpalaClient
import ibis.expr.datatypes as dt
import ibis.expr.schema as sch

_impala_to_ibis_type = udf._impala_to_ibis_type

Expand Down Expand Up @@ -61,4 +63,37 @@ def parse_type(t):
raise Exception(t)


def get_schema(self, table_name, database=None):
"""
Return a Schema object for the indicated table and database
Parameters
----------
table_name : string
May be fully qualified
database : string, default None
Returns
-------
schema : ibis Schema
"""
qualified_name = self._fully_qualified_name(table_name, database)
query = "DESCRIBE {}".format(qualified_name)

# only pull out the first two columns which are names and types
# pairs = [row[:2] for row in self.con.fetchall(query)]
pairs = []
for row in self.con.fetchall(query):
if row[0] == "":
break
pairs.append(row[:2])

names, types = zip(*pairs)
ibis_types = [parse_type(type.lower()) for type in types]
names = [name.lower() for name in names]

return sch.Schema(names, ibis_types)


udf.parse_type = parse_type
ImpalaClient.get_schema = get_schema

0 comments on commit 368c99b

Please sign in to comment.