Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Hive partitioned tables support #375

Merged
merged 3 commits into from
Mar 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/connections.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,13 @@ Then `pip install pyodbc`.
```

## Hive
Please note that for Group By validations, the following property must be set in Hive:

`set hive:hive.groupby.orderby.position.alias=true`

If you are running Hive on Dataproc, you will also need to run
`pip install ibis-framework[impala]`

```
{
# Hive is based off Impala connector
Expand Down
2 changes: 1 addition & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ After installing the CLI tool using the instructions below, you will be ready to

## Deploy Data Validation CLI on your machine

The Data Validation tooling requires Python 3.6+.
The Data Validation tooling requires Python 3.7+.

```
sudo apt-get install python3
Expand Down
35 changes: 35 additions & 0 deletions third_party/ibis/ibis_impala/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@

from ibis.backends.impala import connect
from ibis.backends.impala import udf
from ibis.backends.impala.client import ImpalaClient
import ibis.expr.datatypes as dt
import ibis.expr.schema as sch

_impala_to_ibis_type = udf._impala_to_ibis_type

Expand Down Expand Up @@ -61,4 +63,37 @@ def parse_type(t):
raise Exception(t)


def get_schema(self, table_name, database=None):
"""
Return a Schema object for the indicated table and database
Parameters
----------
table_name : string
May be fully qualified
database : string, default None
Returns
-------
schema : ibis Schema
"""
qualified_name = self._fully_qualified_name(table_name, database)
query = "DESCRIBE {}".format(qualified_name)

# only pull out the first two columns which are names and types
# pairs = [row[:2] for row in self.con.fetchall(query)]
pairs = []
for row in self.con.fetchall(query):
if row[0] == "":
break
pairs.append(row[:2])

names, types = zip(*pairs)
ibis_types = [parse_type(type.lower()) for type in types]
names = [name.lower() for name in names]

return sch.Schema(names, ibis_types)


udf.parse_type = parse_type
ImpalaClient.get_schema = get_schema