Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate-table-partitions fails for Spanner with Unknown dialect spanner #1059

Closed
sundar-mudupalli-work opened this issue Nov 28, 2023 · 2 comments · Fixed by #1066
Closed
Assignees
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@sundar-mudupalli-work
Copy link
Contributor

Hi

A typical generate-table-partitions command fails because we are not able to convert the IBIS where clause (i.e. an IBIS table expression) into SQL text. This works for 8 other databases BigQuery, Hive, MySQL, Oracle, Postgres, Snowflake, SQL Server and Teradata (we have test cases for all those databases).

The command is

data-validation generate-table-partitions --secret-manager-type gcp --secret-manager-project-id span-cloud-ck-testing-external --source-conn my_mysql -tc my_spanner --tables-list person.person1=test-db.person1 --hash 'first_name2' --primary-keys ID -cdir gs://dvt-ck-cloud-run-testing/mudupalli/MySp8GPart16 -pn 16

Replacing my_spanner (which points to a Spanner database) with my_mysql - i.e. validate against same database succeeds.

The specific error is below:

  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/data_validation/partition_builder.py", line 73, in partition_configs
    partition_filters = self._get_partition_key_filters()
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/data_validation/partition_builder.py", line 229, in _get_partition_key_filters
    self._extract_where(target_table.filter(filter_target_clause))
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/data_validation/partition_builder.py", line 85, in _extract_where
    return re.sub(r"\s\s+", " ", ibis.to_sql(x).sql.split("WHERE")[1]).replace(
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/ibis/expr/sql.py", line 381, in to_sql
    (pretty,) = sg.transpile(sql, read=read, write=write, pretty=True)
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/__init__.py", line 187, in transpile
    for expression in parse(sql, read, error_level=error_level)
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/__init__.py", line 74, in parse
    dialect = Dialect.get_or_raise(read)()
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 136, in get_or_raise
    raise ValueError(f"Unknown dialect '{dialect}'")
ValueError: Unknown dialect 'spanner'
@helensilva14 helensilva14 added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Nov 30, 2023
@sundar-mudupalli-work sundar-mudupalli-work self-assigned this Dec 3, 2023
@sundar-mudupalli-work
Copy link
Contributor Author

Hi,

The problem occurs because the Spanner backend behaves differently from the BigQuery backend - perhaps the Spanner backend is not fully implemented. I can work around this issue - however it might be a very simple fix in the Spanner backend. Here is a Python interpreter session demonstrating the issue. BigQuery works fine with the to_sql function returning the sql string. When Spanner is presented with a table expression that execute() correctly, it still fails to render with the to_sql function.

(env) mudupalli@dvt-mudupalli-test001:~/gcsdir$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibis
>>> from data_validation.clients import get_data_client
>>> bq_c = get_data_client({"source_type": "BigQuery", "project_id": "span-cloud-ck-testing-external"})
>>> result_tbl = bq_c.table('span-cloud-ck-testing-external.pso_data_validator.results')
>>> ibis.to_sql(result_tbl)
SQLString(sql='SELECT\n  t0.*\nFROM `span-cloud-ck-testing-external.pso_data_validator.results` AS t0')
>>> span_c = get_data_client({"source_type": "Spanner", "instance_id": "df-test", "database_id": "test-db"})
>>> person_tbl = span_c.table('person1')
>>> person_100 = person_tbl.filter(person_tbl.ID == 100)
>>> person_100.execute()
                                         first_name1  ...   ID
0  dshlfiuadhsfkudashgfkuadshgkaudgshkuagykfuydag...  ...  100

[1 rows x 7 columns]
>>> ibis.to_sql(person_100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/ibis/expr/sql.py", line 381, in to_sql
    (pretty,) = sg.transpile(sql, read=read, write=write, pretty=True)
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/__init__.py", line 187, in transpile
    for expression in parse(sql, read, error_level=error_level)
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/__init__.py", line 74, in parse
    dialect = Dialect.get_or_raise(read)()
  File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 136, in get_or_raise
    raise ValueError(f"Unknown dialect '{dialect}'")
ValueError: Unknown dialect 'spanner'

The issue seems to be that the dialect spanner is not registered with sqlglot. Is there a way to fix this in the backend?

Thanks.

Sundar Mudupalli

@nehanene15
Copy link
Collaborator

The easiest way to do this is to pass in the BQ dialect for Spanner clients in the to_sql() call we make in partition_builder. BQ and Spanner have the same SQL syntax so this should be interchangeable.

This is a separate issue, but Spanner doesn't support ROW_NUMBER() as shown in this error when Spanner is the source system:
Command:
data-validation -v generate-table-partitions -sc spanner -tc spanner -tbls pso_data_validator.dvt_core_types -pk id -hash '*' -pn 2 -cdir /home/user/professional-services-data-validator/cdir

Error:
google.api_core.exceptions.MethodNotImplemented: 501 Unsupported built-in function: row_number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
3 participants