Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation throws pandas merge error when source data is empty #1006

Closed
SwatiT17 opened this issue Sep 27, 2023 · 17 comments · Fixed by #1100
Closed

Validation throws pandas merge error when source data is empty #1006

SwatiT17 opened this issue Sep 27, 2023 · 17 comments · Fixed by #1100
Assignees
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@SwatiT17
Copy link
Contributor

While executing row validation using custom query, validation is failing with an unexpected error message

Error You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat occurred while running config file dvt_configs/test.yaml. Skipping it for now.

Command

data-validation -ll ERROR validate custom-query row -sc td_conn -tc bq_conn -sqf dvt_configs/src.sql -tqf dvt_configs/tgt.sql --primary-keys pk -comp-fields=col1,col2,col3,col4,col5 -bqrh test.dvt.bq_validation -c dvt_configs/test.yaml
src.sql
select col1,col2,col3,col4,col5 from src_tbl where pk in(123)
tgt.sql
select col1,col2,col3,col4,col5 from tgt_tbl where pk in(123)

@nehanene15
Copy link
Collaborator

@SwatiT17 Were we able to confirm this is due to lack of data in the dataframes?

@helensilva14 helensilva14 added the type: question Request for information or clarification. Not an issue. label Sep 28, 2023
@nehanene15 nehanene15 changed the title Row Validation using custom query failing with unexpected Error Message Validation throws pandas merge error when source data is empty Sep 29, 2023
@nehanene15 nehanene15 added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. and removed type: question Request for information or clarification. Not an issue. labels Sep 29, 2023
@nehanene15
Copy link
Collaborator

More context: This is due to the source data frame being empty. DVT should add better error handling for the following use cases:

  • Source DF is empty and Target DF has data: Validation failure
  • Target DF is empty and Source DF has data: Validation failure
  • Both Source and Target DFs are empty: Raise ValueError ("Both source and target have no data to validate")

We can add the checks once we connect to the data frames in Ibis here or in generate_report() in combiner.py

@helensilva14
Copy link
Contributor

Hi @SwatiT17! Could you please provide updates here about the progress for this issue? Thanks a lot!

@SwatiT17
Copy link
Contributor Author

SwatiT17 commented Oct 4, 2023

@helensilva14 Is the above fix for DVT error handling ready to use for me to validate?

Currently, as mentioned above the issue arises when there is no data in the source i.e. Empty Source dataframe

@helensilva14
Copy link
Contributor

@SwatiT17, not yet, I just wanted to check if your team had other updates about it. I think @Raniksingh is taking a look at it, otherwise I can also jump on it on Friday.

@dipintimanandhar
Copy link
Member

I tested with the empty source table and I didn't get the error as mentioned above.
Analysis done:
Source connection -> Terradata
Target Connection -> BigQuery
The terradata table is empty whereas the bigquery table has some records.
Both the tables has same schema (Screenshot attached),
Screenshot 2023-10-12 at 12 31 09 PM
I have tried running with yaml file as well -> yaml file is attached below.
configterradata.yaml.zip

For further analysis, I tested with same connection for source and target with Bigquery connection. I didn't get the error as well.
I have attached the verbose output herewith.
Screenshot 2023-10-12 at 12 25 06 PM
Screenshot 2023-10-12 at 12 26 23 PM

@dipintimanandhar
Copy link
Member

@SwatiT17 could you provide me the schema of the tables for which the issue occurred. Will use them for testing.

@sunyanyong
Copy link

@SwatiT17 could you provide me the schema of the tables for which the issue occurred. Will use them for testing.

This error may occur when using the "generate-table-partitions" command

@paolocanaletti
Copy link

I have the same issue when comparing Teradata and Bigquery with custom query and both are empty.
using DVT v4.2

@helensilva14
Copy link
Contributor

More context: This is due to the source data frame being empty. DVT should add better error handling for the following use cases:

  • Source DF is empty and Target DF has data: Validation failure
  • Target DF is empty and Source DF has data: Validation failure
  • Both Source and Target DFs are empty: Raise ValueError ("Both source and target have no data to validate")

We can add the checks once we connect to the data frames in Ibis here or in generate_report() in combiner.py

Thanks a lot @sunyanyong and @paolocanaletti for letting us this problem is still happening.

Quoting @nehanene15's comment to recap this issue with our developer team.

@nehanene15
Copy link
Collaborator

I have the same issue when comparing Teradata and Bigquery with custom query and both are empty. using DVT v4.2

I'm unable to reproduce this issue similar to @dipintimanandhar.

I tried with BQ to BQ custom query validation with an empty source table, empty target table, and both empty tables. This error occurring when both tables exist but have no data in them, correct? Can you provide an example of the custom query you are running and the schema of the table?

@paolocanaletti
Copy link

I have the same issue when comparing Teradata and Bigquery with custom query and both are empty. using DVT v4.2

I'm unable to reproduce this issue similar to @dipintimanandhar.

I tried with BQ to BQ custom query validation with an empty source table, empty target table, and both empty tables. This error occurring when both tables exist but have no data in them, correct? Can you provide an example of the custom query you are running and the schema of the table?

The issue I have is between TD and BQ, not BQ to BQ and using a custom query.
It is not related to a specific table structure or data type, it happens for all of the empty tables.
in my test they are both empty, source and target.
Could be good to have at least a warning in the logs that tell me both source and target are empty so that I can understand that it has been processed and not skipped.
Maybe also something in the result table could be good, to avoid checking two different places:

  • validation result into table
  • empty tables in logs

@nehanene15
Copy link
Collaborator

Can you provide the full stack trace? I suspect it has to do with table schemas since the error reported is trying to merge a 'float64' column with 'object' in pandas. I wasn't able to reproduce with TD to BQ custom query.

I tried with TD to BQ with the following TD table definition (NO data added):
create set table udf.empty_table (id VARCHAR(30), name VARCHAR(30), num INTEGER);

DVT command: data-validation validate custom-query row -sc teradata -tc bq -sqf source.sql -tqf target.sql -pk num -comp-fields name

source.sql
SELECT id, name, num FROM udf.empty_table WHERE num IN (1,2,3)

target.sql
SELECT id, name, num FROM proj.dataset.empty_table WHERE num IN (1,2,3)

Result, Latest DVT version:
Empty validation result, no errors.

@paolocanaletti
Copy link

if you suspect the issue is with float I am not getting why you only use varchar and integer in your test.
By the way, I don't have any float in my tables, but the issue is the same but happening with Timestamp/datetime

these are the datatype in the table
BQ
a STRING
b DATE
c DATETIME
d INTEGER
e NUMERIC

TD
a varchar
b date
c timestamp
d integer
e decimal(10,2)

conversion from Timestamp to datetime is mandatory because in this case the source timestamp is not UTC but contains the value with my timezone.

and the error is
Custom Query validation for dvt run is getting failed for the entity - xxxxxx with error -ValueError: You are trying to merge on object and datetime64[ns] columns. If you wish to proceed you should use pd.concat .`

Log

  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/data_validation/data_validation.py", line 96, in execute
    result_df = self._execute_validation(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/data_validation/data_validation.py", line 331, in _execute_validation
    raise e
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/data_validation/data_validation.py", line 314, in _execute_validation
    result_df = combiner.generate_report(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/data_validation/combiner.py", line 90, in generate_report
    result_df = client.execute(documented)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/__init__.py", line 307, in execute
    return execute_and_reset(node, params=params, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 486, in execute_and_reset
    result = execute(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/multipledispatch/dispatcher.py", line 278, in __call__
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 432, in main_execute
    return execute_with_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 219, in execute_with_scope
    result = execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 323, in execute_until_in_scope
    scopes = [
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 324, in <listcomp>
    execute_until_in_scope(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/core.py", line 351, in execute_until_in_scope
    result = execute_node(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/multipledispatch/dispatcher.py", line 278, in __call__
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/trace.py", line 136, in traced_func
    return func(*args, **kwargs)
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/ibis/backends/pandas/execution/join.py", line 110, in execute_join
    df = pd.merge(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 110, in merge
    op = _MergeOperation(
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 707, in __init__
    self._maybe_coerce_merge_keys()
  File "/opt/rh/rh-python39/Python-3.9.18/venv/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1346, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and datetime64[ns] columns. If you wish to proceed you should use pd.concat`

@paolocanaletti
Copy link

paolocanaletti commented Feb 9, 2024

When instead I compare table that are not empty empty but the custom query return an empty result in both the table, although the tables contains only string/varchar then I am not able to get the run_id from the resulting dataframe using

df.columns.get_loc('run_id')

is it normal?

@nehanene15
Copy link
Collaborator

I did test with float, just didn't write out all my test cases in the comment.
I created empty tables with the same schema you provided and was unable to reproduce on DVT v4.2 with the following config:
Source (TD) SQL:
SELECT a,b,c,d,e FROM udf.empty_table

Target (BQ) SQL:
SELECT a,b,c,d,e FROM proj.dataset.empty_table

DVT command:
data-validation validate custom-query row -sc teradata -tc bq -sqf source.sql -tqf target.sql -pk a -comp-fields b,c,d,e

Can you provide the custom queries themselves and the DVT command? And can you confirm the tables are completely empty with 0 rows? We may need to get on a call to debug this.

For your second comment, yes this is normal. If no data matches the custom query criteria, you will get an empty dataframe with no run_id.

@nehanene15
Copy link
Collaborator

I'm able to reproduce this by adding a composite PK with datetime column as a PK:
data-validation validate custom-query row -sc teradata -tc bq -sqf source.sql -tqf target.sql -pk a,c,e -comp-fields 'b,d'

The issue here is a mismatch in data types inferred by pandas for the primary key column(s). In this case, BQ inferred an 'datetime[64]' data type whereas TD inferred an 'object' data type. I was able to fix this issue for date columns in the TD backend.

@paolocanaletti
I know this issue occurs for float64 values as well - if you could provide the data types of the primary keys for TD and BQ when you get the float64 error, it will help me find a fix faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
8 participants
@paolocanaletti @nehanene15 @helensilva14 @dipintimanandhar @Raniksingh @SwatiT17 @sunyanyong and others