-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom-Query failure for boolean datatype columns #905
Comments
Hello Team, Command to run the file: ERROR Message: |
Thank you for reaching out and good to know that the customer wants to use DVT for the comparison. Can you provide the following information to help us further Thank you for filing an issue, it allows us to collect all the information in one place, prioritize it and use it in the future for training and tracking. Here is additional information that would be helpful.
These can all be attached to the issue. Taking a quick look - my question is - do you really need boolean as one of the primary keys ? As explained in Primary Keys, you only need that column if it is required to uniquely identify the row in the custom query. Thanks. Sundar Mudupalli |
Regarding the usage of Primary-key of a boolean type, this was only way to define the composite primary keys to define the uniqueness of the data , but also I have seen that without the column being a primary key if column is boolean it has failed for some other table during row validation using hash_all method. |
Please find the attached config file generated |
Can you provide the verbose output generated via By default, DVT casts columns to string so we can then use it in a hash or concatenate the columns. |
Please find the attached verbose output. |
I do not see any difference in the output file generated before and after using verbose. Please find the command used below.
|
Since you're saving to a config file, you will need to apply |
Please find the output of the above asked command. |
Seems like we're casting to TEXT with Redshift and there's a Redshift issue when casting a BOOL column to TEXT. @abhilash-JET Could you test on Redshift and confirm? |
Issue 2 Target data Source data |
why are we sharing email id over github? Can't this be an issue of GDPR? |
@sreeti-JET can you please add the example by scraping the data ? I have deleted the data |
Abhilash-JET, If you are not able to cast BOOLEAN to TEXT/VARCHAR, then it is a problem. DVT out of the box will not work for BOOLEAN type. This may be a problem specific to AWS RedShift. I am able to convert Boolean to TEXT with PostgreSQL (CloudSQL). There is an alternative - and that is to convert the BOOLEAN type to TEXT/VARCHAR yourself in your query. Since the BOOLEAN type has only 3 possible values, this is fortunately not that difficult. The first is the SQL that you provided. The second is the modified SQL where the BOOLEAN type is converted to TEXT/VARCHAR. Can you try this and see if it works
You can specify is_top_placement_text as one of the primary keys. Let us know how that works out. Sundar Mudupalli |
Please find error log in the attached. |
Hi @sreeti-JET @abhilash-JET! Could you please try to use the CASE block for both source (Redshift) and target (BQ) queries? Otherwise we will still find the
And make sure to change the primary keys list to |
Hi, As Helen mentioned you need to make the change on both ends. I am attaching files that created the tables with the same schema, populated them with 4 rows and using your queries modified with the changes suggested. These run correctly in our environment with the 3.20 release. This may not work if you just cloned the development branch (see issue #909). The sequence of commands we used are shown below. You will need to edit the files, change their names to .sql and update the commands to conform to your file and table names, redshift and bq locations: red_rm.txt
Please try it out and let us know what you find out. |
This is working as expected.Thanks team for your help.Could you please let us know when the below 2 issues will be fixed?
|
@sreeti-JET Thanks for the update. In that case, we will close this issue. As for your other issues,
Thanks. |
Hi Team,
I have a failure using the DVT tool for random row validation using custom - query method. I have generated the configuration file using below command where the column is_top_placement is boolean datatype.
when I trigger the command to run the validation of the above generated YAML file , I run into issues as shown below.
data-validation configs run -c ./cq_rm.yaml
Error: My validation runs fine when I exclude the boolean datatype column from the configuration file.
Please do let me know if more details are required to debug further.
The text was updated successfully, but these errors were encountered: