Skip to content

Latest commit

 

History

History
776 lines (459 loc) · 86.1 KB

CHANGELOG.md

File metadata and controls

776 lines (459 loc) · 86.1 KB

Changelog

Untagged

5.1.1 (2024-06-12)

Documentation

5.1.0 (2024-06-11)

Features

  • Add a workaround for a Snowflake IN list limitation (#1152) (16b979e)
  • Support --trim-string-pks flag for padded string semantics (#1166) (a81f396)
  • Support GCS custom query files (#1155) (e3fe3d1)

Bug Fixes

  • Fixes bug in get_max_in_list_size (#1158) (973e6b6)
  • Removing t0 alias from column name, while getting schema from query. Adding Integration test for Hive Custom-Query (#1164) (74a14af)
  • Support PKs with different casing for generate-partitions (#1142) (021ce75)
  • Update to support up to 10K partitions (#1139) (210c352)

5.0.0 (2024-05-21)

⚠ BREAKING CHANGES

  • Support for GCS config paths decoupled from environment variables (#1129)
  • Filters not working correctly in Snowflake (#1126)

Features

  • Add support for random row sampling on binary id columns (#1135) (c3d2155)
  • Control Teradata decimal format when cast to string (#1138) (e68e2a6)
  • Support for GCS config paths decoupled from environment variables (#1129) (72e41b7)

Bug Fixes

  • Filters not working correctly in Snowflake (#1126) (9845643)
  • Fix casting from binary to string on Snowflake & BigQuery (#1113) (4f5ae81)
  • Issue 1127 configs dir fails with more than 40 files (#1130) (15c81cf)
  • Teradata's ValueError after large timestamp epoch second handling (#1121) (ee8d6da)

Documentation

  • Add custom-query code snippet for Cloud Run sample documentation (#1124) (93bb64f)
  • Distributed DVT Cloud Run Jobs sample (#1133) (f51f327)

4.5.0 (2024-03-18)

Features

  • Support GCS files in configs list command (#1108) (b49e1c3)

Bug Fixes

  • Add table names to report results when source and/or target dataframes are empty (#1104) (812ed62)
  • Fixes issue casting Snowflake decimal with scale>0 to string (#1110) (34446a4)
  • force cast for aggregates (#1114) (44b60cf)
  • Teradata large timestamp handling (#1117) (842d8b7)

4.4.0 (2024-02-22)

Features

  • Add --url to Oracle connections add options (#1083) (2f078c2)
  • Add PostgreSQL OID support (#1076) (58f8fcb)
  • Add support to generate a JSON config file only for applications purposes (#1089) (d463038)
  • set default oracle sql alchemy arraysize to 500 (#1088) (1672ac5)
  • Support for Kubernetes (#1058) (fdbdbe0)

Bug Fixes

  • Add support for cx_Oracle's DB_TYPE_LONG_RAW (#1095) (90547ef)
  • Better casts to string for binary floats/doubles (#1078) (15bfc4c)
  • case-insensitive comparison field support (#1103) (d28786f)
  • Fix merge issue for Teradata empty dataframes (#1100) (cc91fa2)
  • increase upper limit on recursion columns (#1090) (c599ebf)
  • Remove DDL automatically issued by Ibis for Postgres connections (#1067) (c2b660b)
  • Row validation primary key columns >64bit int/float are cast to string (#1080) (9e70e9e)
  • Spanner generate-partition to use BQ dialect (#1066) (f3cc565)
  • spanner hash function to return string instead of bytes (#1062) (722dff9)

Documentation

  • Add Airflow Kubernetes pod operator samples (#1087) (7d5ea91)
  • Updates on nested column limitations, contributing guide examples and incorrect example (#1082) (cc0f60a)

4.3.0 (2023-11-28)

Features

  • Adding Exclude columns flag for aggregations in column validations (#961) (faa32dc)
  • support query parameter for MSSQL connection (#1026) (48b0355)

Bug Fixes

  • --dry-run for SQLAlchemy clients with valid raw SQL (#1047) (c1e0e34)
  • Add Spanner RawSQL operation to enable filtering (#1054) (3a01503)
  • Adding credentials as parameter for Spanner (#1031) (367658e)
  • Adjust find-tables to properly get Oracle and Postgres schemas (#1034) (45fb40a)
  • Cast should treat nullable and non-nullables as the same (#1037) (5e5c5eb)
  • Fix --grouped-columns issue for Oracle validation (#1050) (3473a27)
  • Fix decimal separator to "." (dot) on Oracle (#1042) (14cc7ef)
  • Teradata SSLMODE issue fix (#1014) (e7aab6b)

Documentation

  • Add CLOB to Oracle BLOB validation document (#1029) (8c76c1b)
  • Update connections.md to add supported version of DB2 (#1030) (44b4be7)

4.2.0 (2023-09-28)

Features

  • Add more mappings to the allowlist configuration files for Oracle schema validations (#953) (0fed588)
  • Include date columns for min/max/sum validations (#984) (6de9921)
  • Include date columns in scope of wildcard_include_timestamp option (#989) (a4cf773)
  • Support BQ decimal precision and scale for schema validation (#960) (b1d4942)
  • Support standard deviation for column agg (#964) (bb81701)

Bug Fixes

  • Add exception handling for invalid value to cast a comparison field (#957) (703ca75)
  • Add missing SnowflakeDialect mapping for BINARY data type (#959) (9ad529a)
  • Add not-null string to accepted date types in append_pre_agg_calc_field() (#980) (76fcfc6)
  • Adjust set up for randow row batch size default value, but it maintains as 10,000 (#986) (a20ccab)
  • custom query row validation failing when SQL contains upper cased columns (#994) (a9fed41)
  • Fix warning and precision detection when target precision higher than source (#965) (5f00ce1)
  • generate-table-partitions- fixes Issue 945 and Issue 950 (#962) (c53f2fc)
  • Prevent failure of column validation config generation if source column other than allow-list not present in target table. (#974) (40a073e)
  • Prevent Oracle blob throwing exceptions during column validation (#1005) (8df1cfa)
  • support for case insensitive PKs and Snowflake random row (#998) (1a157ae)
  • support for null columns, support for access locks (#976) (f54bb4d)
  • yaml validation files in gcs (#977) (bf0fa0a)

Documentation

  • Add a new sample code for row hash validation of Oracle BLOB (#997) (0bd48a2)

4.1.0 (2023-08-18)

Features

  • support timestamp aggregation for Oracle and TD (#941) (911bae8)

Bug Fixes

  • Issues with validate column for time zoned timestamps (#930) (ee7ae9a)
  • Schema validations ignore not null on Teradata and BigQuery (#935) (936744b)
  • Support casting TD PKs to VARCHAR (#946) (2171532)

4.0.0 (2023-08-02)

⚠ BREAKING CHANGES

  • Ibis Upgrade to 5.1.0 (#894)
  • Partition based on non-numeric and multiple keys (#889)

Features

  • Adding Random-Row support for Custom Query (#891) (fc42c61)
  • Adding RawSQL function for Redshift (#903) (c25d690)
  • Enhance validate schema to support time zoned timestamp columns (#919) (aed1505)
  • generate-table-partitions: Works on all 7 platforms - BigQuery, Hive, MySQL, Oracle, Postgres, SQL Server and Teradata. (#922) (aa84d7a)
  • Ibis Upgrade to 5.1.0 (#894) (b5db4c0)
  • Partition based on non-numeric and multiple keys (#889) (7b6a530)
  • Snowflake support (#921) (e1d590b)
  • Support allow list decimals having a range for precision and scale. Also add --allow-list-file. (#888) (7783beb)

Bug Fixes

  • Adding date and timestamp formatting for Hive (#876) (65a090a)
  • Adding enhancements to allow-list in schema validation (#881) (c83df2b)
  • Adding UTF encoding for Oracle hash generation (#878) (2e24eae)
  • No column filtering for csv/json text output. Reverts part of change for issue 753 (#890) (ba641e0)
  • redshift bug for custom query (#911) (f1018b5)
  • teradata NUMBER with no precision/scale, small doc fix after Ibis upgrade (#914) (f9db68f)
  • validate column sum/min/max issue for decimals with precision beyond int64/float64 (#918) (5a8d691)

Documentation

  • Add sample shell script and documentation to execute validations at a BigQuery dataset level (#910) (a84da45)

3.2.0 (2023-05-31)

Features

  • Add --dry-run option to validate. (#778) (8989350)
  • Add Impala flags for http_transport and http_path (#829) (d966b9e)
  • Add support for SQL Server's IMAGE, BINARY, VARBINARY, NCHAR, NTEXT, NVARCHAR data types (#859) (6ebece3)
  • Add support for SQL Server's MONEY data type (#837) (0749c9e)
  • Move source credentials to secret manager (#824) (1dd5fea)
  • Redshift integration for Normal row and Custom-Query Validation. (#817) (92ab215)

Bug Fixes

  • Add missing operations for SQL Server - ExtractEpochSeconds, ExtractDayOfYear, ExtractWeekOfYear (#870) (709dd4c)
  • Adding datetime and timestamp format logic (#840) (eb095c9)
  • dry-run bug when running configs, added CODEOWNERS, and docs (#865) (1779772)
  • handle numeric datatype mapping in teradata schema and fix int mapping as per teradata doc (#874) (333eadb)
  • split connection names from second last period instead of first from front (#864) (1462deb)
  • Support for sum/min/max included for oracle number greater than int64 (#809) (73bda66)

Documentation

  • Fix typos on README (#801) (14ddcc5)
  • update installation guide about Python 3.11 (#815) (88cd281)
  • Update our documentation about find-tables command and the score-cutoff parameter (#846) (54403e3)

3.1.0 (2023-04-21)

Features

  • add db2 hash and concat support (#800) (c16e2f7)
  • add Impala connection optional parameters (#743) (#790) (414d7f8)
  • added source_type in output while listing connections list (#803) (056275b)
  • Adding Custom-Query support for DB2. (#807) (a8085d3)
  • Option for simpler report output grid (#802) (b92eb91)

Bug Fixes

  • Mysql fix to support row hash validations, random row validation, and filter (#812) (ae07fa4)
  • schema validation fixes for Oracle/SQL Server float64 and SQL Server datetimeoffset (#796) (ad0e64f)

Documentation

  • add README for Airflow DAG sample, update code formatting in other docs (#722) (f4c3241)
  • score-cutoff changed to 1 (#779) (d3aabca)

3.0.0 (2023-03-28)

⚠ BREAKING CHANGES

  • issue673 optimize CLI tools arg parser (#701)

Features

  • ✨ Add support for source/target inline sql queries for validate custom-query command (#734) (c5e7a37)
  • gcp secret manger support for DVT (#704) (d6c40f1)
  • ibis_bigquery strftime support for DATETIME columns (#737) (b1141de)

Bug Fixes

  • Add support for numeric and precision with length and precision in Postgres Custom Query (#723) (742b77e)
  • Adding Decimal datatype support for MSSQL custom query validation (#771) (0d5c5eb)
  • Better detection of Oracle client (#736) (efce0b8)
  • Cater for query driven comparisons in date format override code (#733) (0a22643)
  • issue 740 teradata strftime function (#747) (9fd102a)
  • issue673 optimize CLI tools arg parser (#701) (26bb8e9)
  • Protect column and row validation calculated column names from Oracle 30 character identifier limit (#749) (89413c1)
  • remove secret manager warnings (#781) (7e72bfd)

Documentation

2.9.0 (2023-02-16)

Features

  • Added Partition support to generate multiple YAML config files (#653) (Issue #619,#662) (f79c308)
  • added run_id to output (#708) (17720f2)
  • Divert cast of PostgreSQL decimal with scale>0 to to_char (#721) (3542851)
  • Use centralized date/time format in order to compare row data across engines (#720) (0de823b)

Bug Fixes

  • Error handling for batch processing of config files (#663) (21a26af)
  • Protect non-date columns from astype(str) date workaround (#726) (489ee27)
  • schema validation fix for different base names of source and destination data types (#710) (d7b44b0)

Documentation

  • updated Oracle parameter from user_name to user and changed underscores to hypens across the document (#689) (8777e00)

2.8.0 (2023-01-19)

Features

  • Logic to add allow-list to support datatype matching with a provided list in case of mismatched datatypes between source and target (#643) (269f8dc)

Bug Fixes

  • making logmech as optional for TD connection (#665) (500caa3)

2.7.0 (2023-01-06)

Features

  • Add AlloyDB support (#645) (cfedc22)
  • Add Integration test for Oracle (#651) (de3bbcc)
  • Added custom query support for Oracle (#646) (3f8771a)
  • Added custom query support for PostgreSQL (#644) (88dcfd3)
  • extend TO_CHAR to cover date, time and timestamp types (#641) (e0c184f)
  • SQL Server custom query support (#640) (98ab010)
  • Support config directory for running validations and add multithreading for DB queries (#654) (c67b51a)
  • Support custom calculated fields (#637) (14b506b)

2.6.0 (2022-11-28)

Features

Bug Fixes

  • bare data-validation command throws exception (#627) (7595c50)
  • column validation casing to allow for case-insensitive match (#626) (c694357)

2.5.0 (2022-10-18)

Features

  • adding scaffold for concatenate as a cli operation (#566) (ec4ef33)

Bug Fixes

  • Custom query validation throwing error with sql files ending with semicolon(;) (#591) (16a89ac)
  • Row validation optimization to avoid select all columns (#599) (de3758e)
  • update function to return non-unicode string (#615) (e334c65)

2.4.0 (2022-10-06)

⚠ BREAKING CHANGES

  • Add Python 3.10 support (#564)

Features

Miscellaneous Chores

2.3.0 (2022-09-15)

Features

  • Addition of log level as an argument for DVT logging and replac… (#577) (dbd9bc3)
  • Oracle row level validation support (#583) (489654c)

Bug Fixes

  • Add RawSQL support for Postgres and SQL Server (#576) (0693782)
  • fixing String to varchar for teradata (a979931)
  • random rows with filter option (#582) (da4faaf)
  • support NUMBER with no precision/scale (#572) (03219ba)
  • Teradata limit on column name, bug when casting to VARCHAR (#580) (c8700be)

Documentation

  • remove snowflake, add row supported DBs (#587) (1d923f5)

2.2.0 (2022-08-29)

⚠ BREAKING CHANGES

  • Added teradata custom query support (#547)

Features

  • Added teradata custom query support (#547) (97c3203)
  • Improve schema validation debugging, Support DATE for Hive validations (#558) (e67de5b)
  • Support for MSSQL row validation (#570) (61dabe0)

Bug Fixes

Miscellaneous Chores

2.1.0 (2022-07-14)

Features

  • new flag to exclude columns from schema validation (#507) (53ac41a)
  • Remove dependency on tables list for custom query (#541) (7dca5bd)

Bug Fixes

Documentation

2.0.1 (2022-06-10)

Bug Fixes

  • Schema validation to make case insensitive column name comparision (#500) (ee8c542)

2.0.0 (2022-05-26)

⚠ BREAKING CHANGES

  • Add 'primary_keys' and 'num_random_rows' fields to result handler (#372)

Features

  • Add 'primary_keys' and 'num_random_rows' fields to result handler (#372) (b123279)
  • add a new DAG example to run DVT (#485) (e3dd7ed)
  • adding impala random function (#483) (93d2072)
  • Enable sum/avg/bit_xor for BigQuery datetime type (#488) (083de07)

Documentation

1.7.2 (2022-05-12)

⚠ BREAKING CHANGES

  • Adds custom query row level hash validation feature. (#440)

Features

  • Add example of BigQuery cast to NUMERIC, update chore release version (#476) (50fac28)
  • Adds custom query row level hash validation feature. (#440) (f057fe8)
  • Issue356 db2 test (#383) (70fb7bc)
  • Support cast to BIGINT before aggregation (#461) (ca598a0)
  • support float and decimal types in Hive (#470) (5936f60)

Bug Fixes

  • add get_ibis_table_schema (#410) (#411) (4093625)
  • only replaces datatypes and not column names (#453) (6143794)
  • supports NULL datetime/timestamps, fixes bug with validation_status in PR 455 (#460) (57896f4)
  • Updated schema validation logic to column as 'validation_status' (#455) (e30c337)
  • updating teradata docs for sha256 UDF and swapping string_join for concat (#457) (23dbf56)

1.7.1 (2022-04-14)

⚠ BREAKING CHANGES

  • Changed result schema 'status' column to 'validation_status' (#420)

Features

Bug Fixes

  • bug introduced with new pr (#429) (a6cf3f0)
  • Hash all bug, noxfile updates (#413) (fc73e21)
  • Hive boolean nan to None, Unsupported ibis data types in structs and arrays (#444) (e94a1da)
  • ibis default sql option limits query results at 10k rows (#418) (7539efe)
  • Impala strings/objects now return None instead of NaN (#406) (9d3c5ec)
  • issue 265 add cloud spanner functionality (#394) (783cdf8)
  • support labels for schema validation (#260) (#381) (f787701)
  • Treat both source and target values being NULL as a success (#437) (c4da5ca)

Miscellaneous Chores

1.7.0 (2022-03-23)

Features

Bug Fixes

  • add to_hex for bigquery hash (#400) (e5c7ded)
  • Comparison fields Key Error fix (#396) (a597b56)
  • ensure all statuses are success or fail, particularly after _join_pivots (#329) (#370) (310747d)
  • make status values consistent across validation types (#377) (#378) (5c08463)
  • Multiple updates (#359) (6b2614d)
  • revert change from #345 that causes filters, threshold and labels to be ignored for column validations (#376) (#379) (8b295cf)
  • Status when source and target agg values are 0 (#393) (6a41f68)
  • support schema validation for more clients (#355) (#380) (ed46295)
  • supporting non default schemas for mssql (#365) (100b3ea)
  • test for nan when calculating fail/success in combiner (#341) (#366) (a9720c2)
  • use an appropriate column filter list for schema validation (#350) (#371) (806151a)

Documentation

  • Add Hive as a supported data source to docs (#354) (be2a49d)

1.6.0 (2021-12-01)

Features

Bug Fixes

Documentation

1.5.0 (2021-10-19)

Features

  • added kerberos service name flag for Impala connections, fixed bug in row validation with YAML (#320) (351994c)
  • Track DVT GCS connections (#326) (b384b1f)

Bug Fixes

Documentation

1.4.0 (2021-09-30)

Features

  • add state manager client (#311) (e893ea5)
  • Allow user to specify a format for stdout (#242) (#293) (f0a9fa1)
  • Allow user to specify a format for stdout T2 (#242) (#296) (ec1af22)
  • cast aggregates (#306) (e3da4c3)
  • Issue262 impala connect (#281) (eaa052f)
  • logic to deploy dvt on Cloud Run (#280) (9076286)
  • promote 3.9 to main version (as it is in Cloudtops now for local testing) and add a small unit test for persoanl use (#292) (eb0f21a)
  • Refactor CLI to fit Command Pattern (#303) (f6d2b9d)
  • Updated Cloud Functions sample (#297) (923413d)

Bug Fixes

  • updated code so that BQ target schema would not set to None for FileSystem to BQ validations (#309) (5016d65)

1.3.2 (2021-06-29)

Documentation

1.3.1 (2021-06-28)

Documentation

1.3.0 (2021-06-28)

Features

  • add table matching score as a param incase adjusted is needed (#267) (b02aed5)
  • CI/CD Release to PyPi via Cloud Build (#258) (0870fc7)

Bug Fixes

  • correct issues blocking impala and hive (#266) (5110d1f)

1.2.0 (2021-05-27)

Features

Bug Fixes

1.1.8

  • Adding and documenting find-tables CLI feature with schema filter
  • Correct filter errors caused by SQL Alchemy errors
  • Adding beta calculated fields logic

1.1.7

  • Adding tests to validate BIGNUMERIC BQ type behavior

1.1.6

  • Minor fix for Teradata client from breaking IBis changes

1.1.5

  • Add support for running raw queries against a connection
  • Upgraded Ibis to v1.4 with large client organizational and design changes
  • Added support for "use_no_lock_tables" Teradata config to optionally avoid table locking

1.1.4

  • Added an options to add key:value labels to validation runs
  • Oracle and SQL Alchemy now support RawSql filters
  • Add support for Cloud Functions in samples
  • Added schema information to result set

1.1.3

  • Release find-tables logic too help build table lists
  • Teradata client improvements
  • Remove rarely used dependencies into extras

1.1.2

  • Teradata numeric column and general bug fixes
  • Fix Ibis query compliation order causing cross join

1.1.1

  • Bug fixes to support case insensitivity
  • Allow null values to be handled in grouped columns
  • Oracle client improvements

1.1.0

  • Added Row validations for cell level validation with primary keys
  • Client support for Oracle, SQL Server, Postgres, and GCS files

1.0

  • Support for Column and GroupedColumn validations
  • Allow custom filter via YAML config
  • BigQuery result handlers supported
  • Client support for BigQuery, MySQL, and Teradata

0.1.1 (release date TBD)

Bug Fixes

  • update BigQuery dependencies to fix group-by results handler #64

Documentation

  • remove references to unsupported validations from README #63
  • includes wheel file installation steps in README #57
  • add filters and data sources to README #56

Internal / Testing Changes

  • move ibis addons to third-party directory #61

0.1.0 (2020-07-16)

Initial alpha release.

Features

  • Add data-validation CLI, which can run from CLI arguments, store a configuration YAML file, or run from a run-config YAML file.
  • Add support for querying Teradata.
  • Add support for querying BigQuery.
  • Write report output to BigQuery.

Dependencies

  • To use Teradata support, you must manually install the teradatasql PIP package.

Documentation

  • See the README.md file for getting started instructions.