-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Issue619 determine the method for generating batches efficiently #653
feat: Issue619 determine the method for generating batches efficiently #653
Commits on Dec 15, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 7d99ebf - Browse repository at this point
Copy the full SHA 7d99ebfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 80c0f8b - Browse repository at this point
Copy the full SHA 80c0f8bView commit details
Commits on Dec 16, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 032f883 - Browse repository at this point
Copy the full SHA 032f883View commit details -
Configuration menu - View commit details
-
Copy full SHA for bf95ea4 - Browse repository at this point
Copy the full SHA bf95ea4View commit details
Commits on Dec 18, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 2703427 - Browse repository at this point
Copy the full SHA 2703427View commit details
Commits on Dec 21, 2022
-
feat: Added Partition support to generate multiple yaml config files
New: Arguments - file: cli_tools.py 1. New command 'get-partitions' added to generate partitions for the following Validation types: 1. row 2. custom-query(TODO) 2. --partition-type: Specify the type of partition logic: 1. primary_key 2. primary_key_mod(TODO) 3. hash_mod(TODO) 3. --partition-num: Number of partitions/config files to create. Range=[1,1000] If specified value is greater than count(*), value if coalesced to count(*) 4. --config-dir: Directory Path to store YAML Config Files 5. Added required arguments group to distinguish from optional arguments 6. Added mutually exclusive arguments group for --hash and --concat Constants - file: consts.py 1. Added DEFAULT_PARTITION_TYPE 2. Added PARTITION_TYPES Partition methods - file: __main__.py 1. _get_arg_partition_type(args): extract and return partition logic 2. partition_and_store_config_files(args): Build/split config managers and store yaml files 3. partition_configs(args, config_managers): Create a list of lists of config managers using partition filters 4. _get_primary_key_partition_filters(args, config_manager): Get filters for primary_key partition logic 5. _add_partition_filters_to_config(config_managers, partition_filters): Split ConfigManager objects and Add partition Filters 6. get_dataframe(config_manager): Build source and target pandas dataframes from input ConfigManager object 7. build_primary_key_agg_config_managers_from_args(args): Build a list of ConfigManager object for finding count, min and max of primary_key Partition methods - file: data_validation.py 1. get_pandas_df(): Build source and target queries, return source and target dataframes Type Hints and Doc string: 1. Added Type Hints to the above methods 2. Added Doc string with desc, args and return type for above methods
Configuration menu - View commit details
-
Copy full SHA for 1dedce3 - Browse repository at this point
Copy the full SHA 1dedce3View commit details -
Merge branch 'develop' into issue619-determine-the-method-for-generat…
…ing-batches-efficiently
Configuration menu - View commit details
-
Copy full SHA for 7863f67 - Browse repository at this point
Copy the full SHA 7863f67View commit details -
fix: Added Partition support to generate multiple yaml config files
New: Partition methods - file: __main__.py 1. _add_partition_filters_and_store(config_managers, partition_filters,config_dir,args): Split ConfigManager objects, Add partition Filters and store in target dir 2. _get_arg_config_dir(args): Return String yaml config folder pathfrom args. Partition methods - file: cli_tools.py 1. get_target_table_folder_path(config_dir, target_folder_name): Create and return target directory Partition methods - file: state_manager.py 1. create_partition_config_directory(config_dir: str,target_folder_name: str) Type Hints and Doc string: 1. Added Type Hints to the above methods 2. Added Doc string with desc, args and return type for above methods
Configuration menu - View commit details
-
Copy full SHA for 09e2132 - Browse repository at this point
Copy the full SHA 09e2132View commit details -
Configuration menu - View commit details
-
Copy full SHA for 524db2d - Browse repository at this point
Copy the full SHA 524db2dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 259c548 - Browse repository at this point
Copy the full SHA 259c548View commit details -
fix: Added Partition support to generate multiple yaml config files
New: Partition methods - file: __main__.py 1. _add_partition_filters_and_store(config_managers, partition_filters,config_dir,args): Split ConfigManager objects, Add partition Filters and store in target dir 2. _get_arg_config_dir(args): Return String yaml config folder pathfrom args. Partition methods - file: cli_tools.py 1. get_target_table_folder_path(config_dir, target_folder_name): Create and return target directory Partition methods - file: state_manager.py 1. create_partition_config_directory(config_dir: str,target_folder_name: str) Type Hints and Doc string: 1. Added Type Hints to the above methods 2. Added Doc string with desc, args and return type for above methods
Configuration menu - View commit details
-
Copy full SHA for 96cf91d - Browse repository at this point
Copy the full SHA 96cf91dView commit details -
feat: Added Partition support to generate multiple yaml config files
New: Arguments - file: cli_tools.py 1. New command 'get-partitions' added to generate partitions for the following Validation types: 1. row 2. custom-query(TODO) 2. --partition-type: Specify the type of partition logic: 1. primary_key 2. primary_key_mod(TODO) 3. hash_mod(TODO) 3. --partition-num: Number of partitions/config files to create. Range=[1,1000] If specified value is greater than count(*), value if coalesced to count(*) 4. --config-dir: Directory Path to store YAML Config Files 5. Added required arguments group to distinguish from optional arguments 6. Added mutually exclusive arguments group for --hash and --concat Example: data-validation get-partitions row \ -sc BQ_CONN \ -tc BQ_CONN \ -tbls bigquery-public-data.new_york_citibike.citibike_stations,mohammedturky-sql.dvt.citibike_stations \ "--primary-keys station_id,region_id \ --hash * \ --filter-status fail \ --filters 'station_id>3000:station_id>3000' \ --config-dir partitions_dir \ --partition-type primary_key \ --partition-num 20 Constants - file: consts.py 1. Added DEFAULT_PARTITION_TYPE 2. Added PARTITION_TYPES Partition methods - file: __main__.py 1. _get_arg_partition_type(args): extract and return partition logic 2. partition_and_store_config_files(args): Build/split config managers and store yaml files 3. partition_configs(args, config_managers): Create a list of lists of config managers using partition filters 4. _get_primary_key_partition_filters(args, config_manager): Get filters for primary_key partition logic 5. _add_partition_filters_and_store(config_managers, partition_filters,config_dir,args): Split ConfigManager objects, Add partition Filters and store in target dir 6. get_dataframe(config_manager): Build source and target pandas dataframes from input ConfigManager object 7. build_primary_key_agg_config_managers_from_args(args): Build a list of ConfigManager object for finding count, min and max of primary_key 8. _get_arg_config_dir(args): Return String yaml config folder pathfrom args. Partition methods - file: data_validation.py 1. get_pandas_df(): Build source and target queries, return source and target dataframes Partition methods - file: cli_tools.py 1. get_target_table_folder_path(config_dir, target_folder_name): Create and return target directory Partition methods - file: state_manager.py 1. create_partition_config_directory(config_dir: str,target_folder_name: str) Type Hints and Doc string: 1. Added Type Hints to the above methods 2. Added Doc string with desc, args and return type for above methods New Command added - 'get-partitions'
Configuration menu - View commit details
-
Copy full SHA for b6255bc - Browse repository at this point
Copy the full SHA b6255bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4bf5f4f - Browse repository at this point
Copy the full SHA 4bf5f4fView commit details
Commits on Dec 22, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 58a7a00 - Browse repository at this point
Copy the full SHA 58a7a00View commit details -
Configuration menu - View commit details
-
Copy full SHA for 07333d8 - Browse repository at this point
Copy the full SHA 07333d8View commit details -
Validation type: custom-query Partition type: primary_key_mod & hash_mod
Configuration menu - View commit details
-
Copy full SHA for 32fd216 - Browse repository at this point
Copy the full SHA 32fd216View commit details
Commits on Dec 28, 2022
-
Merge branch 'develop' into issue619-determine-the-method-for-generat…
…ing-batches-efficiently
Configuration menu - View commit details
-
Copy full SHA for b5145b8 - Browse repository at this point
Copy the full SHA b5145b8View commit details -
fix: Updated arguments and argument parser
1. Changed command from get_partitions to generate partitions 2. Renamed validate_parser to partition_parser 3. Renamed validate_subparser to partition_subparser 4. Removed --grouped-columns argument for row validation 5. Removed custom_query partition parser since it is not supported in this PR
Configuration menu - View commit details
-
Copy full SHA for da69e8b - Browse repository at this point
Copy the full SHA da69e8bView commit details
Commits on Jan 4, 2023
-
feat: Added Partition support to generate multiple yaml config files
Reconfigured and shifted Partition methods to data_validation/query_builder/partition_row_builder.py and data_validation/partition_builder.py file: partition_row_builder.py 1. Build PartitionRowBuilder object 2. Get Ibis queries for Min, Max and Count for primary key column file: partition_builder.py 1. Build PartitionBuilder object 2. Create ConfigManager objects for input tables 3. Generate Partition filters 4. Save partitions in input directory as multiple YAMLs
Configuration menu - View commit details
-
Copy full SHA for 99bb79c - Browse repository at this point
Copy the full SHA 99bb79cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1e28a4a - Browse repository at this point
Copy the full SHA 1e28a4aView commit details -
fix: Added Warning logs and Documentation
1. Warning logs in case of Source/Target count, min or max mismatch. 2. Raise TypeError when primary key is not numeric
Configuration menu - View commit details
-
Copy full SHA for 2b2e76a - Browse repository at this point
Copy the full SHA 2b2e76aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1af3f35 - Browse repository at this point
Copy the full SHA 1af3f35View commit details
Commits on Jan 6, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 45e51b9 - Browse repository at this point
Copy the full SHA 45e51b9View commit details -
Merge branch 'develop' into issue619-determine-the-method-for-generat…
…ing-batches-efficiently
Configuration menu - View commit details
-
Copy full SHA for 57549b3 - Browse repository at this point
Copy the full SHA 57549b3View commit details
Commits on Jan 11, 2023
-
Merge branch 'develop' into issue619-determine-the-method-for-generat…
…ing-batches-efficiently
Configuration menu - View commit details
-
Copy full SHA for c34bd8a - Browse repository at this point
Copy the full SHA c34bd8aView commit details
Commits on Jan 16, 2023
-
refactor: Removed ROW command and added desc to README.md
1. Removed `row` command, default partition logic is row 2. Added `--primary-key` 3. Added to Examples.md 4. Added to readme.md
Configuration menu - View commit details
-
Copy full SHA for d208a18 - Browse repository at this point
Copy the full SHA d208a18View commit details
Commits on Jan 18, 2023
-
feat: Added support to save partition configs to GCS bucket
1. Configs can be saved to GCS specifying a full gs:// path 2. Updated README.MD 3. Removed Random Row and Threshold arguments for generate-table-partitions 4. Removed unused consts 5. Modified partition_builder.py 6. Updated state_manager.py to support GCS path independant of PSO_DV_CONFIG_HOME
Configuration menu - View commit details
-
Copy full SHA for 06bb8e3 - Browse repository at this point
Copy the full SHA 06bb8e3View commit details -
Merge branch 'develop' into issue619-determine-the-method-for-generat…
…ing-batches-efficiently
Configuration menu - View commit details
-
Copy full SHA for 654061f - Browse repository at this point
Copy the full SHA 654061fView commit details -
Configuration menu - View commit details
-
Copy full SHA for e031ca5 - Browse repository at this point
Copy the full SHA e031ca5View commit details -
fix: re-wrote logs which are used in tests/unit/test_cli_tools.py::te…
…st_create_and_list_connections & tests/unit/test_cli_tools.py::test_create_and_list_and_get_validations
Configuration menu - View commit details
-
Copy full SHA for fda93a4 - Browse repository at this point
Copy the full SHA fda93a4View commit details
Commits on Jan 19, 2023
-
test: Added test cases for partition_builder.py
1. Covered testing PartitionBuilder object 2. Generate Partitions 3. Generate YAML from Configs with new partition filters
Configuration menu - View commit details
-
Copy full SHA for ffba13e - Browse repository at this point
Copy the full SHA ffba13eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 20d9556 - Browse repository at this point
Copy the full SHA 20d9556View commit details -
Configuration menu - View commit details
-
Copy full SHA for 040d801 - Browse repository at this point
Copy the full SHA 040d801View commit details
Commits on Jan 20, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 4c2e330 - Browse repository at this point
Copy the full SHA 4c2e330View commit details -
Merge branch 'develop' into issue619-determine-the-method-for-generat…
…ing-batches-efficiently
Configuration menu - View commit details
-
Copy full SHA for 56e4354 - Browse repository at this point
Copy the full SHA 56e4354View commit details -
Configuration menu - View commit details
-
Copy full SHA for 35d99bc - Browse repository at this point
Copy the full SHA 35d99bcView commit details
Commits on Jan 24, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 936421f - Browse repository at this point
Copy the full SHA 936421fView commit details -
Configuration menu - View commit details
-
Copy full SHA for bb82253 - Browse repository at this point
Copy the full SHA bb82253View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4fc5a55 - Browse repository at this point
Copy the full SHA 4fc5a55View commit details -
Configuration menu - View commit details
-
Copy full SHA for e531d8e - Browse repository at this point
Copy the full SHA e531d8eView commit details