Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Issue619 determine the method for generating batches efficiently #653

Commits on Dec 15, 2022

  1. init

    mohdt786 committed Dec 15, 2022
    Configuration menu
    Copy the full SHA
    7d99ebf View commit details
    Browse the repository at this point in the history
  2. init

    mohdt786 committed Dec 15, 2022
    Configuration menu
    Copy the full SHA
    80c0f8b View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2022

  1. init

    mohdt786 committed Dec 16, 2022
    Configuration menu
    Copy the full SHA
    032f883 View commit details
    Browse the repository at this point in the history
  2. init

    mohdt786 committed Dec 16, 2022
    Configuration menu
    Copy the full SHA
    bf95ea4 View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2022

  1. Configuration menu
    Copy the full SHA
    2703427 View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2022

  1. feat: Added Partition support to generate multiple yaml config files

    New:
        Arguments - file: cli_tools.py
        1. New command 'get-partitions' added to generate partitions for the following Validation types:
            1. row
            2. custom-query(TODO)
        2. --partition-type: Specify the type of partition logic:
            1. primary_key
            2. primary_key_mod(TODO)
            3. hash_mod(TODO)
        3. --partition-num: Number of partitions/config files to create.
            Range=[1,1000]
            If specified value is greater than count(*), value if coalesced to count(*)
        4. --config-dir: Directory Path to store YAML Config Files
        5. Added required arguments group to distinguish from optional arguments
        6. Added mutually exclusive arguments group for --hash and --concat
    
        Constants - file: consts.py
        1. Added DEFAULT_PARTITION_TYPE
        2. Added PARTITION_TYPES
    
        Partition methods - file: __main__.py
        1. _get_arg_partition_type(args): extract and return partition logic
        2. partition_and_store_config_files(args): Build/split config managers and store yaml files
        3. partition_configs(args, config_managers): Create a list of lists of config managers using partition filters
        4. _get_primary_key_partition_filters(args, config_manager): Get filters for primary_key partition logic
        5. _add_partition_filters_to_config(config_managers, partition_filters): Split ConfigManager objects and Add partition Filters
        6. get_dataframe(config_manager): Build source and target pandas dataframes from input ConfigManager object
        7. build_primary_key_agg_config_managers_from_args(args): Build a list of ConfigManager object for finding count, min and max of primary_key
    
        Partition methods - file: data_validation.py
        1. get_pandas_df(): Build source and target queries, return source and target dataframes
    
        Type Hints and Doc string:
        1. Added Type Hints to the above methods
        2. Added Doc string with desc, args and return type for above methods
    mohdt786 committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    1dedce3 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'develop' into issue619-determine-the-method-for-generat…

    …ing-batches-efficiently
    mohdt786 committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    7863f67 View commit details
    Browse the repository at this point in the history
  3. fix: Added Partition support to generate multiple yaml config files

    New:
    
        Partition methods - file: __main__.py
        1. _add_partition_filters_and_store(config_managers, partition_filters,config_dir,args): Split ConfigManager objects, Add partition Filters and store in target dir
        2. _get_arg_config_dir(args): Return String yaml config folder pathfrom args.
    
        Partition methods - file: cli_tools.py
        1. get_target_table_folder_path(config_dir, target_folder_name): Create and return target directory
    
        Partition methods - file: state_manager.py
        1. create_partition_config_directory(config_dir: str,target_folder_name: str)
    
        Type Hints and Doc string:
        1. Added Type Hints to the above methods
        2. Added Doc string with desc, args and return type for above methods
    mohdt786 committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    09e2132 View commit details
    Browse the repository at this point in the history
  4. testing and linting

    mohdt786 committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    524db2d View commit details
    Browse the repository at this point in the history
  5. fix: linting and test

    mohdt786 committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    259c548 View commit details
    Browse the repository at this point in the history
  6. fix: Added Partition support to generate multiple yaml config files

    New:
    
        Partition methods - file: __main__.py
        1. _add_partition_filters_and_store(config_managers, partition_filters,config_dir,args): Split ConfigManager objects, Add partition Filters and store in target dir
        2. _get_arg_config_dir(args): Return String yaml config folder pathfrom args.
    
        Partition methods - file: cli_tools.py
        1. get_target_table_folder_path(config_dir, target_folder_name): Create and return target directory
    
        Partition methods - file: state_manager.py
        1. create_partition_config_directory(config_dir: str,target_folder_name: str)
    
        Type Hints and Doc string:
        1. Added Type Hints to the above methods
        2. Added Doc string with desc, args and return type for above methods
    mohdt786 committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    96cf91d View commit details
    Browse the repository at this point in the history
  7. feat: Added Partition support to generate multiple yaml config files

    New:      Arguments - file: cli_tools.py     1. New command 'get-partitions' added to generate partitions for the following Validation types:         1. row          2. custom-query(TODO)     2. --partition-type: Specify the type of partition logic:         1. primary_key         2. primary_key_mod(TODO)         3. hash_mod(TODO)     3. --partition-num: Number of partitions/config files to create.         Range=[1,1000]         If specified value is greater than count(*), value if coalesced to count(*)     4. --config-dir: Directory Path to store YAML Config Files     5. Added required arguments group to distinguish from optional arguments     6. Added mutually exclusive arguments group for --hash and --concat     Example: data-validation get-partitions row \                 -sc BQ_CONN \                 -tc BQ_CONN \                  -tbls bigquery-public-data.new_york_citibike.citibike_stations,mohammedturky-sql.dvt.citibike_stations \                   "--primary-keys station_id,region_id \                  --hash * \                 --filter-status fail \                 --filters 'station_id>3000:station_id>3000' \                 --config-dir partitions_dir \                 --partition-type primary_key \                 --partition-num 20      Constants - file: consts.py     1. Added DEFAULT_PARTITION_TYPE     2. Added PARTITION_TYPES      Partition methods - file: __main__.py     1. _get_arg_partition_type(args): extract and return partition logic     2. partition_and_store_config_files(args): Build/split config managers and store yaml files     3. partition_configs(args, config_managers): Create a list of lists of config managers using partition filters     4. _get_primary_key_partition_filters(args, config_manager): Get filters for primary_key partition logic     5. _add_partition_filters_and_store(config_managers, partition_filters,config_dir,args): Split ConfigManager objects, Add partition Filters and store in target dir     6. get_dataframe(config_manager): Build source and target pandas dataframes from input ConfigManager object     7. build_primary_key_agg_config_managers_from_args(args): Build a list of ConfigManager object for finding count, min and max of primary_key     8. _get_arg_config_dir(args): Return String yaml config folder pathfrom args.       Partition methods - file: data_validation.py     1. get_pandas_df(): Build source and target queries, return source and target dataframes      Partition methods - file: cli_tools.py     1. get_target_table_folder_path(config_dir, target_folder_name): Create and return target directory      Partition methods - file: state_manager.py     1. create_partition_config_directory(config_dir: str,target_folder_name: str)      Type Hints and Doc string:     1. Added Type Hints to the above methods     2. Added Doc string with desc, args and return type for above methods
    
    New Command added - 'get-partitions'
    mohdt786 committed Dec 21, 2022
    Configuration menu
    Copy the full SHA
    b6255bc View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    4bf5f4f View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2022

  1. linting and fixes

    mohdt786 committed Dec 22, 2022
    Configuration menu
    Copy the full SHA
    58a7a00 View commit details
    Browse the repository at this point in the history
  2. type hints and lint

    mohdt786 committed Dec 22, 2022
    Configuration menu
    Copy the full SHA
    07333d8 View commit details
    Browse the repository at this point in the history
  3. fix: Raise value error for:

    Validation type: custom-query
    Partition type: primary_key_mod & hash_mod
    mohdt786 committed Dec 22, 2022
    Configuration menu
    Copy the full SHA
    32fd216 View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2022

  1. Merge branch 'develop' into issue619-determine-the-method-for-generat…

    …ing-batches-efficiently
    mohdt786 committed Dec 28, 2022
    Configuration menu
    Copy the full SHA
    b5145b8 View commit details
    Browse the repository at this point in the history
  2. fix: Updated arguments and argument parser

    1. Changed command from get_partitions to generate partitions
    2. Renamed validate_parser to partition_parser
    3. Renamed validate_subparser to partition_subparser
    4. Removed --grouped-columns argument for row validation
    5. Removed custom_query partition parser since it is not supported in this PR
    mohdt786 committed Dec 28, 2022
    Configuration menu
    Copy the full SHA
    da69e8b View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2023

  1. feat: Added Partition support to generate multiple yaml config files

    Reconfigured and shifted Partition methods to data_validation/query_builder/partition_row_builder.py and data_validation/partition_builder.py
    
    file: partition_row_builder.py
    1. Build PartitionRowBuilder object
    2. Get Ibis queries for Min, Max and Count for primary key column
    
    file: partition_builder.py
    1. Build PartitionBuilder object
    2. Create ConfigManager objects for input tables
    3. Generate Partition filters
    4. Save partitions in input directory as multiple YAMLs
    mohdt786 committed Jan 4, 2023
    Configuration menu
    Copy the full SHA
    99bb79c View commit details
    Browse the repository at this point in the history
  2. Linting and TypeHint added

    mohdt786 committed Jan 4, 2023
    Configuration menu
    Copy the full SHA
    1e28a4a View commit details
    Browse the repository at this point in the history
  3. fix: Added Warning logs and Documentation

    1. Warning logs in case of Source/Target count, min or max mismatch. 2. Raise TypeError when primary key is not numeric
    mohdt786 committed Jan 4, 2023
    Configuration menu
    Copy the full SHA
    2b2e76a View commit details
    Browse the repository at this point in the history
  4. fix: removed unwanted Docs

    mohdt786 committed Jan 4, 2023
    Configuration menu
    Copy the full SHA
    1af3f35 View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2023

  1. Configuration menu
    Copy the full SHA
    45e51b9 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'develop' into issue619-determine-the-method-for-generat…

    …ing-batches-efficiently
    mohdt786 committed Jan 6, 2023
    Configuration menu
    Copy the full SHA
    57549b3 View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2023

  1. Merge branch 'develop' into issue619-determine-the-method-for-generat…

    …ing-batches-efficiently
    mohdt786 committed Jan 11, 2023
    Configuration menu
    Copy the full SHA
    c34bd8a View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2023

  1. refactor: Removed ROW command and added desc to README.md

    1. Removed `row` command, default partition logic is row 2. Added `--primary-key` 3. Added to Examples.md 4. Added to readme.md
    mohdt786 committed Jan 16, 2023
    Configuration menu
    Copy the full SHA
    d208a18 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2023

  1. feat: Added support to save partition configs to GCS bucket

    1. Configs can be saved to GCS specifying a full gs:// path
    2. Updated README.MD
    3. Removed Random Row and Threshold arguments for generate-table-partitions
    4. Removed unused consts
    5. Modified partition_builder.py
    6. Updated state_manager.py to support GCS path independant of PSO_DV_CONFIG_HOME
    mohdt786 committed Jan 18, 2023
    Configuration menu
    Copy the full SHA
    06bb8e3 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'develop' into issue619-determine-the-method-for-generat…

    …ing-batches-efficiently
    mohdt786 committed Jan 18, 2023
    Configuration menu
    Copy the full SHA
    654061f View commit details
    Browse the repository at this point in the history
  3. fix: fixed path conflicts

    mohdt786 committed Jan 18, 2023
    Configuration menu
    Copy the full SHA
    e031ca5 View commit details
    Browse the repository at this point in the history
  4. fix: re-wrote logs which are used in tests/unit/test_cli_tools.py::te…

    …st_create_and_list_connections & tests/unit/test_cli_tools.py::test_create_and_list_and_get_validations
    mohdt786 committed Jan 18, 2023
    Configuration menu
    Copy the full SHA
    fda93a4 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2023

  1. test: Added test cases for partition_builder.py

    1. Covered testing PartitionBuilder object 2. Generate Partitions 3. Generate YAML from Configs with new partition filters
    mohdt786 committed Jan 19, 2023
    Configuration menu
    Copy the full SHA
    ffba13e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    20d9556 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    040d801 View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2023

  1. Configuration menu
    Copy the full SHA
    4c2e330 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'develop' into issue619-determine-the-method-for-generat…

    …ing-batches-efficiently
    mohdt786 committed Jan 20, 2023
    Configuration menu
    Copy the full SHA
    56e4354 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    35d99bc View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2023

  1. Configuration menu
    Copy the full SHA
    936421f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bb82253 View commit details
    Browse the repository at this point in the history
  3. fix: Reduced test input size

    mohdt786 committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    4fc5a55 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e531d8e View commit details
    Browse the repository at this point in the history