Skip to content

Releases: data-dot-all/dataall

v2.6.0

16 Jul 15:48
5e421fe
Compare
Choose a tag to compare

What's Changed

New features 🆕

Refactoring 💻

Enhancements 🥇

Tests 🧪

  • Automate bootstrapping of integrations tests by @petrkalos in #1289
  • Codebuild integration tests reads cognito-test-users param from environment account by @petrkalos in #1295
  • Add environment tests by @petrkalos in #1371, #1334 and Update gql apis + update_environment tests by @petrkalos in #1348
  • Add group/consumption_role invite/remove tests by @petrkalos in #1387
  • Add Dataset integration tests - Dataset CRUD + actions outside of data.all by @dlpzx in #1379
  • Add Worksheet integration tests - all except run sql query by @dlpzx in #1393
  • Add Notebook integration testsby @noah-paige in #1400

Fixes 🪲

Dependencies 📦

  • Safety checks - Ignore disputed issue on pip by @dlpzx in #1271
  • Bump certifi from 2023.7.22 to 2024.7.4 in /deploy/custom_resources/custom_authorizer by @dependabot in #1390
  • Upgrade ejs to 3.1.10 in yarn npm by @dlpzx in #1265
  • Bump requests from 2.31.0 to 2.32.0 in /backend by @dependabot in #1291
  • Bump requests from 2.31.0 to 2.32.0 in /backend/dataall/base/cdkproxy by @dependabot in #1293
  • Bump requests from 2.31.0 to 2.32.2 in /deploy/custom_resources/custom_authorizer by @dependabot in #1309
  • Upgrade flask packages to satisfy safety check by @petrkalos in #1313
  • Fix npm audit findings by @noah-paige in #1341
  • Bump urllib3 from 1.26.18 to 1.26.19 in /deploy/custom_resources/custom_authorizer by @dependabot in #1339
  • Update version auth at edge to use node v20 by @noah-paige in #1327

New Contributors

Full Changelog: v2.5.0...v2.6.0

v2.5.0

13 May 12:02
93ff772
Compare
Choose a tag to compare

What's Changed

New features 🆕

  • Make visibility of auto-approval toggle configurable based on confidentiality by @anushka-singh in #1223

Refactoring 💻

Enhancements 🥇

  • Enable encryption for lambda environment variables by @mourya-33 in #1225
  • Add integration tests on a real API client and integrate the tests in CICD by @dlpzx in #1219
  • Update lambda_api.py to add encryption for lambda env vars by @mourya-33 in #1255

Fixes 🪲

Dependencies 📦

  • Bump werkzeug from 3.0.1 to 3.0.3 in /tests_new/integration_tests by @dependabot in #1254
  • Bump werkzeug from 3.0.1 to 3.0.3 in /backend/dataall/base/cdkproxy by @dependabot in #1252
  • Bump werkzeug from 3.0.1 to 3.0.3 in /tests by @dependabot in #1253

Full Changelog: v2.4.0...2.5.0

v2.4.0

25 Apr 12:52
5df6100
Compare
Choose a tag to compare

What's Changed

⚠️ ⚠️ Important: Review the warnings in #1064 if you want to use environments in multiple-regions.

New features 🆕

Big Refactoring 💻

Enhancements 🥇

  • Remove allowAll bucket policy statement by @dlpzx in #1106
  • Adding check to remove any spaces in confidentiality names by @TejasRGitHub in #1126
  • Worksheet UI improvements - fix Team and list Environments of Team by @dlpzx in #1111
  • WAF rule parameters in cdk.json + Documentation by @SofiaSazonova in #1140
  • Update cdkExecPolicy.yaml to cleanup overly excessive permissions by @mourya-33 in #1085
  • Add grants to pivot role in verify tables functions by @dlpzx in #1149
  • Implement guardrails and mechanisms to deal with deleted IAM roles in share requests by @SofiaSazonova in #1161
  • Implement least privilege principle for cloudfront, lambda and db migration stacks by @mourya-33 in #1134
  • Implement less restrictive trust policy for local development pivot roles by @dlpzx in #1176

Fixes 🪲

  • Fix EnvUri to check GET_ENV permission for worksheet by @noah-paige in #1125
  • Grant IAM permissions to read data to environment team IAM roles independently from CREATE_DATASET permissions by @SofiaSazonova in #1137
  • Allow ListEnv to get associated organization information by @noah-paige in #1139
  • Redirect the user to correct URL after login by @TejasRGitHub in #1094
  • Fixes for email notifications not sending share link in the body by @TejasRGitHub in #1143
  • Fix folder pagination missing page by @dlpzx in #1158
  • Add "/ "to prefix in crawlers if it is not specified in input by @dlpzx in #1156
  • Add Athena List permissions to use AWS SDK for Pandas in SageMaker by @dlpzx in #1155
  • Add new data.all permissions REMOVE_ORGANIZATION_GROUP, INVITE_ORGANIZATION_GROUP to teams invited to an Organization by @SofiaSazonova in #1162
  • Fix missing GET_FOLDER permissions by @dlpzx in #1163
  • Fix input parameters for get credentials get environment group by @dlpzx in #1198
  • Update CDK exec role Policy name with region in template by @dlpzx in #1197
  • Remove creation of log-groups in Lambdas by @dlpzx in #1192
  • Fix missing session in resolve_environment by @dlpzx in #1199
  • Fix missing $ in CDK custom policy by @dlpzx in #1204
  • Fix unnecessary permission check in resolve_stack functions (failure in list datasets when there are shared datasets) by @dlpzx in #1205
  • Fix reference to locationUri by @dlpzx in #1209
  • Fix sagemaker tagging permissions by @dlpzx in #1211

Documentation 📚

  • Documentation in GitHub pages for release 2.4.0 by @dlpzx in #1191
  • Documentation in Userguide for release 2.4 by @dlpzx in #1218

Dependencies 📦

  • Upgrade follow-redirects and webpack-dev-middleware depedencies in frontend by @dlpzx in #1121
  • Upgrade express in frontend by @dlpzx in #1152
  • Bump idna from 3.4 to 3.7 in /deploy/custom_resources/custom_authorizer by @dependabot in #1166

v2.3.0

13 Mar 08:11
e10a043
Compare
Choose a tag to compare

What's Changed

⚠️ ⚠️ Important: After upgrading to v2.3.0 environment stacks need to be updated before executing data sharing requests. If the environment stack is not data sharing will fail. To update the environment stacks there are 3 options:

  1. Using cdk.json parameter enable_update_dataall_stacks_in_cicd_pipeline --> automatically updates the environments and dataset stacks in the CICD pipeline
  2. Waiting for overnight update stack task --> same as the above, but it runs at a daily schedule.
  3. Updating environments in Environment > Stack tab > click on Update button --> manual update

New features 🆕

  • Introduce dataset lock for data sharing, increasing robustness of parallel data sharing by @anushka-singh in #1072
  • Add verification of data sharing and reapplying if "unhealthy" by @noah-paige in #1062
  • Enable Central Catalog Glue databases import by @TejasRGitHub in #1021 and list them in worksheets in #1079
  • Replace IAM inline policies by configurable Managed Policies for folder and bucket sharing by @SofiaSazonova and @dlpzx in #1068
  • Simplify LakeFormation Glue database shares - single shared_db and single resource link table by @dlpzx in #1016 and add sharing guardrails drop permissions in #1055 and update Worksheet database names in UI in #1063
  • Add data sharing auto-approval option for datasets by @SofiaSazonova in #988
  • Introduce feature flags for topics and confidentiality and custom confidentiality list by @TejasRGitHub in #1049

Enhancements 🥇

Fixes 🪲

Refactoring 💻

Documentation 📚

Dependencies 📦

  • Upgrade Aurora postgreSQL engine 11 --> 13 by @noah-paige in #963
  • Upgrade axios package to resolve follow-redirect vulnerability by @noah-paige in #952
  • Remove unused packages: jinja2, deprecated by @dlpzx in #969
  • Upgrade npm packages: axios, css-tools by @dlpzx in #1052
  • Upgrade postcss and add yarn resolutions by @dlpzx in #1059
  • Applyboto3==1.34.35 in DeployFrontend action by @anandsumit2000 in #1054
  • Upgrade starlette version and dependecies to avoid ReDoS by @dlpzx in #1038
  • Upgrade ip package in frontend for yarn and npm by @dlpzx in #1070

New Contributors 👨‍💻 👩‍💻

Full Changelog: v2.2.0...v2.3.0

v2.2.0

14 Dec 07:32
823d642
Compare
Choose a tag to compare

What's Changed

This time there are no warnings.

New features 🆕

Enhancements 🥇

Fixes 🪲

  • Add the cloudformation:ContinueUpdateRollback permission to the pivotRole, for administration of linked environment accounts. by @rbernotas in #850
  • Fix Module Enabled Pipelines by @noah-paige in #874
  • Add Athena:UpdateWorkGroup permissions to CDK Exec Policy by @noah-paige in #892
  • Add Pagination to Return Full List Cognito Groups by @noah-paige in #891
  • Remove unnecessary MANAGE_ORGANIZATIONS check by @dlpzx in #887
  • Fix S3DatasetClient upload data by @noah-paige in #909
  • Fix Migration Script for New Deployment by @noah-paige in #908
  • Create frontend config role regardless of custom auth or not in backend by @noah-paige in #913
  • Fix permissions on share workflows by @dlpzx in #914

Documentation 📚

Dependencies

  • Upgrade Athena engine version to v3 by @dlpzx in #886
  • Bump axios from 0.26.1 to 1.6.0 in /frontend by @dependabot in #867
  • Bump certifi from 2022.12.7 to 2023.7.22 in /deploy/custom_resources/custom_authorizer by @dependabot in #910
  • Bump urllib3 from 1.26.15 to 1.26.18 in /deploy/custom_resources/custom_authorizer by @dependabot in #911
  • Bump requests from 2.29.0 to 2.31.0 in /deploy/custom_resources/custom_authorizer by @dependabot in #912

New Contributors 👨‍💻 👩‍💻

Full Changelog: v2.1.0...v2.2.0

v2.1.0

08 Nov 08:00
f917a7a
Compare
Choose a tag to compare

What's Changed

⚠️ Important: After upgrading to v2.1.0 environment stacks need to be updated before creating or editing datasets. If the environment stack is not updated Dataset creation and other functionalities will fail. To update the environment stacks there are 3 options:

  1. Using cdk.json parameter enable_update_dataall_stacks_in_cicd_pipeline --> automatically updates the environments and dataset stacks in the CICD pipeline
  2. Waiting for overnight update stack task --> same as the above, but it runs at a daily schedule.
  3. Updating environments in Environment > Stack tab > click on Update button --> manual update

Governance 🏛️

New features 🆕

Enhancements 🥇

  • Fix shell=true semgrep issues by @dlpzx in #760
  • Add global flag to replace and avoid scanning issues on incomplete-sanitization by @dlpzx in #762
  • Allow to submit a share when you are both an approver and a requester by @zsaltys in #793
  • Redirect upon creating a share request by @zsaltys in #799
  • Add frontend and backend feature flags by @zsaltys in #817
  • Make hosted_zone_id optional by @lorchda in #812
  • Add configurable session timeout to Cognito by @manjulaK in #786
  • Modularization of notifications, refactor from core to modules by @dlpzx in #822
  • Add Additional Error Messages for KMS Key lookup on imported dataset by @noah-paige in #748
  • Handle Environment Import of IAM service roles by @noah-paige in #749
  • Add condition when there are no public subnets by @lorchda in #794
  • Check other share exists before clean up by @noah-paige in #769
  • Configure Pytests on Feature Flags by @noah-paige in #764

Fixes 🪲

Dependencies

  • Add resolutions for yarn.lock pinned packages by @dlpzx in #757
  • Upgrade babel to non-vulnerable version 7.23.2 by @dlpzx in #816
  • Bump werkzeug from 2.2.3 to 3.0.1 in /tests by @dependabot in #831
  • Bump werkzeug from 2.3.3 to 3.0.1 in /backend/dataall/base/cdkproxy by @dependabot in #832
  • Bump react-devtools-core from 4.28.0 to 4.28.4 in /frontend by @dependabot in #824

Documentation 📚

New Contributors 👨‍💻 👩‍💻

Special thanks to the new contributors!

Full Changelog: v2.0.0...v2.1.0

v2.0.0

13 Sep 14:51
13c1baf
Compare
Choose a tag to compare

What's Changed

Major version upgrade ☀️

Data.all v2 is a modular version of data.all that allows customers to easily configure and customize data.all to their needs. In a single config file, the different modules can be configured, enabled or disabled. New features and customizations to the modules can now be added to the source code, as well as complete new modules.

In this release we have carried out a deep refactoring of the backend and frontend packages and the resulting code shows significant differences with the v1.6.2 structure. Refer to the following PRs and issues for more details on the design changes.

⚠️ Breaking changes?
Upgrading from v1.6.2 to v2 does NOT include any breaking changes. Despite the magnitude of the code changes, there are no changes to the architecture diagram or to existing resources. Pre-existing datasets, environments, shares or any other resources are not affected by the upgrade.

Enhancements and fixes 🪲

Documentation 📚

Contributors

Full Changelog: v1.6.2...v2.0.0

v1.6.2

08 Aug 15:14
f235c19
Compare
Choose a tag to compare

What's Changed

⚠️ This is a patch for V1.6.1. If you are upgrading from a previous version of data.all, please have a look at the "Manual actions required" section. Fresh deployments are unaffected.

  • Add missing KMS keys for canaries by @dlpzx in #619
  • Allow restricted nacls backend VPC by @noah-paige in #626
  • Fix cloudfront stack in case custom domain is given by @dbalintx in #607
  • resolve unnecessary dependency in git_release role by @dlpzx in #623
  • get prefix list ids for dbmigration for infra region by @dlpzx in #624
  • Handle External ID SSM v1.6.1> by @noah-paige in #630

Upgrading from <v1.6.0 to v1.6.2

The externalID used to secure the pivotRole(s) in linked environments will be moved from AWS Secrets Manager to AWS Systems Manger Parameter Store as part of this upgrade.

⚠️ NOTE: If you have deployed data.all with enable_pivot_role_auto_create set to true in your cdk.json then you will not have to perform the manual steps listed below and can simply upgrade to v1.6.2. If not please continue with the manual steps below:

In order to retain the same externalID and not have to update the pivotRole(s) of each linked environment, follow the below steps:

  1. In your data.all deployment account, Navigate to AWS Secrets Manager and retrieve the secret value of the external ID (named dataall-externalId-{envname}) --> keep this value somewhere for later reference
    Screenshot 2023-08-08 at 9 34 20 AM

  2. Upgrade code from existing version to v1.6.2 and commit latest code changes to deploy via CodePipeline

  3. Once the CodePipeline execution is complete, Navigate to SSM Parameter Store in Deployment Account and find externalID Parameter (named /dataall/{envname}/pivotRole/externalId) --> edit the existing value with the one retained from Step 1
    Screenshot 2023-08-08 at 9 34 28 AM

Full Changelog: v1.6.1...v1.6.2

v2.0.0-beta1

03 Aug 20:15
9220140
Compare
Choose a tag to compare
v2.0.0-beta1 Pre-release
Pre-release

Beta pre-release of version 2.0.0, focused on the refactor to modularize data.all. This version includes a modularized backend but not yet a modularized front-end, which will be published with the final release.

⚠️ We recommend installing this release from scratch instead of upgrading an existing system, since this is a pre-production release.

⚠️ WARNING If upgrading, do so from version 1.6.2.

Known issues affecting deployment

In the deployment guide, run step 8 before step 5, then continue from step 5. This is needed because data.all uses the cdk look up roles in CDK synth, which requires bootstrapping the accounts before running cdk synth locally. Documentation will be updated for the final release.

Known issues

  • #556 Request for share is being sent for invalid environment (CREATE_FAILED)
  • #540 OpenSearch stack failed during backend deploy due to length of policy name
  • #534 Catalog Search along with filters
  • #533 Profille Job run fails
  • #428 Prefix crawling is crawling complete bucket instead of specific folder
  • #374 Error in Monitoring tab in Admin Settings
  • #338 Import of Dashboard / Dataset - Environment selection drop-down list is limited to 5 environments
  • #288 Can't Paginate to view all Folders
  • #625 CDK execution role (custom template) throws S3 access denied error for pivotRole auto-created nested stack
  • Denied share requests show the wrong message to the asking user: approved instead of denied (no effect on actual sharing)
  • Logging of approvals for sharing shows AWSResourceNotFound for some approvals
  • There is an issue when user creates a dataset he/she can’t upload the data using UPLOAD button. We are facing CORS error which disappears after some time
  • After creating a dataset, a user may temporarily be unable to upload data using the UPLOAD button

What's Changed

New Contributors

Full Changelog: v1.6.1...v2.0.0-beta1

v1.6.1

25 Jul 10:15
f3baf14
Compare
Choose a tag to compare

What's Changed

⚠️ We strongly recommend you to upgrade to V1.6.2 directly and skip this release. V1.6.2 includes a better implementation of V1.6.1 fixes ⚠️

  • Fix wrong update of externalId for pivotRole by @dlpzx in #591

Manual actions required

ONLY if you are upgrading!
In the first run the CodePipeline will fail in the CDK Synth stage if no additional changes are done:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::111111111111:assumed-role/SOME ROLE/... is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::222222222222:role/cdk-hnb659fds-lookup-role-22222222222-eu-west-1

CodeBuild needs additional permissions to assume the IAM role in the CDK Synth stage. Since we cannot update this CodeBuild stage without running it, the permissions need to be added manually.

Upgrading from V1.6.0 to v1.6.1

The role that we need to update is a role named <PREFIX>-<GITBRANCH>-codebuild-baseline-role. It will say it in the error message in the CodeBuild logs

  1. Go to the IAM role (<PREFIX>-<GITBRANCH>-codebuild-baseline-role) and click on Add permissions > Create inline policy
image 2. Update the policy, use the JSON and copy the policy below: image

The policy of the Codebuild execution role need to include the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::*:role/cdk-hnb659fds-lookup-role*"
        }
    ]
}
  1. After the pipeline has successfully run, go back to the IAM role and remove the manually added policy. The policy is now added as part of infrastructure as code.
image

Upgrading from <V1.6.0 to v1.6.1

The error points at a different role some. A role created by CDK that looks like the following in the CodeBuild logs:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts:::111111111111:assumed-role/dataall-sbx8-cicd-stack-dataallsbx8cdkpipelinePipe-HMXY7D9OX4FM/AWSCodeBuild-30c50765-4529-4d20-99ce-88f82139a82c is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::22222222222:role/cdk-hnb659fds-lookup-role-22222222222-eu-west-1

We find the role and update it as we explained in the "Upgrading from V1.6.0 to v1.6.1" section.
image

Once that is done, retry the CodeBuild Synth stage. In this case you do NOT need to cleanup the manually added policies as this role will be deleted.
Full Changelog: v1.6.0...v1.6.1