Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TERRA-68 ⁃ Add support for database lookup via database name #144

Open
muresan opened this issue Jun 30, 2022 · 4 comments
Open

TERRA-68 ⁃ Add support for database lookup via database name #144

muresan opened this issue Jun 30, 2022 · 4 comments

Comments

@muresan
Copy link

muresan commented Jun 30, 2022

Affected Data source

  • data.astra_database

Expected Behavior

Allow lookup of database by name. I am currently looking up databases via data.astra_databases but this has the problem that it does not fail if at least one database is present in the organization and leads to code like:

locals {
  database_id_tmp = join("", [for db in data.astra_databases.list.results : db.id if db.name == var.database_name])
  database_id = local.database_id_tmp != "" ? local.database_id_tmp : "00000000-0000-0000-0000-000000000000"
}

which has the issue that:

  • when database does not exist, database_id_tmp is empty so all places where you would use it, like other astra resources, they fail at the validation step because they expect the database_id to be in UUID format.
  • I "fixed" this by using "00000000-0000-0000-0000-000000000000" which created even more fun failure scenarios on the provider.
    A data structure would fail directly if the DB does not exist and would "propagate" the error to the rest of terraform, without that there's no clear error and in some cases you end up with an API call with the database_id = "00000000-0000-0000-0000-000000000000" which fails with another cryptic error.

Important Factoids

The API only supports lookup using the DB id and database name may not be enforced to be unique so this may not be possible. Implementing this by enumerating all the databases and selecting the one matching the name might be too inefficient.

References

https://docs.datastax.com/en/astra/docs/_attachments/devopsv2.html#tag/Database-Operations/operation/getDatabase

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: TERRA-68
┆priority: Major

@sync-by-unito sync-by-unito bot changed the title Add support for database lookup via database name TERRA-68 ⁃ Add support for database lookup via database name Jun 30, 2022
@emerkle826
Copy link
Contributor

The API only supports lookup using the DB id and database name may not be enforced to be unique so this may not be possible.

That is currently the case. You can create any number of databases with the exact same name as they are given a unique UUID at creation. This UUID is the only way to identify them in the DevOps API, so the provider can't really guarantee the names will be unique.

In your example, you have var.database_name defined somewhere. Would it not be possible to just use the database UUID? If you are creating databases with terraform, you have access to the TF resource and can get the name from the resource if you need, but you will also have the ID (which is guaranteed to be unique). If you are managing the databases outside of your terraform files, you somehow know the name of the DB you want. Can you alter your workflow to figure out the DB ID instead of or in addition to the DB name?

@muresan
Copy link
Author

muresan commented Jul 1, 2022

In your example, you have var.database_name defined somewhere. Would it not be possible to just use the database UUID? If you are creating databases with terraform, you have access to the TF resource and can get the name from the resource if you need, but you will also have the ID (which is guaranteed to be unique). If you are managing the databases outside of your terraform files, you somehow know the name of the DB you want. Can you alter your workflow to figure out the DB ID instead of or in addition to the DB name?

The use case you are describing works fine when there is a single root Terraform module that controls everything but it gets complicated when there are several in a company that has separate teams for cloud, networking, backing services/applications. Each team manages their own bit of infrastructure (which gets deployed in separate CI/CD pipelines):

  • networking team manages the PSC, astra_link, astra_private_link, but only knows database_name
  • backing services manages astra_database, creates using database_name and has database_id but does not share the id with anyone (ideally we would discover it using data.astra_databases)
  • cloud/another theam would manage the datastax organization (roles, auth tokens, etc) currently hard to implement because there's no way to figure out the organization_id if all you have is a token (aws has data.aws_caller_identity, gcp has data.google_client_config)
    The issue I'm trying to solve is finding a database by the known name because that is the information that all the teams have, the DB name. The database_id is only known to the backing services team but also the networking team needs to discover it (currently using astra_databases).

Tags on resources would also help to identify them without having to resort to implementing namespaces in the database name because otherwise how would I be able to differentiate between two identical named resources? Even now if someone creates a database with the exact same name (say "db-name"), my db lookup would basically fail because it would return "db-namedb-name". If you go the way of:

  database_id_tmp = [for db in data.astra_databases.list.results : db.id if db.name == var.database_name][0]

and return the 1st one that fails if there is no match. If we return the array then the rest of the code still cannot decide which one is correct based just on the name.
Even supporting tags on database and role (and returning said tags in astra_databases and astra_roles) would be helpful in our case because then we can select the correct role/database to use by walking the list and checking name and tag values.

I know there are solutions like storing the database_id somewhere associated with the name and looking it up in that way, but then this becomes a copy of the information and not the authoritative information which can lead to problems.

@emerkle826
Copy link
Contributor

@muresan Apologies for letting this fall off my radar.

  • backing services manages astra_database, creates using database_name and has database_id but does not share the id with anyone (ideally we would discover it using data.astra_databases)

I understand separating responsibilities, but not sharing the piece of information that uniquely identifies a DB is wrong in my opinion.

As you mention, there are things that can be done to try to figure out the ID from the name, but there are no guarantees you will get the correct ID if there are duplicates. Tags/namespaces would also just be extra pieces added to work around not sharing the already existing unique ID.

  • cloud/another theam would manage the datastax organization (roles, auth tokens, etc) currently hard to implement because there's no way to figure out the organization_id if all you have is a token (aws has data.aws_caller_identity, gcp has data.google_client_config)

We could implement a data source (ex. data.caller_org_id) to provide the organization ID. But in the context of a given terraform state, the token used to maintain things would have to always be associated with the org. Otherwise any references to the org id would be detected as resource changes within that context, and managed resources would all be detected as changed. However, I don't see this being difficult to adhere to (ensuring that the token used remains a part of the same org for the life of that terraform context)

@muresan
Copy link
Author

muresan commented Sep 20, 2022

@muresan Apologies for letting this fall off my radar.

  • backing services manages astra_database, creates using database_name and has database_id but does not share the id with anyone (ideally we would discover it using data.astra_databases)

I understand separating responsibilities, but not sharing the piece of information that uniquely identifies a DB is wrong in my opinion.

That is my point, I want to share information that uniquely identifies a DB: a name + tags, not a UUID. To make an analogy, in AWS you would not share instance IDs, VPC IDs, you would share the tags on them which uniquely identify that resource.

As you mention, there are things that can be done to try to figure out the ID from the name, but there are no guarantees you will get the correct ID if there are duplicates. Tags/namespaces would also just be extra pieces added to work around not sharing the already existing unique ID.

Most clouds that manage their own IDs have implemented solutions for this, either by enforcing a unique name for the resource or by allowing the resource creator to add metadata to the resource to uniquely identify the resource after creation. I want to avoid creating dependencies between pipelines by forcing one to use the output from the other, because those pipelines run in separate environments and potentially in completely separate CI/CD instances. I know there are lots of options on how to share that information but it only introduces another service dependency (GCS Secrets Manager for example) that needs to be managed and that potentially has outdated information.

  • cloud/another theam would manage the datastax organization (roles, auth tokens, etc) currently hard to implement because there's no way to figure out the organization_id if all you have is a token (aws has data.aws_caller_identity, gcp has data.google_client_config)

We could implement a data source (ex. data.caller_org_id) to provide the organization ID. But in the context of a given terraform state, the token used to maintain things would have to always be associated with the org. Otherwise any references to the org id would be detected as resource changes within that context, and managed resources would all be detected as changed. However, I don't see this being difficult to adhere to (ensuring that the token used remains a part of the same org for the life of that terraform context)

I don't think you store the token ID in the state so changes to the token ID should not impact existing resources. The ORG ID has a longer life than a database and most tokens should be ephemeral, there should not be any constraint preventing me from using a different token (for security) for every single terraform run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants