Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add data source for Cloud Spanner #206

Merged
merged 12 commits into from
Apr 23, 2021
Merged

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Mar 9, 2021

Closes #60

pd_schema = source_df.dtypes[
[i for i, v in source_df.dtypes.iteritems() if v not in [numpy.dtype("O")]]
]
# Loop over index keys() instead of iteritems() because pandas is
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reproduce this issue:

import datetime
import pandas

print(pandas.__version__)

df = pandas.DataFrame({
    "int_col": [1],
    "datetime_col": pandas.Series(
        ["2021-03-12T12:50:25"], dtype="datetime64[ns]"
    ),
})

df["datetime_col"] = df["datetime_col"].dt.tz_localize(datetime.timezone.utc)

for i, v in df.dtypes.iteritems():
    print(f"{i}: {v}")

It was successful in my environment, so I'm not sure what's going on in Python 3.9 session.

@@ -45,23 +45,4 @@ def to_pandas(snapshot, sql, query_parameters):
# Creating pandas dataframe from data and columns_list
df = DataFrame(data, columns=column_list)

# Dictionary to map spanner datatype to a pandas compatible datatype
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because the dtype translation was messing up NULL handling.

assert row["source_agg_value"] == row["target_agg_value"]


def test_cli_find_tables(spanner_connection_args, database_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test might be less error prone using fake fs, plus you wont need to clean up the connection file at the end?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, and am getting an error when the client authenticates. First, it couldn't find my key file, so I used an updated fixture to add it as a real path.


@pytest.fixture
def fs_with_creds(fs):
    if "GOOGLE_APPLICATION_CREDENTIALS" in os.environ:
        fs.add_real_file(os.environ["GOOGLE_APPLICATION_CREDENTIALS"])

    yield fs

But even with this, I get an error:

ERROR    grpc._plugin_wrapping:_plugin_wrapping.py:82 AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7fb3b8b9ae10>" raised exception!
Traceback (most recent call last):
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/grpc/_plugin_wrapping.py", line 78, in __call__
    context, _AuthMetadataPluginCallback(callback_state, callback))
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/google/auth/transport/grpc.py", line 86, in __call__
    callback(self._get_authorization_headers(context), None)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/google/auth/transport/grpc.py", line 73, in _get_authorization_headers
    self._request, context.method_name, context.service_url, headers
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/google/auth/credentials.py", line 133, in before_request
    self.refresh(request)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/google/oauth2/service_account.py", line 361, in refresh
    access_token, expiry, _ = _client.jwt_grant(request, self._token_uri, assertion)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/google/oauth2/_client.py", line 153, in jwt_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/google/oauth2/_client.py", line 105, in _token_endpoint_request
    response = request(method="POST", url=token_uri, headers=headers, body=body)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/google/auth/transport/requests.py", line 183, in __call__
    method, url, data=body, headers=headers, timeout=timeout, **kwargs
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/requests/adapters.py", line 416, in send
    self.cert_verify(conn, request.url, verify, cert)
  File "/Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/requests/adapters.py", line 228, in cert_verify
    "invalid path: {}".format(cert_loc))
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /Users/swast/miniconda3/envs/pso-data-validator-ibis-1.4/lib/python3.7/site-packages/certifi/cacert.pem

I believe it's having trouble reading the root certificates to validate a secure connection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try adding some more "real" files / directories, but I think it's becoming more error-prone, not less :-(

@dhercher
Copy link
Collaborator

This looks good to go beyond the one nit

Copy link
Collaborator

@dhercher dhercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@tswast tswast merged commit c63f68e into develop Apr 23, 2021
@tswast tswast deleted the data-source-cloud-spanner branch April 23, 2021 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Spanner
2 participants