Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row validation: Inconsistency between nullable and non-nullable columns #1036

Closed
nj1973 opened this issue Nov 8, 2023 · 1 comment · Fixed by #1037
Closed

row validation: Inconsistency between nullable and non-nullable columns #1036

nj1973 opened this issue Nov 8, 2023 · 1 comment · Fixed by #1037
Assignees
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nj1973
Copy link
Contributor

nj1973 commented Nov 8, 2023

Inconsistent behaviour for nullable vs non-null columns, this is caused by this expression in ibis_addon/api.py:

    if op.to == self.type() and not op.to.is_timestamp():
         # noop case if passed type is the same
         return self

self.type() is an Ibis type expression that internally holds a value prefixed with ! for not-null columns, e.g. !string for String columns.
op.to is always the nullable variant.
Because of this only nullable columns drop into the no-op branch and bypass a cast to string.

We could fix this with a simple workaround (aka hack) like this:

     """
+    def same_type(from_type, to_type) -> bool:
+        # The data type for Non-nullable columns if prefixed with "!", this is causing deviations
+        # between nullable and non-nullable columns. The second comparison below is catering for this.
+        return bool(
+            from_type == to_type
+            or str(from_type).lstrip("!") == str(to_type).lstrip("!")
+        )
+
     # validate
     op = ops.Cast(self, to=target_type)

-    if op.to == self.type() and not op.to.is_timestamp():
+    if same_type(op.to, self.type()) and not op.to.is_timestamp():
         # noop case if passed type is the same
@nj1973 nj1973 self-assigned this Nov 8, 2023
@nj1973 nj1973 added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Nov 8, 2023
@nj1973
Copy link
Contributor Author

nj1973 commented Nov 8, 2023

Captured comment from Neha:

I think op.to would be equivalent to the target_type provided in the function parameters:
def cast(self, target_type: dt.DataType) -> Value

So I think the issue lies within our cast to 'string' when we do row hash validation here rather than to '!string ' if the original column is required. We assume that the cast will account for checking the same type regardless of nullability.

Due to this, I think your approach makes the most sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
1 participant