Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All numeric encoded hashids being treated as natural IDs #66

Closed
afshelley opened this issue Oct 8, 2021 · 7 comments
Closed

All numeric encoded hashids being treated as natural IDs #66

afshelley opened this issue Oct 8, 2021 · 7 comments
Assignees

Comments

@afshelley
Copy link

afshelley commented Oct 8, 2021

Hi, I have run into an issue whereby an object has a fully-numeric coded hashid. Despite a comment in the library asserting this is not possible I have found that it is. Here are the steps to recreate:

import hashids
a = hashids.Hashids("salt", min_length=7, alphabet="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ")
a.encode(8308830)
>>> '5666668'

I realise now that this may have been avoided if we used a prefix, but unfortunately we did not.

I think the issue is that the _is_uint() method used by the Hashid class assumes if a value can be cast to an unsigned integer, that it needs to be encoded rather than decoded. This means when we are deserialising a numeric hashid to try and resolve a foreign key relation (we are using DRF) the numeric hashid is being treated as a natural ID, and the wrong object is queried.

I'm not sure what the solution for this is. Would it be possible to add a setting e.g. HASHID_FIELD_RESPECT_TYPE or something (defaults to false) and then write the _is_unit() method like this:

from .conf import settings

def _is_uint(number):
    """Returns whether a value is an unsigned integer."""
    if settings.HASHID_FIELD_RESPECT_TYPE:
        return isinstance(number, int) and int(number) >= 0
    try:
        return number == int(number) and number >= 0
    except ValueError:
        return False
@afshelley
Copy link
Author

afshelley commented Oct 8, 2021

I've been doing some testing with this and I think the problem cannot be solved with my suggestion above.

I am able to bypass the assumption by updating the Hashid class in the following way, i.e.

# Check if `value` is an integer first as it is much faster than checking if a string is a valid hashid
if _is_uint(value):
    self._id = value
    self._hashid = self.encode(value)
try:
    if not settings.HASHID_FIELD_RESPECT_TYPE:
        value = int(value, base=10)
    else:
        raise TypeError
except (TypeError, ValueError):
    ...

Essentially not making the assumption that a string that is numeric isn't a hashid (when that particular setting is set).

@nshafer nshafer self-assigned this Oct 8, 2021
@nshafer nshafer closed this as completed in e0c9bee Oct 8, 2021
@nshafer
Copy link
Owner

nshafer commented Oct 8, 2021

Thank you for the detailed report. Assuming that all hashids would have non-numeric characters in it was definitely a falsehood.

Since this is actually a bug introduced in 3.2.0 (and I'm surprised nobody else has hit it), I've fixed it straightaway without a feature flag and released 3.3.2.

A lot of code had been changed in 3.2.0 when prefix support was added, and this was due to an optimization related to it. However, I did perf tests, and the fix I just published in 3.3.2 has no significant performance implication.

I also added tests for these digit-only hashids to hopefully prevent another regression.

Thanks!
Nate

@afshelley
Copy link
Author

Thanks for the speedy fix Nate, it is much appreciated!

@nshafer
Copy link
Owner

nshafer commented Oct 11, 2021

No problem... btw, I edited your report to remove your salt, which I definitely greatly appreciate as it I was able to reproduce the problem very quickly. Even though this is not a security product (only obfuscation), a devoted person might be able to still obtain that and use it to walk your primary keys. Anyways, thanks again!

@afshelley
Copy link
Author

I edited your report to remove your salt

Much appreciated. It was just a random salt not something we used in production though so no harm done.

We've implemented the latest package version and it covered some of the issues we found, specifically the occasions where we were trying to do a direct ID lookup on a numeric hashid.

However, I've just been doing some testing and noticed something else. We're using DRF and if you pass in an object with a reference to a related field that's using HashidSerializerCharField, it seems to be getting caught up on the following logic:

    def to_internal_value(self, data):
        if not self.allow_int_lookup and _is_int_representation(data):
            self.fail('invalid_hashid', value=data)
        return super().to_internal_value(data)

_is_int_representation(data) returns True for an all-numeric hashid. I believe I could circumvent this by enabling allow_int_lookup for this field/model, and the underlying Hashid instantiation would handle the hashid correctly as per your fix, but this feels quite hacky!

@nshafer nshafer reopened this Oct 27, 2021
@nshafer
Copy link
Owner

nshafer commented Oct 27, 2021

You are correct, the HashidSerializerCharField class was using the naive prevention for int lookups. I copied over the more robust solution and wrote some tests to catch it in the future. Thanks again for the bug report. 3.3.3 is released with the fix.

@afshelley
Copy link
Author

🙏 thank you again for your speedy fixup, it is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants