-
-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom URI type #692
Add custom URI type #692
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mathemancer This looks good to me generally but it isn't recognizing URLs without schemes (e.g. centerofci.org
). Could we detect those as well and default to scheme HTTP if none is detected?
domains.csv should serve as a good inference test.
There is no such thing as a URL without a scheme 😉. The problem is that |
@mathemancer I think it's fine if we don't detect |
I think it would actually be simpler to check the suffix against known TLDs in the long run (i.e., as we fix bugs and extend; we can more easily extend a list of accepted TLDs than an increasingly-arcane list of rules to determine whether a string is a valid URL as per relevant RFCs). Domains (or authorities in the case of URIs) of the form foo.bar are a pretty limited case, and I'm not convinced it's worth handling with special code. For example, in addition to foo.bar, we'd want to be able to find:
Also, the false positive list would be pretty bad once number-ish things are in the mix. We'd want to exclude:
|
@mathemancer Okay, I'm convinced. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Fixes #412
This adds a custom URI type to Mathesar. It also adds functions at the DB layer for getting the parts of a URI.
Technical details
The DB layer functions are based on a regular expression given in the RFC that defines a URI: https://datatracker.ietf.org/doc/html/rfc3986 . Note that similarly to the situation for the email type, the functions to get the parts of a URI are not yet hooked up to the API. The parts that can be gotten are:
These are all of the pieces defined in the RFC. The function to get part
X
is calledmathesar_types.uri_<X>
, takes aTEXT
(orVARCHAR
orCHAR
) and returns the same.Checklist
Update index.md
).master
branch of the repositoryvisible errors.
Developer Certificate of Origin
Developer Certificate of Origin