Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce function API #1022

Merged
merged 119 commits into from
Feb 7, 2022
Merged
Show file tree
Hide file tree
Changes from 98 commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
f714909
Intro dataclass hierarchy for modeling filtering
dmos62 Nov 22, 2021
6bdb935
Rename primitive to leaf and operator to branch
dmos62 Nov 23, 2021
90710c3
Rename subject to branch
dmos62 Nov 23, 2021
b51cf6b
Introduce BranchType and ParameterType
dmos62 Nov 23, 2021
1819c0b
Refactor to not use mixins
dmos62 Nov 23, 2021
639c14b
Implement remaining predicates
dmos62 Nov 23, 2021
5025f87
Revert to using mixins
dmos62 Nov 23, 2021
906cbd1
Ammend fauxStatic comment
dmos62 Nov 23, 2021
efd3dc2
Rename fauxStatic to static
dmos62 Nov 23, 2021
aa9239e
Rename parameterType to Count; other minor changes
dmos62 Nov 24, 2021
828814a
Implement exposing filtering options through REST API
dmos62 Nov 24, 2021
58a9432
Dead code
dmos62 Nov 25, 2021
7705bae
switch to snake_case
dmos62 Nov 25, 2021
7379d8c
Make name more specific
dmos62 Nov 25, 2021
ee1f01a
Impl. parsing our custom filter spec to Predicate
dmos62 Nov 26, 2021
c18e0fe
Rename Leaf.field to Leaf.column
dmos62 Nov 26, 2021
74c6626
Move serialization and deserialization routines
dmos62 Nov 26, 2021
78bf19b
Rename function to something more specific
dmos62 Nov 26, 2021
311456e
Add basic serialization test
dmos62 Nov 26, 2021
201837f
Implement Predicate parameter constraints
dmos62 Nov 26, 2021
54e143a
Change filter api nomenclature
dmos62 Nov 28, 2021
086d1d3
Fix SA spec serialization
dmos62 Nov 28, 2021
6424c1f
Impl. testing deserialization
dmos62 Nov 28, 2021
b1910b7
Make filters aware of MA type comparability
dmos62 Nov 28, 2021
30ea5c4
Reorder MA filter spec object keys
dmos62 Nov 28, 2021
db73bda
Switch to using None as default filter
dmos62 Nov 28, 2021
8d16a0f
Use snake_case for vars and funcs
dmos62 Nov 28, 2021
79172bc
Remove duplicate_only (get_duplicates) filter
dmos62 Dec 1, 2021
28eb1b3
Finish changing casing
dmos62 Dec 1, 2021
485184e
Remove get_duplicates tests
dmos62 Dec 1, 2021
02eb143
Include negated versions of predicates
dmos62 Dec 1, 2021
e39c3a9
Adapt record filter tests
dmos62 Dec 1, 2021
748bc4b
Intro. db.filters.operations.apply
dmos62 Dec 2, 2021
0250369
Satisfy flake8 linter
dmos62 Dec 2, 2021
be0e144
Intro. `filters` database endpoint
dmos62 Dec 3, 2021
75ad723
Add name field
dmos62 Dec 3, 2021
7868bc2
Bug fixes
dmos62 Dec 3, 2021
fc9cf88
Remove dead import
dmos62 Dec 3, 2021
57d23fd
Minor refactor; fix test
dmos62 Dec 3, 2021
7908e0d
Fix some tests
dmos62 Dec 3, 2021
c2aca6d
Merge branch 'master' into backend-filtering-numbers
dmos62 Dec 3, 2021
0d5a4c8
Add comments; add assertion descriptions
dmos62 Dec 3, 2021
3128800
Intro. duplicate_only records query parameter
dmos62 Dec 9, 2021
f5f8edc
Intro. rudimentary test that duplicate_only param is being routed pro…
dmos62 Dec 9, 2021
7033be3
Factor out sqlalchemy-filters
dmos62 Dec 18, 2021
f8ac768
Backport error catching from newer PRs
dmos62 Dec 18, 2021
f84dbab
Improve exception message
dmos62 Dec 18, 2021
ed7998d
Implement referenced column existance check
dmos62 Dec 18, 2021
405a6ea
Remove redundant Predicate mixins
dmos62 Dec 20, 2021
80055ab
Quick clean up
dmos62 Dec 20, 2021
8f05a9f
Merge remote-tracking branch 'origin/range_grouping' into backend-fil…
dmos62 Dec 20, 2021
b2e78e8
Fix circular dependency caused by dead import
dmos62 Dec 20, 2021
2f0b63c
Dead imports
dmos62 Dec 20, 2021
5001bc9
Fix sort/filter, grouping, counting interaction
dmos62 Dec 22, 2021
2e0918b
Fix some leftover filter defaults
dmos62 Dec 22, 2021
793adf8
Fix linter warnings
dmos62 Dec 22, 2021
190c09d
Refactor get_query
dmos62 Dec 22, 2021
541a6cf
Fix column selection in get_query
dmos62 Dec 23, 2021
a2d2e63
Dead import
dmos62 Dec 23, 2021
45e691d
Re-refactor
dmos62 Jan 10, 2022
b1298c4
Suggestions implemented as dataclasses
dmos62 Jan 10, 2022
0dbcfb5
Refactor suggestions to be frozen dicts
dmos62 Jan 10, 2022
f2c04d3
Reimplement deserialization
dmos62 Jan 10, 2022
6cbc5e6
Reimplement referenced columns collector
dmos62 Jan 10, 2022
0885923
Account for nested expressions
dmos62 Jan 10, 2022
cf85466
Reimplement MA -> SA expression conversion
dmos62 Jan 10, 2022
1a48f03
Minor reformat
dmos62 Jan 10, 2022
cd37250
Remove outdated doc string
dmos62 Jan 10, 2022
cf212b7
Use Sequence for type annotations instead of List
dmos62 Jan 11, 2022
cf3313f
Fix import
dmos62 Jan 11, 2022
84148b9
Fix static property mutability
dmos62 Jan 11, 2022
cf1b9bb
Switch expression class implementation
dmos62 Jan 12, 2022
5bd62c5
Move namespaces; number of minor refactors
dmos62 Jan 12, 2022
169c8f4
Move functions to the mathesar.database namespace
dmos62 Jan 12, 2022
b7c72c2
Improve module doc
dmos62 Jan 13, 2022
1cb877c
Switch to single-quote
dmos62 Jan 13, 2022
689fa15
Rename Function to DbFunction; collect DbFunction subclasses automati…
dmos62 Jan 14, 2022
05b3cdc
Remove or move last of db.filters
dmos62 Jan 14, 2022
5325098
Intro. endpoint for DbFunctions
dmos62 Jan 14, 2022
2e56359
Move DbFunctions from mathesar to db namespace
dmos62 Jan 14, 2022
e6891d8
Revert to absolute imports
dmos62 Jan 17, 2022
fe3b7f5
Force nested hints into lists
dmos62 Jan 17, 2022
446fdf7
Make make_hint private
dmos62 Jan 17, 2022
7b737fd
Linter fixes
dmos62 Jan 17, 2022
8b64508
Introduce class Literal(DbFunction)
dmos62 Jan 17, 2022
ce7cc7b
Add "literal" hint
dmos62 Jan 17, 2022
4d2e5ff
Intro. endpoint for hinted db types
dmos62 Jan 20, 2022
89ddf5c
Make sure literal params are always wrapped in Literal(...)
dmos62 Jan 20, 2022
cbf9968
Let ColumnReference take literal strings.
dmos62 Jan 21, 2022
b7b6fce
Redo some of the tests
dmos62 Jan 25, 2022
27c246e
Fix whitespace
dmos62 Jan 25, 2022
5fec9a0
Restore modified files from master
dmos62 Jan 25, 2022
31c5755
Merge branch 'master' into function-api-denuked
dmos62 Jan 25, 2022
fe799ac
Satisfy flake8
dmos62 Jan 25, 2022
336943a
Remove obsolete test files
dmos62 Jan 25, 2022
fdb9129
Add tests for filtering with db functions API
dmos62 Jan 26, 2022
2e504d9
Minor cleanup
dmos62 Jan 26, 2022
18992c1
Merge branch 'master' into function-api-denuked
dmos62 Jan 26, 2022
d547e4f
Rename Db to DB
dmos62 Jan 27, 2022
6e96593
Parametrize functions endpoint with database
dmos62 Jan 31, 2022
a0eb951
Improve code comment
dmos62 Feb 1, 2022
ff8fd06
Drop MetaData as parameter
dmos62 Feb 1, 2022
40b7dd5
Parametrize db_types endpoint with database
dmos62 Feb 1, 2022
a234632
Move some serialization logic into a serializer
dmos62 Feb 1, 2022
fdbcaf7
Add rudimentary tests for new endpoints
dmos62 Feb 1, 2022
8127ccf
Linter fixes and basic changes
dmos62 Feb 1, 2022
1c7b439
Make to_sa_expression parameters more descriptive
dmos62 Feb 1, 2022
08e5dfc
Move ExtractURIAuthority to URI type's NS
dmos62 Feb 1, 2022
3173b22
Improve docstring
dmos62 Feb 1, 2022
306e1d7
Consolidate logic for listing/retrieving DBFunction subclasses
dmos62 Feb 1, 2022
06cea56
Fix table API returning db types uppercased
dmos62 Feb 1, 2022
9b552c8
Linter fixes
dmos62 Feb 1, 2022
fd474b6
Fix import
dmos62 Feb 1, 2022
ea3f200
Revert "Fix table API returning db types uppercased"
dmos62 Feb 1, 2022
2564c0b
Move new endpoints to `databases` resource
dmos62 Feb 3, 2022
0053051
Remove commented tests
dmos62 Feb 3, 2022
fc0f8f6
Fix whitespace
dmos62 Feb 3, 2022
7de8eee
Fix new endpoint tests
dmos62 Feb 3, 2022
70ebae2
Merge branch 'master' into function-api-denuked
dmos62 Feb 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added db/functions/__init__.py
Empty file.
261 changes: 261 additions & 0 deletions db/functions/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
"""
This namespace defines the DbFunction abstract class and its subclasses. These subclasses
represent functions that have identifiers, display names and hints, and their instances
hold parameters. Each DbFunction subclass defines how its instance can be converted into an
SQLAlchemy expression.

Hints hold information about what kind of input the function might expect and what output
can be expected from it. This is used to provide interface information without constraining its
user.

These classes might be used, for example, to define a filter for an SQL query, or to
access hints on what composition of functions and parameters should be valid.
"""

from abc import ABC, abstractmethod

from sqlalchemy import column, not_, and_, or_, func, literal
from db.types.uri import URIFunction

from db.functions import hints

import importlib
import inspect


class DbFunction(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this to DBFunction, we agreed to use uppercase for acronyms in the backend code here.

id = None
name = None
hints = None

def __init__(self, parameters):
if self.id is None:
raise ValueError('DbFunction subclasses must define an ID.')
if self.name is None:
raise ValueError('DbFunction subclasses must define a name.')
self.parameters = parameters

@property
def referenced_columns(self):
"""Walks the expression tree, collecting referenced columns.
Useful when checking if all referenced columns are present in the queried relation."""
columns = set([])
for parameter in self.parameters:
if isinstance(parameter, ColumnReference):
columns.add(parameter.column)
elif isinstance(parameter, DbFunction):
columns.update(parameter.referenced_columns)
return columns

@staticmethod
@abstractmethod
def to_sa_expression():
return None


class Literal(DbFunction):
id = 'literal'
name = 'Literal'
hints = tuple([
hints.parameter_count(1),
hints.parameter(1, hints.literal),
])

@staticmethod
def to_sa_expression(p):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a more verbose variable than p?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also applies to other instances.

return literal(p)


class ColumnReference(DbFunction):
id = 'column_reference'
name = 'Column Reference'
hints = tuple([
hints.parameter_count(1),
hints.parameter(1, hints.column),
])

@property
def column(self):
return self.parameters[0]

@staticmethod
def to_sa_expression(p):
return column(p)


class List(DbFunction):
id = 'list'
name = 'List'

@staticmethod
def to_sa_expression(*ps):
return list(ps)


class Empty(DbFunction):
id = 'empty'
name = 'Empty'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(1),
])

@staticmethod
def to_sa_expression(p):
return p.is_(None)


class Not(DbFunction):
id = 'not'
name = 'Not'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(1),
])

@staticmethod
def to_sa_expression(*p):
return not_(*p)


class Equal(DbFunction):
id = 'equal'
name = 'Equal'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(2),
])

@staticmethod
def to_sa_expression(p1, p2):
return p1 == p2


class Greater(DbFunction):
id = 'greater'
name = 'Greater'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(2),
hints.all_parameters(hints.comparable),
])

@staticmethod
def to_sa_expression(p1, p2):
return p1 > p2


class Lesser(DbFunction):
id = 'lesser'
name = 'Lesser'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(2),
hints.all_parameters(hints.comparable),
])

@staticmethod
def to_sa_expression(p1, p2):
return p1 < p2


class In(DbFunction):
id = 'in'
name = 'In'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(2),
hints.parameter(2, hints.array),
])

@staticmethod
def to_sa_expression(p1, p2):
return p1.in_(p2)


class And(DbFunction):
id = 'and'
name = 'And'
hints = tuple([
hints.returns(hints.boolean),
])

@staticmethod
def to_sa_expression(*ps):
return and_(*ps)


class Or(DbFunction):
id = 'or'
name = 'Or'
hints = tuple([
hints.returns(hints.boolean),
])

@staticmethod
def to_sa_expression(*ps):
return or_(*ps)


class StartsWith(DbFunction):
id = 'starts_with'
name = 'Starts With'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(2),
hints.all_parameters(hints.string_like),
])

@staticmethod
def to_sa_expression(p1, p2):
return p1.like(f'{p2}%')


class ToLowercase(DbFunction):
id = 'to_lowercase'
name = 'To Lowercase'
hints = tuple([
hints.parameter_count(1),
hints.all_parameters(hints.string_like),
])

@staticmethod
def to_sa_expression(p1):
return func.lower(p1)


class ExtractURIAuthority(DbFunction):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this depends on our custom URI type, I would not put this in the general functions namespace. I'd store it with the URI type instead.

Keep in mind that our custom types may not be installed on all Mathesar deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should apply to any functions that depend on our custom types.

id = 'extract_uri_authority'
name = 'Extract URI Authority'
hints = tuple([
hints.parameter_count(1),
hints.parameter(1, hints.uri),
])

@staticmethod
def to_sa_expression(p1):
return func.getattr(URIFunction.AUTHORITY)(p1)


def _get_defining_module_members_that_satisfy(predicate):
# NOTE: the value returned by globals() (when it's called within a function) is set when the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain in a docstring what this function does?

# function is defined and does not change depending on where the function is called from.
# See https://docs.python.org/3/library/functions.html#globals
# If we wanted to move this function into another namespace, we would have to additionally
# pass it this namespace's globals().
defining_module_name = globals()['__name__']
defining_module = importlib.import_module(defining_module_name)
all_members_in_defining_module = inspect.getmembers(defining_module)
return tuple(
member
for _, member in all_members_in_defining_module
if predicate(member)
)


def _is_concrete_db_function_subclass(member):
return inspect.isclass(member) and member != DbFunction and issubclass(member, DbFunction)


# Enumeration of supported DbFunction subclasses; needed when parsing.
supported_db_functions = _get_defining_module_members_that_satisfy(_is_concrete_db_function_subclass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wacky idea:

If you use the type function to define all these subclasses, I think it would clean things up and avoid the need for all this inspection. Example:

ToLowercase = type(
    'ToLowercase',
    (DbFunction,),
    dict(
        id='to_lowercase',
        name='To Lowercase',
        hints=tuple([hints.parameter_count(1), hints.all_parameters(hints.string_like)]),
        to_sa_expression=staticmethod(lambda p1: func.lower(p1))
    )
)

You'd set up a factory function (maybe a factory method of the parent class, actually; matter of style) that takes the right arguments, and loop over a list of dicts (or whatever) of args to create the subclasses.

Advantages:

  • This would let you define the subclasses in a nice coherent list, make it easy to extend, etc.
  • You'd know that all required attributes were set at import rather than instantiation.
  • You'd also be able to recover the list of subclasses without needing inspection, and indeed lists of (for example) all their names, ids, etc.
  • If you make a dict of functions (by id for example), then in other places in the code, then, you could also look up the subclass by the id attribute instead of needing to inspect it. That aspect is a bit less exciting, but still.
  • all the subclasses set up via defining dict would make everything would visually immediate and comparable.
  • Creating new DBFunction subclasses in the REPL (or elsewhere) would be easy and obvious (easier to see a function signature than inspect the abstract class code for what needs to be implemented).

Disadvantages:

  • It'd be a bit wonky as far as Python style, but I think the simplicity makes it worth it.

Copy link
Contributor Author

@dmos62 dmos62 Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. If generating a dict or list of DBFunction subclasses, that would mean they're not top-level type objects and thus can't be imported like from db.functions.base import ToLowercase. Might not be a problem.

This makes me wonder how I'll get DBFunctions that are defined in different namespaces collected into a single list. The functions endpoint has to be parametrized on the database because not all types and not all database functions might be installed on all databases.

Copy link
Contributor

@mathemancer mathemancer Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could generate them in a dict, then (in the same loop) add them to globals (I've used this in other projects). I suggest not doing that, though, since most of the places you want them, you need to look them up anyway, so you may as well grab them from a dict or list or whatever. You could import the various dicts from other namespaces into the base module (this also nicely avoids circular imports). Then, if you do want the top-level type objects, you're in the right place to add them all into one module's globals. You should be able to restrict which dictionaries get passed through the DBFunction factory based on what's installed, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tips. Useful.

10 changes: 10 additions & 0 deletions db/functions/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
class BadDbFunctionFormat(Exception):
pass


class UnknownDbFunctionId(BadDbFunctionFormat):
pass


class ReferencedColumnsDontExist(BadDbFunctionFormat):
pass
42 changes: 42 additions & 0 deletions db/functions/hints.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from frozendict import frozendict


def _make_hint(id, **rest):
return frozendict({"id": id, **rest})


def parameter_count(count):
return _make_hint("parameter_count", count=count)


def parameter(index, *hints):
return _make_hint("parameter", index=index, hints=hints)


def all_parameters(*hints):
return _make_hint("all_parameters", hints=hints)


def returns(*hints):
return _make_hint("returns", hints=hints)


boolean = _make_hint("boolean")


comparable = _make_hint("comparable")


column = _make_hint("column")


array = _make_hint("array")


string_like = _make_hint("string_like")


uri = _make_hint("uri")


literal = _make_hint("literal")
48 changes: 48 additions & 0 deletions db/functions/operations/apply.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
from db.functions.base import DbFunction
from db.functions.exceptions import ReferencedColumnsDontExist
from db.functions.operations.deserialize import get_db_function_from_ma_function_spec


def apply_ma_function_spec_as_filter(relation, ma_function_spec):
db_function = get_db_function_from_ma_function_spec(ma_function_spec)
return apply_db_function_as_filter(relation, db_function)


def apply_db_function_as_filter(relation, db_function):
_assert_that_all_referenced_columns_exist(relation, db_function)
sa_expression = _db_function_to_sa_expression(db_function)
relation = relation.filter(sa_expression)
return relation


def _assert_that_all_referenced_columns_exist(relation, db_function):
columns_that_exist = _get_columns_that_exist(relation)
referenced_columns = db_function.referenced_columns
referenced_columns_that_dont_exist = \
set.difference(referenced_columns, columns_that_exist)
if len(referenced_columns_that_dont_exist) > 0:
raise ReferencedColumnsDontExist(
"These referenced columns don't exist on the relevant relation: "
+ f"{referenced_columns_that_dont_exist}"
)


def _get_columns_that_exist(relation):
columns = relation.selected_columns
return set(column.name for column in columns)


def _db_function_to_sa_expression(db_function):
"""
Everything is considered to be either a DbFunction subclass or a literal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "everything" here?

"""
if isinstance(db_function, DbFunction):
raw_parameters = db_function.parameters
parameters = [
_db_function_to_sa_expression(raw_parameter)
for raw_parameter in raw_parameters
]
db_function_subclass = type(db_function)
return db_function_subclass.to_sa_expression(*parameters)
else:
return db_function
Loading