Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to omit table joining if relations are not needed by API consumer #20

Closed
bazylhorsey opened this issue Oct 13, 2022 · 10 comments
Closed

Comments

@bazylhorsey
Copy link
Contributor

bazylhorsey commented Oct 13, 2022

After turning on SQL echo, it seems that there can be quite a bit more querying than needs to necessarily happen.
Queries can be much faster if related tables are not unwrapped.

Example body param:

    expand_fields: Optional[bool] = Query(
        default=False, description="Expand relational fields"
    ),

It would be EVEN BETTER if you could only select from one of the related fields. This could easily be implemented with an additional Enum representing the base table.
Example body param:

    expand_fields: Optional[List[Enum[{SOMETHING}]] = Query(
        default={SOMETHING}, description="Expand relational fields"
    ),

In cases where the expand_fields list is empty, it should not show the unwrapped location, and more importantly not perform the selectin.

@bazylhorsey
Copy link
Contributor Author

This also seems an appropriate time to ask why selectin was chosen as the lazy loading type?
Is there a way to change to a join at query time, this may be a good example if we can find an good spot for it.

@dongfengweixiao
Copy link
Contributor

I think cancel the relationship between all tables may be more flexible, and let the program itself handle the relationship.

@bazylhorsey
Copy link
Contributor Author

By program you mean the client consuming the API? What if a person needed in some cases to see the hero with their team, and in other cases just needed hero, and the spending time on the join was useless?

@jonra1993
Copy link
Owner

jonra1993 commented Oct 14, 2022

Hello @bazylhorsey the selectin was used on relationships due to the async implementation of sqlmodel I was facing this issue tiangolo/sqlmodel#74. I implemented selectin on the Relationship otherwise it should be done on the query. I agree that queries can be optimized so some joining can be omitted. What do you mean by "change to a join at query time"?

Do you want those body params to allow users to decide if they want to have the relationship response?

A sample with a custom query like this?

async def read_users_list_by_role_name(
    status: Optional[IUserStatus] = Query(
        default=IUserStatus.active,
        description="User status, It is optional. Default is active",
    ),
    role_name: str = Query(
        default="", description="String compare with name or last name"
    ),
    params: Params = Depends(),
    current_user: User = Depends(
        deps.get_current_user(required_roles=[IRoleEnum.admin])
    ),
):
    """
    Retrieve users by role name and status. Requires admin role
    """
    user_status = True if status == IUserStatus.active else False
    query = (
        select(User)
        .join(Role, User.role_id == Role.id)
        .where(and_(Role.name == role_name, User.is_active == user_status))
        .order_by(User.first_name)
    )
    users = await crud.user.get_multi_paginated(query=query, params=params)
    return create_response(data=users)

@jonra1993
Copy link
Owner

jonra1993 commented Oct 15, 2022

Hello @bazylhorsey lazy can also be joined, subquery, or selectin as described here https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html

I have tested elapsed time with different lazy techniques but the time is similar, a couple of us

import timeit

    elapsed_time = timeit.repeat(lambda: (await crud.hero.get_multi_paginated(params=params) for _ in '_').__anext__(), number=10000, repeat=5)
    for index, exec_time in enumerate(elapsed_time, 1):
        m_secs = round(exec_time * 10 ** 2, 2)
        print(f"Case {index}: Time: {m_secs}µs")
    
    print(f"Mean: {round(max(elapsed_time)* 10 ** 6, 2)}")

@bazylhorsey
Copy link
Contributor Author

bazylhorsey commented Nov 13, 2022

https://stackoverflow.com/questions/31237042/whats-the-difference-between-select-related-and-prefetch-related-in-django-orm

Finally found a general guideline for loading techniques! 🥂
So it looks like M2M relations and reverse foreign keys (in your example anywhere you're doing team.heroes in crud) are best for using selectin. Anywhere you have one-to-one relations or extending a model with the foreign key handy joined is a better choice.

My understanding django -> fastapi:

  • prefetch_related == selectin
  • select_related == joined

A colleague of mine had a solution to your bug. In your crud base you can use response.unique().scalar_one_or_none()
I am not sure if this could have undesirable results, as I know joins are an entirely different strategy than selectin and could have different output.

@jonra1993
Copy link
Owner

Hello @bazylhorsey thanks for this reference I am going to try "response.unique().scalar_one_or_none()" and see what is its output

@bazylhorsey
Copy link
Contributor Author

bazylhorsey commented Nov 21, 2022

I'd go for using joined in all your many-to-ones (side with foreign key), and one-to-one (both sides).
Local performance is much better as compute goes to psql. You can utilize useList: False in the relationship attributes, to enforce joins don't create duplicate rows. I'm skeptical of unique(), because my intuition is this actually allows sql to make its own PK so it can avoid key errors during joining. This might create malformed returns, so in that case I think selectin is a superior choice in the places it caused the bug referenced. This is my reasoning for doing joined everywhere except the places that return a list with its relationship (ie team.heroes).

@jonra1993
Copy link
Owner

Hello @bazylhorsey can you please share your sample code you were testing?

@jonra1993
Copy link
Owner

@bazylhorsey Based on loading relationships and the insight of your link. One-One and One-Many relationships have been updated to use sa_relationship_kwargs={"lazy": "joined"} and Many-Many relationships use sa_relationship_kwargs={"lazy": "selectin"}

8470af9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants