Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list index performance update for wwsympa.fcgi do_lists subroutine #925

Merged
merged 2 commits into from
May 12, 2020

Conversation

olivov
Copy link
Contributor

@olivov olivov commented Apr 2, 2020

For some background information, we're running Sympa 6.2.54 in a
RHEL 7 environment. We've got around 6,000 lists and close to
770,000 subscribers.

Since upgrading to Sympa 6.2 about two years ago, we've come across
a performance issue when an authenticated user uses the "list index"
page (the /sympa/lists URI) to view all lists they have access to
see. This page can take quite a while to load for an authenticated
user (30-45 seconds).

Digging around in the code, it appears the reason it takes a long
time for this page to load in our environment is that there is a
database lookup on a per list basis to check whether or not the user
is either a privileged owner, owner, editor or subscriber. This
lookup has become expensive for us due to the large number of active
lists we have in our environment.

With this update, we've made some changes to the do_lists subroutine
in wwsympa.fcgi. A hash is loaded with the user's list membership
and we use hash lookups to check for user's list membership to avoid
a large number of database lookups. We hope this change does not
obviate any list visibility checking mechanisms for which we have
not thought of but we've tried to leverage the existing list loop
checking mechanism to avoid any unwanted visibility issues. This
update has been vetted for Sympa 6.2.54 and we've seen a 3x
performance improvement in our environment with load times (albeit
long still) between 10-15 seconds.

…ntries instead of database lookups for list membership.
Comment on lines 4182 to 4187
foreach my $list (values %{$which->{'owner'}}) {
$which->{'privileged_owner'}->{$list->{'name'}} =
$list->is_admin('privileged_owner', $param->{'user'}{'email'});
$which->{'actual_editor'}->{$list->{'name'}} =
$list->is_admin('actual_editor', $param->{'user'}{'email'});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actual_editor is not always the owner.

…e between priv owner and owner from get_which lookup.
@olivov
Copy link
Contributor Author

olivov commented Apr 3, 2020

Thanks for the review Soji and apologies for the misplacement there.
Looking in web_tt2/lists.tt2, it looks like actual_editor is the moderators for a list

[% ELSIF which.$listname.is_editor ~%]
	[%|loc%]Moderator[%END ~%]
	[% IF which.$listname.is_subscriber %]
		[%|loc%], [%END%]
	[%END ~%]
[% END ~%]

The update addresses this issue to get the actual_editor info and also properly differentiates between a privileged owner and normal owner.

@ikedas
Copy link
Member

ikedas commented Apr 4, 2020

Looking in web_tt2/lists.tt2, it looks like actual_editor is the moderators for a list

[% ELSIF which.$listname.is_editor ~%]
	[%|loc%]Moderator[%END ~%]
	[% IF which.$listname.is_subscriber %]
		[%|loc%], [%END%]
	[%END ~%]
[% END ~%]

The update addresses this issue to get the actual_editor info and also properly differentiates between a privileged owner and normal owner.

which.$listname.is_editor template variable should be true, if the user belongs to actual_editor.

This is part of Sympa::List::get_admin():

sympa/src/lib/Sympa/List.pm

Lines 2983 to 3022 in db8702f

my @users;
if ($role eq 'editor') {
@users =
grep { $_ and $_->{role} eq 'editor' } @{$admin_user || []};
} elsif ($role eq 'owner') {
@users =
grep { $_ and $_->{role} eq 'owner' } @{$admin_user || []};
} elsif ($role eq 'actual_editor') {
@users =
grep { $_ and $_->{role} eq 'editor' } @{$admin_user || []};
@users = grep { $_ and $_->{role} eq 'owner' } @{$admin_user || []}
unless @users;
} elsif ($role eq 'privileged_owner') {
@users = grep {
$_
and $_->{role} eq 'owner'
and $_->{profile}
and $_->{profile} eq 'privileged'
} @{$admin_user || []};
} elsif ($role eq 'receptive_editor') {
@users = grep {
$_
and $_->{role} eq 'editor'
and ($_->{reception} || 'mail') ne 'nomail'
} @{$admin_user || []};
@users = grep {
$_
and $_->{role} eq 'owner'
and ($_->{reception} || 'mail') ne 'nomail'
} @{$admin_user || []}
unless @users;
} elsif ($role eq 'receptive_owner') {
@users = grep {
$_
and $_->{role} eq 'owner'
and ($_->{reception} || 'mail') ne 'nomail'
} @{$admin_user || []};
} else {
die sprintf 'Unknown role "%s"', $role;
}

actual_editor is equal to editor, however if there are no such users on the list, equal to owner.

Therefore, the user belongs to actual_editor, if they belongs to both owner and actual_editor, or if they belongs to editor.

@olivov
Copy link
Contributor Author

olivov commented Apr 6, 2020

Thanks for the additional info Soji. In terms of the applications functionality, it makes sense that if a list has no 'moderator', then the lists 'owners' inherit that role.

The PR should take the 'actual_editor' in to consideration as the current code does. The current code sets the editor to 1 when calling the is_admin('actual_editor') method on the list object:

sympa/src/cgi/wwsympa.fcgi.in

Lines 4249 to 4254 in 7433eb4

if ( $param->{'user'}{'email'}
and $list->is_admin('actual_editor', $param->{'user'}{'email'})) {
$list_info->{'is_editor'} = 1;
# Compat. < 6.2b.2.
$list_info->{'admin'} = 1;
}

Then to establish the 'actual_editor' in the PR:

    foreach my $list (values %{$which->{'editor'}}) {
        $which->{'actual_editor'}->{$list->{'name'}} =
            $list->is_admin('actual_editor', $param->{'user'}{'email'});
    }
    ....
    if (    $param->{'user'}{'email'}
        and defined($which->{'actual_editor'}->{$listname})) {
        $list_info->{'is_editor'} = 1;
        # Compat. < 6.2b.2.
        $list_info->{'admin'} = 1;
    }    

We're still using the is_admin('actual_editor') method to determine whether or not the list for the user should have the editor set to 1 or not.

Does this update track with the current code implementation? Apologies if I am misunderstanding this!

@ikedas
Copy link
Member

ikedas commented Apr 14, 2020

Does this update track with the current code implementation?

I think you are right. To keep compatibility to current behavior, we'd be better to use actual_editor to determine moderators.

@olivov
Copy link
Contributor Author

olivov commented Apr 16, 2020

Okay, thanks for the additional info Soji. Please do let me know if you see any other areas of concern here from this update. I appreciate your insight!

@ikedas
Copy link
Member

ikedas commented Apr 19, 2020

@olivov, I don't have any more questions!

@salaun-urennes1
Copy link
Collaborator

Hello, I started a thread on sympa-users ML regarding wwsympa performances. My findings are based on code profiling. One of the issue I identified is close to your findings: too many database requests to build the list of lists.

However I'd be in favor of a more global approach, not solving the problem on the wwsympa side only. Actually you'd have the same performance issues in other processes that need to find out which lists a user is owner/editor/member of.

I'd suggest making changes in Sympa/List.pm:

  • get_admins() : already has a level of cache. When executing a high-level subroutine such as do_lists(), we could run a single SQL query in the admin_table (instead of running one query per list) and populate the cache with the information.
  • is_list_member() : also has a level of cache. We could also run a single SQL query to get all memberships for that user at once.

This change would really bring better performances and would benefit all Sympa interfaces (CLI, web, mail, SOAP).

@ikedas
Copy link
Member

ikedas commented May 6, 2020

@salaum-urennes1,
I think we’d be better to submit new issue for the new approach.

Additionally, “The cache caused tons of problems”, and it should be removed. See pinned issue “Sympa caching” and related issues (I don’t show hash-number so that this pr won’t be related to it).

@salaun-urennes1 salaun-urennes1 mentioned this pull request May 7, 2020
@ikedas
Copy link
Member

ikedas commented May 11, 2020

I feel the approach this PR makes is nice. I'll add a few fixes and merge it.

@racke
Copy link
Contributor

racke commented May 11, 2020

Very good. Thanks a lot to @olivov and @ikedas !

@salaun-urennes1
Copy link
Collaborator

I tried the PR code in our staging environment with 2384 mailing lists, with NYTProfiler enabled.
It appears that the code still runs too many SQL queries, as you can see on the NYTProfiler report below.

Once Sympa::List::get_lists() has been executed, do_prepare_query() should not be called 2393 times by Sympa::List::get_current_admins(). Also 4619 calls to do_prepare_query() from Sympa::List::has_included_users probably costs a lot.

Capture du 2020-05-11 16-37-20

@ikedas
Copy link
Member

ikedas commented May 11, 2020

@salaun-urennes1, please submit a new issue precisely describing your approach.

@ikedas ikedas added this to the 6.2.56 milestone May 12, 2020
ikedas added a commit that referenced this pull request May 12, 2020
…olivov & ikedas

list index performance update for wwsympa.fcgi do_lists subroutine (#925)
@ikedas ikedas merged commit bfbe662 into sympa-community:sympa-6.2 May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants