-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dataviz] feat: add graph Active contributors grouped by age #1991
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To define with the rest of the team:
What is an active contributor?
Updated graph after comments on discord; updated graph now look like this:
|
…d_by_contributors_age ... to make it more explicit
Hi @NatNgs The new graph is really cool 👌 I ran
The colours and the groups look good to me for now. Let's check with the rest of the team.
I haven't updated "First comparison date = last comparison date" yet, but I'll think about it. |
import streamlit as st | ||
from dateutil.relativedelta import relativedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a direct dependency of the project, should we add it to the file requirements.txt
?
drop=True | ||
) # Keep only the required data, remove duplicates. | ||
|
||
df.week_date = pd.to_datetime(df.week_date, infer_datetime_format=True, utc=True).astype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using astype
raises a warning in the console, and will raise an exception in the future.
Using .astype to convert from timezone-aware dtype to timezone-naive dtype is deprecated and will raise in a future version. Use obj.tz_localize(None) or obj.tz_convert('UTC').tz_localize(None) instead
I don't know the best practice here (I have nearly zero experience with pandas), but it looks like the astype
is used to make methods such as df.week_date.min()
and .max()
work. It should be possible to keep the timezone aware datetime, right? Maybe by explicitly sorting the column week_date
?
If there is no side effect we may want to explicitly convert the timezone aware datetime to naive dates.
In case it helps:
# works
df.week_date = pd.to_datetime(df.week_date, infer_datetime_format=True, utc=True)
df.week_date.cat.as_ordered().max()
# doesn't work
df.week_date = pd.to_datetime(df.week_date, infer_datetime_format=True, utc=True)
df.week_date.max()
*** TypeError: Categorical is not ordered for operation max
you can use .as_ordered() to change the Categorical to an ordered one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NatNgs
related issues #1726
Add user-growth plot in Tournesol's streamlit app
Added a new plot in Streamlit, displaying the number of contributors, new users and active users in one line plot.
"Active" users are defined as having done at least one comparison on the week or before, and having done at least one comparison on the week or after.
Checklist
❤️ Thank you for your contribution!