Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neurips24 #1970

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
72af6d9
Added import for vouchers and scores in pipline/inputs
lenhoanglnh May 12, 2024
df57fbf
Important change: Modified qr_quantile using asymmetric Huber rather …
lenhoanglnh May 15, 2024
9f0ddb4
cleanup docstrings in Solidago (wip)
amatissart May 2, 2024
fd1fb49
implement 'get_pipeline_kwargs' in TournesolInput
amatissart May 12, 2024
049d72e
fix experiments script
amatissart May 16, 2024
dde4c9f
read vouches in TournesolInput
amatissart May 16, 2024
82e9c4f
[solidago] gbt: estimate asymmetrical uncertainties based on increase…
amatissart Jun 1, 2024
c58e424
cleanup docstrings in Solidago (wip)
amatissart May 2, 2024
5e6d598
implement 'get_pipeline_kwargs' in TournesolInput
amatissart May 12, 2024
051f088
fix experiments script
amatissart May 16, 2024
3483609
read vouches in TournesolInput
amatissart May 16, 2024
498f4a3
Fixed experiments calls to Tournesol inputs API
lenhoanglnh Jun 1, 2024
fde2a83
Merge branch 'solidago-pipeline-docs-1' of github.com:tournesol-app/t…
lenhoanglnh Jun 1, 2024
afc32d4
fix docstring
amatissart Jun 1, 2024
0032c86
Merge pull request #1971 from tournesol-app/solidago-pipeline-docs-1
amatissart Jun 3, 2024
a2dfbaa
fix numerical issues in gbt implementations
amatissart Jul 4, 2024
39c5652
normalize weight per user in Standardize
amatissart Jul 4, 2024
fdd40f3
normalize weight per user in QuantileZeroShift
amatissart Jul 4, 2024
23b6da3
Merge remote-tracking branch 'origin/main' into neurips24
amatissart Aug 6, 2024
3b911d8
solidago: fix numerical instability in gbt
amatissart Aug 22, 2024
ef9819c
try to stabilize lbfgs
amatissart Sep 5, 2024
de2434e
fix wrong usage of 'med' in qr_uncertainty, expose high_likelihood_ra…
amatissart Sep 8, 2024
7aca462
add QuantileShift (in addition to QuantileZeroShift) to define target…
amatissart Sep 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions solidago/experiments/data_analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
from solidago.pipeline.inputs import TournesolInputFromPublicDataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy

data = TournesolInputFromPublicDataset.download()

criteria = {
"reliability": "Reliable and not misleading",
"importance": "Important and actionable",
"engaging": "Engaging and thought-provoking",
"pedagogy": "Clear and pedagogical",
"layman_friendly": "Layman-friendly",
"diversity_inclusion": "Diversity and inclusion",
"backfire_risk": "Resilience to backfiring risks",
"better_habits": "Encourages better habits",
"entertaining_relaxing": "Entertaining and relaxing"
}
entities = set(data.comparisons.entity_a) | set(data.comparisons.entity_b)
user_ids = set(data.comparisons.user_id)

def add_comparison_analysis_columns(comparisons):
def is_first_comparison(comparisons):
registered = { e: set() for e in entities }
entity_a_firsts, entity_b_firsts = list(), list()
for _, r in comparisons.iterrows():
entity_a_first, entity_b_first = False, False
if r.criteria == "largely_recommended" and r.user_id not in registered[r.entity_a]:
registered[r.entity_a].add(r.user_id)
entity_a_first = True
if r.criteria == "largely_recommended" and r.user_id not in registered[r.entity_b]:
registered[r.entity_b].add(r.user_id)
entity_b_first = True
entity_a_firsts.append(entity_a_first)
entity_b_firsts.append(entity_b_first)
return entity_a_firsts, entity_b_firsts

entity_a_firsts, entity_b_firsts = is_first_comparison(comparisons)
comparisons = comparisons.assign(entity_a_first=entity_a_firsts)
comparisons = comparisons.assign(entity_b_first=entity_b_firsts)

def score_of_first_comparison(comparisons):
first_comparison_score = list()
for _, r in comparisons.iterrows():
if r.entity_a_first and (not r.entity_b_first):
first_comparison_score.append(r.score)
elif (not r.entity_a_first) and r.entity_b_first:
first_comparison_score.append(- r.score)
else:
first_comparison_score.append(np.nan)
return first_comparison_score

comparisons = comparisons.assign(first_comparison_score=score_of_first_comparison(comparisons))

def has_others(comparisons):
with_others = dict()
for _, r in comparisons[comparisons.criteria != "largely_recommended"].iterrows():
if r.user_id not in with_others:
with_others[r.user_id] = dict()
if r.entity_a not in with_others[r.user_id]:
with_others[r.user_id][r.entity_a] = set()
with_others[r.user_id][r.entity_a].add(r.entity_b)
has_others = list()
for _, r in comparisons.iterrows():
has_others.append(
r.user_id in with_others
and r.entity_a in with_others[r.user_id]
and r.entity_b in with_others[r.user_id][r.entity_a]
)
return has_others

comparisons = comparisons.assign(has_others=has_others(comparisons))

def is_trusted(comparisons):
return [data.users.loc[r.user_id, "trust_score"] >= 0.8 for _, r in comparisons.iterrows()]

comparisons = comparisons.assign(is_trusted=is_trusted(comparisons))

return comparisons

c = add_comparison_analysis_columns(data.comparisons)

def add_user_analysis_columns(users, comparisons):
def n_comparisons(users, comparisons):
return [
len(comparisons[comparisons.user_id == user_id])
for user_id, _ in data.users.iterrows()
]
users = users.assign(n_comparisons=n_comparisons(users, comparisons))
users = users.assign(
n_main_comparisons=n_comparisons(
users,
comparisons[comparisons.criteria == "largely_recommneded"]
)
)
return users

u = add_user_analysis_columns(data.users, data.comparisons)

def add_score_analysis_columns():
def _unsquash(scores):
for _, row in scores[scores.score == 100.00].iterrows():
row.score = 99.99
for _, row in scores[scores.score == -100.00].iterrows():
row.score = -99.99
return scores.score / np.sqrt(100.0**2 - scores.score)

data.collective_scores = data.collective_scores.assign(unsquashed=_unsquash(data.collective_scores.scores))
data.individual_scores = data.individual_scores.assign(unsquashed=_unsquash(data.individual_scores.scores))

def confidence_interval(scores, confidence=0.95):
mean = scores.mean()
z_deviation = np.sqrt(2) * scipy.special.erfinv(confidence)
deviation = z_deviation * np.sqrt( scores.var() / len(scores) )
return mean - deviation, mean + deviation

def plot_criteria(comparisons, figsize=(2, 3)):
fig, axs = plt.subplots(3, 3, figsize=figsize)
for n_plot, ax in enumerate(axs.flat):
criterion = list(criteria.keys())[n_plot]
cc = comparisons[comparisons.criteria == criterion]
ax.hist(cc.score, bins=21)
ax.set_title(criteria[criterion])

def n_extreme_values(scores, n_std_dev):
mean = scores.mean()
std_dev = np.sqrt(scores.var())
return len(scores[np.abs(scores - mean) > n_std_dev * std_dev])

def plot(comparison_scores, colors=("g", "y", "r"), labels=None):
if labels is None:
plt.hist(comparison_scores, 21, density=True, histtype='bar', color=colors)
else:
plt.hist(comparison_scores, 21, density=True, histtype='bar', color=colors, label=labels)
1 change: 0 additions & 1 deletion solidago/experiments/engagement_bias.json
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@
}],
"preference_learning": ["UniformGBT", {
"prior_std_dev": 7,
"comparison_max": 10,
"convergence_error": 1e-05,
"cumulant_generating_function_error": 1e-05
}],
Expand Down
1 change: 0 additions & 1 deletion solidago/experiments/resilience.json
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@
}],
"preference_learning": ["UniformGBT", {
"prior_std_dev": 7,
"comparison_max": 10,
"convergence_error": 1e-05,
"cumulant_generating_function_error": 1e-05
}],
Expand Down
7 changes: 5 additions & 2 deletions solidago/experiments/synthetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,13 @@ def sample_correlation(n_users, n_entities, seed, generative_model, pipeline) ->
users, voting_rights, user_models, global_model = pipeline(*data)

truth = entities["svd0"]
estimate = [global_model(e, row)[0] for e, row in entities.iterrows()]
estimate = [
global_model(e, row)[0] if global_model(e, row) is not None else 0.
for e, row in entities.iterrows()
]
return np.corrcoef(truth, estimate)[0, 1]

def sample_n_correlations(n_users, n_entities, n_seeds, generative_model, pipeline, thread=True):
def sample_n_correlations(n_users, n_entities, n_seeds, generative_model, pipeline, thread=False):
if not thread:
return [
sample_correlation(n_users, n_entities, seed, generative_model, pipeline)
Expand Down
132 changes: 107 additions & 25 deletions solidago/experiments/tournesol.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
import logging
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from threading import Thread

from solidago.pipeline.inputs import TournesolInputFromPublicDataset

from solidago.trust_propagation import LipschiTrust
from solidago.voting_rights import AffineOvertrust
from solidago.preference_learning import LBFGSUniformGBT
from solidago.preference_learning import UniformGBT
from solidago.scaling import ScalingCompose, Mehestan, QuantileZeroShift
from solidago.aggregation import StandardizedQrQuantile
from solidago.post_process import Squash
Expand All @@ -27,6 +31,18 @@
ch.setLevel(logging.INFO)
info_logger.addHandler(ch)

logger.info("Retrieve public dataset")
inputs = TournesolInputFromPublicDataset.download()
video_id_to_entity_id = {
video_id: entity_id
for entity_id, video_id in enumerate(inputs.entity_id_to_video_id)
}

logger.info("Preprocessing data for the pipeline")
users, vouches, all_entities, privacy = inputs.get_pipeline_objects()

# criteria = set(inputs.comparisons["criteria"])
criteria = { "largely_recommended" }

pipeline = Pipeline(
trust_propagation=LipschiTrust(
Expand All @@ -40,12 +56,10 @@
min_overtrust=2.0,
overtrust_ratio=0.1,
),
preference_learning=LBFGSUniformGBT(
preference_learning=UniformGBT(
prior_std_dev=7,
comparison_max=10,
convergence_error=1e-5,
cumulant_generating_function_error=1e-5,
n_steps=2,
),
scaling=ScalingCompose(
Mehestan(
Expand All @@ -65,35 +79,103 @@
aggregation=StandardizedQrQuantile(
quantile=0.2,
dev_quantile=0.9,
lipschitz=0.1,
lipschitz=100,
error=1e-5
),
post_process= Squash(
score_max=100
)
)

logger.info("Retrieve public dataset")
inputs = TournesolInputFromPublicDataset.download()
user_outputs, entities, voting_rights, scaled_user_models = dict(), dict(), dict(), dict()

logger.info("Preprocessing data for the pipeline")
users, vouches, entities, privacy = inputs.get_pipeline_objects()
users = pipeline.trust_propagation(users, vouches)

for c in criteria:
logger.info(f"Running the pipeline for criterion `{c}`")

judgments = inputs.get_judgments(c)

voting_rights[c], entities[c] = pipeline.voting_rights(users, all_entities, vouches, privacy)
user_models = pipeline.preference_learning(judgments, users, entities[c])
scaled_user_models[c] = pipeline.scaling(user_models, users, entities[c], voting_rights[c], privacy)

# criteria = set(inputs.comparisons["criteria"])
criteria = { "largely_recommended" }
# threads = [Thread(target=run_pipeline, args=(criterion,)) for criterion in criteria]
# for thread in threads:
# thread.start()
# for thread in threads:
# thread.join()

logger.info(f"Successful pipeline run.")

user_outputs, voting_rights, user_models, global_model = dict(), dict(), dict(), dict()
def run_pipeline(criterion):
logger.info(f"Running the pipeline for criterion `{criterion}`")
judgments = inputs.get_judgments(criterion)
output = pipeline(users, vouches, entities, privacy, judgments)
user_outputs[criterion], voting_rights[criterion] = output[0], output[1]
user_models[criterion], global_model[criterion] = output[2], output[3]

threads = [Thread(target=run_pipeline, args=(criterion,)) for criterion in criteria]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
scores = inputs.collective_scores

squashed_user_models, global_model = dict(), dict()
quantiles = [0.1, 0.2, 0.35, 0.5, 0.65, 0.8, 0.9]

for q in quantiles:

pipeline.aggregation.quantile = q
squashed_user_models[q], global_model[q] = dict(), dict()
for c in criteria:
user_models, global_model[q][c] = pipeline.aggregation(voting_rights[c], scaled_user_models[c], users, entities[c])
squashed_user_models[q][c], global_model[q][c] = pipeline.post_process(user_models, global_model[q][c], entities)

logger.info(f"Successful pipeline run.")
q_scores = list()
for _, row in scores.iterrows():
try:
entity_id = video_id_to_entity_id[row.video]
q_scores.append(global_model[q][row.criteria](entity_id, None)[0])
except:
q_scores.append(0.)
scores[f"score_q={q}"] = q_scores

comparisons = inputs.comparisons
s_main = scores[scores.criteria == "largely_recommended"]
c_main = comparisons[comparisons.criteria == "largely_recommended"]

entity_a_counts = c_main.value_counts("entity_a")
entity_b_counts = c_main.value_counts("entity_b")

def n_comparisons(video):
total = 0
if video not in video_id_to_entity_id:
return 0
if video_id_to_entity_id[video] in entity_a_counts:
total += entity_a_counts[video_id_to_entity_id[video]]
if video_id_to_entity_id[video] in entity_b_counts:
total += entity_b_counts[video_id_to_entity_id[video]]
return total

def n_contributors(video):
if video not in video_id_to_entity_id:
return 0
entity = video_id_to_entity_id[video]
contributors = set(c_main[c_main.entity_a == entity].user_id)
contributors |= set(c_main[c_main.entity_b == entity].user_id)
return len(contributors)

s_main.loc[:,"n_comparisons"] = [n_comparisons(r.video) for _, r in s_main.iterrows()]
s_main.loc[:,"n_contributors"] = [n_contributors(r.video) for _, r in s_main.iterrows()]

s_top_main = s_main[(s_main.n_comparisons > 100) & (s_main.n_contributors > 20)]
top_entities = set(s_top_main.video)
c_top_main = c_main[(c_main.entity_a.isin(top_entities)) | (c_main.entity_b.isin(top_entities))]


ranking = { q: s_top_main.sort_values(f"score_q={q}", ascending=False)["video"] for q in quantiles }
for q in quantiles:
rk = list(ranking[q])
s_top_main.loc[:, f"ranking_q={q}"] = [ rk.index(r.video) for _, r in s_top_main.iterrows() ]

ranking_cols = [f"ranking_q={q}" for q in quantiles]

s_top_main.loc[:, "ranking_delta"] = s_top_main["ranking_q=0.8"] - s_top_main["ranking_q=0.2"]
s_top_main.loc[:, "score_delta"] = s_top_main["ranking_q=0.8"] - s_top_main["ranking_q=0.2"]

largest_delta = set(s_top_main.sort_values("score_delta")[:5].video)
largest_delta |= set(s_top_main.sort_values("score_delta")[-5:].video)

s_plot = s_top_main[s_top_main.video.isin(largest_delta)][["video"] + ranking_cols].set_index("video")


16 changes: 16 additions & 0 deletions solidago/src/solidago/pipeline/inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,22 @@ def __init__(self, dataset_zip: Union[str, BinaryIO]):
data=self.users.index, index=self.users["public_username"]
)
self.comparisons = self.comparisons.join(username_to_user_id, on="public_username")

with (zipfile.Path(zip_file) / "vouchers.csv").open(mode="rb") as vouchers_file:
# keep_default_na=False is required otherwise some public usernames
# such as "NA" are converted to float NaN.
self.vouchers = pd.read_csv(vouchers_file, keep_default_na=False)

with (zipfile.Path(zip_file) / "collective_criteria_scores.csv").open(mode="rb") as collective_scores_file:
# keep_default_na=False is required otherwise some public usernames
# such as "NA" are converted to float NaN.
self.collective_scores = pd.read_csv(collective_scores_file, keep_default_na=False)

with (zipfile.Path(zip_file) / "individual_criteria_scores.csv").open(mode="rb") as individual_scores_file:
# keep_default_na=False is required otherwise some public usernames
# such as "NA" are converted to float NaN.
self.individual_scores = pd.read_csv(individual_scores_file, keep_default_na=False)


@classmethod
def download(cls) -> "TournesolInputFromPublicDataset":
Expand Down
21 changes: 13 additions & 8 deletions solidago/src/solidago/primitives.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,18 +85,23 @@ def _qr_quantile_loss_derivative(
"""Computes the derivative of the loss associated to qr_quantile"""
regularization = (variable - default_value) / lipschitz

if quantile == 0.5:
quantile_term = 0.0
elif isinstance(voting_rights, (int, float)):
quantile_term = (1.0 - 2.0 * quantile) * voting_rights * len(values)
else:
quantile_term = (1.0 - 2.0 * quantile) * np.sum(voting_rights)

deltas = variable - values
uncertainties_2 = left_uncertainties_2 * (deltas < 0) + right_uncertainties_2 * (deltas > 0) + spacing
forces = voting_rights * deltas / np.sqrt(uncertainties_2 + deltas**2)

if quantile == 0.5:
return regularization + forces.sum()

left_strength = min(1.0, quantile / (1-quantile))
right_strength = min(1.0, (1-quantile) / quantile)

forces = np.where(
forces < 0,
forces * left_strength,
forces * right_strength,
)

return regularization + quantile_term + forces.sum()
return regularization + forces.sum()
Comment on lines +92 to +104
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lenhoanglnh This change seems to change significantly the behaviour of the "zero shift" on current Tournesol data. Is it expected? Should we adjust the quantile parameter?

On "main", after applying the shift with score_shift_quantile = 0.15, about 13% of the individual scores are negative. On this branch "neurips24", that would be 37%.

As a consequence the distribution of Tournesol would be modified, with fewer videos reaching the recommendability threshold (1238 instead of 3013).

(I used the "legacy2023" pipeline, currently deployed on production. But I expect it would similar with the new pipeline).

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsatisfactory indeed.
I'm a bit disturbed. It feels like the quantile is now poorly estimated.
Maybe this is because videos with lower scores have higher uncertainty? Or less trust?

Copy link
Contributor Author

@lenhoanglnh lenhoanglnh May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I looked at the data and indeed, the uncertainties for bad videos are smaller than for good videos, which explains why the quantile increased with the new quantile definition. I see two simple fixes:

  • Reduce score_shift_quantile = 0.15 to score_shift_quantile = 0.05.
  • Remove uncertainties in quantile estimation.

The former is much more satisfactory.



@njit
Expand Down
Binary file modified solidago/tests/data/tiny_tournesol.zip
Binary file not shown.
Loading
Loading