Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish ids from multiple providers #9896

Merged

Conversation

3link
Copy link
Contributor

@3link 3link commented May 3, 2023

Type of change

  • Other

Description of change

The goal of this PR is to sketch an implementation proposal for allowing multiple user id modules to expose the same kind of identifiers. E.g. two modules exposing UID2. The idea is that a config - like the one below - provides priority in case a collision happens.

pbjs.setConfig({
    userSync: {
        idPriority: {
            uid2: ['uid2', 'liveIntentId']
        }
...

modules/liveIntentIdSystem.js Outdated Show resolved Hide resolved
@3link 3link changed the title DRAFT: Distinguish ids from multiple providers Distinguish ids from multiple providers May 31, 2023
@3link 3link marked this pull request as ready for review May 31, 2023 14:38
Copy link
Collaborator

@dgirardi dgirardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have some concerns about this approach - it seems to care only about what ends up in the EIDs array; with no support for the old userId map. I think this means that we need to deprecate it, or if this catches on adapters that are using it will effectively become second class citizens (but I'll defer to @patmmccann for the decision).

By that I mean that for example, if your adapter uses userIdAsEids you can get uid2 from liveintent; but if you're using userId you won't.

It does have the advantage of being much simpler than a comprehensive solution.

@@ -453,7 +453,7 @@ function getSubmoduleId(submodules, sourceName) {
return {};
}
const submodule = submodules.filter(sub => isPlainObject(sub.idObj) &&
Object.keys(sub.idObj).length && USER_IDS_CONFIG[Object.keys(sub.idObj)[0]]?.source === sourceName);
Object.keys(sub.idObj).length && USER_IDS_CONFIG[Object.keys(sub.idObj)[0]]?.find(conf => conf.source === sourceName));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this says "the module that corresponds to this source is the first one I find that can provide IDs from this source". That's not necessarily true is it? I think we need a way to mark the "owner" of a source outside of the EID object. It could be as simple as the first one declared is the one you own.

Copy link
Collaborator

@dgirardi dgirardi May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, since this is only used in here:

function getUserIdsAsEidBySource(sourceName) {
return createEidsArray(getSubmoduleId(initializedSubmodules, sourceName))[0];
}

I think what we should do is reverse the filtering - instead of finding the module for a given source to get its EIDs, it makes more sense now to get all EIDs and filter them by source (since they can come from different modules).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, "get all EIDs and filter them by source". If there is a collision, i.e., 2 or more eids with the same source check the provider and choose the one where the provider and source are the same. The question then becomes what do we do when one eid has no provider value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eids#USER_IDS_CONFIG could be adjusted to always have a global provider for each user id that is then used as provider for all uids extracted based on it. That way every eid/uid would have a provider. For example:

  'lipb': {
  	provider: 'liveintent.com',
  	extractors: [
	    // liveIntentId
	    {
	      getValue: function(data) {
	        return data.lipbid;
	      },
	      source: 'liveintent.com',
	      atype: 3,
	      getEidExt: function(data) {
	        if (Array.isArray(data.segments) && data.segments.length) {
	          return {
	            segments: data.segments
	          };
	        }
	      }
	    },

	    // uid2
	    {
	      source: 'uidapi.com',
	      atype: 3,
	      getValue: function(data) {
	        return data.uid2;
	      }
	    }
	  ]
	}

And eids#createEidObjects would take care of building the uid.ext using the global provider value and output

[{
    source: 'liveintent.com',
    uids: [{
        id: 'some-random-id-value',
        atype: 1
    }],
    ext: {
        provider: 'liveintent.com',
        segments: ['s1', 's2']
    }
},
{
    source: 'uidapi.com',
    uids: [{
        id: 'some-random-id-value',
        atype: 3,
        ext: { 
            provider: 'liveintent.com' 
        }
    }]
}]

Copy link
Contributor Author

@3link 3link Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One related question: what if there is a collision and neither of the providers is the owner of the id?

Maybe it would make sense to return an eid with multiple uids when getUserIdsAsEidBySource is called and there is a collision?

For example: getUserIdsAsEidBySource('uidapi.com') would then return

[{
    source: 'uidapi.com',
    uids: [{
        id: 'some-random-id-value',
        atype: 3,
        ext: { provider: 'uidapi.com' }
    },
    {
        id: 'some-random-id-value',
        atype: 3,
        ext: { provider: 'liveintent.com' }
    }]
}]

when a UID2 could be resolved via LI's and the UID2 adapter. The caller then decides which uid to pick if there are multiple. We also can make sure that the uid of the id owner is always at index 0 in the uids array.

Same can be applied when building the eids array in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently getUserIdsAsEidBySource returns a single object, and it's part of the public API, so that's not really up for change. if there's more than one EID we have to pick one.

I agree that the API should not be changed in this regard. getUserIdsAsEidBySource should return one single object - one eid. An eid, however, can have an array of uids. My proposal is rather about - in case of collisions - exposing multiple uids with their providers within the eid that is returned. By providing the "provider" field the uids become distinguishable and the caller can decide what id to use in case there are multiple based on the provider.

Copy link
Collaborator

@dgirardi dgirardi Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to do that you have to make the assumption that all providers for a source will always generate "compatible" EID objects; for example, they share atype and (edit - cancel that, atype is part of UID) everything they put in ext (through getEidExt) always applies equally no matter the provider. I would be very hesitant to rely on this especially since the EID objects are only likely to get more complicated over time. Choosing one when there's a conflict is an acceptable compromise IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not change the API, that will require changes to bid adapters and the servers behind them, that would be a massive undertaking. Can we make this configurable? Can the pub declare a list of priorities in the core user id module?

========================
In case of user id source collisions

  1. liveintent.com
  2. pubcid.org
  3. google.com

If this prioritization still doesn't produce a winner, pick one at random.

Copy link
Contributor Author

@3link 3link Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this configurable? Can the pub declare a list of priorities in the core user id module?

@jdwieland8282 Sounds good.

@jdwieland8282 What do you think, should that config allow to provide priority per identifier type - e.g. uid2 from uidapi.com has higher priority than uid2 from liveintent.com? Or should it be a config that defines that all identifiers from one user id module have priority over same types of identifiers exposed by any other module with lower priority? I personally tend towards the first option as it is more fine grained - but it is also more complex in the implementation and when writing the config.

Also, I would see if it is feasible to apply this prioritization logic when building the eids array, not only in getUserIdsAsEidBySource. Are we on the same page here?

Copy link

@deepthivenkat deepthivenkat Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We met today in Identity PMC call and made the following decisions after vote:
https://docs.google.com/document/d/1dXrTaA-gFXayP-QWhFZN_91kLWp7qFO1CH1GmgafUdc/edit?usp=sharing

We want to go with Option I.I which is to give publisher the ability to provide the priority of the modules for each identifier. In the event where no configuration is provided, the first id/source that gets populated from any provider will get picked.

We want this configuration defined in the core user id module so that publishers can define which. sub adapters should override the other when there are id conflicts

The priority field should appear as an optional parameter right next to name here for each id: https://github.com/prebid/Prebid.js/blob/master/modules/userId/userId.md

@patmmccann
Copy link
Collaborator

I'm pretty okay with this approach; we should also remember that the user ID method is in the public api and encourage the 57 adapters not using the eids method of the advantages to changing. It isn't really discriminating against an adapter if they can all change implementations at any time.

We hope live intent may help in this, letting their ssp partners know to prefer the eids method.

We should make sure this information is well published.

@deepthivenkat
Copy link

@patmmccann I can help come up with a public facing documentation explaining this.

@3link 3link force-pushed the distinguish_ids_from_multiple_providers branch from b8b0d17 to c0e5c62 Compare June 14, 2023 11:35
@3link
Copy link
Contributor Author

3link commented Jun 14, 2023

@jdwieland8282 @dgirardi @deepthivenkat

Adjusted the PR to make submodule priority configurable per user id. Also, closed the magnite id PR and added magnite id handling here.

@3link
Copy link
Contributor Author

3link commented Jun 14, 2023

The e2e test failures seem not to be caused by the changes in this PR. Ran the e2e test from local machine successfully.

@jdwieland8282
Copy link
Member

I'm not qualified to code review this, but maybe @dgirardi can when he has a free moment. I can review the docs pr does that exist yet @deepthivenkat ?

const currentIdPriorityByName = idPriority[key]?.reverse().indexOf(i.submodule.name);
const currentIdPriorityByAlias = idPriority[key]?.reverse().indexOf(i.submodule.aliasName);
// eslint-disable-next-line
const currentIdPriority = currentIdPriorityByName ? currentIdPriorityByName : currentIdPriorityByAlias ? currentIdPriorityByAlias : - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works (and I don't see it in a test case) - assuming you meant "use the name first, then the alias".
Because of how boolean casting works, this will always use the name (every value except 0 is truthy), except when it was the last element in the priority array (reversed becomes 0), in which case it will always use the alias, even if it was not in the array.

I am not sure that considering aliases here is necessary, but if we do, my approach would be to "resolve" them when the id priority is passed to setConfig; that would also help with performance. By that I mean, instead of just taking in idPriority verbatim from user input, do some processing so that when you get here idPriority[key] is already reversed, and does not contain any aliases - only the proper names.

Copy link
Contributor Author

@3link 3link Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgirardi Good catch! Addressed both concerns - flipping priority and alias handling - in 2256f92.

Kept the aliases handling for now - but if there are no strong arguments for keeping it, I can remove it.

.filter(i => isPlainObject(i.idObj) && Object.keys(i.idObj).length)
.reduce((carry, i) => {
Object.keys(i.idObj).forEach(key => {
carry[key] = i.idObj[key];
const currentIdPriorityByName = idPriority[key]?.reverse().indexOf(i.submodule.name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, doing the reverse only once may be a requirement and not just a suggestion - because IIRC it modifies the array in place so on the second call you'd actually be restoring it to the original order.

@3link 3link force-pushed the distinguish_ids_from_multiple_providers branch from edb0498 to 2256f92 Compare June 16, 2023 12:46
@patmmccann
Copy link
Collaborator

@deepthivenkat This looks ready to merge once we have a docs pull drafted?

@deepthivenkat
Copy link

deepthivenkat commented Jun 21, 2023 via email

@patmmccann patmmccann merged commit 1016e89 into prebid:master Jun 23, 2023
2 checks passed
github-ishara-chan-tung pushed a commit to ebuzzing/Prebid.js that referenced this pull request Jul 12, 2023
* Make id priority configurable

* Use mocked modules

* Support provided ext for medianet, uid2 and bidswitch eid.uid

* Add support for magnite id

* Delete unused findEid function

* Add a tests for getEncryptedEidsForSource

* Extract constant for the provider domain

* Make updatePPID take priority into account

* Compute effective priority only once and handle aliases

* Fix after merge

* Use Map#has instead of Map#get

* Remove duplicate test

* Add tests

* Run linter
musikele pushed a commit to rubicon-project/Prebid.js that referenced this pull request Aug 28, 2023
* Make id priority configurable

* Use mocked modules

* Support provided ext for medianet, uid2 and bidswitch eid.uid

* Add support for magnite id

* Delete unused findEid function

* Add a tests for getEncryptedEidsForSource

* Extract constant for the provider domain

* Make updatePPID take priority into account

* Compute effective priority only once and handle aliases

* Fix after merge

* Use Map#has instead of Map#get

* Remove duplicate test

* Add tests

* Run linter
santii7395 pushed a commit to themaven-net/Prebid.js that referenced this pull request Aug 28, 2023
* Make id priority configurable

* Use mocked modules

* Support provided ext for medianet, uid2 and bidswitch eid.uid

* Add support for magnite id

* Delete unused findEid function

* Add a tests for getEncryptedEidsForSource

* Extract constant for the provider domain

* Make updatePPID take priority into account

* Compute effective priority only once and handle aliases

* Fix after merge

* Use Map#has instead of Map#get

* Remove duplicate test

* Add tests

* Run linter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants