Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLARIAH Requirements for infrastructure and Software/Services #5

Merged
merged 37 commits into from
Jan 18, 2022

Conversation

proycon
Copy link
Member

@proycon proycon commented Jul 12, 2021

In this requirements branch we're working on requirements for the CLARIAH infrastructure and for Software/Services, as described in issue #4. This pull requests tracks all changes. Feel free to comment either here or in issue #4. Very specific comments on the textual contents are more are suited to be placed here, whereas generic observations are more suites for issue #4.

Feel free to simply push to this branch with your contributions (or an extra pull request if you prefer).

Note: this pull request is a work in progress (draft) tracking the relevant changes and should not be merged until ready.

@ddeboer
Copy link
Contributor

ddeboer commented Jul 13, 2021

Note: this pull request is a work in progress tracking the relevant changes, DO NOT MERGE until the WIP marker is removed from the subject.

You can convert this to a draft PR to prevent it from getting merged accidentally.

@proycon
Copy link
Member Author

proycon commented Jul 13, 2021

You can convert this to a draft PR to prevent it from getting merged accidentally.

Ha thanks, that was the option I was looking for but couldn't find :) I was already surprised and wondering whether github had it at all (in gitlab it is much easier to find).

@proycon proycon marked this pull request as draft July 13, 2021 07:55
docs/requirements/infrastructure-requirements.md Outdated Show resolved Hide resolved
docs/requirements/infrastructure-requirements.md Outdated Show resolved Hide resolved
docs/requirements/infrastructure-requirements.md Outdated Show resolved Hide resolved
@proycon
Copy link
Member Author

proycon commented Jul 28, 2021

@ddeboer Thanks for the feedback, I have processed your suggestions.

docs/requirements/software-requirements.md Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
docs/requirements/software-requirements.md Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
docs/requirements/software-requirements.md Outdated Show resolved Hide resolved
All services open to end-users and which require some form of user authentication *MUST* be compatible with
CLARIAH's authentication and authorization infrastructure. That is, they should be able to communicate with CLARIAH's
[SATOSA](https://github.com/IdentityPython/SATOSA) Authentication Provider. It is *RECOMMENDED* to use OpenID Connect
for this communication. Instruction can be found [here](https://github.com/CLARIAH/IG-DevOps/tree/main/docs/authentication).
Copy link
Contributor

@ddeboer ddeboer Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://github.com/CLARIAH/IG-DevOps/blob/main/docs/authentication/authentication_clariah_nl.md:

all of the identity providers registered at CLARIN (see "Participating Identity Federations" at CLARINs Service Provider Federation). This basically means: all European academic institutions. In addition to this CLARIN-provided collection of identity providers, Beeld en Geluid is also available as an identity provider in discovery.clariah.nl.

So this requirement makes a lot of sense for applications geared towards academic users. But what about applications with other types of users? Case in point: NDE’s Solid Collection Registration System, in which small heritage institutions manage their collections? What would be the added value for this type of applications in using SATOSA rather than directly connecting to some external IdP such as Auth0?

So should we:

  • generalise this requirement to be about OIDC-compatibility on the part of the application, so it can be connected to any OIDC IdP?
  • or document more clearly how to use CLARIAH auth with non-academic users (I guess BenG is an example of this)?

@proycon proycon marked this pull request as draft September 30, 2021 09:21
@proycon
Copy link
Member Author

proycon commented Oct 1, 2021

So this requirement makes a lot of sense for applications geared towards academic users. But what about applications with other types of users? Case in point: NDE’s Solid Collection Registration System, in which small heritage institutions manage their collections? What would be the added value for this type of applications in using SATOSA rather than directly connecting to some external IdP such as Auth0?

So should we:

generalise this requirement to be about OIDC-compatibility on the part of the application, so it can be connected to any OIDC IdP?
or document more clearly how to use CLARIAH auth with non-academic users (I guess BenG is an example of this)?

@ddeboer Very good point, I completely agree and have tried to raise this issue before: non-academic users that don't yet have an account in the federated authentication infrastructure need to be able to get (immediate) access as well. This is in fact part of the reason we held on to our own legacy registration system for the tools & services in Nijmegen. If we opt for a single CLARIAH-wide identity provider, I think it must have the additional option for new users to register (and immediately or after simple mail verification have an active account, that is, not hindered by a human in the loop). If participating services want to exclude such non-academic users, they can still do so based on authorization details. I don't know who's making the decisions for this, perhaps @janpieterk and @mmisworking can tell more?

@menzowindhouwer
Copy link
Contributor

Non-academic users can use the CLARIN IdP, and register here: https://user.clarin.eu/user/register

@proycon
Copy link
Member Author

proycon commented Oct 1, 2021

Non-academic users can use the CLARIN IdP, and register here: https://user.clarin.eu/user/register

Thanks for the quick reply! I indeed knew about that one, but I think it had a human in the loop that needs to verify the registration right? (at least it used to be that way when I registered long ago, and the text still hints at it: After your registration is processed (normally within two working days)). I think such a delay is not acceptable if a user wants to use a service, users expect to immediately use it or they lose interest and leave. (We used to have a similar verification stage in Nijmegen and got rid of it for the same reason). I'd do it the other way round, give users access immediately after registration, but notify a human to keep an eye on registrations and revoke permissions (and possibly set IP bans etc) if needed.

@menzowindhouwer
Copy link
Contributor

We can propose this approach to CLARIN.eu, maybe they are willing to switch to this model. Can you propose it to [email protected], I think @dietervu also listens to that one ...

@roelandordelman
Copy link
Contributor

For Media Suite the CLARIN Idp route is often used for temporary login in Media Suite for users without a university account. However, apart from the manual step at CLARIN, we also have to whitelist the person with the CLARIN account. In practice this manual operation is not a problem. In fact, in my opinion it would often be a requirement. For example, at NISV only scholars are allowed access to NISV data. Therefore the CLARIN users should belong to an "academic" user group. Also, we only provide temporary access via CLARIN (by enabling removal after a certain period of time). Another example, at other collection owners access could be granted to individual users and individual collections, e.g., known professor X is allowed to access collection Y. Whether there is a CLARIN idp of something else, there will be a need for manually assigning access levels to individuals based on their credentials (and/or even membership of a CLARIAH organisation?). So I would not be in favour of the model @proycon describes (turning it around). Ideally, a request for accessing a collection from a non-academic user should be distributed via a CLARIAH wide service to a local operator at a collection owner that checks the request and grants it or not based on pre-defined criteria (established in a large agreement). Some of these criteria may be handled automatically though. For example, members of a organisation that is a memeber of CLARIAH but not academic are granted access automatically via their iDP. I think it is also key that access is blocked/granted at exactly the right spot. E.g., NISV metadata can be searched via media Suite by anyone, viewing content or analysing metadata via Jupyter Notebooks is restricted. So the blocking/grating part should be placed at the viewing level (or environments where users are working with JNs), as is the case in WP5 currently.

@proycon
Copy link
Member Author

proycon commented Oct 1, 2021

Yes, I completely understand the need for proper checks, especially in case of sensitive data, but as you said, there is a need for manually assigning access for such services anyway. I'm not saying we shouldn't do that, it's just that in some cases you might not want it and right now that's impossible. So my concern is with the services that don't need much authorization but only need some simple authentication.

For example, the CLST RUN services attract a fair amount of outside non-academic users, including private individuals and even commercial parties, who just want to try out the service. They are mostly processing services (like the ASR) and users bring their own data so we don't have much to protect there. So as long as the demand on our resources is pretty insignificant, we're fine with anyone trying our services. When people come in hordes and overwhelm the servers we'd probably reconsider ;) but right now we're happy with every user that finds our stuff useful. An activation barrier with human verification would hinder people to try out the service (people have short attention spans and lose interest quickly anyway, I'd do the same).

I suppose if CLARIAH/CLARIN doesn't provide this function, which is fair enough of course, the other option is that we have is to rely on an extra identity provider to accommodate such users, or setting up and managing our own one. That does pose some extra technical challenges and the user will then have to explicitly choose whether to use the CLARIN IdP or whatever else we provide.

Btw, the discussion deviates a bit from @ddeboer 's original point (I guess we should have made a separate issue), which was whether we want to require all services in our infrastructure to use the CLARIAH/CLARIN authentication service (which I think we do). Adding additional IdPs would not violate this either (but if it can be avoided it's have my preference as it's simpler to implement).

hayco and others added 2 commits October 26, 2021 12:10
Update software-requirements.md
@proycon proycon closed this in b10995b Nov 17, 2021
proycon added a commit that referenced this pull request Nov 17, 2021
WP3 SPAQ presentation from Tech Day 20210325
@proycon proycon reopened this Nov 17, 2021
@proycon proycon added the RFC/proposal This is a proposal / request for comments. Input is much appreciated. label Nov 17, 2021
@@ -0,0 +1,4 @@
## CLARIAH Requirements
Copy link
Contributor

@ddeboer ddeboer Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the path docs/requirements/ still accurate after the move to the clariah-plus repo? We now have top-level use-cases/ so should we make requirements/ top-level too?

@proycon
Copy link
Member Author

proycon commented Jan 18, 2022

Is the path docs/requirements/ still accurate after the move to the clariah-plus repo? We now have top-level use-cases/ so should we make requirements/ top-level too?

No, this will have to be resolved/rebased when we merge this into the main branch. Perhaps the time has come to accept these proposals in the main branch and continue work from there, they have been open long enough and discussion seems to have stagnated a bit. What do you think? (also @roelandordelman)

@ddeboer
Copy link
Contributor

ddeboer commented Jan 18, 2022

Yeah, let’s merge this and do any follow-up work in subsequent PRs.

@ddeboer ddeboer marked this pull request as ready for review January 18, 2022 10:37
proycon added a commit that referenced this pull request Jan 18, 2022
@proycon proycon merged commit ac47c38 into main Jan 18, 2022
@proycon proycon changed the title [WIP] CLARIAH Requirements for infrastructure and Software/Services CLARIAH Requirements for infrastructure and Software/Services Jan 18, 2022
@proycon
Copy link
Member Author

proycon commented Jan 18, 2022

This PR is now merged, see the contents here https://github.com/CLARIAH/clariah-plus/tree/main/requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC/proposal This is a proposal / request for comments. Input is much appreciated.
Development

Successfully merging this pull request may close these issues.

None yet

6 participants