Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebrowser] Add get delegation token logic for secure hadoop (#3301) (Related with: #3324 ) #3449

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

SoniaComp
Copy link
Contributor

What changes were proposed in this pull request?
I am working on integrating Hue with SecureHadoop as a ProxyUser.
I found that the behavior of ProxyUser in WebHDFS FileBrowser needs to be enhanced.
When "security_enabled" is true, only "read_url" method use delegation token.

I want to modify the code
so that a user who has been impersonated from Hue(a proxy user) can issue a Delegation Token when using WebHDFS.
(#3323)

@bjornalm
Copy link
Collaborator

Hi @SoniaComp and thanks for contributing again :-)
@ranade1 @amitsrivastava can you have a look?

@github-actions
Copy link

This PR is stale because it has been open 45 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Oct 15, 2023
@SoniaComp
Copy link
Contributor Author

I use this bug fix in my secure hadoop environment successfully, so I'm sure that this bug fix work well.
I want to get feedbacks from other engineers. Thank you for checking the codes 🙇‍♀️

@github-actions github-actions bot removed the Stale label Oct 18, 2023
@bjornalm
Copy link
Collaborator

@ranade1 friendly reminder from our contributor here, can you take a look at this PR?

@ranade1
Copy link
Contributor

ranade1 commented Dec 4, 2023

@SoniaComp, The "cachetools" library you are utilizing is not thread-safe. See https://cachetools.readthedocs.io/en/latest/

Note Please be aware that all these classes are not thread-safe. Access to a shared cache from multiple threads must be properly synchronized, e.g. by using one of the memoizing decorators with a suitable lock object. ```

Also, in the gunicorn setup we use "processes" instead of "threads"? Have you tested your code on latest Hue?

@SoniaComp
Copy link
Contributor Author

@ranade1 Thank you for your comment! I will change that library with better one.

@SoniaComp
Copy link
Contributor Author

@bjornalm @ranade1
I changed cachetools to django core cache which is thread-safe.

@SoniaComp
Copy link
Contributor Author

I tested this code and confirmed that it works properly after fixing the bug.

@@ -32,12 +33,13 @@
import time
import urllib.request, urllib.error

from django.core.cache import caches
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

�Requesting a delegation token every time using filebrowser can put a stress on the hadoop namenode. So I used cache.

CACHES[CACHES_WEBHDFS_DELEGATION_TOKEN_KEY] = {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': CACHES_WEBHDFS_DELEGATION_TOKEN_KEY,
'TIMEOUT': desktop.conf.KERBEROS.REINIT_FREQUENCY
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kerberos tickets are renewed periodically, so I use Timeout option that the cache expires accordingly.

if self._security_enabled:
token = cache.get(self.user, None)
if not token:
token = self.get_delegation_token(self.user)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use the impersonation function to grant permission to each user, you must use a delegation token. (reference: https://blog.cloudera.com/hadoop-delegation-tokens-explained/)

def get_delegation_token(self, renewer):
"""get_delegation_token(user) -> Delegation token"""
# Workaround for HDFS-3988
if self._security_enabled:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem was fixed in hadoop 2.6 version. (reference: https://issues.apache.org/jira/browse/HDFS-3988)

@bjornalm
Copy link
Collaborator

bjornalm commented Jan 8, 2024

@SoniaComp Thanks! @ranade1 Can you have another look?

@SoniaComp
Copy link
Contributor Author

SoniaComp commented Feb 4, 2024

Thank you for your reviews! Are there any parts in this PR that could be improved further?

@ranade1
Copy link
Contributor

ranade1 commented Feb 5, 2024

Hello @SoniaComp , Thank you for putting code changes. I have specific questions.

  1. Have you tried this code where Hue HA is enabled? (For example multiple Hue servers are running behind single Hue LB).
  2. Have you tried using multiple users using Hue HA(above configuration)?

@SoniaComp
Copy link
Contributor Author

Hi! I appreciate for the good questions! 😊

  1. I am running hue on a Kubernetes cluster using the Hue helm chart. I used an image built using the code written in PR.
  2. And multiple users authenticated through in-house LDAP are using Hue's hdfs filebrowser function with separate permissions using this Hue. (hue was set to hdfs proxy user)

@SoniaComp
Copy link
Contributor Author

Do I need more test for this code?

@SoniaComp
Copy link
Contributor Author

@ranade1 @bjornalm Hi! I am currently running hue with this code for over 100 users. If there are any areas that need further development, I will make corrections.

@bjornalm
Copy link
Collaborator

@SoniaComp thank you. @ranade1 @amitsrivastava Can we please take a moment to see if we can get this PR merged or if additional changes/tests are required?

Copy link

This PR is stale because it has been open 45 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Apr 26, 2024
@eubnara
Copy link
Contributor

eubnara commented Apr 26, 2024

This is needed.

@github-actions github-actions bot removed the Stale label Apr 27, 2024
Copy link

This PR is stale because it has been open 45 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jun 11, 2024
@eubnara
Copy link
Contributor

eubnara commented Jun 11, 2024

@SoniaComp
It is about to be closed. 😅
So I just leave a comment.

@github-actions github-actions bot removed the Stale label Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants