Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smhp add features in LCS utills #392

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Smhp add features in LCS utills #392

wants to merge 3 commits into from

Conversation

gmgtamz
Copy link

@gmgtamz gmgtamz commented Aug 1, 2024

3 scripts

ssh_root.sh to add the SSH root capability
slurm_fix.sh to fix the plugstack bug
pam_adopt_cgroup_wheel.sh to add cgroup, adoptpam and wheel groups

none of them is linked into the python caller script (lifecycle_script.py) yet

smhp_conf="/opt/ml/config/resource_config.json"

pecho(){
#pretty echo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment right?

# node_type="$(cat "${smhp_conf}" | jq -r ".InstanceGroups[] | select(.Instances[0].CustomerIpAddress == \"${hip}\") | .Name")"
# [[ -n $node_type ]] && break
# done
pecho "node_type=${node_type}" # controller-machine, worker-group-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in the comment add Ex, it reads more naturally # Ex: controller-machine, worker-group-1


if [[ $node_type == "controller-machine" ]] ;then
slurm_fix
# scontrol reconfigure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?

@@ -0,0 +1,75 @@
#!/bin/bash
# https://slurm.schedmd.com/pam_slurm_adopt.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short sentence at the beginning to say what it does, people will not click on every link to understand what it does

@@ -0,0 +1,88 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No generation of ssh keys for root.

@@ -0,0 +1,88 @@
#!/bin/bash
# https://slurm.schedmd.com/pam_slurm_adopt.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same references as in the slurm_fix. Remove or can you explain why use use them here too. What does this file do too? Add a small note on the header, 1 sentence is ok, more if needed.


if ! [[ -f "${ssh_dir}/${key_priv}" ]] ;then
rm -f "${ssh_dir}/${key_pub}"
# rm -f "${ssh_dir}/${key_auth}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@@ -0,0 +1,249 @@
#!/bin/bash
# https://slurm.schedmd.com/pam_slurm_adopt.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy paste from other files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what the files does too, add a header

pyxis_conf="${plugstack_dir}/pyxis.conf"

smhp_conf="/opt/ml/config/resource_config.json"
# apt_opts='-o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

scancel -u ubuntu
squeue

# ssh ip-10-1-64-87 uname -a ; -u ubuntu ssh ip-10-1-64-87 uname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the below

squeue
ssh $node uname
# Access denied by pam_slurm_adopt: you have no active jobs on this node
# Connection closed by 10.1.64.87 port 22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?



# test
node="ip-10-1-64-87"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static...

grep -Hn "^UsePAM " ${sshd_conf}

# apt update
# # -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -0,0 +1,75 @@
#!/bin/bash
Copy link
Contributor

@mhuguesaws mhuguesaws Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this script to an explicit functionality.
This is unclear.

@mhuguesaws mhuguesaws requested a review from nghtm August 1, 2024 18:55
@perifaws
Copy link
Contributor

Any update on this @gmgtamz ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants