Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGX Wireless bug fix #259

Merged
merged 1 commit into from
Sep 13, 2023
Merged

AGX Wireless bug fix #259

merged 1 commit into from
Sep 13, 2023

Conversation

emrahbillur
Copy link
Contributor

Separation of netvmExtraModules caused a bug in agx wireless, where this was fixed by adding appropriate corrections. Wireless condition for agx is improved. A small fix was added for docker (where it is not included if it is not enabled) in case it is planned to add for support.

@emrahbillur
Copy link
Contributor Author

https://ssrc.atlassian.net/browse/SP-2906 Bug was introduced by netvmExtraModules in hardware was overriding the netvmExtraModules in target. Fixed that issue. There were unnecessary if statements in som type checking for enabling wireless, fixed. Added a fix for docker if somehow being added but this will not introduce any delay as if docker support is not included the fix is also not included.

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

Added a fix for docker if somehow being added but this will not introduce any delay as if docker support is not included the fix is also not included.

I'd prefer the WiFi fix and Docker fix as separate commits/PRs as they are not related to each other. There's practical outcome from this - if the Docker overlay was a separate commit, it's easier to revert the overlay commit when we eventually get the fix from the upstream. The whole docker daemon enabling and the docker group are also separate from overlaying the Docker.

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug (using a remote builder) fails to build - it gets stuck to (with --debug):
[2/0/234 built] building ell-0.5.6 on ssh://[email protected]: debug1: pledge: fork

pledge: fork seems to come from ssh clientloop.c - http://bxr.su/OpenBSD/usr.bin/ssh/clientloop.c#960

I already rebooted the remote builder, but it repeats. I'll check if it's build host related.

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

I'll check if it's build host related.

Nope, still the same. Ends up stuck in pledge: fork immediately.

@emrahbillur
Copy link
Contributor Author

I'll seperate both sides and recommit with only WiFi fix.

@emrahbillur
Copy link
Contributor Author

I'll check if it's build host related.

Nope, still the same. Ends up stuck in pledge: fork immediately.

I checked the builds for many times for both flash-script and image flakes of agx and image flakes of nx of but I'll try a rebuild switch to check again.

@emrahbillur
Copy link
Contributor Author

And a small note to add. The issue in docker was due to a bug in Go compiler. So that fix will also solve issues in many other Go compiler based applications. It is a docker fix but I'll rename the commit of docker fix as "Docker fix due to Go issue".

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

I'll check if it's build host related.

Nope, still the same. Ends up stuck in pledge: fork immediately.

I checked the builds for many times for both flash-script and image flakes of agx and image flakes of nx of but I'll try a rebuild switch to check again.

Ok, thanks for confirming. I'm building it now directly on AGX without remote ssh builder connection.

@emrahbillur
Copy link
Contributor Author

I'll check if it's build host related.

Nope, still the same. Ends up stuck in pledge: fork immediately.

I checked the builds for many times for both flash-script and image flakes of agx and image flakes of nx of but I'll try a rebuild switch to check again.

Ok, thanks for confirming. I'm building it now directly on AGX without remote ssh builder connection.

I've removed the docker fix from my local. After confirming build is correct I'll commit and rename the pull request for Wifi Fix. Then I'll later add a PR with Docker fix due to Go compiler issue.

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

I'm building it now directly on AGX without remote ssh builder connection.

Gets stuck on:

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built, 0.0 MiB DL] waiting for lock on '/nix/store/28lpb19hzlw9v5a0alckln2dgfr03wmk-ell-0.56','/n`

This happens also on local build on aarch64 hardware.

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

Gets stuck on:

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built, 0.0 MiB DL] waiting for lock on '/nix/store/28lpb19hzlw9v5a0alckln2dgfr03wmk-ell-0.56','/n`

Apparently I'm facing this NixOS/nix#2029

@emrahbillur
Copy link
Contributor Author

Gets stuck on:

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built, 0.0 MiB DL] waiting for lock on '/nix/store/28lpb19hzlw9v5a0alckln2dgfr03wmk-ell-0.56','/n`

Apparently I'm facing this NixOS/nix#2029

I'll commit a Wireless fix only PR just after check. So you could try on that.

Signed-off-by: Emrah Billur <[email protected]>
Remove docker

Signed-off-by: Emrah Billur <[email protected]>
@emrahbillur
Copy link
Contributor Author

Replaced the PR into only wireless for agx fix.

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

I'll commit a Wireless fix only PR just after check. So you could try on that.

Still the same:

[nix-shell:~/ghaf]$ git log --oneline -1
54bca8e (HEAD, emrah/main) AGX Wireless bug fix

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built] waiting for lock on ...

@emrahbillur
Copy link
Contributor Author

I'll commit a Wireless fix only PR just after check. So you could try on that.

Still the same:

[nix-shell:~/ghaf]$ git log --oneline -1
54bca8e (HEAD, emrah/main) AGX Wireless bug fix

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built] waiting for lock on ...
I'll commit a Wireless fix only PR just after check. So you could try on that.

Still the same:

[nix-shell:~/ghaf]$ git log --oneline -1
54bca8e (HEAD, emrah/main) AGX Wireless bug fix

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built] waiting for lock on ...

I'll commit a Wireless fix only PR just after check. So you could try on that.

Still the same:

[nix-shell:~/ghaf]$ git log --oneline -1
54bca8e (HEAD, emrah/main) AGX Wireless bug fix

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built] waiting for lock on ...

I'll commit a Wireless fix only PR just after check. So you could try on that.

Still the same:

[nix-shell:~/ghaf]$ git log --oneline -1
54bca8e (HEAD, emrah/main) AGX Wireless bug fix

[nix-shell:~/ghaf]$ nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug
[0/235 built] waiting for lock on ...

root@emrah-ThinkPad-P14s-Gen-3:/home/emrah/ghaf-support-for-nx# nix build .#packages.aarch64-linux.nvidia-jetson-orin-nx-debug
warning: Git tree '/home/emrah/ghaf-support-for-nx' is dirty
Then builds with no issues. Same for nix build .#packages.aarch64-linux.nvidia-jetson-orin-agx-debug

@vilvo
Copy link
Contributor

vilvo commented Sep 12, 2023

I get following lines to the kernel log - multiple times:

[  846.797982] INFO: task test-cipher:15875 blocked for more than 724 seconds.
[  846.798207]       Tainted: G           O      5.10.104 #1-NixOS
[  846.798376] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.798595] task:test-cipher     state:D stack:    0 pid:15875 ppid: 15836 flags:0x00000804
[  846.798599] Call trace:
[  846.798608]  __switch_to+0xc8/0x120
[  846.798615]  __schedule+0x2bc/0x824
[  846.798616]  schedule+0x50/0xd0
[  846.798619]  schedule_timeout+0x160/0x1a0
[  846.798620]  __wait_for_common+0xcc/0x220
[  846.798622]  wait_for_completion+0x34/0x40
[  846.798627]  skcipher_recvmsg+0x314/0x3ec
[  846.798632]  sock_read_iter+0xf0/0x120
[  846.798637]  new_sync_read+0x18c/0x1a0
[  846.798639]  vfs_read+0x134/0x1c0
[  846.798640]  ksys_read+0xec/0x10c
[  846.798641]  __arm64_sys_read+0x28/0x34
[  846.798645]  el0_svc_common.constprop.0+0x80/0x1c4
[  846.798647]  do_el0_svc+0x38/0xac
[  846.798650]  el0_svc+0x1c/0x30
[  846.798652]  el0_sync_handler+0xe0/0x10c
[  846.798654]  el0_sync+0x16c/0x180

targets/nvidia-jetson-orin.nix Show resolved Hide resolved
targets/nvidia-jetson-orin.nix Show resolved Hide resolved
@emrahbillur
Copy link
Contributor Author

INFO: task test-cipher:15875 blocked for more than 724 seconds

I'm trying to reproduce the same error but could not manage to get this. I remember in some cross compilation configuration cipher part caused errors but still not sure. I'll also try cross build now.

@emrahbillur emrahbillur changed the title AGX Wireless bug and Docker bug fix AGX Wireless bug fix Sep 13, 2023
@mikatammi
Copy link
Contributor

Are we ready to merge? I'm just testing this and if everything goes ok I will push the merge button

@mikatammi
Copy link
Contributor

In my tests everything go ok, and the NetVM connects to the WLAN network

@mikatammi mikatammi merged commit 3372f53 into tiiuae:main Sep 13, 2023
3 checks passed
@vilvo
Copy link
Contributor

vilvo commented Sep 13, 2023

INFO: task test-cipher:15875 blocked for more than 724 seconds

I'm trying to reproduce the same error but could not manage to get this. I remember in some cross compilation configuration cipher part caused errors but still not sure. I'll also try cross build now.

I changed my remote builder from Orin AGX (ghaf/nixox) to M1 (nixos) and it passed building the problematic package - it's still building the chromium-unwrapped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants