Failed to scrape node: remote error: tls: internal error #1480

rarecrumb · 2024-05-01T17:06:13Z

What happened: Metrics server failed to scrape a node

What you expected to happen: Successfully scrape the node

Anything else we need to know?: Deploying with the helm chart

Environment:

Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): EKS
Container Network Setup (flannel, calico, etc.): Calico
Kubernetes version (use kubectl version): 1.29
Metrics Server manifest

spoiler for Metrics Server manifest:

      args:
      - --kubelet-insecure-tls
      containerPort: 4443
      hostNetwork:
        enabled: true

Kubelet config:

spoiler for Kubelet config:

Metrics server logs:

spoiler for Metrics Server logs:

E0501 16:40:35.362224       1 scraper.go:149] "Failed to scrape node" err="Get \"https://10.3.10.48:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-3-10-48.ec2.internal"

Status of Metrics API:

spolier for Status of Metrics API:

kubectl describe apiservice v1beta1.metrics.k8s.io

Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       app.kubernetes.io/instance=metrics-server
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=metrics-server
              app.kubernetes.io/version=0.7.1
              argocd.argoproj.io/instance=metrics-server
              helm.sh/chart=metrics-server-3.12.1
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2023-07-13T20:41:49Z
  Resource Version:    266080474
  UID:                 59bdff53-5db0-4819-a27e-6aff8526d41e
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       base
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2024-04-30T18:18:43Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

/kind bug

The text was updated successfully, but these errors were encountered:

logicalhan · 2024-05-02T16:44:20Z

/kind support
/triage accepted

kanhayaKy · 2024-07-08T09:11:33Z

Any update on this?
I'm having similar issues,

The log from the metrics-server

E0704 07:13:21.054122       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.062399       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.120301       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.94.156:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-94-156.ec2.internal"
E0704 07:13:36.128872       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.66.224:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-66-224.ec2.internal"
E0704 07:20:51.104101       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.33.165:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-33-165.ec2.internal"

On the node we are able to see that it's listening on port 10250, and was also able to establish connection to the prometheus operator pods

sh-4.2$ netstat -a | grep 10250
tcp6       0      0 [::]:10250              [::]:*                  LISTEN
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-84-140.:59798 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39802 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:44384 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39806 ESTABLISHED

This is a very strange behavior as we have not changed any config and started getting this issue out of no where

nathan-bowman · 2024-08-07T14:59:36Z

It seems that there is a PR in the works, but has anyone derived a work around for this issue?

NahumLitvin · 2024-09-04T11:38:06Z

i am having the same issue

dcherniv · 2024-09-04T20:15:47Z

Check if the CSR for the node is signed.
Ran into something similar recently awslabs/amazon-eks-ami#1944

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 1, 2024

k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2024

dongjiang1989 linked a pull request Jul 10, 2024 that will close this issue

Fix: fix remote error: tls: internal error #1522

Open

nathan-bowman mentioned this issue Jul 18, 2024

Splunk Operator: Autoscaling Issue splunk/splunk-operator#1352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to scrape node: remote error: tls: internal error #1480

Failed to scrape node: remote error: tls: internal error #1480

rarecrumb commented May 1, 2024

logicalhan commented May 2, 2024

kanhayaKy commented Jul 8, 2024

nathan-bowman commented Aug 7, 2024

NahumLitvin commented Sep 4, 2024

dcherniv commented Sep 4, 2024

Failed to scrape node: remote error: tls: internal error #1480

Failed to scrape node: remote error: tls: internal error #1480

Comments

rarecrumb commented May 1, 2024

logicalhan commented May 2, 2024

kanhayaKy commented Jul 8, 2024

nathan-bowman commented Aug 7, 2024

NahumLitvin commented Sep 4, 2024

dcherniv commented Sep 4, 2024