Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add options to use Prometheus federation API on linkerd-viz with all necessary resources for it to work correctly #12212

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

francRang
Copy link

@francRang francRang commented Mar 6, 2024

Add options to use Prometheus federation API via scrape configs or service monitors - create necessary resources (AuthorizationPolicy) to avoid any connectivity errors when enabling API (closes #11050)

There is no way to automatically configure prometheus to use the federation API if a user wants to do so for linkerd-viz.

This PR will address this via 2 different ways as described by this doc and creates the necessary Authorization form.

Multiple users have confirmed the AuthorizationPolicy works: #11050 (comment). As per the helm logic, I ran helm template with the values configured as needed and things work.

➜  linkerd-2-copy git:(main) ✗ helm template viz/charts/linkerd-viz/ | yq eval-all 'select(.kind == "AuthorizationPolicy" and .metadata.name == "prometheus-admin-federate")' - && cat viz/charts/linkerd-viz/values.yaml | grep "federateUsingScrapeConfig" -A 20
# Source: linkerd-viz/templates/prometheus.yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: prometheus-admin-federate
  namespace: default
  labels:
    linkerd.io/extension: viz
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: Linkerd
    app.kubernetes.io/version: linkerdVersionValue
    component: prometheus
spec:
  targetRef:
    group: policy.linkerd.io
    kind: Server
    name: prometheus-admin
  requiredAuthenticationRefs:
    - group: policy.linkerd.io
      kind: NetworkAuthentication
      name: kubelet
  federateUsingScrapeConfig: true
  # -- Makes use of Prometheus's federation API to copy data from one Prometheus to another by ServiceMonitor to configure Prometheus
  # adds necessary AuthorizationPolicy to avoid 403 errors.
  federateUsingServiceMonitor: true
  # -- Disables automatic creation of AuthorizationPolicy to prevent 403 scrape errors when using Prometheus federation API.
  # Ignored if both federateUsingScrapeConfig & federateUsingServiceMonitor are set to false
  disableAuthorizationPolicyForFederatedPrometheus: false
➜  linkerd-2-copy git:(main) ✗ helm template viz/charts/linkerd-viz/ | yq eval-all 'select(.kind == "ServiceMonitor")' - && cat viz/charts/linkerd-viz/values.yaml | grep "federateUsingScrapeConfig" -A 20
# Source: linkerd-viz/templates/prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    linkerd.io/extension: viz
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: Linkerd
    app.kubernetes.io/version: linkerdVersionValue
    component: prometheus
  name: linkerd-federate
  namespace: default
spec:
  endpoints:
    - interval: 30s
      scrapeTimeout: 30s
      params:
        match[]:
          - '{job="linkerd-proxy"}'
          - '{job="linkerd-controller"}'
      path: /federate
      port: admin-http
      honorLabels: true
      relabelings:
        - action: keep
          regex: '^prometheus$'
          sourceLabels:
            - '__meta_kubernetes_pod_container_name'
  jobLabel: app
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      component: prometheus
  federateUsingScrapeConfig: true
  # -- Makes use of Prometheus's federation API to copy data from one Prometheus to another by ServiceMonitor to configure Prometheus
  # adds necessary AuthorizationPolicy to avoid 403 errors.
  federateUsingServiceMonitor: true
  # -- Disables automatic creation of AuthorizationPolicy to prevent 403 scrape errors when using Prometheus federation API.
  # Ignored if both federateUsingScrapeConfig & federateUsingServiceMonitor are set to false
  disableAuthorizationPolicyForFederatedPrometheus: false
helm template viz/charts/linkerd-viz/ | yq eval-all 'select(.kind == "ConfigMap" and .metadata.name == "prometheus-config")' - && cat viz/charts/linkerd-viz/values.yaml | grep "federateUsingScrapeConfig" -A 20                  
# Source: linkerd-viz/templates/prometheus.yaml
###
### Prometheus
###
kind: ConfigMap
apiVersion: v1
metadata:
  name: prometheus-config
  namespace: default
  labels:
    linkerd.io/extension: viz
    component: prometheus
    namespace: default
  annotations:
    linkerd.io/created-by: linkerd/helm linkerdVersionValue
data:
  prometheus.yml: |-
    global:
      evaluation_interval: 10s
      scrape_interval: 10s
      scrape_timeout: 10s

    rule_files:
    - /etc/prometheus/*_rules.yml
    - /etc/prometheus/*_rules.yaml

    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']

    #  Required for: https://grafana.com/grafana/dashboards/315
    - job_name: 'kubernetes-nodes-cadvisor'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
      metric_relabel_configs:
      - source_labels: [__name__]
        regex: '(container|machine)_(cpu|memory|network|fs)_(.+)'
        action: keep
      - source_labels: [__name__]
        regex: 'container_memory_failures_total' # unneeded large metric
        action: drop

    - job_name: 'linkerd-controller'
      kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
          - 'linkerd'
          - 'default'
      relabel_configs:
      - source_labels:
        - __meta_kubernetes_pod_container_port_name
        action: keep
        regex: admin-http
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: component

    - job_name: 'linkerd-service-mirror'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels:
        - __meta_kubernetes_pod_label_component
        - __meta_kubernetes_pod_container_port_name
        action: keep
        regex: linkerd-service-mirror;admin-http$
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: component

    - job_name: 'linkerd-proxy'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels:
        - __meta_kubernetes_pod_container_name
        - __meta_kubernetes_pod_container_port_name
        - __meta_kubernetes_pod_label_linkerd_io_control_plane_ns
        action: keep
        regex: ^linkerd-proxy;linkerd-admin;linkerd$
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      # special case k8s' "job" label, to not interfere with prometheus' "job"
      # label
      # __meta_kubernetes_pod_label_linkerd_io_proxy_job=foo =>
      # k8s_job=foo
      - source_labels: [__meta_kubernetes_pod_label_linkerd_io_proxy_job]
        action: replace
        target_label: k8s_job
      # drop __meta_kubernetes_pod_label_linkerd_io_proxy_job
      - action: labeldrop
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_job
      # __meta_kubernetes_pod_label_linkerd_io_proxy_deployment=foo =>
      # deployment=foo
      - action: labelmap
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+)
      # drop all labels that we just made copies of in the previous labelmap
      - action: labeldrop
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+)
      # __meta_kubernetes_pod_label_linkerd_io_foo=bar =>
      # foo=bar
      - action: labelmap
        regex: __meta_kubernetes_pod_label_linkerd_io_(.+)
      # Copy all pod labels to tmp labels
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
        replacement: __tmp_pod_label_$1
      # Take `linkerd_io_` prefixed labels and copy them without the prefix
      - action: labelmap
        regex: __tmp_pod_label_linkerd_io_(.+)
        replacement:  __tmp_pod_label_$1
      # Drop the `linkerd_io_` originals
      - action: labeldrop
        regex: __tmp_pod_label_linkerd_io_(.+)
      # Copy tmp labels into real labels
      - action: labelmap
        regex: __tmp_pod_label_(.+)

  federateUsingScrapeConfig: true
  # -- Makes use of Prometheus's federation API to copy data from one Prometheus to another by ServiceMonitor to configure Prometheus
  # adds necessary AuthorizationPolicy to avoid 403 errors.
  federateUsingServiceMonitor: true
  # -- Disables automatic creation of AuthorizationPolicy to prevent 403 scrape errors when using Prometheus federation API.
  # Ignored if both federateUsingScrapeConfig & federateUsingServiceMonitor are set to false
  disableAuthorizationPolicyForFederatedPrometheus: false
helm template viz/charts/linkerd-viz/ | yq eval-all 'select(.kind == "ServiceMonitor" and .metadata.name == "linkerd-federate")' - && cat viz/charts/linkerd-viz/values.yaml | grep "federateUsingScrapeConfig" -A 20
# Source: linkerd-viz/templates/prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    linkerd.io/extension: viz
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: Linkerd
    app.kubernetes.io/version: linkerdVersionValue
    component: prometheus
  name: linkerd-federate
  namespace: default
spec:
  endpoints:
    - interval: 30s
      scrapeTimeout: 30s
      params:
        match[]:
          - '{job="linkerd-proxy"}'
          - '{job="linkerd-controller"}'
      path: /federate
      port: admin-http
      honorLabels: true
      relabelings:
        - action: keep
          regex: '^prometheus$'
          sourceLabels:
            - '__meta_kubernetes_pod_container_name'
  jobLabel: app
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      component: prometheus
  federateUsingScrapeConfig: false
  # -- Makes use of Prometheus's federation API to copy data from one Prometheus to another by ServiceMonitor to configure Prometheus
  # adds necessary AuthorizationPolicy to avoid 403 errors.
  federateUsingServiceMonitor: true
  # -- Disables automatic creation of AuthorizationPolicy to prevent 403 scrape errors when using Prometheus federation API.
  # Ignored if both federateUsingScrapeConfig & federateUsingServiceMonitor are set to false
  disableAuthorizationPolicyForFederatedPrometheus: false
helm template viz/charts/linkerd-viz/ | yq eval-all 'select(.kind == "AuthorizationPolicy" and .metadata.name == "prometheus-admin-federate")' - && cat viz/charts/linkerd-viz/values.yaml | grep "federateUsingScrapeConfig" -A 20
# Source: linkerd-viz/templates/prometheus.yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: prometheus-admin-federate
  namespace: default
  labels:
    linkerd.io/extension: viz
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: Linkerd
    app.kubernetes.io/version: linkerdVersionValue
    component: prometheus
spec:
  targetRef:
    group: policy.linkerd.io
    kind: Server
    name: prometheus-admin
  requiredAuthenticationRefs:
    - group: policy.linkerd.io
      kind: NetworkAuthentication
      name: kubelet
  federateUsingScrapeConfig: false
  # -- Makes use of Prometheus's federation API to copy data from one Prometheus to another by ServiceMonitor to configure Prometheus
  # adds necessary AuthorizationPolicy to avoid 403 errors.
  federateUsingServiceMonitor: true
  # -- Disables automatic creation of AuthorizationPolicy to prevent 403 scrape errors when using Prometheus federation API.
  # Ignored if both federateUsingScrapeConfig & federateUsingServiceMonitor are set to false
  disableAuthorizationPolicyForFederatedPrometheus: false
linkerd-2-copy git:(main) ✗ helm template viz/charts/linkerd-viz/ | yq eval-all 'select(.kind == "AuthorizationPolicy" and .metadata.name == "prometheus-admin-federate")' - && cat viz/charts/linkerd-viz/values.yaml | grep "federateUsingScrapeConfig" -A 20 
  federateUsingScrapeConfig: false
  # -- Makes use of Prometheus's federation API to copy data from one Prometheus to another by ServiceMonitor to configure Prometheus
  # adds necessary AuthorizationPolicy to avoid 403 errors.
  federateUsingServiceMonitor: true
  # -- Disables automatic creation of AuthorizationPolicy to prevent 403 scrape errors when using Prometheus federation API.
  # Ignored if both federateUsingScrapeConfig & federateUsingServiceMonitor are set to false
  disableAuthorizationPolicyForFederatedPrometheus: true
helm template viz/charts/linkerd-viz/ | yq eval-all 'select(.kind == "AuthorizationPolicy" and .metadata.name == "prometheus-admin-federate")' - && cat viz/charts/linkerd-viz/values.yaml | grep "federateUsingScrapeConfig" -A 20
  federateUsingScrapeConfig: true
  # -- Makes use of Prometheus's federation API to copy data from one Prometheus to another by ServiceMonitor to configure Prometheus
  # adds necessary AuthorizationPolicy to avoid 403 errors.
  federateUsingServiceMonitor: false
  # -- Disables automatic creation of AuthorizationPolicy to prevent 403 scrape errors when using Prometheus federation API.
  # Ignored if both federateUsingScrapeConfig & federateUsingServiceMonitor are set to false
  disableAuthorizationPolicyForFederatedPrometheus: true

I agree to the DCO for all the commits in this PR.

@francRang francRang marked this pull request as ready for review March 8, 2024 06:17
@francRang francRang requested a review from a team as a code owner March 8, 2024 06:17
@adleong
Copy link
Member

adleong commented Mar 11, 2024

Thanks, @francRang! I think my preference would be to add the AuthorizationPolicy you have here, but omit the ServiceMonitor and scrape config changes. These are already well documented and omitting them from our templates helps keep them simple.

Does federation only require access to the /federate path? If so, we can narrow the authorization here by creating an HttpRoute resource for /federate and then change the authorization policy to authorize the kubelet to that HttpRoute only.

@adleong
Copy link
Member

adleong commented Apr 10, 2024

Hi @francRang, are you still interested in working on this?

@francRang
Copy link
Author

@adleong Yes, I will resume this MR during the weekend. Thank you for the feedback!

@kflynn
Copy link
Member

kflynn commented May 30, 2024

I'm going to go ahead and close this one since it's been idle for awhile -- @francRang, feel free to reopen if you're still at it! and, in any case, thanks for looking into this in the first place. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prometheus metrics federation yields HTTP 403
3 participants