Skip to content

Latest commit

 

History

History
138 lines (105 loc) · 6.46 KB

TROUBLESHOOT.md

File metadata and controls

138 lines (105 loc) · 6.46 KB

TROUBLESHOOTING

This document aim to helps users solve common problems they might encounter while installing or using the rating-operator. Problems listed below might be fixed by coming releases, the document will be updated accordingly. If you encounter an error not documented here, feel free to open an issue to discuss it, and we will add it here.

Common advices

  • When installing the dependencies, is is STRONGLY recommended to use the exact same version as mentionned. Kubernetes and operators in general being in a fast growing environment, we fixed the versions used for each external components. Not respecting this can generate undocumented imcompatibility error, do at your own risks. Here is a list of the versions we use:

    • Helm 3.1.2
    • Grafana 7.0.3
    • Rook-ceph 1.2.6
    • Prometheus latest
    • Metering-operator 4.2
  • After installing the metering-operator, it is STRONGLY recommended to wait for the first Reports to be generated before installing the rating.

  • After rating-operator installation, it is avised to wait approximately 10 minutes before starting to use it. The initialization time of the rating-operator-api can be long depending on the allocated resources.

  • To test the ability of the rating-operator-api to answer, you can try:

$> kubectl get pods -l app.kubernetes.io/component=api -o name | cut -d/ -f2 | xargs -I{} kubectl port-forward {} 5012:5012
Forwarding from 127.0.0.1:5012 -> 5012
Forwarding from [::1]:5012 -> 5012
# From another terminal
$> curl http://localhost:5012/alive
I'm alive!
  • After adding a new configuration, always verify that it is accepted. If the message below does not appear, the RatingRules have not been validated and thus will not be used.
$> kubectl -n $RATING_NAMESPACE  describe ratingrules.rating.smile.fr test-rules
[...]
 Type    Reason   Age   From  Message
  ----    ------   ----  ----  -------
  Normal  Logging  43m   kopf  RatingRule test-rules created, valid from 2020-04-22T12:46:41Z.
  • Do NOT create or modify RatedMetrics yourself, it's not designed to be used that way.
  • If a custom resource is stuck while deleting, the cause is probably the finalizer method. To solve this problem, we can use the patch command of kubectl:
$ kubectl delete ratingrules.rating.smile.fr rating-rating-default-rules
# The command hangs

# On another terminal
$ kubectl get ratingrules.rating.smile.fr
NAME                          AGE
rating-rating-default-rules   3d

$ kubectl patch ratingrules.rating.smile.fr/rating-rating-default-rules -p '{"metadata":{"finalizers":[]}}' --type=merge
ratingrule.rating.smile.fr/rating-rating-default-rules patched

# Then
$ kubectl get ratingrules.rating.smile.fr
No resources found in rating namespace.

Common questions

The rook operator pod is not running correctly

You might see this message:

$ kubectl -n rook-ceph describe pods -l app=rook-ceph-operator
[...]
      Message:   failed to run operator. Error starting agent daemonset: error starting agent daemonset: failed to create rook-ceph-agent daemon set. DaemonSet.apps "rook-ceph-agent" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
[...]

The fix is to run kube-apiserver with the --allow-privileged flag. This configuration detail is specific to rook-ceph and you may not need it with other storage plugins.

In our case with Juju:

$ juju config kubernetes-master allow-privileged=true

If that's not the issue you have, look here:


I just installed the rating and nothing is happening

As the rating is watching the Reports generated by the metering-operator, you might have to wait as long as an hour to start seing metrics. Reports are generated every hour at HH:00, so you can expect to get RatedMetrics soon after (seconds, in our case, 500 frames per metrics takes 1.4 seconds).


I waited until the next hour but nothing is happening

It might be related to the configuration versionning system. If you deploy the rating before having Reports generated, the operator will keep trying to get frames from a timeframe where none will ever exist. Natively, the operator looks from frames with a timestamp between 1970/01/01T00:00:000 and the moment the first RatingRules was deployed (the installation time, by default), and try to rate those with the base configuration. If no frames are found, the operator will just wait. You can fix this situation by removing and recreating the base RatingRules (rating-rating-default-rules by default).

To have better understanding of why this happens, read this.


I cannot connect to Grafana, what is the password ?

If you use the Grafana installed by the Prometheus operator, the credentials are:

  • Login: admin
  • Password: prom-operator

In case it doesn't work for you, use the following:

$ kubectl get secret prometheus-grafana -o yaml -n monitoring
apiVersion: v1
kind: Secret
data:
  admin-password: cHJvbS1vcGVyYXRvcg==
  admin-user: YWRtaW4=
[...]
$ echo "cHJvbS1vcGVyYXRvcg==" | base64 -d
prom-operator

After configuration, I don't see any data in Grafana

If you can list and query the endpoints in Grafana but do not get any results, check on the top right corner of the Grafana screen. The scalable rating produce data frames that always have a round timestamp, and the time parameter of Grafana is non inclusive. To see data, query AT LEAST data from the last 3 hours. You will never encounter this problem if you are using the reactive mechanism.


I don't succeed in using multi-tenancy through Grafana

We use cookie based sessions to authenticate user queries to the rating-operator-api. You have to log through the /login endpoint of the rating-operator-api, THEN log into grafana. If you have configured the datasource properly, enabled the session cookie and activated Basic authentication, you can verift the cookie presence after login, through your web browser's interface. If you cannot, go back to configuring Grafana or check your browser's cookie settings.