Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create documentation about incidents and metrics #3134

Merged
merged 4 commits into from
Jun 25, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/component-statuses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Component Statuses

Unlike Incidents, Cachet starts listing Component statuses from 1.
When creating or updating a component, you'll need to specify a status for it.

A status can be one of the following:

Status|Name|Description
------|----|-----------
1|Operational|The component is working
2|Performance issues|The component is experiencing some slowness.
3|Partial Outage|The component may not be working for everybody. This could be a geographical issue for example.
4|Major outage|The component is not working for anybody.


41 changes: 41 additions & 0 deletions docs/incidents/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Incidents

An incident is something that should not happen, but that happens anyway.

## What is exactly an incident

In your status page you are showing the state of some components. It may be a
server, a database, of whatever you want.
If your database server crashes, it is an incident.

## Why should I create an incident

Having a status page is a good thing, being honest with the state of your
components is better.
A status page is not only there to show a green light, it's also there to show
why something bad is happening, and when it will be fixed.

So, when your component experiences a problem, it's a good practice to create an
incident.

## How to use the incidents

When experiencing an incident, it's good to keep being up-to-date with what
happens in the real world. That's why you can use _incident updates_.

How you manage your incidents is up to you, but if you have no idea you can do
the following:

1. An incident happens. While a team is working to fix it, a person is creating
an incident. Be clear about what happens. At the same time, set the concerned
component with the right status (_Major Outage_, _Performance issues_ or
other)
2. You identify the origin of the problem, add an _incident update_ to explain
what is the problem, if it's important or not.
3. You think the problem is fixed but are not sure, add an incident update to
explain that. Say it should be fixed, you are watching if everything keeps
being good.
4. If it's not fixed, add an _incident update_ as in the second point because
it's identified bt not fixed. If it's fixed, congratulation! Add an _incident
update_ to explain the details, and say it's definitely fixed. Do not forget
to set the component as _Operational_ again.
54 changes: 54 additions & 0 deletions docs/metrics/create-metric.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Create a metric

This documentation will guide you through the metric creation.
You need to know [what is a metric][1].

## Filling the form

Creating a metric is as simple as filling a form. You just need to know what do
the fields mean.

To access to the metrics creation, follow these steps:

- Log into your Cachet instance.
- Once on the Dashboard click `Metrics` in the sidebar.
- Click the `Create a metric` button.

And you are there! You should be able to see the metric form.
Let's explain the fields:

- `Name`: The name of the metric as it will be shown on the status page.
Example: "API response time".
- `Suffix`: The suffix that will be added in the tooltip when you put your mouse
over the point on the metric. Usually it's the unit of the raw data. Example:
"ms". If you send "42" to the metric, then "42ms" would be show in the
tooltip.
- `Description`: A description of the metric. What is the usage of the metric?
What does it measure? Example: "The average response time of our API".
- `Calculation of metrics`: What computation should be done on your data before
displaying them in the metric? It may be either _Sum_ or _Average_. Example:
_Average_ to compute the average reponse time for a given time.
- `Default view`: The default view of the metric. Viewing the datas of 1 year
ago is not useful, but it's about your preference to see datas of the last
hour, 12 hours, week or month. Example: _Last 12 hours_ because you want to
see the last 12 hours of data by default. It's only the default view, this can
be changed in a select box.
- `Decimal places` The number of decimal of the point that is displayed. If you
are computing the average of something it's almost sure that you'll get an
average with a coma, line 42,424242. Example: 2 to get 42,42 instead of a long
number.
- `How many minutes of threshold between metric points?`: The number of minutes
between the points in the metric. According to your needs it may be 1, 5 or
even 30. It's really up to you. Example: 60 to get one point every hour.
- `Display chart on the status page?`: If checked, this chart will be displayed
on the status page. But it's possible to create the metric and not showing it.
- `Visibility`: Who should be able to see the chart? You have three choices:
- `Visible to authenticated users`: It means that people won't be able to
see it except if they are authenticated. Useful if it's an internal metric.
- `Visible to everybody`: It means that every user, even not authenticated,
will be able to see the chart.
- `Always hidden`: It means that nobody will be able to see the chart.



[1]: index.md
32 changes: 32 additions & 0 deletions docs/metrics/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Metrics

This guide aims to explain basics about metrics.

## What are metrics

When you do monitoring on your services, servers, APIs or others, you can get
raw data. These datas may be a response time to a request, the number of queries
handled in a minute, etc.

The metrics are these raw datas. Using the [Cachet's API][1] you can send the datas
about what you are monitoring to Cachet.


## What can do metrics for you

Having good metrics to show may be great for customers or partners.

You have a big webservice that is under pressure? So it's important to have a
short response time. A metric could show to your users that the webservice is
responding fast!
Imagine, you have a metric named "Response time". Every 10 seconds you call your
webservice, and send the response time to the Cachet's API, in the metric. On
your status page you'll be able to see the average response time for a minute
for example.

Doing so, your users would see that during the last 10 minutes your response
time was worst than previously, and it begins to being better.



[1]: api-documentation.md