Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to add more specific pod annotations #777

Closed
yann-soubeyrand opened this issue Sep 19, 2018 · 63 comments
Closed

Ability to add more specific pod annotations #777

yann-soubeyrand opened this issue Sep 19, 2018 · 63 comments
Assignees

Comments

@yann-soubeyrand
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When making use of the Kubernetes filter plugin, one can use a pod's annotation to specify a parser which Fluent Bit could use to parse this pod's logs. One can also use a pod's annotation to exclude this pod's logs. However, a pod can be composed of several containers each of which can use different log formats. Moreover, the containers can log to their stdout and stderr streams which can also use different log formats.

Describe the solution you'd like
To solve this problem, Fluent Bit could support more specific pod annotations. For example, a pod could specify the following annotations:

fluentbit.io/parser/<container_name>/<(stdout|stderr)>
fluentbit.io/exclude/<container_name>/<(stdout|stderr)>

The order of precedence for the pod's annotations could be (from the highest precedence to the lowest):

fluentbit.io/parser/<container_name>/<(stdout|stderr)>
fluentbit.io/parser/<container_name>
fluentbit.io/parser/<(stdout|stderr)>
fluentbit.io/parser

Describe alternatives you've considered
As a workaround, I send raw logs to Fluentd and do static parsing there (static meaning without making use of pod's annotations).

Additional context
What I'm trying to accomplish is parsing and filtering Kubernetes logs with Fluent Bit prior to sending it to Elasticsearch or Fluentd. See #776 for more context.

@donbowman
Copy link
Contributor

donbowman commented Sep 19, 2018

+1. This occurs when using istio. It injects a sidecar that uses JSON format logging, where the 'main' container does not.

Kubernetes allows a single slash in an annotation, so the above suggested format is not legal.

a qualified name must consist of alphanumeric characters, '-', '_' or '.', 
and must start and end with an alphanumeric character (e.g. 'MyName', 
or 'my.name',  or '123-abc', regex used for validation is 
'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]') with an optional 
DNS subdomain prefix and '/' (e.g. 'example.com/MyName')

So it could use a dash if we dropped the stdout/stderr: fluentbit.io/parser-container

I'm not sure how you'd get the stdout/stderr since the log file might be merged.

donbowman added a commit to donbowman/fluent-bit that referenced this issue Sep 21, 2018
This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser-container`).
donbowman added a commit to donbowman/fluent-bit that referenced this issue Sep 21, 2018
This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser-container`).
donbowman added a commit to donbowman/fluent-bit that referenced this issue Sep 21, 2018
This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser-container`).

Signed-off-by: Don Bowman <[email protected]>
@donbowman
Copy link
Contributor

for comment and test, #789

I made this bit fluentbit.io/parser[-container]: parsername.

donbowman added a commit to donbowman/fluent-bit-docs that referenced this issue Sep 21, 2018
This addresses fluent/fluent-bit#777
to include `fluentbit.io/parser[-container]: parser`.

Signed-off-by: Don Bowman <[email protected]>
@yann-soubeyrand
Copy link
Contributor Author

In my opinion, being able to specify a different parser for stdout and stderr is a must: Apache access logs and error logs are fundamentally different for example. If we cannot use '/' as a delimiter, maybe we can use '_' as it is not allowed in container names if I remember correctly.

@donbowman
Copy link
Contributor

donbowman commented Sep 21, 2018 via email

@edsiper edsiper self-assigned this Sep 21, 2018
@donbowman
Copy link
Contributor

donbowman commented Sep 21, 2018

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/identifiers.md

shows the rules.

I guess we could do fluentbit.io/parser[-stdout|stderr][-container] @ the cost of an ambiguity if there is a container called stdout or stderr in a pod.

@edsiper
Copy link
Member

edsiper commented Sep 21, 2018

it's bad that we are restricted in the format by k8s spec...

I just did a quick look but what about:

fluentbit.io/parser.<stream>.<container>

where

  • stream: can be stdout, stderr or * for both (fluentbit.io/parser.*.mycontainer)

@edsiper
Copy link
Member

edsiper commented Sep 21, 2018

@bigkraig @homer6 @StevenACoffman, would you please review this proposal and provide some feedback from an usability point of view ?

@donbowman
Copy link
Contributor

donbowman commented Sep 22, 2018

I don't think * is allowed. If I try to use '*' I get this:

The Pod "user-db-7dcc9649dc-b72rs" is invalid: metadata.annotations: Invalid value: "foo.bar.*.x": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')

So I can use all instead of *.

So fluentbit.io/parser.<all|stdout|stderr>.container

@edsiper
Copy link
Member

edsiper commented Sep 22, 2018

@donbowman good point, yeah, 'all' sounds good

@bigkraig
Copy link

I went a similar path recently in the ALB Ingress controller. Should the container name come before the stream? It should go from least to most specific and I think having the option of not providing a stream makes sense.

I don’t know if you would configure parsers on things other than containers in an annotation, but if so you may want to add the ‘container’ word as an identifier as well

@bigkraig
Copy link

fluentbit.io/parsers.containers.CONTAINERNAME

fluentbit.io/parsers.containers.CONTAINERNAME.stdout

Sorry for the lack of quotes. I’m on a phone on my way to a soccer game 😃

donbowman added a commit to donbowman/fluent-bit that referenced this issue Sep 22, 2018
This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser-container`).

Signed-off-by: Don Bowman <[email protected]>
donbowman added a commit to donbowman/fluent-bit that referenced this issue Sep 22, 2018
This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser-container`).

Signed-off-by: Don Bowman <[email protected]>
@donbowman
Copy link
Contributor

donbowman commented Sep 22, 2018 via email

@donbowman
Copy link
Contributor

Getting the stream is not that simple. flb_filter_do() calls the filters, but the msgpack there has only 2 entries(time, log). It no longer has access to the raw docker log (which had stream: stdout in it).

I tried adding Decode_Field_As escaped stream to the docker parser, but we still only get the 2.

Still looking.

@Globegitter
Copy link

Just want to add that being able to differentiate between stdout/stderr is also very important for nginx logs as they differentiate there fundamentally and there is no way to control the error logs of nginx.

donbowman added a commit to donbowman/fluent-bit that referenced this issue Sep 24, 2018
This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser[.container][.stream]`).

Stream should be one of stdout|stderr. So example annotations could be:

```
fluentbit.io/parser.istio-init.stdout": "init_stdout"
"fluentbit.io/parser.session-db": "redis"
```

Signed-off-by: Don Bowman <[email protected]>
donbowman added a commit to donbowman/fluent-bit-docs that referenced this issue Sep 24, 2018
This addresses fluent/fluent-bit#777
to include `fluentbit.io/parser[-container]: parser`.

Signed-off-by: Don Bowman <[email protected]>
@donbowman
Copy link
Contributor

OK I have committed this change to the PR #789

this supports:

fluentbit.io/parser[.container][.stream]

as to the early comment about nginx, this will work for nginx in kubernetes.

@donbowman
Copy link
Contributor

there's a side-affect of this. Elasticsearch does not like a mix - match of
fluentbit.io/parser: foo
fluentbit.io/parser.bob: foo

It ends up wanting, in the 2nd case, to parse that path.

curl -s 'elasticsearch.logging:9200/logstash-2018.09.24/_mapping/flb_type/field/kubernetes.annotations.fluent*' | python3 -m json.tool
{
    "logstash-2018.09.24": {
        "mappings": {
            "flb_type": {
                "kubernetes.annotations.fluentbit.io/parser.keyword": {
                    "full_name": "kubernetes.annotations.fluentbit.io/parser.keyword",
                    "mapping": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "kubernetes.annotations.fluentbit.io/parser": {
                    "full_name": "kubernetes.annotations.fluentbit.io/parser",
                    "mapping": {
                        "io/parser": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

@edsiper
Copy link
Member

edsiper commented Sep 25, 2018 via email

@donbowman
Copy link
Contributor

its absolutely an issue. elastic stops working. it confuses it that a field is sometimes a scalar and sometimes an object.

you see the mapping: io/parser above?

So i think we cannot use a 'dot' as a delimiter in this way.

this is a more general issue, e.g. if anyone (other than us) has a field which is somtimes A, sometimes A.B, it will lock up. fluentbit keeps trying to log it, elastic keeps returning an error.

{"took":604,"errors":true,"items":[{"index":{"_index":"logstash-2018.09.25","_type":"flb_type","_id":"CbqPDmYBf8jg_1jQcB1A","status":400,"error":{"type":"mapper_parsing_exception","reason":"object mapping for [kubernetes.annotations.fluentbit.io/parser] tried to parse field [fluentbit.io/parser] as object, but found a concrete value"}}},{"index":{"_index":"logstash-2018.09.25","_type":"flb_type","_id":"CrqPDmYBf8jg_1jQcB1A","status":400,"error":{"type":"mapper_parsing_exception","reason":"object mapping for [kubernetes.annotations.fluentbit.io/parser] tried to parse field [fluentbit.io/parser] as object, but found a concrete value"}}},{"index":{"_index":"logstash-2018.09.25","_type":"flb_type","_id":"C7qPDmYBf8jg_1jQcB1A","status":400,"error":{"type":"mapper_parsing_exception","reason":"object mapping for [kubernetes.annotations.fluentbit.io/parser] tried to parse field [fluentbit.io/parser] as object, but found a concrete value"}}},{"index

@donbowman
Copy link
Contributor

Apologies for the image, but i'm lazy :)

this shows the issue, the fact the annotation is sometimes a scalar, sometimes an object.

image

@donbowman
Copy link
Contributor

A work-around is:

    [FILTER]
        Name record_modifier
        Match *
        Remove_key kubernetes.annotations.fluentbit.io/parser

so, what is the suggested solution?

  • Disallow the 'legacy' syntax of fluentbit.io/parser
  • Don't use '.' as a delimiter
  • remove these fields so we don't log them to elastic, hope it doesn't happen on others?

One solution is:

fluentbit.io/stdout-parser[-containername]
fluentbit.io/stderr-parser[-containername]
fluentbit.io/parser: name

Other suggestions?

@donbowman
Copy link
Contributor

why? the current method works for all container names without restriction.
I think the _ would as well.

Kubernetes has a bit of an opinion on this internal to it. Its log-file format:

shipping-notify-64db5dd76b-r9mb8_sock-shop_istio-init-d22b9937e84aca64ab211756e2f7e86c1150ebfb13f1f30a4cfe77abb91e1473.log

is effectively
-.log

since we don't need namespace or pod (its implicit in the meta data already), we can use the pattern of _ if you want to be consistent. its disallowed as a first-char in a container name.

@edsiper
Copy link
Member

edsiper commented Oct 3, 2018

do you mean like this ?:

  • fluentbit.io/parser_stderr[-container]
  • fluentbit.io/parser_stdout[-container]
  • fluentbit.io/parser[-container]
  • fluentbit.io/parser

@donbowman
Copy link
Contributor

or all _ should work too.

i don't think the 2nd last can be done. its again ambiguous. i think just:
fluentbit.io/parser_stderr[_container]
fluentbit.io/parser_stdout[_container]
fluentbit.io/parser

if you want the 2nd last, i've currently got it set so that if stderr parser is not set it uses stdout, but we could create an 'all' stream:

fluentbit.io/parser_all_container.

i'll ask again, the current implementation, that's not ok? its a lot of semantic discussion over the order of where 'stdout' goes :)

@donbowman
Copy link
Contributor

ok, from slack, proposal (final unless someone complains):

  • fluentbit.io/parser: X
  • fluentbit.io/parser_stdout[-container]: X
  • fluentbit.io/parser_stderr[-container: X
  • fluentbit.io/parser-container: X

the first case is the current case, so no change.
2nd/3rd case the tag fluent-bit is picking up is "parser_std??", followed by optional container name.
Last case is 'both streams' case

or, simplified:

  • fluentbit.io/parser[-container]: X
  • fluentbit.io/parser_stdout[-container]: X
  • fluentbit.io/parser_stderr[-container: X

There is no ambiguity since a container cannot start w/ _ nor -
and since we are using parser[_stream], not parser[-stream]

comments please? I would like to finish this today.

@yann-soubeyrand
Copy link
Contributor Author

Sorry to be insistent, but to me, it seems more logical to order annotation parts by specificity and to use _ everywhere to be consistent:

  • fluentbit.io/parser__
  • fluentbit.io/parser_
  • fluentbit.io/parser_

even if it forbids having a container named stderr or stdout (wouldn't it be weird anyway?).

@donbowman
Copy link
Contributor

this would mean a container name starting w/ stderr/stdout would be treated incorrectly.

@yann-soubeyrand
Copy link
Contributor Author

No, just two container names would be problematic: stdout and stderr. _ is forbidden in container names (to my knowledge). If a container name could contain _, it would be problematic for the log filename nomenclature ;-)

@donbowman
Copy link
Contributor

so why break this?
the argument is that fluent-bit has 3 keys:

  • parser_stdout
  • parser_stderr
  • parser

and then the value is separated by a -

why should we make a restriction on container names that is trivially worked around by using the -?

if that's the case, we can use what i currently have implemented, the stdout-parser, stderr-parser instead.

@edsiper
Copy link
Member

edsiper commented Oct 3, 2018

@yann-soubeyrand

the thinking about the nomenclature is:

Fluent Bit annotation will have 4 components:

  • prefix
  • config key
  • config key metadata (optional, container name for the issue in question)
  • config value

so config key metadata follows after a -, e.g: fluentbit.io/parser_stderr-mymetadata

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Oct 4, 2018

I surveyed what the rest of the kubernetes ecosystem does, and I can find no examples mixing both underscores and dashes, and my instinct is to immediately try to correct it. Just look at this odd duck:

  • fluentbit.io/parser_stderr_my-container-name

The contribution by @donbowman is a genuine improvement as is:

  • fluentbit.io/stderr-parser[-container]
  • fluentbit.io/stdout-parser[-container]
  • fluentbit.io/parser[-container]

The only prominent alternatives in the kubernetes ecosystem are Istio and Prometheus.

  • Istio make the value be a json field and match within it.

  • Prometheus initially supported pod annotations (e.g. prometheus.io/scrape, prometheus.io/path, prometheus.io/port) but have deprecated it in favor of ServiceMonitor Custom Resource Definitions (CRDs) for reasons described here:

The annotation is not supported by the operator. ServiceMonitors should be
used instead. The annotation is very limited by its nature and there's no
feasible way to support anything beyond "scrape on" and a single port. A
per-service scrape interval, multi-port pods, authentication, honorLabels,
and more are all not possible. It also provides no meaningful way to assign
services to be scraped by different Prometheus instances.

ServiceMonitors allow all these without exposing any complex relabelling
and there's really no practical reason for sticking with the annotation.

I strongly suggest using the @donbowman solution as is, and if the descriptive power of annotations is insufficient in the future, then follow the example of either Prometheus or Istio.

@yann-soubeyrand
Copy link
Contributor Author

@yann-soubeyrand

the thinking about the nomenclature is:

Fluent Bit annotation will have 4 components:

* prefix

* config key

* config key metadata (optional, container name for the issue in question)

* config value

so config key metadata follows after a -, e.g: fluentbit.io/parser_stderr-mymetadata

OK, I get the point with config key and config key metadata ;-)

@edsiper
Copy link
Member

edsiper commented Oct 4, 2018

@StevenACoffman thanks for the comments.

My only concern with stdout-parser and stderr-parser is that they do not start with parser, which I think makes easier to understand the annotation context (I am open to restrict containers starting with stdout and stderr)

@StevenACoffman
Copy link
Contributor

BTW, I looked more at how Amabassador with Istio uses annotations with json values and it looks like that would require a lot of careful thought to get it right if fluent-bit went down that path.

If each pod is allowed to opt-in to self-service configuration of parser selection for it's logs via an annotation value of json something like:

  annotations:
    fluentbit.io/parser_config: |
      ---
      container:
        apache:
          streams:
            stderr: apache_error
            stdout: apache2
      container:
         my-app:
           streams:
             stdout: json
             stderr: universal

This would potentially preclude containers named "container" or "streams" or "stdout" or "stderr".

@edsiper
Copy link
Member

edsiper commented Oct 4, 2018

if that means that potentially we need to add a Yaml parser I would prefer to avoid such situation.

@donbowman
Copy link
Contributor

i believe it would come in as json (given we are using the json API).

That being said, can we get some consensus here? I'm strongly opposed to any solution that doesn't work with 100% of container names (given the current one does and it doesn't seem to be compelling why we should have some restriction).

The current implementation... Whats wrong with it other than some don't like the stream on the left side of the word parser? Its all 'dash' based, so that objection is gone, it works with all container names.

Failing that, if I do the complex annotation like istio uses, would that be ok?

I really want to get this finished, i've spent far more time on it than I would have liked.

@edsiper
Copy link
Member

edsiper commented Oct 4, 2018

@donbowman

If this option do not restrict any container I am happy with it:

  • fluentbit.io/parser: X
  • fluentbit.io/parser[-container]: X
  • fluentbit.io/parser_stdout[-container]: X
  • fluentbit.io/parser_stderr[-container: X

disclaimer: I care a lot about readability and context (specially about configs), having the config name on the left is a must for me.

donbowman added a commit to donbowman/fluent-bit that referenced this issue Oct 5, 2018
…container]

Discussion occured in [777](fluent#777 (comment))
This allows the user to select either a global behaviour for the pod,
or a per stream behaviour, or a per-stream + per-container behaviour.

Signed-off-by: Don Bowman <[email protected]>
@donbowman
Copy link
Contributor

PR is updated w/ the commit above

@edsiper
Copy link
Member

edsiper commented Oct 5, 2018

thanks @donbowman for the hard work on this and community for their continuous feedback.

@donbowman so PR #789 is updated ?

@donbowman
Copy link
Contributor

yes PR #789 789 is updated

edsiper pushed a commit that referenced this issue Oct 5, 2018
…netes. (#789)

* Add container-specific annotation parser for Kubernetes.

This addresses #777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser[.container][.stream]`).

Stream should be one of stdout|stderr. So example annotations could be:

```
fluentbit.io/parser.istio-init.stdout": "init_stdout"
"fluentbit.io/parser.session-db": "redis"
```

Signed-off-by: Don Bowman <[email protected]>

* Change Kubernetes parser annotation to fluentbit.io/parser[_stream][-container]

Discussion occured in [777](#777 (comment))
This allows the user to select either a global behaviour for the pod,
or a per stream behaviour, or a per-stream + per-container behaviour.

Signed-off-by: Don Bowman <[email protected]>
@edsiper
Copy link
Member

edsiper commented Oct 5, 2018

got it. #789 merged, thanks!

@edsiper edsiper closed this as completed Oct 5, 2018
yann-soubeyrand pushed a commit to yann-soubeyrand/fluent-bit that referenced this issue Oct 5, 2018
…netes. (fluent#789)

* Add container-specific annotation parser for Kubernetes.

This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser[.container][.stream]`).

Stream should be one of stdout|stderr. So example annotations could be:

```
fluentbit.io/parser.istio-init.stdout": "init_stdout"
"fluentbit.io/parser.session-db": "redis"
```

Signed-off-by: Don Bowman <[email protected]>

* Change Kubernetes parser annotation to fluentbit.io/parser[_stream][-container]

Discussion occured in [777](fluent#777 (comment))
This allows the user to select either a global behaviour for the pod,
or a per stream behaviour, or a per-stream + per-container behaviour.

Signed-off-by: Don Bowman <[email protected]>
@yann-soubeyrand
Copy link
Contributor Author

@donbowman Thanks for your work!

@edsiper Could you please reopen this issue since only the parser annotation has been addressed and the issue was about the exclude annotation as well? Or do you prefer to track it in a new issue?

@edsiper
Copy link
Member

edsiper commented Oct 5, 2018

@yann-soubeyrand new issue please.

yann-soubeyrand pushed a commit to yann-soubeyrand/fluent-bit that referenced this issue Oct 29, 2018
…netes. (fluent#789)

* Add container-specific annotation parser for Kubernetes.

This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser[.container][.stream]`).

Stream should be one of stdout|stderr. So example annotations could be:

```
fluentbit.io/parser.istio-init.stdout": "init_stdout"
"fluentbit.io/parser.session-db": "redis"
```

Signed-off-by: Don Bowman <[email protected]>

* Change Kubernetes parser annotation to fluentbit.io/parser[_stream][-container]

Discussion occured in [777](fluent#777 (comment))
This allows the user to select either a global behaviour for the pod,
or a per stream behaviour, or a per-stream + per-container behaviour.

Signed-off-by: Don Bowman <[email protected]>
yann-soubeyrand pushed a commit to yann-soubeyrand/fluent-bit that referenced this issue Nov 26, 2018
…netes. (fluent#789)

* Add container-specific annotation parser for Kubernetes.

This addresses fluent#777.

Annotations on a Pod currently include `fluentbit.io/parser`. This
adds an optional -container (e.g. `fluentbit.io/parser[.container][.stream]`).

Stream should be one of stdout|stderr. So example annotations could be:

```
fluentbit.io/parser.istio-init.stdout": "init_stdout"
"fluentbit.io/parser.session-db": "redis"
```

Signed-off-by: Don Bowman <[email protected]>

* Change Kubernetes parser annotation to fluentbit.io/parser[_stream][-container]

Discussion occured in [777](fluent#777 (comment))
This allows the user to select either a global behaviour for the pod,
or a per stream behaviour, or a per-stream + per-container behaviour.

Signed-off-by: Don Bowman <[email protected]>
rawahars pushed a commit to rawahars/fluent-bit that referenced this issue Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants