Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record_transformer remove_keys not working as expected #2109

Closed
aleks-mariusz opened this issue Aug 22, 2018 · 7 comments
Closed

record_transformer remove_keys not working as expected #2109

aleks-mariusz opened this issue Aug 22, 2018 · 7 comments

Comments

@aleks-mariusz
Copy link

aleks-mariusz commented Aug 22, 2018

I have a pipeline set up to get kubernetes based log messages into elasticsearch using fluentd (which originate from docker containers using the systemd logging driver, most examples online have the separate json files in /var/log/containers which docker seems to have moved away from recently).

The messages arrive in elasticsearch fine, but along the path the log takes, i am trying to get fluentd to delete some keys that are excessive and unnecessary. After using field_map in the systemd_entry block, I am using the record_transformer's remove_keys option inside a block, however certain keys do not get deleted and i'm wondering if this is a bug or am i just using this functionality incorrectly.

fluentd version: 1.2.4 running inside docker (1.13.1) container deployed by kubernetes (1.11.2):

the Gemfile used to install Fluentd

source 'https://rubygems.org'

gem 'fluentd', '<=1.2.4'
gem 'activesupport', '~>5.2.1'
gem 'fluent-plugin-kubernetes_metadata_filter', '~>2.1.2'
gem 'fluent-plugin-elasticsearch', '~>2.11.5'
gem 'fluent-plugin-systemd', '~>1.0.1'
gem 'fluent-plugin-detect-exceptions', '~>0.0.11'
gem 'fluent-plugin-prometheus', '~>1.0.1'
gem 'fluent-plugin-multi-format-parser', '~>1.0.0'
gem 'fluent-plugin-rewrite-tag-filter', '~>2.1.0'
gem 'oj', '~>3.6.5'

the Dockerfile used to create the container:

FROM debian:stretch-slim

ARG DEBIAN_FRONTEND=noninteractive

COPY clean-apt /usr/bin
COPY clean-install /usr/bin
COPY Gemfile /Gemfile

# 1. Install & configure dependencies.
# 2. Install fluentd via ruby.
# 3. Remove build dependencies.
# 4. Cleanup leftover caches & files.
RUN BUILD_DEPS="make gcc g++ libc6-dev ruby-dev libffi-dev" \
    && clean-install $BUILD_DEPS \
                     ca-certificates \
                     libjemalloc1 \
                     ruby \
    && echo 'gem: --no-document' >> /etc/gemrc \
    && gem install --file Gemfile \
    && apt-get purge -y --auto-remove \
                     -o APT::AutoRemove::RecommendsImportant=false \
                     $BUILD_DEPS \
    && clean-apt \
    # Ensure fluent has enough file descriptors
    && ulimit -n 65536

# Copy the Fluentd configuration file for logging Docker container logs.
COPY fluent.conf /etc/fluent/fluent.conf
COPY run.sh /run.sh

# Expose prometheus metrics.
EXPOSE 80

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1

# Start Fluentd to pick up our config that watches Docker container logs.
CMD /run.sh $FLUENTD_ARGS

The configuration question:

<source>
  @id journald
  @type systemd
  <storage>
    persistent true
    path /var/log/journald.pos
  </storage>
  tag journal
</source>

# rename some fields to be more obvious
<filter journal>
  @type systemd_entry
  field_map {"_PID": "system.pid", "_CMDLINE": "system.argv", "_COMM": "system.process-name", "_EXE": "system.exe", "_GID": "system.gid", "_UID": "system.uid", "_PRIORITY": "system.priority", "PRIORITY": "system.priority", "_HOSTNAME": "system.name"}
  field_map_strict false
  fields_strip_underscores true
</filter>

# get rid of some of useless keys
<filter journal>
  @type record_transformer
  remove_keys BOOT_ID,CAP_EFFECTIVE,MACHINE_ID,SOURCE_REALTIME_TIMESTAMP,STREAM_ID,SYSTEMD_CGROUP,TRANSPORT,SOURCE_MONOTONIC_TIMESTAMP
</filter>

#Tag Kubernetes containers
<match journal>
  @type rewrite_tag_filter
  <rule>
    key SYSLOG_IDENTIFIER
    pattern /^(.*)/
    tag systemd.$1
  </rule>
  <rule>
    key CONTAINER_NAME
    pattern /^k8s_/
    tag kubernetes.journal.container
  </rule>
</match>

#Tell kubernetes_metadata that the logs are coming from journal
<filter kubernetes.journal.container>
  @type kubernetes_metadata
  use_journal true
</filter>

# rewrite_tag_filter does not support nested fields like
# kubernetes.container_name, so this exists to flatten the fields
# so we can use them in our rewrite_tag_filter
<filter kubernetes.journal.container>
  @type record_transformer
  enable_ruby true
  <record>
    kubernetes_namespace_container_name ${record["kubernetes"]["namespace_name"]}.${record["kubernetes"]["container_name"]}
  </record>
</filter>

# retag based on the namespace and container name of the log message
<match kubernetes.journal.container>
  @type rewrite_tag_filter
  # Update the tag have a structure of kube.<namespace>.<containername>
  <rule>
    key kubernetes_namespace_container_name
    pattern /^(.+)$/
    tag kube.$1
  </rule>
</match>

<filter kube.**>
  @type record_transformer
  enable_ruby true
  remove_keys kubernetes_namespace_container_name,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,CONTAINER_TAG,$["system.argv"],$["system.exe"],$["system.process-name"],$["kubernetes"]["container_image_id"],$["kubernetes"]["master_url"],$["kubernetes"]["namespace_id"],$["kubernetes"]["pod_id"]
</filter>

<match **>
  @id elasticsearch
  @type elasticsearch
  @log_level info
  include_tag_key true
  type_name fluentd
  host "#{ENV['OUTPUT_HOST']}"
  port "#{ENV['OUTPUT_PORT']}"
  logstash_format true
  <buffer>
    @type file
    path /var/log/fluentd-buffers/kubernetes.system.buffer
    flush_mode interval
    retry_type exponential_backoff
    flush_thread_count 2
    flush_interval 5s
    retry_forever
    retry_max_interval 30
    chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
    queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
    overflow_action block
  </buffer>
</match>

And finally, what the entry looks like arriving in elasticsearch:

{
  "_index": "logstash-2018.08.22",
  "_type": "fluentd",
  "_id": "v84vYWUB_6s2akbsIdM-",
  "_version": 1,
  "_score": null,
  "_source": {
    "system.pid": "620",
    "system.argv": "/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver=journald --signature-verification=false --storage-driver overlay2",
    "system.process-name": "dockerd-current",
    "system.exe": "/usr/bin/dockerd-current",
    "system.gid": "0",
    "system.uid": "0",
    "system.priority": "6",
    "system.name": "k8lab2",
    "MESSAGE": "k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:130: watch of *v1beta1.Event ended with: The resourceVersion for the provided watch is too old.",
    "docker": {
      "container_id": "64d860cc1f48ea1e1e4a2e42b9095b428d9ec1501d1ea04a8e8352c19b79174f"
    },
    "kubernetes": {
      "container_name": "kube-controller-manager",
      "namespace_name": "kube-system",
      "pod_name": "kube-controller-manager-k8lab2",
      "container_image": "k8s.gcr.io/kube-controller-manager-amd64:v1.11.2",
      "labels": {
        "component": "kube-controller-manager",
        "tier": "control-plane"
      },
      "host": "k8lab2"
    },
    "log.level": "W",
    "app.k8s.timestamp": "0822 10:29:57.341593",
    "app.k8s.pid": "1",
    "app.k8s.source": "reflector.go:341",
    "syslog.severity": 4,
    "@timestamp": "2018-08-22T10:29:57.410725103+00:00",
    "tag": "kube.kube-system.kube-controller-manager"
  },
  "fields": {
    "@timestamp": [
      "2018-08-22T10:29:57.410Z"
    ]
  },
  "sort": [
    1534933797410
  ]
}

Basically, my goal is to get rid of the system.argv and system.process-name keys, this is attempt in the next-to-last block (right before the <match **> above).

@cosmo0920
Copy link
Contributor

You can use $.system.argv and $.system.process-name or $["system"]["argv"] and $["system.exe"] instead of $["system"]["argv"] and $["system.exe"].
You shouldn't use . in nested record accessor specifier.

@grpubr
Copy link

grpubr commented Nov 27, 2018

I also meet the bug ,to simplify the step to reproduce
my Gemfile

source 'https://rubygems.org'

gem 'fluentd', '<=1.2.4'
gem 'activesupport', '~>5.2.1'
gem 'fluent-plugin-concat', '~>2.3.0'
gem 'fluent-plugin-detect-exceptions', '~>0.0.11'
gem 'fluent-plugin-elasticsearch', '~>2.11.5'
gem 'fluent-plugin-kubernetes_metadata_filter', '~>2.0.0'
gem 'fluent-plugin-multi-format-parser', '~>1.0.0'
gem 'fluent-plugin-prometheus', '~>1.0.1'
gem 'fluent-plugin-systemd', '~>1.0.1'
gem 'oj', '~>3.6.5'

the second filter try to remove the key 'foo.bar'

<filter kubernetes.**>
  @type record_transformer
  <record>
    foo.bar test1234
  </record>
</filter>
<filter kubernetes.**>
  @type record_transformer
  enable_ruby true
  #remove_keys $['foo.bar']  #not working
  remove_keys foo.bar #working 
</filter>

according to record_accessor syntax , they should be same

Syntax

  • dot notation: $. started parameter. Chain fields by .

This is simple syntax. For example, $.event.level for record["event"]["level"], $.key1[0].key2 for record["key1"][0]["key2"]

  • bracket notation: $[ started parameter. Chain fields by []

Useful for special characters, ., and etc: $['dot.key'][0]['space key'] for record["dot.key"][0]["space key"]

@cosmo0920
Copy link
Contributor

cosmo0920 commented Nov 27, 2018

@grpubr How about using remove_keys $['foo']['bar']?
Is current syntax complicated for you?

@grpubr
Copy link

grpubr commented Nov 27, 2018

@cosmo0920

remove_keys $['foo']['bar'] means that the nested key should be removed. it works as design eg.

{
 "foo"{
     "bar": "test1234"
  }
}

however, we are trying to remove key containing dot, which does not work as expected eg.

{
"foo.bar": "test1234"
}

they are not same.

@cosmo0920
Copy link
Contributor

I asked @okkez to investigate thatremove_keys "$['foo.bar']" notation is not working.

okkez added a commit to okkez/fluentd that referenced this issue Nov 27, 2018
In previous version, following configuration does not work properly:

```
<source>
  @type dummy
  dummy [
    {"foo.bar": "test1234", "message": "Hello"}
  ]
  tag dummy
</source>

<filter dummy>
  @type record_transformer
  remove_keys "$['foo.bar']"
</filter>

<match dummy>
  @type stdout
</match>
```

This shows like following:

```
2018-11-27 15:19:18 +0900 [info]: #0 fluentd worker is now running worker=0
2018-11-27 15:19:19.008586045 +0900 dummy: {"foo.bar":"test1234","message":"Hello"}
2018-11-27 15:19:20.009721132 +0900 dummy: {"foo.bar":"test1234","message":"Hello"}
2018-11-27 15:19:21.010784035 +0900 dummy: {"foo.bar":"test1234","message":"Hello"}
```

In this version, it works well.

See also fluent#2109

Signed-off-by: Kenji Okimoto <okimoto@clear-code.com>
@okkez
Copy link
Contributor

okkez commented Nov 27, 2018

@grpubr @aleks-mariusz Try #2192 please.

@okkez
Copy link
Contributor

okkez commented Nov 29, 2018

Released v1.3.1.

@okkez okkez closed this as completed Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants