record_transformer remove_keys not working as expected #2109

aleks-mariusz · 2018-08-22T11:00:54Z

I have a pipeline set up to get kubernetes based log messages into elasticsearch using fluentd (which originate from docker containers using the systemd logging driver, most examples online have the separate json files in /var/log/containers which docker seems to have moved away from recently).

The messages arrive in elasticsearch fine, but along the path the log takes, i am trying to get fluentd to delete some keys that are excessive and unnecessary. After using field_map in the systemd_entry block, I am using the record_transformer's remove_keys option inside a block, however certain keys do not get deleted and i'm wondering if this is a bug or am i just using this functionality incorrectly.

fluentd version: 1.2.4 running inside docker (1.13.1) container deployed by kubernetes (1.11.2):

the Gemfile used to install Fluentd

source 'https://rubygems.org'

gem 'fluentd', '<=1.2.4'
gem 'activesupport', '~>5.2.1'
gem 'fluent-plugin-kubernetes_metadata_filter', '~>2.1.2'
gem 'fluent-plugin-elasticsearch', '~>2.11.5'
gem 'fluent-plugin-systemd', '~>1.0.1'
gem 'fluent-plugin-detect-exceptions', '~>0.0.11'
gem 'fluent-plugin-prometheus', '~>1.0.1'
gem 'fluent-plugin-multi-format-parser', '~>1.0.0'
gem 'fluent-plugin-rewrite-tag-filter', '~>2.1.0'
gem 'oj', '~>3.6.5'

the Dockerfile used to create the container:

FROM debian:stretch-slim

ARG DEBIAN_FRONTEND=noninteractive

COPY clean-apt /usr/bin
COPY clean-install /usr/bin
COPY Gemfile /Gemfile

# 1. Install & configure dependencies.
# 2. Install fluentd via ruby.
# 3. Remove build dependencies.
# 4. Cleanup leftover caches & files.
RUN BUILD_DEPS="make gcc g++ libc6-dev ruby-dev libffi-dev" \
    && clean-install $BUILD_DEPS \
                     ca-certificates \
                     libjemalloc1 \
                     ruby \
    && echo 'gem: --no-document' >> /etc/gemrc \
    && gem install --file Gemfile \
    && apt-get purge -y --auto-remove \
                     -o APT::AutoRemove::RecommendsImportant=false \
                     $BUILD_DEPS \
    && clean-apt \
    # Ensure fluent has enough file descriptors
    && ulimit -n 65536

# Copy the Fluentd configuration file for logging Docker container logs.
COPY fluent.conf /etc/fluent/fluent.conf
COPY run.sh /run.sh

# Expose prometheus metrics.
EXPOSE 80

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1

# Start Fluentd to pick up our config that watches Docker container logs.
CMD /run.sh $FLUENTD_ARGS

The configuration question:

<source>
  @id journald
  @type systemd
  <storage>
    persistent true
    path /var/log/journald.pos
  </storage>
  tag journal
</source>

# rename some fields to be more obvious
<filter journal>
  @type systemd_entry
  field_map {"_PID": "system.pid", "_CMDLINE": "system.argv", "_COMM": "system.process-name", "_EXE": "system.exe", "_GID": "system.gid", "_UID": "system.uid", "_PRIORITY": "system.priority", "PRIORITY": "system.priority", "_HOSTNAME": "system.name"}
  field_map_strict false
  fields_strip_underscores true
</filter>

# get rid of some of useless keys
<filter journal>
  @type record_transformer
  remove_keys BOOT_ID,CAP_EFFECTIVE,MACHINE_ID,SOURCE_REALTIME_TIMESTAMP,STREAM_ID,SYSTEMD_CGROUP,TRANSPORT,SOURCE_MONOTONIC_TIMESTAMP
</filter>

#Tag Kubernetes containers
<match journal>
  @type rewrite_tag_filter
  <rule>
    key SYSLOG_IDENTIFIER
    pattern /^(.*)/
    tag systemd.$1
  </rule>
  <rule>
    key CONTAINER_NAME
    pattern /^k8s_/
    tag kubernetes.journal.container
  </rule>
</match>

#Tell kubernetes_metadata that the logs are coming from journal
<filter kubernetes.journal.container>
  @type kubernetes_metadata
  use_journal true
</filter>

# rewrite_tag_filter does not support nested fields like
# kubernetes.container_name, so this exists to flatten the fields
# so we can use them in our rewrite_tag_filter
<filter kubernetes.journal.container>
  @type record_transformer
  enable_ruby true
  <record>
    kubernetes_namespace_container_name ${record["kubernetes"]["namespace_name"]}.${record["kubernetes"]["container_name"]}
  </record>
</filter>

# retag based on the namespace and container name of the log message
<match kubernetes.journal.container>
  @type rewrite_tag_filter
  # Update the tag have a structure of kube.<namespace>.<containername>
  <rule>
    key kubernetes_namespace_container_name
    pattern /^(.+)$/
    tag kube.$1
  </rule>
</match>

<filter kube.**>
  @type record_transformer
  enable_ruby true
  remove_keys kubernetes_namespace_container_name,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,CONTAINER_TAG,$["system.argv"],$["system.exe"],$["system.process-name"],$["kubernetes"]["container_image_id"],$["kubernetes"]["master_url"],$["kubernetes"]["namespace_id"],$["kubernetes"]["pod_id"]
</filter>

<match **>
  @id elasticsearch
  @type elasticsearch
  @log_level info
  include_tag_key true
  type_name fluentd
  host "#{ENV['OUTPUT_HOST']}"
  port "#{ENV['OUTPUT_PORT']}"
  logstash_format true
  <buffer>
    @type file
    path /var/log/fluentd-buffers/kubernetes.system.buffer
    flush_mode interval
    retry_type exponential_backoff
    flush_thread_count 2
    flush_interval 5s
    retry_forever
    retry_max_interval 30
    chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
    queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
    overflow_action block
  </buffer>
</match>

And finally, what the entry looks like arriving in elasticsearch:

{
  "_index": "logstash-2018.08.22",
  "_type": "fluentd",
  "_id": "v84vYWUB_6s2akbsIdM-",
  "_version": 1,
  "_score": null,
  "_source": {
    "system.pid": "620",
    "system.argv": "/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver=journald --signature-verification=false --storage-driver overlay2",
    "system.process-name": "dockerd-current",
    "system.exe": "/usr/bin/dockerd-current",
    "system.gid": "0",
    "system.uid": "0",
    "system.priority": "6",
    "system.name": "k8lab2",
    "MESSAGE": "k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:130: watch of *v1beta1.Event ended with: The resourceVersion for the provided watch is too old.",
    "docker": {
      "container_id": "64d860cc1f48ea1e1e4a2e42b9095b428d9ec1501d1ea04a8e8352c19b79174f"
    },
    "kubernetes": {
      "container_name": "kube-controller-manager",
      "namespace_name": "kube-system",
      "pod_name": "kube-controller-manager-k8lab2",
      "container_image": "k8s.gcr.io/kube-controller-manager-amd64:v1.11.2",
      "labels": {
        "component": "kube-controller-manager",
        "tier": "control-plane"
      },
      "host": "k8lab2"
    },
    "log.level": "W",
    "app.k8s.timestamp": "0822 10:29:57.341593",
    "app.k8s.pid": "1",
    "app.k8s.source": "reflector.go:341",
    "syslog.severity": 4,
    "@timestamp": "2018-08-22T10:29:57.410725103+00:00",
    "tag": "kube.kube-system.kube-controller-manager"
  },
  "fields": {
    "@timestamp": [
      "2018-08-22T10:29:57.410Z"
    ]
  },
  "sort": [
    1534933797410
  ]
}

Basically, my goal is to get rid of the system.argv and system.process-name keys, this is attempt in the next-to-last block (right before the <match **> above).

The text was updated successfully, but these errors were encountered:

cosmo0920 · 2018-09-18T08:41:07Z

You can use $.system.argv and $.system.process-name or $["system"]["argv"] and $["system.exe"] instead of $["system"]["argv"] and $["system.exe"].
You shouldn't use . in nested record accessor specifier.

grpubr · 2018-11-27T05:44:10Z

I also meet the bug ,to simplify the step to reproduce
my Gemfile

source 'https://rubygems.org'

gem 'fluentd', '<=1.2.4'
gem 'activesupport', '~>5.2.1'
gem 'fluent-plugin-concat', '~>2.3.0'
gem 'fluent-plugin-detect-exceptions', '~>0.0.11'
gem 'fluent-plugin-elasticsearch', '~>2.11.5'
gem 'fluent-plugin-kubernetes_metadata_filter', '~>2.0.0'
gem 'fluent-plugin-multi-format-parser', '~>1.0.0'
gem 'fluent-plugin-prometheus', '~>1.0.1'
gem 'fluent-plugin-systemd', '~>1.0.1'
gem 'oj', '~>3.6.5'

the second filter try to remove the key 'foo.bar'

<filter kubernetes.**>
  @type record_transformer
  <record>
    foo.bar test1234
  </record>
</filter>
<filter kubernetes.**>
  @type record_transformer
  enable_ruby true
  #remove_keys $['foo.bar']  #not working
  remove_keys foo.bar #working 
</filter>

according to record_accessor syntax , they should be same

Syntax

dot notation: $. started parameter. Chain fields by .

This is simple syntax. For example, $.event.level for record["event"]["level"], $.key1[0].key2 for record["key1"][0]["key2"]

bracket notation: $[ started parameter. Chain fields by []

Useful for special characters, ., and etc: $['dot.key'][0]['space key'] for record["dot.key"][0]["space key"]

cosmo0920 · 2018-11-27T05:58:11Z

@grpubr How about using remove_keys $['foo']['bar']?
Is current syntax complicated for you?

grpubr · 2018-11-27T06:08:44Z

@cosmo0920

remove_keys $['foo']['bar'] means that the nested key should be removed. it works as design eg.

{
 "foo"{
     "bar": "test1234"
  }
}

however, we are trying to remove key containing dot, which does not work as expected eg.

{
"foo.bar": "test1234"
}

they are not same.

cosmo0920 · 2018-11-27T06:31:28Z

I asked @okkez to investigate thatremove_keys "$['foo.bar']" notation is not working.

@type

In previous version, following configuration does not work properly: ``` <source> @type dummy dummy [ {"foo.bar": "test1234", "message": "Hello"} ] tag dummy </source> <filter dummy> @type record_transformer remove_keys "$['foo.bar']" </filter> <match dummy> @type stdout </match> ``` This shows like following: ``` 2018-11-27 15:19:18 +0900 [info]: #0 fluentd worker is now running worker=0 2018-11-27 15:19:19.008586045 +0900 dummy: {"foo.bar":"test1234","message":"Hello"} 2018-11-27 15:19:20.009721132 +0900 dummy: {"foo.bar":"test1234","message":"Hello"} 2018-11-27 15:19:21.010784035 +0900 dummy: {"foo.bar":"test1234","message":"Hello"} ``` In this version, it works well. See also fluent#2109 Signed-off-by: Kenji Okimoto <okimoto@clear-code.com>

okkez · 2018-11-27T07:03:02Z

@grpubr @aleks-mariusz Try #2192 please.

okkez · 2018-11-29T02:48:39Z

Released v1.3.1.

okkez mentioned this issue Nov 27, 2018

Delete top level using bracket style properly #2192

Merged

okkez closed this as completed Nov 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

record_transformer remove_keys not working as expected #2109

record_transformer remove_keys not working as expected #2109

aleks-mariusz commented Aug 22, 2018 •

edited

Loading

cosmo0920 commented Sep 18, 2018

grpubr commented Nov 27, 2018

cosmo0920 commented Nov 27, 2018 •

edited

Loading

grpubr commented Nov 27, 2018 •

edited

Loading

cosmo0920 commented Nov 27, 2018

okkez commented Nov 27, 2018

okkez commented Nov 29, 2018

record_transformer remove_keys not working as expected #2109

record_transformer remove_keys not working as expected #2109

Comments

aleks-mariusz commented Aug 22, 2018 • edited Loading

cosmo0920 commented Sep 18, 2018

grpubr commented Nov 27, 2018

Syntax

cosmo0920 commented Nov 27, 2018 • edited Loading

grpubr commented Nov 27, 2018 • edited Loading

cosmo0920 commented Nov 27, 2018

okkez commented Nov 27, 2018

okkez commented Nov 29, 2018

aleks-mariusz commented Aug 22, 2018 •

edited

Loading

cosmo0920 commented Nov 27, 2018 •

edited

Loading

grpubr commented Nov 27, 2018 •

edited

Loading