On Using Fluentd to Parse Elastic Common Scheme from Kubernetes Pod Logs
Today I had to parse the JSON logs of our containers in Kubernetes. Our application are logging in the
Elastic Common Scheme format to STDOUT.
This format is a JSON object with well-defined fields per log line.
We use a fluentd daemonset to read the container logs from the nodes. The outputs of STDOUT and STDERR are saved in
/var/log/containers
on the nodes by the docker daemon. This setup is very common in the Kubernetes environment. The official fluend-elasticsearch addon uses the same approach.
In this article I will show the problems I encountered while parsing these logs.
The fluentd input to read the container logs is:
|
|
As you can see in the above snippet the input parses the container logs as JSON via the fluentd JSON parser.
The format of the container log files is the docker default json-file
(details).
If the container prints the string this is a log line\n
to STDOUT, the docker daemon saves the following line to the container log file:
|
|
After passing the fluentd input, the fluentd event has exact three fields. log
, stream
and the timestamp.
Since our container are logging JSON to STDOUT, the lines in the container logs looks like the following:
|
|
This is where the problems start. We told the input to parse every line has JSON. The resulting fluentd event looks like this:
|
|
The fluentd JSON parser can not parse the output of our application as JSON since the docker daemon escaped the JSON in the log string. To solve this I used a filter with a JSON parser:
|
|
After our fluentd event passed this filter it looks like this:
|
|
The parser removes the original log line from the event since we set the option remove_key_name_field
. We also tell the
JSON parser to keep all data, and the already parsed timestamp via the options reserve_time
and reserve_data
.
Some applications in the cluster are not logging in JSON format. Therefore, we set the option emit_invalid_record_to_error
to false
.
Without that setting, every non-JSON log line get the @ERROR
label and require special treatment by other filters.
The ECS logging scheme does not have a field parsed
. Moreover, the field message
is at the top-level of the object.
To mutate records, fluentd has the record_transformer
filter. This filter can set event field values to values of other fields,
but can’t copy the content of the field parsed
to the top-level of the event. This was the hardest problem I encountered.
After some googling I found the filter record_modifier.
I applied the trick shown in the record_modifier
README to implement complex logic in ruby.
|
|
We set the field _dummy_
to the result of the ruby expression ${record.merge!(record["parsed"])}
. The content of our fluentd event
is the hash record
. The function merge!
merges the event in place with the values of the field record["parsed]
. The result
of the merge is irrelevant. Therefore we drop the field _dummy_
after applying the filter. The filter deletes also the merged
field parsed
.
After applying that filter our event looks like this:
|
|
This event complies with the Elastic Common Scheme. The final fluentd configuration is:
|
|