In February, before the lockdown, I presented my Fast logs talk at ConFoo. At the end, I had an interesting question I had to cut short because of the timing. This blog post aims to describe the relevant points of the talk, the question, as well as some possible answer.
One improvement to have faster logs
In this talk, I highlight different ways to make logs faster. Nowadays, most logs are aggregated into a single place for later usage. For example, a widespread architecture is based on the Elastic stack:
While this design works, it has a lot of components. In particular, Logstash is here to parse the original log lines in JSON format, so that Elasticsearch can index each field separately.
2020-02-29 13:56:54.906 INFO 1 --- [ main] c.f.f.Main : Informative message
{
"date": [[ "20-02-29" ]],
"MONTHDAY": [[ "29" ]],
"MONTHNUM": [[ "02" ]],
"YEAR": [[ "20" ]],
"time": [[ "13:56:54.906" ]],
"HOUR": [[ "13" ]],
"MINUTE": [[ "56" ]],
"SECOND": [[ "54.906" ]],
"level": [[ "INFO" ]],
"threadName": [[ "main" ]],
"class": [[ "c.f.f.Main" ]],
"message":
[[ "Informative message" ]]
}
One improvement is to configure the logging framework with a pattern so that it directly writes JSON-formatted one-liner logs. The following shows how to do that with Logback:
<configuration>
<appender name="FILE" class="ch.qos.logback.core.FileAppender">
<file>/var/log/sample.log</file>
<encoder>
<pattern>{ "date": "%d{dd-MM-yy}", "time": "%d{HH:mm:ss.SSS}", "thread": "%thread", "level": "%-5level", "className": "%logger{36}", "message": "%msg" }%n</pattern>
</encoder>
</appender>
<root level="debug">
<appender-ref ref="FILE" />
</root>
</configuration>
The output is now the following:
{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "c.f.f.Main", "message": "Informative message" }
That allows to send logs directly to Elasticsearch, and thus to simplify the architecture by completely removing Logstash:
Adding additional meta-data
Now, suppose the business requires additional fields in addition to the log message for analytics purpose. With simple string messages, this is as easy as getting a reference to the value, and adding it to the field:
logger.info("Client-Id -> {} | Informative message", client.getId());
When executed under the correct log level, this will output the desired message:
2020-02-29 13:56:54.906 INFO 1 --- [ main] ch.frankel.fastlogs.Main : Client-Id -> XYZ | Informative message
With Logstash, it’s possible to improve the groking pattern so that it successfully parses the Client-Id value and sends it as a dedicated field to Elasticsearch. Unfortunately, the JSON-formatted log option shown above doesn’t allow this:
{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "ch.frankel.fastlogs.Main", "message": "Client-Id -> XYZ | Informative message" }
Notice the Client-Id is part of the message, along with separators. It’s possible to send this to Elasticsearch and parse the message with custom ingesters. Yet, this will take additional processing time on each message indexed. Is it back to square one and Logstash?
Mapped Diagnostics Context to the rescue
SLF4J - and similar frameworks - allow to add additional meta-data through the usage of MDC. In short, a MDC is a hashmap context holder bound to a specific thread.
MDC can be leveraged to solve the issue. Here’s a sample code snippet:
MDC.put("client-id", client.getId());
logger.info("Informative message");
MDC.remove("client-id");
Note that we explicitly set it before, and remove it after, the logging statement. Depending on the context, it would be a benefit to use servlet filters to do that automatically in order to avoid mistakes. For example, Logback provides an out-of-the-box MDC filter to set request-related data.
Let’s slightly update the pattern to use MDC:
<pattern>{ "date": "%d{dd-MM-yy}", "time": "%d{HH:mm:ss.SSS}", "thread": "%thread", "level": "%-5level", "className": "%logger{36}", "message": "%msg", "Client-Id": "%X{client-id}" }%n</pattern>
Notice the %X{key}
syntax. With that, we can keep our simple Logstash-free architecture, and still get the desired JSON output:
{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "ch.frankel.fastlogs.Main", "message": "Informative message", "Client-Id": "XYZ" }
Conclusion
Logs can be made faster by removing the need to parse them. To achieve that, one can directly output single-line JSON messages. When additional meta-data is required, just use a MDC and the logging pattern appropriate to the logging implementation.
To go further: