Recently I’ve been processing huge amounts of AIS (Automatic Identification System) data at my day job. By improving performance of the data processing, we enable prompt alerts and facilitate fast feedback loops, empowering platform users with real-time insight. Removing reflection from the hot loop resulted in a 29x performance improvement. Read on if you want to dive head first into what reflection is and why it’s not a good idea to call it thousands of times per second.

Generally speaking, the Go language is considered a performant and efficient language compared to its peers, but there are certain foot guns when it comes to using the reflection package. If you are using the package, it’s definitely worthwhile building out some simple benchmark tests to ensure you aren’t introducing a performance issue.

What is reflection anyway?

Reflection is a standard library package in go that lets you introspect the type system to dynamically figure out what the types and values of an any type are at runtime. This is the perfect solution if you want to write some code that iterates through the fields of a struct, or you want to write some code that reads the struct tag comment and does something smart with it.

In short, it’s the way to ‘meta program’ and write code that works on your own programs state at runtime. Let’s check out an example.

package main

import (
    "fmt"
    "reflect"
)

// A generic function to print struct details
func PrintStructDetails(data any) {
    val := reflect.ValueOf(data)
    typ := reflect.TypeOf(data)

    // Check if we're dealing with a struct
    if val.Kind() != reflect.Struct {
        fmt.Println("Not a struct!")
        return
    }

    fmt.Println("Type:", typ.Name())
    fmt.Println("Fields:")

    // Iterate over struct fields
    for i := 0; i < val.NumField(); i++ {
        field := typ.Field(i)
        value := val.Field(i)

        fmt.Printf("\tName: %s Type: %s Value: %v Tag: %s\n", field.Name, field.Type, value, field.Tag)
    }
}

Try it: https://play.golang.com/p/TVx-vokhiyY

In the above example, we have created a function PrintStructDetail, that will print out the names and fields of a struct programmatically. It takes an ‘any' type as the parameter, and then if it is a struct type it iterates through the fields and prints out the name and value of the field.

Why use reflection?

You probably have already used reflection in golang without realising it. For example, json encoding and decoding in the standard library is reflection based, it does something like the above to map parsed json values into a struct type based on the values of the field names.

Albeit you are unlikely to need to use reflection in your day to day coding, it can be used to make some magical things happen in Go, it’s a great way to extend the language. Advanced Go programmers should definitely add reflection to their tool belt.

Today however, we won’t be going in depth on how and why to use reflection, we’re going to show you how we can make parsing massively more performant by not using it.

Why is reflection slow?

To keep it dense and too the point, reflection is slow because of missed compiler optimization and extra memory allocation. When we call reflection on a type, it means the go runtime needs to allocate more memory to store the types, fields, names and tagblocks. All of this extra runtime stuff provides indirection in the generated assembly code. Processors are extremely efficient at doing the same operation many times in a row, because they can ‘pipeline’ multiple bits of data into the same operation. When reflection is used the pipeline optimization is not possible, furthermore – you need to do slow allocations and lookups on the heap to figure out what the type is before you can do the actual operation or work on the data

You don’t need to know exactly the theory on why this is slow to get the benefits of not using reflection though. To prove that it makes sense, just benchmark the code in practice, and use production telemetry to show how pulling out reflection can result in a performance improvement.

TLDR: if your types are massive or you are frequently invoking reflection, then you should look at pulling out the need for reflection from the code.

A deep dive on AIS parsing

AIS (Automatic Identification System) is a radio based protocol for helping ships and anything on the water avoid collision. My team is concerned with ingesting a global dataset of the stuff. Speed of ingesting the data is paramount. The first step of all our work involves decoding []bytesinto a struct so that we can perform analytics and build datasets for training machine learning models.

There are many AIS message types (27 of them), most commonly they are a bag of fields related to a ship’s position, heading, course, callsign and mmsi (a kind of unique vessel identifier).

type PositionReport struct {
    Header                    `aisWidth:"38"`
    Valid                     bool            `aisEncodeMaxLen:"168"`
    NavigationalStatus        uint8           `aisWidth:"4"`
    RateOfTurn                int16           `aisWidth:"8"`
    Sog                       Field10         `aisWidth:"10"`
    PositionAccuracy          bool            `aisWidth:"1"`
    Longitude                 FieldLatLonFine `aisWidth:"28"`
    Latitude                  FieldLatLonFine `aisWidth:"27"`
    Cog                       Field10         `aisWidth:"12"`
    TrueHeading               uint16          `aisWidth:"9"`
    Timestamp                 uint8           `aisWidth:"6"`
    SpecialManoeuvreIndicator uint8           `aisWidth:"2"`
    Spare                     uint8           `aisWidth:"3" aisEncodeAs:"0"`
    Raim                      bool            `aisWidth:"1"`
    CommunicationStateNoItdma `aisWidth:"19"`
}

https://github.com/BertoldVdb/go-ais/blob/master/messages.go#L32-L50

In the AIS parsing library, it’s represented by the above struct. The struct tag contains a aisWidth parameter that defines the number of bytes to ingest for that part of the message. A message is simply a sequence of bytes, where certain chunks of bytes represent different fields of data.

You can probably imagine how parsing works already. The parser will iterate through the fields of the struct, reading the ‘width’ number of bytes, reflecting on the type of the field, then calling the write conversion method to turn that data into something like a uint8, or a bool, or a string.

OK, so how does this all relate to the slowness of reflection? This approach works perfectly fine for lower volumes of data, but for larger datasets, you have repeated calls to reflect.TypeOf reflect.ValueOf. All this really adds up to the amount of time it takes for parsing a message.

How did I speed it up?

One day, while getting to the bottom of another issue in the codebase, I had just run the heap profiler on the decoder component. What I saw was a huge number of allocations in the reflect package. Specifically, the thing that iterates over struct fields, and figures out the type of these fields. Because of what I know about reflection being a ‘slower’ operation than usual, I had the confidence to dive in and perform some tests .

Ultimately, this led me to creating a small custom code generator that removes reflection from the parser, in favour of a more simple parser. I was able to use the existing metadata about the types provided in the struct tags, but generate a parser that didn’t call reflect at runtime at all.

Removing reflection resulted in a 29x speed improvement

Fortunately I work for an organisation that values open source, and I was able to contribute this change back to the source of the parser..

https://github.com/BertoldVdb/go-ais/tree/master/parser_generator

It’s a dense 500 lines, but it gets the job done. The code works by knowing which struct represents a message that can be parsed, and the code integer that maps to it. It was important to reuse as much of the existing parser as possible, because it is known to be ‘complete’ and correct. After running both implementations side by side over massive datasets of AIS, I was able to confirm this, and with confidence move the change into production.

Would I recommend this to others?

The way parsing works in this ais library is an interesting one. It works well as an expressive way to show how the AIS format works. However, I think it’s not a very idiomatic approach for a go codebase. I’m not sure I would recommend the code generation approach for building your own new parser either. Parsers are a really important advanced topic to learn. For anyone who hasn’t dabbled with parsers I would recommend a deep dive on the topic.

A good small hand rolled parser is worthwhile maintaining and developing, and I would avoid formal grammars and intermediate languages where possible. The problem with these parser generator approaches is that you often end up with a custom DSL that is difficult for others to pick up and learn.

There is a good amount of public knowledge that backs up this approach to building parsers, and you can see that most languages do not rely on formal or custom grammars like BNF, instead deciding to focus on a hand rolled approach

One recommendation I will make before wrapping this up, is that you should check out this really great book. In the parsers and compilers book, you go through a step by step test first approach to building a tokeniser, lexer and interpreter for a tiny language called ‘monkey’. I found it to be a very awesome intermediate -> advanced level book, and it uses the go language (even better!).

Reflection is slow in Golang