I have been using trunk-based development for some time to improve productivity in the team I supervise. In order to perform trunk-based development smoothly, feature toggle plays an important role in it, so I use this article to introduce feature toggle, what kinds of toggles are included and how to use them.
But before we get to that, let's talk about a concept that is easily confused. What is the difference between feature toggle and configuration?
First, wherever a configuration is stored, such as a file, or a central storage system, such as Consul by HashiCorp, it is considered static. When the system is initialized, the entire configuration file is stored in each instance's memory to reduce unnecessary I/O overhead. That is, if a setting is to be modified, the associated instance must be restarted in order to reload the setting into memory.
On the other hand, a feature toggle does not work this way; a feature toggle calls the management system each time to get the current settings for the instance or even the feature. Therefore, feature toggle is more dynamic than configuration.
Let's use code as an example.
If it is a configuration file, it will look like the following.
const config = loadConfigFromSomewhere();
function foo() {
if (config.featureA) doA();
else doB();
}
As for the feature toggle, it is a little different.
const connection = initFeatureToggle();
function bar() {
if (connection.isEnabled("featureA")) doA();
else doB();
}
As you can see from the above example, the configuration is fixed after loadConfigFromSomewhere
, and then the value is just taken from the variable. However, the feature toggle may get a different result each time when calling isEnabled
, depending on the current environment.
What kinds of feature toggles are there?
There are four types of feature toggles.
- Release Toggles
- Ops Toggles
- Experiment Toggles
- Permission Toggles
There are two dimensions in the diagram, dynamism and longevity.
- Dynamism said that the frequency of the toggle will be changed, the closer to the right side means the more frequently the toggle will be changed.
- Longevity refers to how long the toggle will stay in the source code, the closer to the bottom the shorter the retention time.
The release toggle in the blue area is relatively static and is only used when a feature is released, and will be removed when the feature is stable. On the other hand, the toggles in the green area are relatively dynamic and may change according to various needs.
Release Toggles
Release toggle is the most commonly used form of toggle and is intended to control the impact scope of each release.
Assuming there is a new requirement for this release, the code for that new requirement should be fully encapsulated by the toggle for this release. At the moment of release, the default toggle should be off, in other words, the behavior of this release and the last release should be exactly the same.
When the release is successful, the toggle can be turned on gradually on demand. At first, it may be 1% of the calls will get turned on, then 5%, then 10% and so on. In this way, the functionality is gradually opened to 100%. Such an approach is also known as the canary release.
Alternatively, the toggle is turned on all the way at the beginning. Once any problem is encountered, the toggle is turned off completely to avoid a disaster affecting the entire system. The practice is known as blue-green deployment.
It is important to note that the life cycle of these release toggles should be just a short period of time. These toggles should be removed from the code as soon as it is determined that the functionality is stable. If you don't do this, the code will be filled with more and more toggles, which in return will cause a maintenance effort.
Ops Toggles
Ops toggles are different from release toggles because release toggles are designed to react to each release, but ops toggles are designed to handle changes to the infrastructure. When the infrastructure has to be upgraded or migrated for some purpose, it can be managed through ops toggles.
For example, if a system starts with a distributed tracking system called Elastic APM
and wants to replace it with jaeger
for budget or maintenance reasons, then an ops toggle can be used to switch between the two systems. Until we are sure that jaeger
can be operated correctly, the two systems will coexist, possibly with a half-monitoring ratio of 50-50. The ops toggle will remain in place until we are confident that we can replace the original system with jaeger
, so the duration will be much longer than the release toggle.
Another example of using ops toggle is a manual circuit breaker. For a high throughput system, a rate limit algorithm is usually implemented, but once the traffic reaches a certain level, it is necessary to directly cut off the excess traffic to avoid impacting the whole system. It is ideal for the system to be able to adjust and recover itself, however, such a mechanism is highly complex and difficult to do well at the beginning, so ops toggle is a good choice.
This toggle will also exist for a long time, until the developer has a way to implement an automatic mechanism.
Experiment Toggles
Experiment toggles, as the name suggests, are toggles that are used to perform experiments. When a feature has two different behaviors and we want to evaluate the effectiveness of these two behaviors, we use experiment toggles, and the process of experimentation is called A/B testing.
It operates a bit like a release toggle, but unlike a release toggle that allows old and new behaviors to coexist, an experiment toggle allows two new behaviors to coexist. It may even be possible to bring in more experimental parameters through experiment toggles to make the whole experimental process more flexible and efficient.
Yes, most of the feature toggles can carry additional parameters, not just true or false. Therefore, this can play the most accelerating role for the constantly changing experimental environment.
Just like other toggles, the experiment toggles should be removed from the code after the experiment is done and the results are confirmed.
Permission Toggles
This last one is the most complex applicable case. From the diagram, we can see that it is both dynamic and long-lived.
Then, what exactly is the existence of this toggle?
From my point of view, it has two scenarios, one is the system-wide access control and another is the product level access control.
System-wide access control means that there are certain functions or operations that are only available to specific users, thus we manage these restricted functions through permissions toggles. A little hard to imagine? I'll provide a pseudo code.
function restrictedFunction() {
const metadata = {userId, userLevel, userRole};
if (connection.isEabled("featureA", metadata))
doFunction();
else return;
}
From the above example, we can know that this specific function is only allowed for certain users, and the user must verify his Id
, Level
and Role
to make sure he meets the eligibility criteria before the function is available, otherwise it will be skipped.
Such a toggle seems to be very stable, right? In fact, it is not, the verification method may change, originally only Id
and Level
may be required, until one day due to demand and add Role
. So the rules of the toggle may change with requirements, and the usage context of the toggle may also change as requirements expand.
How to use feature toggles correctly?
We've already seen what the code looks like when using feature toggle.
const connection = initFeatureToggle();
function bar() {
if (connection.isEnabled("featureA")) doA();
else doB();
}
When more and more toggles are made, there will be more and more if-else in between, which actually violates most of the clean code principles. Therefore, you should be extra careful when using feature toggle and follow a few basic principles.
- Use only where needed.
- control the number of toggles.
- Toggles do not overlap with each other.
- Regularly clear the useless toggles.
In addition, the code of using toggles will eventually become as follows.
function bar() {
doB();
}
Because of this, it is very important to be able to remove toggles easily. I recommend using the Factory Method and encapsulating the code with a feature toggle so that only the product inside the Factory Method is removed when removing toggles and not all the external callers are touched. Here is a simple demonstration.
function factory()
{
if ( connection.isEnabled("featureA", metadata) ) {
return new NewHandler();
} else {
return new OrigHandler();
}
}
// qwer.js
factory().doA();
// asdf.js
factory().doB();
// zxcv.js
factory().doC();
Wrapping the whole feature toggle judgment in the factory method, the external caller doesn't need to know whether he gets a new or old handler. when removing the feature toggle in the future, we only needs to modify the factory method and drop the OrigHandler
.
function factory()
{
return new NewHandler();
}
As a result, whether it is qwer.js
, asdf.js
or zxcv.js
do not need to change.
Conclusion
This time we talked about the many scenarios where feature toggles can be used and how to treat them properly, and you should know why trunk-based development needs feature toggles so much, because new versions are released frequently and to avoid affecting the online environment. So all unstable modifications must have a mechanism to isolate them. Feature toggles are exactly such a role.
Another commonly mentioned use case in using feature toggles is that once a toggle is modified, it must immediately react to some online functionality. Such a requirement can be achieved through Observer Pattern, but I don't recommend this kind of conjunction. After all, we know that these toggles will eventually be removed and they should not be involved in the whole domain model, instead they play the role of coordinator, coordinating between people and systems.
In this article, we have only discussed the use of feature toggles and their considerations, without mentioning specific solutions. Next time I will introduce two famous providers of feature toggle, LaunchDarkly and Unleash, and provide some comments and guidelines based on my experience with them respectively.