Microservices: Think Hard Before You Jump In

Tue, 08 Feb 2022 by garethbrown

While an awareness among developers of the potential drawbacks of mis-prescribed or poorly designed microservice architecture is growing, I'm seeing on a first hand basis that this problem is still prevalent in too many software projects.

One of the best ways to ensure that your team keeps moving forward and producing value is to avoid costly mistakes. The world has enough problems to solve and we don't need bright minds wasting their time on ill advised, overly complex, sledgehammer-to-crack-a-walnut software architecture. In too many cases, for too many teams, choosing microservice architecture is such a mistake.

Here are 5 common (potentially poor) justifications for microservice architecture:

1. The business wants to ensure that the project can scale to meet massive future demand and be resilient

This usually comes from a good place. The project may have been undertaken due to problems with a legacy platform,or may be considered a big investment. We often hear that FAANG uses microservice architecture, and they're too big and clever to be wrong, right?

Some businesses do work with spiky workloads, and it's advantageous from a cost and performance perspective to be able to scale dynamically with demand. Cloud services have readily available tools for managing this (such as load balancers, container management, virtualisation and API gateways). Such tools are often associated with microservices.

But these tools can be used with monoliths too. You can also run multiple containerised instances of your modular monolith? You might need to code for cloud / centralised storage (such as AWS S3), but that's not usually a big challenge. Maybe a particular area of the application is by far the most resource intensive, but that doesn't necessarily mean it will help to separate it from the rest of the application. It's unlikely that it would hurt to replicate the less commonly used areas of the application also. Unnecessary additional network comms consumes resources, and that's before we even begin to consider the additional complexity it creates. Your application might well be more efficient on a per request basis if more comms was in-process rather than across a network.

On redundancy, the cause of the outage is much more likely to be an issue with application code or your pipeline configuration than it is a failing cloud service, and microservice architecture makes for a greater surface area for failure (through additional code and more complex architecture). Unless you're mitigating against this category of failure with Blue / Green deployment or similar, there's a good chance that such an issue is going to affect all instances regardless of your multi-region deployment.

2. Because the business sees conceptually separates areas of processing

Maybe the business frames various areas of processing as separate, possibly because they are easily identifiable, or occur sequentially after one another, but these conceptual boundaries should not be translated into application boundaries before an understanding of Domain Driven Design has been achieved and analysis applied to your applications data architecture.

Does each application boundary satisfy requirements for consideration as a separate Bounded Context? In short, each microservice should share minimal data. This is key. If you find that two microservices rely on much the same data, separating them will likely result in 'chatty' services, with data frequently being passed in both directions, with the added expense of network calls and messaging and the complexity that goes with that (more on this later).
Does each microservice have its own database? It should, as each service must be independently deployable. If services share a database, then there is often potential for modification of one service to affect the data relied on by another, and this makes them hard to deploy independently. If each does have its own database but the bounded context is poorly defined, then you will end up with duplication of data and the overhead of keeping that data in sync.
Would the services share a lot of logic that would need to be updated in lock step if changes were to be made. Authentication and common utilities such as logging are often good candidates for being independent services, since they usually have a clear area of responsibility and associated bounded context. If however you find yourself replicating other business specific logic or database queries between services, they may be easier to maintain in the same application / service.
Will each service have its own team going forward? One benefit of microservices is that it allows separate teams to operate independently of others, only requiring coordination when the service's public interface changes.

3. Because we might want to replace <insert module here> later

This doesn't necessarily mean it needs to be a separate service. Well designed modular monoliths can be refactored to add or remove components.

4. Because this is what FAANG uses (they might even have developed some of the tools)

You might be working for a large business with significant and mission critical workloads, but there's a high probability that you're not working at FAANG scale. FAANG and their peers work differently to many other businesses, where it's accepted that small changes require large teams and huge amounts of process and communication. To achieve this they employ developers with high degrees of specialisation (and thousands of them too). Here microservice architecture makes sense, where services are managed by dedicated teams. Your business might be better to take advantage of its smaller size and agility. The work required to develop and maintain microservices might eat into developer time resulting in a waste of that advantage.

5. Developers want to use modern technologies that are well known

As developers we need to keep our skills current. While most developers will spend time outside of their paid employment learning new skills and technologies, using those skills in a commercial setting gives a chance to battle test and learn what really works and what is just hype. Tech skills look better on a CV if backed by commercial experience on real world problems (and maybe at a larger scale than your hobby projects might achieve). The downside of this is that tools needed to support microservice architecture might be made to fit into projects that don't really need them.

Where will microservices cost you time and create opportunity for mistakes?

Here is how and where microservices might increase costs:

Developing an internal API

You'll need a means of communicating between microservices. This will likely use a machine to machine authentication mechanism distinct from your user authentication mechanism. This might drive you towards a SaaS authentication service, the likes of which are commonly expensive depending on the number of users you have.

The data transferred between services is likely sensitive, and may require a network architecture such that it's not routed over the WAN with the rest of your services API traffic.

Additional network calls

Calls that might have been in-process function calls or calls using a direct database connection become HTTP network calls or calls via a message bus using AMQP or similar. These are more time consuming to develop and more prone to failure. Naturally, where network calls are introduced between services (compared with in-process calls within a monolith), apps become less resilient, with less data integrity as a result of failed and partially completed transactions. As a result you will likely need:

Additional error handling and retry mechanisms:

What happens when calls between services fail? Will you retry the call? A common solution to this problem is to implement the Circuit Breaker Pattern. And what if acknowledgement of a call isn't received? Then you need to consider idempotent design.
Idempotent Design:

Idempotent design is building an API in such a way that should the same data be received twice, that data can either be discarded or processed in such a way that it doesn't adversely affect application state.
Additional Configuration:

Any configuration common to each microservice will need to be replicated for each service. In addition, configuration for the comms between services will add to the amount of configuration that needs to be maintained.
Correlation IDs:

When investigating faults, you will likely find yourself tracking a series of related calls amongst multiple services. For this you will need the ability to create a correlation ID at the start of the series and include that correlation ID in each request in the chain. Once you have a correlation ID, you have a key that you can use to search logs and trace the fault. That can still be arduous, unless you have a means of aggregating and searching logs in one place.
Long Running Transactions (Sagas):

Long Running Transactions or Sagas, are a mechanism for dealing with the potential for failure and partially completed transactions described above. Several technologies such as Mass Transit incorporate libraries for implementing long running transactions, though a mature understanding of these processes and the time to implement, including forward and rollback functions is required for a complete solution. Often you may find that understanding what constitutes correct handling for a rollback is less than straight forward. Mitigations such as correlation IDs (described above) logging, reporting and notifying of the failed transaction are required.
Log Aggregation and Search:

In microservice architecture, it is common to send logs from each service to a central source so they can be searched in a single place. Services with sufficient logging to trace errors across service boundaries often produce a lot of data, and aggregating this data is a non-trivial task. Teams will often opt to use a specialist service, often a 3rd party (e.g. Data Dog) where costs can be significant for high traffic applications, or an open source stack such as the ELK stack which requires maintenance and can in itself be expensive to run.
Increased Efforts Around Security and Patching:

More applications means a greater attack surface area. There is no inherent reason why any one micro service should be vulnerable, but any security efforts will have to be replicated among all services as will associated patching.
Packaging and Distribution of Common Code:

There will likely be standards and capabilities that you'll want to replicate between services. In an effort not to repeat yourself (DRY) you may end up considering common packages. Building, testing, versioning and distributing the common packages, along with testing the upgrades in each dependent application will involve its own design, tooling and maintenance. In a modular monolith, instead of a package, you might just need library project within the same code base.

Wrapping Up

The aim of this article is to prompt software engineers and architects to think hard before employing microservice architecture.

Many of the issues described above are solvable problems. So it's not that they are intractable problems, rather they have been solved many times over, by many teams. The question is do you want to solve them, and does all of your team have the skills to do so? Are there better, more valuable problems you could be solving? Architecture that is fun to build is not always fun to maintain. Is the aim of the project to explore software development techniques or to solve business problems? Assuming it's the latter, I would suggest that you need solid reasoning to opt for multiple microservices at the point of project inception.