Refactor to Purity

Pure Functions are program methods that can be executed without causing side effects. In functional programming, they are more of a rule than an exception. However, in most object-oriented languages, you encounter them less often, or at least they are not frequently considered the preferred approach. In the dotnet environment, much emphasis is placed on Dependency Injection and more or less extensive abstractions using interfaces.

The following article will demonstrate how to transition from a codebase with many such indirections to a simpler version that removes a lot of unnecessary complexity.

Initial Situation

As a starting point for our refactoring, let’s consider a fictional example of an online shop. The source code is available on GitHub, and there’s a separate branch in the repository for each refactoring step.

Source code on GitHub, Branch steps/01-initial-state

The application consists of an application project and a project for associated tests. The structure doesn’t strictly adhere to an architectural style but is meant to illustrate the components you might expect in such a system. Additionally, we will focus on the backend side of our online shop here. The folder structure looks like this:

├───Refactor.Application
│   ├───Controllers
│   ├───CQRS
│   │   ├───Handlers
│   │   └───Requests
│   ├───Data
│   ├───Models
│   ├───Repositories
│   │   ├───Implementations
│   │   └───Interfaces
│   └───Services
└───Refactor.Application.Test
    ├───Controllers
    ├───CQRS
    │   └───Handlers
    ├───Repositories
    └───Services

The application is written in C# and uses ASP.NET controllers. Business logic is implemented in service classes, and domain models are located in the Models folder. Database access is done through the Repository pattern, and the POCO classes for the database are placed in the Data folder. For communication between controllers and services, the CQRS (Command Query Responsibility Segregation) pattern is employed.

All these individual components are managed and wired together using Dependency Injection.

Abstraction

In software development, abstraction is a commonly used concept. However, it often falls short of its primary goal, which is to reduce complexity and maintenance efforts. Additionally, abstraction layers are frequently introduced without providing concrete benefits but rather because “that’s how things are done.” This not only impacts code readability but also makes it challenging to understand the runtime behavior without a detailed analysis of dependencies. While the abstractions in our example might appear artificially enforced for a small demo, they are indeed encountered in real projects from time to time.

Base Classes and Marker Interfaces

All our model classes inherit from an abstract base class or record called ModelBase, which doesn’t provide any implementation. The database POCOs implement an interface called IData, which at least defines a property Id.

// ./Models
public abstract record ModelBase;

public record Customer(
    Guid Id,
    string FirstName,
    string LastName,
    string Email) : ModelBase;

// ./Data
public interface IData
{
    Guid Id { get; }
}

public record Customer(
    Guid Id,
    string FirstName,
    string LastName,
    string Email,
    bool Active) : IData;

Repository Interfaces

In the Repositories folder, you can find both a generic interface IRepository<T> and specific interfaces for each database table or POCO class, such as ICustomerRepository. Additionally, there’s an abstract base class called AbstractRepository<T>, which simply implements the methods of the generic interface one-to-one.

public interface IRepository<T> where T : IData
{
    T Get(Guid id);
    IEnumerable<T> GetAll();
    void Add(T entity);
    ...
}

public abstract class AbstractRepository<T> : IRepository<T> where T : IData
{
    protected readonly IDatabase _database;

    protected AbstractRepository(IDatabase database) => _database = database;

    public abstract T Get(Guid id);
    public abstract IEnumerable<T> GetAll();
    public abstract void Add(T entity);
    ...
}

Abstracting the concrete database access through an interface like IDatabase can be meaningful, as it allows you to replace external systems like a database with a mock object during testing. However, we will find a different solution to this problem as we proceed.

In most cases, the concrete implementations of the repositories typically consist of forwarding calls to the base class or an IDatabase object.

public class CustomerRepository : AbstractRepository<Customer>, ICustomerRepository
{
    public CustomerRepository(IDatabase database) : base(database) { }

    public override void Add(Customer entity) => _database.Add(entity);
    public override void Update(Customer entity) => _database.Update(entity);
    ...
}

Services and CQRS

All our services are defined as interfaces with precisely one implementation; there is no need for multiple implementations or runtime swapping.

The example of ITaxService illustrates the frequent use of interfaces. The interface defines only a single method, which has no dependencies apart from its direct method parameters.

public interface ITaxService
{
    (decimal taxAmount, decimal grossPrice) CalculateTax(
        decimal netPrice, decimal taxRate);
}

public class TaxService : ITaxService
{
    public (decimal taxAmount, decimal grossPrice) CalculateTax(
        decimal netPrice, decimal taxRate)
    {
        var taxAmount = netPrice * taxRate / 100m;
        var grossPrice = netPrice + taxAmount;

        return (taxAmount, grossPrice);
    }
}

Tests

Now, what does a (unit) test for such code look like? Testing the GetOrderItems() method of the OrderItemService illustrates how much setup code is already required to mock dependencies and feed them with data. In the case of the ITaxService interface, even the business logic is implemented in the mock object.

[Test]
public void Should_Return_OrderItems()
{
    // Arrange
    var orderId = Guid.NewGuid();

    var orderItem1 = new OrderItem(Guid.NewGuid(),
        orderId, Guid.NewGuid(), 2, 19.75m);
    var orderItem2 = new OrderItem(Guid.NewGuid(),
        orderId, Guid.NewGuid(), 3, 9.66m);
    var orderItemData = new List<OrderItem> { orderItem1, orderItem2 };

    var orderItemRepository = Substitute.For<IOrderItemRepository>();
    orderItemRepository.GetByOrderId(orderId).Returns(orderItemData);

    var taxService = Substitute.For<ITaxService>();

    taxService.CalculateTax(default, default)
        .ReturnsForAnyArgs(info =>
        {
            var netPrice = info.ArgAt<decimal>(0);
            var taxRate = info.ArgAt<decimal>(1);

            var taxAmount = netPrice * taxRate / 100m;
            var grossPrice = netPrice + taxAmount;

            return (taxAmount, grossPrice);
        });

    var sut = new OrderItemService(orderItemRepository, taxService);

    // Act
    var orderItems = sut.GetOrderItems(orderId);

    // Assert
    orderItems.Should().NotBeNullOrEmpty();
    orderItems.Should().HaveCount(2);

    var firstOrderItem = orderItems.First();
    firstOrderItem.Id.Should().Be(orderItem1.Id);
    firstOrderItem.TaxRate.Should().Be(19);
    firstOrderItem.GrossPrice.Should().Be(19.75m * 1.19m);
}

As shown in Step 1 of our refactoring process, you can significantly reduce the setup code for tests with minimal effort.

Code Analysis

Exploring our application using Sonargraph reveals the number of dependencies between the individual classes we are dealing with at this stage.

Sonargraph Dependency graph initial state

At this point, the codebase consists of 917 lines of code in 53 files and has an Average Component Dependency (ACD) of 5.3.

Step 1: Test Dummies

In the first step, we focus on the test classes. A robust test suite is the foundation of safe refactoring, which is why we start here.

Following the motto new is glue, we move the instantiation of test data out of the test methods and into Dummies. There’s a dedicated blog post on the topic of Dummy Factories Simple test setup with dummy factories, so we’ll only briefly touch on the changes to our example code here.

Source code on GitHub, Branch steps/02-introduce-dummies

We add a class called DataDummies that takes care of instantiating data objects for us. Additionally, we define some static instances of Customer objects that we can use in our tests.

internal static class DataDummies
{
    public static Customer JohnDoe => Customer(
        new Guid("bfbffb19-cdd4-42ac-b536-606a16d03eae"), "John",
        "Doe", "john.doe@example.com");

    public static Customer JaneDoe => Customer(
        new Guid("95a6db4a-4635-4fb3-b7f6-c206ff7272f1"), "Jane",
        "Doe", "Jane.doe@example.com", false);

    public static Customer Customer(
        Guid? id = null, string firstName = "Peter", string lastName = "Parker",
        string email = "peter.parker@example.com", bool active = true)
    {
        return new Customer(id ?? Guid.NewGuid(),
            firstName, lastName, email, active);
    }

    ...
}

We follow the same approach for our domain objects. Here, we can take advantage of the fact that POCO classes and domain models are usually structured similarly, allowing us to use the data objects from DataDummies.

internal static class ModelDummies
{
    public static Customer JohnDoe => FromData(DataDummies.JohnDoe);
    public static Customer JaneDoe => FromData(DataDummies.JaneDoe);

    public static Customer FromData(Data.Customer data)
    {
        return Customer(id: data.Id, firstName: data.FirstName,
            lastName: data.LastName, email: data.Email);
    }

    ...
}

With this approach, our test setup becomes simpler and, most importantly, more resilient to changes in the data objects since we only need to adjust them in one place.

[Test]
public void Should_Return_OrderItems()
{
    // Arrange
    var orderId = Guid.NewGuid();

    var orderItem1 = OrderItem(price: 19.75m);
    var orderItem2 = OrderItem(price: 9.66m);
    var orderItemData = Collection(orderItem1, orderItem2);

    var orderItemRepository = Substitute.For<IOrderItemRepository>();
    orderItemRepository.GetByOrderId(orderId).Returns(orderItemData);

    ...
}

After these preparations, we can proceed with refactoring the production code.

Step 2: Removing Interfaces

In the second step, we aim to remove unnecessary abstractions through interfaces and base classes. It’s often argued that test code can leverage these abstractions, replacing dependencies implemented as interfaces with mocks, stubs, or fakes. This is certainly true for external dependencies, such as databases or email servers. However, for self-created abstractions, it usually leads to unnecessary complexity and a high overhead in test setup. Test mocks are challenging to maintain and require knowledge of the internals of the real implementation, necessitating their recreation.

Our first point of focus is the TaxService. The CalculateTax() method is already a pure function. Therefore, we can delete the ITaxService interface, make the class and the method static, and simply call it directly. There’s no need for dependency injection, and the test mock can also be eliminated.

public static class TaxService
{
    public static (decimal taxAmount, decimal grossPrice) CalculateTax(
        decimal netPrice, decimal taxRate)
    {
        ...
    }
}

The corresponding Git commit shows 43 deleted lines.

Now let’s turn our attention to the service classes OrderService and OrderItemService. From the dependencies provided via constructor injection (e.g., ICustomerRepository), we only need individual methods or even just the return value of a method. Instead of injecting repository classes, we pass method pointers (delegates) to the service classes. This eliminates the need for private properties, makes the classes stateless and static, and allows us to remove the interfaces.

The OrderService class previously had three dependencies.

public class OrderService : IOrderService
{
    private readonly ICustomerRepository _customerRepository;
    private readonly IOrderItemRepository _orderItemRepository;
    private readonly IOrderRepository _orderRepository;

    public OrderService(IOrderRepository orderRepository,
        ICustomerRepository customerRepository,
        IOrderItemRepository orderItemRepository)
    {
        _orderRepository = orderRepository;
        _customerRepository = customerRepository;
        _orderItemRepository = orderItemRepository;
    }

    public Order GetOrder(Guid id)
    {
        var orderData = _orderRepository.Get(id);
        return GetOrder(orderData);
    }

    private Order GetOrder(Data.Order orderData)
    {
        var customerData = _customerRepository.Get(orderData.CustomerId);
        var orderItemData = _orderItemRepository.GetByOrderId(orderData.Id);

        ...

        return orderModel;
    }
}

After the refactoring, the class looks like this:

public static class OrderService
{
    public static Order GetOrder(Guid id,
        Func<Guid, Data.Order> getOrder,
        Func<Guid, Customer> getCustomer,
        Func<Guid, IReadOnlyCollection<OrderItem>> getOrderItems)
    {
        var orderData = getOrder(id);
        var customerData = getCustomer(orderData.CustomerId);
        var orderItemData = getOrderItems(id);
        return GetOrder(orderData, customerData, orderItemData);
    }

    ...
}

The GetOrder() method is now simply called by passing the relevant methods of the repositories as parameters.

var orders = OrderService.GetOrder(
    id: id,
    getOrder: _orderRepository.Get,
    getCustomer: _customerRepository.Get,
    getOrderItems: _orderItemRepository.GetByOrderId);

If the method signatures differ, we can easily adapt them using lambda expressions.

var orders = OrderService.GetOrder(
    id: id,
    getCustomer: id => _customerRepository.Get(id: id, activeOnly: true),
    ...

The corresponding unit tests also become simpler. We no longer need to assemble mock objects but only define methods. These local lambda expressions are one-liners.

var getOrder = (Guid _) => DataDummies.Order(orderId, peterPan.Id);
var getCustomer = (Guid _) => peterPan;
var getByOrderId = (Guid _) => DataDummies.Collection(orderItem1, orderItem2);

// Act
var order = OrderService.GetOrder(orderId, getOrder, getCustomer, getByOrderId);

Alternatively, with pure functions, you can pass the return value of the same function as a parameter. This is referred to as referential transparency. However, for methods that have side effects (such as database updates) or filter large data sets, this is not always advisable

var order = _orderRepository.Get(id);
var customer = _customerRepository.Get(order.CustomerId);
var orderItems = _orderItemRepository.GetByOrderId(id);

var orders = OrderService.GetOrder(order, customer, orderItems);

By passing dependencies as method parameters rather than using dependency injection to bring them into a class, we shift the responsibility for creating and managing dependencies to the calling code.

Step 3: Removing CQRS

Next, we remove the CQRS pattern implemented with MediatR from our codebase. The library is great, and CQRS is a powerful tool when there is a genuine need to separate commands and queries. However, in our example, we want to demonstrate that this is often unnecessary and might be premature optimization that never materializes.

Instead of distributing the source code that connects the controller and domain logic across multiple IRequest and IRequestHandler<> implementations, we consolidate it into a few integration classes.

Instead of having an AddOrderHandler along with its corresponding AddOrderRequest, we now have a single method that receives the required dependencies as parameters and orchestrates the invocation of the service classes.

public static class OrdersIntegration
{
    public static void AddOrder(Order order,
        ICustomerRepository customerRepository,
        IOrderItemRepository orderItemRepository,
        IOrderRepository orderRepository)
    {
        if (!order.Items.Any())
            throw new InvalidOperationException("Order must have at least one item.");

        var customerData = customerRepository.Get(order.Customer.Id);

        if (customerData.Active is false)
            throw new InvalidOperationException("Customer is not active.");

        foreach (var orderItem in order.Items)
        {
            var orderItemData = OrderItemService.AddOrderItem(orderItem, order);
            orderItemRepository.Add(orderItemData);
        }

        OrderService.AddOrder(order, orderRepository.Add);
    }

    ...
}

In another step, similar to what we did for the services, we can switch from injecting repositories to using method delegates. This allows us to remove all IRepository interfaces since we no longer need to substitute them in our tests. An example Git commit demonstrates this for the IOrderRepository.

Schritt 4: Statische Repositories

After removing a few more abstract base classes and interfaces, let’s take another look at the repository classes. They all have only one dependency on IDatabase. Switching from constructor injection to method injection can be done quickly.

- public class OrderRepository
+ public static class OrderRepository
{
-    private readonly IDatabase _database;
-    public OrderRepository(IDatabase database) => _database = database;

-    public IEnumerable<OrderData> GetOrdersByDate(
-       DateTime startDate, DateTime endDate)
-        => _database.GetAll<OrderData>()
-               .Where(x => x.OrderDate >= startDate && x.OrderDate <= endDate);


+    public static IEnumerable<OrderData> GetOrdersByDate(
+        DateTime startDate, DateTime endDate, IDatabase db)
+            => db.GetAll<OrderData>()
+                .Where(x => x.OrderDate >= startDate && x.OrderDate <= endDate);
     ...
}

If we take it a step further here and expect method delegates as method parameters instead of an instance of IDatabase, we can completely remove the dependency on IDatabase from our repositories.

public static class OrderRepository
{
-    public static IEnumerable<OrderData> GetOrdersByDate(
-       DateTime startDate, DateTime endDate, IDatabase db)
-        => db.GetAll<OrderData>()
-            .Where(x => x.OrderDate >= startDate && x.OrderDate <= endDate);

+    public static IEnumerable<OrderData> GetOrdersByDate(
+        DateTime startDate, DateTime endDate, 
+        Func<IEnumerable<OrderData>> getAll)
+        => getAll().Where(x => x.OrderDate >= startDate &&
+                               x.OrderDate <= endDate);

    ...
}

Alternatively, it may be worth considering passing the return values of these methods to our service method instead of the methods themselves. This eliminates all side effects, resulting in a pure function.

public static IReadOnlyCollection<Order> GetOrdersByDate(
    DateTime startDate, DateTime endDate,
    IEnumerable<OrderData> allOrderData,
    IDictionary<Guid, CustomerData> customerData,
    ILookup<Guid, OrderItemData> orderItemData)
{
    return allOrderData
        .Where(x => x.OrderDate >= startDate && x.OrderDate <= endDate)
        .Select(order => GetOrder(order,
            customerData[order.CustomerId], orderItemData[order.Id]))
        .ToList();
}

The calling method is now responsible for collecting the data.

var allOrderData = db.GetAll<OrderData>();

var customerData = db.GetAll<CustomerData>()
    .ToDictionary(x => x.Id, x => x);

var orderData = db.GetAll<OrderItemData>()
    .ToLookup(x => x.OrderId);

var orders = OrderService.GetOrdersByDate(startDate, endDate,
    allOrderData, customerData, orderData);

For external data sources like databases or files, this approach is usually not suitable due to the late filtering, as it may load too much data. We don’t want to load the entire database into memory just to use a few records. However, for small data sets or data already in memory, this is a good way to (reduce complexity).

Results

What have we achieved with these refactoring steps? Our codebase has become significantly smaller, with almost all interfaces removed.

The dependency graph shows significantly fewer lines. The number of lines of code has been reduced to 715 (25% less), the number of files to 34 (35% less), and the Average Component Dependency has dropped from 5.3 to 3.6.

Sonargraph Dependency graph pure functions

Besides the raw numbers, what’s more important is that the code is now easier to understand and follow. You no longer have to hunt for interfaces and potential implementations to understand the runtime behavior.

The entire source code is available on GitHub.