Did you ever see some piece of code that uses primitive data types, like Strings or integers, in it’s public API? I am sure you have, because most code is written like that.

But that can cause problems. Today, I want to talk about a code smell called “Primitive Obsession”. And I want to show you how code often becomes easier to understand and “nicer” by eliminating primitive data types - especially, but not only, from it’s public API.

Intro (TL;DR)

My name is David and in this video series, I am talking about things you can do to become a better software developer.

I am a freelance consultant and coach - and in the last 12 years I have strived to constantly learn and improve my skills as a software developer, architect, and also as a tester, team coach and trainer.

I have learned that improving your software design is crucial for delivering better software faster and cheaper. Having a low internal quality in your software costs you a lot, and in many different ways.

Today I want to talk about a very specific design smell: Primitive Obsession. You can notice this smell when your code uses a lot of primitive data types, like integers or booleans but also strings and dates, especially in its public API.

Primitive Obsession is a more general version of “Stringly Typed Code”, a code smell where strings are used in places where other data types would be more appropriate.

Question / Problem

How does primitive obsession look like and what exactly is the problem? Consider the following piece of code…

This code comes from a hypothetical software that helps me pay my taxes in Austria. In a first step, users would input their tax account number.

The software would first verify that the tax account number is correct and also load some information about the tax office, which the software will display to the users. And because we are only selling to Austrian customers and only support Austrian tax account number, it can use the first two digits - the tax office number - for doing so.

46-999/9460

The tax account number looks like this: The first two digits are the tax office number. In this example, 46 means “Linz”. The next six digits, ususally written with a slash in the middle, are the account number within this tax office. And the last digit is a checksum.

The example code gets the tax account number that was entered by the user, validates it, extracts the tax office number and loads the tax office information. If everything worked out correctly, it displays the tax office information. Otherwise, it outputs an error.

For some reason, maybe a performance optimization or because it is required by some backend, the tax office number string is also converted to an integer. If you think, right now, “Nobody would ever do that”: I have seen very similar things at multiple past clients. Well, maybe a little bit less obvious stuff. And not with tax account numbers, but with other structured data.

public void onTaxAccountNoChanged(String taxAccountNo) {
    String errorMessage = TaxAccountNoUtil.validateTaxAccountNo(taxAccountNo);
    if(errorMessage != null) {
        showTaxAccountNoValidationError(errorMessage);
    } else {
        String taxOfficeIdInput = taxAccountNo.substring(0, 2);
        Integer taxOfficeId = Integer.valueOf(taxOfficeIdInput);

        String taxOfficeName = taxOfficeRepository.findByNumber(taxOfficeId).getName();
        showTaxOfficeName(taxOfficeName);
    }
}

The first version of the code

This code uses primitive data types - Strings and Integers - all over the place. validateTaxAccountNo returns the error message as a string. The taxAccountNo is a string. We get the tax office number as a string and convert it to an integer. The tax office name is returned as a string by taxOffice.getName()

Use of primitive values in this code

But the code is reasonably short and easy to understand, so where is the problem?

First, these data types are too general. They allow more operations than should be possible with an tax account number or a tax office number. Integer allows us to add, substract, … and also to compare to other numbers. All of that should never be possible for a tax office number.

On the tax account number, we can again do operations that should not be possible, like calling the string manipulation functions, searching inside, and so on.

If you think, right now, “nobody would ever do that”, think again.

Second, the use of types does not add clarity or allow you to remove duplication. It’s even the other way around: To add functionality around tax account numbers, we needed a class called TaxAccountNoUtil. Now there are two places in the code that know about tax account number. We have duplicated the knowledge about tax accounts.

And third, you are voluntarily giving up compile time safety. Suppose you see a method call like this:

A function call - Which parameter is which argument?

When the first two parameters are integers and the other three are strings, nothing is stopping a programmer from supplying them in the wrong oder. When some programmer decides to re-order the arguments or add another argument, it will be really hard to find all the effected places.

The compiler could help you detect defects like that. But in this case, it cannot because you are using primitive data types.

Solution

As a first step, I will do the most simple thing. I will introduce types for TaxAccountNumber, TaxOfficeId and TaxOfficeName - and there is already a class for TaxOffice. They only wrap the primitive value - They do not do anything in addition.

public void onTaxAccountNoChanged(String taxAccountNo) {
    try {
        TaxAccountNumber taxAccountNumber = TaxAccountNumber.fromString(taxAccountNo);

        String taxOfficeIdInput = taxAccountNumber.asFormattedString().substring(0, 2);
        TaxOfficeId taxOfficeId = TaxOfficeId.fromString(taxOfficeIdInput);

        TaxOfficeName taxOfficeName = taxOfficeRepository.findBy(taxOfficeId).getName();
        showTaxOfficeName(taxOfficeName);
    } catch (TaxAccountNumberFormatException e) {
        showTaxAccountNoValidationError(e.getMessage());
    }
}

Even though this was just an intermediate step, I already made some progress. The fromString method of TaxAccountNumber should never return an invalid object. So I had to move the validation there. That’s what J.B. Rainsberger calls “Attractive Code” - Once you have a place where you can put code for a certain concept, this place will attract other code.

This method can still use the old TaxAccountNoUtil class, but I will later even inline the methods from there and remove TaxAccountNoUtil. Another win: We should not have “Util” classes anyway.

And if you do not like that exception being thrown, there would be other solutions. But that would be too much change for this video.

I now have types for TaxAccountNumber and TaxOfficeId, and so far, they only wrap the primitive type. But they already give me two other advantages:

  • I can make the constructor private, forcing everyone to use the factory methods. There I can do very specialized validation, depending on which factory method was called. Also, the factory methods allow me to give a “name” to the constructor, which would otherwise be impossible in Java.
  • I can make the wrapped primitive final. The classes now start to represent what “Domain Driven Design” calls a value object.

Private constructor, immutable fields

Now I have my attractive code. What else could I do with that?

What I will do now is “just” cleanup. I can remove the redundant calls to “fromString” and “asFormattedString” and move this code to the specialized types. And since I now also have a type for the tax office name, I do not have to name the mehtod for showing it showTaxOfficeName. I can name it showNameOf and rely on the fact that the compiler will select the right method.

Final result

Conclusion

The code now is shorter and easier to read.

It has added compile time safety: It is now impossible to supply the wrong number to findBy or to call the wrong method for showing the name. The compiler even forces me to deal with the validation error that I could have ignored in the first version of the code.

Yes, I wrote a lot of code around this method, so the overall amount of code was increased. That took me maybe half an hour.

But this code was easy to write. And it allows me to encapsulate the data better. It allows me to make all these objects immutable - that is, read-only - which makes writing correct correct easier.

The code also added clarity to onTaxAccountNoChanged which will be the first method a programmer will see when they debug a problem with tax account numbers later. If I can save the next programmer a few minutes of debugging time, and if I maybe can prevent a simple parameter ordering mistake, saving another programmer a few minutes, this change was already worth it.

CTA

Have you ever seen code with primitive obsession? What about the software you are working on right now - What are the worst places regarding to this, and did it ever cause a (small or larger) problem?

Or, do you already create classes for value objects and entities, like in the last version of my code? How does that feel? Which advantages does it give you?

Tell me in the comments or shoot me a Tweet to @dtanzer. And do not forget to subscribe to this channel or follow me on Twitter, so you’ll never miss any updates.