Simplicity and maintainability

We generally consider simple, efficient, and maintainable to be related: that something improved in one of the three ways will likely benefit in others.

However, there's an ambiguity in "simple", related to one in "maintainable". Consider the following case:

You have a delimited set of instances of some human-readable structure. For this to be a useful example, the number of cases should be somewhere between five and maybe a hundred, and it should be open - that is, there should be a probability that more instances will be added as time goes on. An example might be specifications of message structures in an ASCII format for an evolving protocol.

These structures also aren't simple discrete units. They come in pairs or even triplets, with there being a base type and one or more derived types. The derivation is fairly simple for a human to do and could be put into a set of rules for an algorithm.

You need all of these cases to be defined for a program to run. You have several possibilities:

1) Load everything in the data segment at the start of the program. This can be done if necessary by having a preprocessor process a configuration file (or the results of a database query) at compile time. 

2) Load everything from a configuration file (or a database) at runtime.

Note that in both these cases I am assuming that all the variations are what you load: that is, that a human prepares the data as regards the variant forms.

This is, in fact, a very "simple" solution - so simple, on fact, that you can delegate maintenance to somebody on the data side with no programming involved whatsoever.

(You're going to have to hand some updates to the data side, notionally, in any case. Every genuinely new case has to get into the system somehow.)

It's also fast. Loading directly into memory and then doing simple lookups is about as fast as you can get.

In one sense, it's maintainable, in the sense that there are simple rules for updating, extending, and correcting the values in the set. These rules are applied outside the program domain.

In another sense, the maintainability is poor, though. It's almost certain that those variants are not going to match exactly if they are all generated by hand. Flushing out errors is difficult. This is, in fact, a violation of the DRY principle, except that it shows up in data rather than logic.

Now consider the case where the variants are generated algorithmically. In the load at compile time model you can do this by using a preprocessing script which can generate data structures declared for the data segment; if you're clever and the use case supports it you can use constexpr algorithms to to the generation within the code itself. For runtime you can generate the variants at startup or defer until one is needed, generating on the fly from the base form.

In some sense all of these options which derive the other forms algorithmically are less simple than just providing all the options up front. But in another sense the system is simpler because it now has fewer inputs. In a one sense, the run-time model is less maintainable: you now need someone who can tweak the algorithm if changes are required to do the maintaining. But on the other hand, there's a whole domain of problems that have been eliminated by making the generation automatic. There is less _to_ maintain. And the algorithms themselves can be put under a set of unit tests, with edge cases dealt with thoroughly enough that the code itself can be used to define expected behaviour.

We thus come to a paradox: the code which is, in the most obvious way, harder to read is also easier to maintain and more reliable, because its execution of intent can be tested; because it is a rule rather than a heap of data. Not only that, it can be used to provide diagnostics of data irregularities which might have escaped notice before.

If we tweak the parameters of the case we can get a better sense of what one might do.

- If there are 500 rather than 50 or 100 distinct cases, or if there's a new case every few days, the burden of doing more than the minimal amount with them manually is just too great. There's no question that you will want to do as much as possible algorithmically.

- If the data is a genuine forever-closed set - really closed, such as one based on an enumeration of the Emperors of Rome, or the months of the year - then getting it right once and taking away the scaffolding, as long as the specification for the data is unlikely to change, can be more attractive. (And even if it does change, occasionally, if the changes are simple, such as adding a new field to all instances, you might just write a one-off script to update the data.)

- If the data is ultimately sourced from an already-existent source (such as a database) eliminating intermediate steps between the source and its use makes sense. It might still make sense to generate code at compile time, but this sourcing also makes it more likely that it will be required that the behaviour of the application reflect the state of the data at runtime. Doing as much as possible with processing becomes the obvious model. However, this also raises the importance of integration testing, as unit tests which mock database access can miss changes to the database.

- If you have a microservices-oriented or distributed system and several applications (typically on different servers) make use of the fully processed data, it makes sense to give one application the sole job of distributing current values to all the other consuming applications. In that context it also makes sense to do as much programmatically as possible within that one application, especially as it is now possible to do full scripted integration testing on the application to test the various cases.

- If the data is required only occasionally and in special cases, unless there is a very tight performance constraint on all transactions, it may make sense to defer as much as possible. This does not necessarily mean that startup costs need to be kept down, though, so one's approach might go both ways. (If you can do all your prep with constexpr functions and structures, you might as well, unless your build times are prohibitive.)

So we have to be careful when we talk about "simplicity", especially when we are contemplating the overall life of the code.

Comments

Popular posts from this blog

Boundaries

Overview

Considerations on an Optimization