Function sizes

Having expressed my discomfort with the idea of a fixed function size, what do I think should be the maximum size of a function?

First of all, counting semicolons is pointless. There is no difference in essentials between.

foo(generateX().createParam());

and

auto x = generateX();

auto param = x.createParam();

foo(param);

other than the greater scope in which the two temporary variables live. (I note that many developers tend to be more comfortable with the latter. I see much more of 

int n = std::accumulate(...);

process(n, param2);

//n not used further

than I do of

process(std::accumulate(...),

   param2);

in general code.)

There is an obvious outer limit on a normal maximum size: too large to be seen on one screen. We will allow the screen to be that of a large monitor. However, even that has its exceptions.

Assume that you have received a message from an external source and that the type is expressed as a string. Based on that type we create a new message object of that specific type and dispatch it.

Some protocols have many, many message types.

You may find yourself with an inescapably long if/else succession. Even with the discipline of making each branch a single function, the resulting construct may be larger than a single screen by some considerable degree. The only way to break up the construct is arbitrary; better to live with one clear location. Comprehensibility is not assisted by being arbitrary. A long if/else statement has the advantage that it's easy to express: determineType(), or dispatchBasedOnType(). Notional functions like determineTypeForNamesStartingWithA() are not named after coherent, meaningful concepts. (You can get rid of the if/else with a Command pattern, but then setting up the commands will be a long function instead.)

In general, though, if you have a single concept which is not an extended choice whose expression requires twenty-five lines, you probably aren't chunking the internal logic anything like adequately - though you may have written an excellent first draft, especially if you work top-down and you aren't using TDD. Refactoring isn't just for revisiting code; it's part of the initial development process. (With TDD you're more likely to stub in the internal logic so that you can put the function under a test as soon as possible. Built-in chunking is one of the benefits of TDD.)

Putting to one side such cases, it's likely to be true that operations which express a single concept can be formulated in terms of a short list of subsidiary conceptual operations. Irreducibly short list: that is, combining two subsidiary operations into one will create a function with two responsibilities. That determines the minimum size of the function. That will also normally be the maximum size of the function, with some qualifications related to locality.

Locality

How large the function is may depend on the weight you give locality.

In general, locality is a good idea. If we define operations in a close common context, they are likely to be more comprehensible, as a whole and individually, than if they are scattered all over. Locality is usually constrained by repetition: if a block of logic is used in multiple contexts, we have to push up the level of generality where it is defined until we get to a level common to all those contexts. This level will normally be determined by how your packages are structured, always requiring no cyclical dependencies and no unnecessary generality.

But if a given operation is called in only one function - and there is no reasonable likelihood that it will ever be called in another - then we have a choice.

We can still define the functionality separately, typically as a private method or subclass, so that at the call site there is one statement; or we can define the function locally. Pre-C++11 this would have been a local class; now it will be, normally, a lambda expression. If the logic is called via an STL algorithm or another function taking a function argument, or if the expression is being used to initialize a variable in an initializer-list, you need either the function or the lambda.

If your logic is in the body as part of sequential evaluation, the other alternative is not to segregate the logic at all. This is probably a poor decision; there is little overhead in, at least, placing the statements in a lambda expression which is a meaningfully named variable before executing it. If you have to do this, though, then extracting a full-fledged function, private to the class, is probably your best bet. This is similar to turning complex and/or tests into encapsulated functions with names, even if they aren't repeated.

If we choose to go with a local lambda expression where some sort of function call is required, we lengthen the function, but gain in the comprehensibility conferred by locality, and the use is idiomatic. Calling 

coll.erase(std::ranges::remove_if(coll, [&](const auto &inVal) { //Short sequence of statements}), coll.end());

will be longer at the site than if we had extracted the lambda into a functor (the reference means you need a functor, not a simple function) but the comprehensibility of the class as a whole may be improved, and the comprehensibility of the function is likely to suffer only minimally if at all. The use in the algorithm already zones off the lambda logic as a separate piece of logic to be looked at as a unit.

Design

Size, then, comes down to the evaluation of the Single Responsibility Principle, modified slightly by the possibility of local definitions of unique sub-operations. Note that this is not maximum size, not an external limit to be placed on a function, but simply "the appropriate size" for a given operation.

Those operations will be dictated by design: using a visitor pattern may lead to smaller functions overall than sequential processing of classes not set up or stored for visiting. Making all your class's members inherit from a common interface and storing them in a vector allows you to operate on them with short STL algorithms rather than sequentially, at the cost of storing heap-based objects and possibly having to provide additional logic if you want your class to be copyable. Design, in turn, is affected by such things as scale - there's not much point in using a command pattern in a thousand-line utility - and the needs of the execution environment (maybe everything in the class has to be stack-based for performance reasons).

In many cases, design is imposed by legacy structures which were not developed with much in the way of any careful structure in mind, or which have gone beyond their intended boundaries. In these cases, you are unlikely to be worrying about an ideal function size; you are more likely to be concerned with reducing the size of hundred-line (or N-thousand-line) functions by encapsulating, extracting, and testing functionality. Even, or especially, there, though, it's worth keeping an intended/emergent design in mind rather than just mechanically extracting blocks of code as you go..

Comments

Popular posts from this blog

Boundaries

Overview

Considerations on an Optimization