Boundaries

Imagine that you have a well-designed application, possibly using dependency injection. It will almost always have a very thin layer visible in main() - capturing command-line parameters and environment variables, mainly[1], and then a call to an Application class which manages the rest of the initial logic - setting up resources such as logging, doing substantive validation of parameters, and creating and running the objects on the next level down which actually carry out the work of the application. This is simply following the SRP at a high level.

[1] If you have any real complexity to your parameters, there may well be good reason to move that out if the main file as well. I recently had a case where some key variables has default values, which could be overridden by environment variables, which could in turn be overridden - plus a lot of other options - at the command line. To make things worse, what the top level was actually working with was an abstract interface for which the command-parameter handler acted as a factory. After initially sketching it out, I moved all of the parameter handling to another module.

That well-defined boundary between the C function main() and its subsidiary OO levels is an obvious cleavage point. It might not be a package or module boundary - that Application class will be, except in a few frameworks, logically part of the same package as the main() function (though it might well be: you could code different front ends to collect the options (e.g. start up and pull options from a database) making the Application class shared between two actual applications).

That cleavage line is logically essentially identical to the boundary at the command line itself (notionally expanded to include the environment as well as the explicit parameters). All the main() function does is marshal that information.

One of the old uses of Tcl/Tk was to write a GUI to collect user input and then generate a command-line call to an application written in a different language. The same applies to using web interfaces: capture some parameters, pass them to a native application on the server, capture the program output and pass it back to the client. For these applications, the logical boundary (though not the physical boundary) is the same as that between the main() level and the next level down in the design model above.

If you change your main() function to read its parameters from standard input, you have a potential component in a UNIX pipeline, with the same logical boundary taking another physical form.

Or if you modify your application a little, and make it be a server which sits and receives messages from, say, Solace, treating the messages as specifications for a set of parameters, you have the essential architecture of a microservice, assuming that what the application does has an appropriately narrow focus. You could even have your microservice exec the command-line application, which is an easy form of code reuse if one with more overhead than one might want. (A microservice architecture is essentially carving a large application up along cleavage lines which can be defined in terms of messages, and making as many operations as possible asynchronous to improve performance - but the same architecture specifies a way of concerting the single application into a safe multithreaded application - safe because such an architecture presupposes no shared data. The advantage of the microservice model is that it allows easier mixing and matching of components.)

One source of flexibility is that separating out components into parts, whether as microservices, or pipelines, or simply along caller/callee models is that the separation not only gives you simpler units from an architectural and maintenance standpoint, but also allows you to mix different languages according to their capabilities -- without going to the rather more considerable trouble of (usually) needing to create a C-callable library (also representing that boundary) to allow for integration by a linker.

In principle, though, a single application which respects the same logical boundaries should, in principle, be as maintainable as, and more efficient than, a combination of microservices, or a pipeline, or a GUI shell calling a command line application. (And, institutionally, every time you add another language to your set of supported tools you add an additional kind of cost, as you now have to ensure that the knowledge to maintain applications in that language is there.) The key lies in having the boundaries as well-defined, and well-documented, as they would be if they were boundaries between applications.

In a multithreaded context this is most likely to be achieved by transferring tasks between threads via well-defined messages. In single-threaded applications it means having a very clearly restricted set of calls at the boundary line. That's the dynamic reflection of the static module structure.

(Parameter objects, which correspond to the actual messages, thus represent domains at a very low level - few dependencies, which allows multiple packages to depend on them.)

In both cases this is one place where true modules are of immense use. Modules clearly define what the affordances are in a manner the compiler will enforce, a compile-time separation to mirror the runtime separation. In C++, you can have two modules in the same directory, allowing the XApplication class to coexist physically with xmain.cpp. In go, the module and directory boundaries coincide, which is generally better but can mean a little awkwardness in this one case. (In Java, a module is an organization level above a package and is a bit of a different case, but the same reasoning applies; similarly for OSGi modules.) In rust, for practical purposes, every file is a separate module.

What we collectively need to develop, though, is an effective set of conventions for documenting module boundaries. Automatic documentation descended from Javadoc and its ilk, like doxygen, tends to focus on the class and can overwhelm with detail.[2] It is suited to wide-interface classes and modules. The very traditional man page is actually well-suited to this and has (UNIX/Linux) system support, but is generally viewed as old-fashioned and requires formatting in *roff.

[2] True literate programming, like Web and CWeb, has no problem with this, because the structure of the document is defined by the writer: it generates source code files, rather than being generated by source code files. Unfortunately, it's not widespread and also requires an additional set of skills for maintainers.

Discursive writing in free-form in Word or OpenOffice or Google Docs can do anything but it does not play well with source code control systems and thus becomes easily decoupled from the source code it is documenting. Free-form writing in LaTeX lives happily with source code and is highly flexible, but requires additional skills. Free-form writing in HTML is a bit like using crayons. Discursive writing on Confluence ties you in to one model (although it can be integrated with other Atlassian tools) and is difficult to export.

And in any case free-form documentation is merely a mechanism: without standards relating to what should be documented, under what headings, and at what levels of detail it is not an effective mechanism for sharing data. This is true, doubled, redoubled, and in spades when one takes into account the fact that in general developers do not enjoy doing documentation.

Documentation at the module level should also have a similar form and content across languages. An approach that works only on Java or C++ or go or Python is of limited general use. At a module level the language of implementation is a subsidiary detail, although an important one.[3] One can imagine easily, in a prototype-to-production development model, having a module in Python which gets converted to C++ with a thin wrapper around it for performance reasons.

[3] If communication across module boundaries is in practice done by message-passing, then documentation would be almost identical regardless of language. If communication is heavily tied to dependency injection, the documentation for use (as opposed to maintenance) may be very high-level indeed if that injection is done by a tool like Spring. (In C++ explicit dependency injection requires more specification, but that can and generally should be expressed in terms of interfaces.)

Likewise, automatic generation of documentation should not simply wipe out an earlier version. This would require that discursive text be shoehorned into a source code structure to which it is poorly suited so that it would not be lost. At the same time, it would be desirable to at least have the option of flagging, in discursive text, references which code changes have rendered out-of-date.

Given the incomplete state of support for modules among major C++ compilers, it is also desirable to be able to define a module at a level independent of source code. This would lack compiler support but would allow for greater flexibility in application to older codebases, and would also allow for the effective generation of specifications for actual modules once they are introduced.

Most of the above considerations point towards the use of an XML schema which can be used to control the generation of documentation and allow tagging to support (e.g.) flagging of changes to a package when referenced in discursive text. Generation of TeX, RTF, HTML, or other formats would allow for variable types of presentation in easily readable form.

Search This Blog

C++ Development: The Breviary Project

Boundaries

Comments

Post a Comment

Popular posts from this blog

Overview

Considerations on an Optimization