Considerations on an Optimization
An application – actually, successive pair of applications – is meant to process (in the end) about three hundred or so entries to convert a simple markup into complex LaTeX markup. It also enhances the data with information drawn from a flat-file text database (i.e. literally a set of .txt files in a single directory).
This application was originally written in rust, which is not my favourite language, but one worth exercising to a degree. The rust application covered a limited subdomain, and generally avoided unsafe rust code, although in some cases it had to use cell mechanisms which converted compile-time to runtime safety.
The initial iteration had the text database as one large file, with some logic allowing for extended seeking through chapters. It soon became apparent, after extending my dataset to about twenty or so items, that this was going to result in an unacceptable runtime — it was taking longer to process than the later TeX processing took, and although in theory this would be acceptable — the application, once finished, would not need to be run often — the user experience was not ideal, especially considering the use case of adding data to the source files, which means in some cases iteratively tweaking the source and running the application. So I converted it to a multi-file database, broken into smaller blocks, to reduce the seek times on the data. This dropped the processing time to a second or so, which was acceptable (the LaTeX processing was at that point the major limit on speed).
I then, for other reasons, ported the application to C++ in order to extend it. Partly this initial port was to confirm my suspicion regarding the languages — and yes, the resulting code was shorter and clearer than the rust code — but I was mainly concerned with extending the domain and I thought C++ a more flexible way to do this. (I also used the opportunity to implement the C++ code in CWEB as an exercise in more extended documentation.)
While I did the port, I did a number of things which were mainly driven by an aim to produce patterns which were easier to manage and extend that the original rust code. This mainly involved converting the iterated tests which rust’s match construct makes easy into Command pattern implementations using std::unordered_map and (where order was important) std::map. In one case I had to match the beginnings of lines in a parser and I used a map with searches between a lower bound and an upper bound to limit the number of comparisons needed to find the appropriate command to execute. (The parser calls take place in what is essentially the main loop of the application.) I was also able to make more extensive use of span and string_view classes where rust would have complained about passing ownership around between functions, and particularly between closures.
I expanded the scope of the base test set to represent the additional domain. The test set now includes about 50 items, or about one-sixth of the expected ultimate size. I’m still doing some regular file searching, though, and I contemplated reading the database files into memory for additional optimization by reducing seek times even further.
But then I decided to check the processing time. The application was now taking about 0.08 to 0.09 seconds to run. By comparison, the pdflatex processing for the largest of four output files is about 1.5 seconds. Even if the project expands to 500 items, ten times the current number, the runtime would still be (just) under a second. Note that this is an interactive user application (though it could certainly be run by a shell script in the background). The application’s runtime is essentially a rounding error compared to the formatting runtime.
I was sure that converting the DB lookup to in-memory would lead to some additional savings, but I could see no way of justifying the work, even just to myself. For practical human purposes, the current runtime is instantaneous, and it will be nearly so on the completion of the data set.
The application doesn’t have large setup costs, so it’s reasonable to consider the processing time per record to be a little less than 2/1000 of a second.
Context is important. In an ongoing transaction processing application — or at least in any such application I’ve ever worked with — that sort of processing time would be undesirable, and obvious steps to improve the latency would be in order. (Only five hundred transactions per second? In some transaction contexts (some FX environments, for example) that would be fine. In processing equities it’s nowhere near fast enough.)
If shifting to in-memory lookups would also simplify the code, they would still be in order. However, this is another place where the parsing logic would remain unchanged but the complexity of setup would increase, i.e. a change made purely for the purposes of optimization.
If the faster approach were likely to be of benefit to some other application in the future, it makes sense to put that capability in a library now, calibrate the differences, and deploy it here. I can’t see any future application which is likely to have need of the faster input mechanism, but it might be handy.
Does it make any difference that the core (but only the core) of the additional functionality has another use? I have a couple of test objects which store a fixed-size array in memory to mock file access by line. If I take one of them and convert the storage to a vector, and store the general class in a library, I have an implementation for the storage, and I can simplify the test classes by removing the repeated structural logic. (This means three library classes: the core implementation; a version designed to be used in tests, differing mainly by its constructor but also providing tracking of calls; and a version designed to be constructed from the contents of a file for the optimization functionality.)
Comments
Post a Comment