Convergence

This is an opinionated piece.

Consider:

fn handle_doubling_indicator(text: &str) -> (String, String) {

if text.len() == 7 {

match &text[6..] {

"A" => ("First Mass".to_string(), text[0..5].to_string()),

"B" => ("Second Mass".to_string(), text[0..5].to_string()),

"C" => ("Third Mass".to_string(), text[0..5].to_string()),

"D" => ("Alternative observance".to_string(), text[0..5].to_string()),

"E" => ("Alternative propers".to_string(), text[0..5].to_string()),

_ => (String::new(), text.to_string()),

}

} else {

(String::new(), text.to_string())

}

and compare:

std::pair<std::string, std::string>

handle_doubling_indicator(const std::string &text)

{

if (text.length() == 7)

{

switch (text[6])

{

case 'A':

return { "First Mass"s, text.substr(0, 5) };

case 'B':

return { "Second Mass"s, text.substr(0, 5) };

case 'C':

return { = "Third Mass"s, text.substr(0, 5) };

case 'D':

return { "Alternative observance"s, text.substr(0, 5) };

case 'E':

return { "Alternative propers"s, text.substr(0, 5) };

}

return { std::string(), text }

}

They implement the same logic. The major differences are that (1) the C++ code is more concise, slightly, (2) the requirements of rust's match expression means that the default return value has to be specified twice and (3) the fact that rust's underlying string model makes it easier to switch on a string slice value than a single byte value means that the C++ code is slightly more efficient. (There's a payment under the hood for using unicode strings in contexts where unicode isn't required.) But aside from that, they map fairly directly onto each other.

Both allocate and return pairs of string objects. We can do better: in C++:

std::pair<std::string_view, std::string_view>

handle_doubling_indicator(const std::string_view text)

{

if (text.length() == 7)

{

switch (text[6])

{

case 'A':

return { "First Mass"sv, text.substr(0, 5) };

case 'B':

return { "Second Mass"sv, text.substr(0, 5) };

case 'C':

return { "Third Mass"sv, text.substr(0, 5) };

case 'D':

return { "Alternative observance"sv, text.substr(0, 5) };

case 'E':

return { "Alternative propers"sv, text.substr(0, 5) };

}

return { ""sv, text }

}

You can do the same thing in rust, but the same observations apply:

fn handle_doubling_indicator(text: &str) -> (&str, &str) {

if text.len() == 7 {

match &text[6..] {

"A" => ("First Mass", &text[0..5]),

"B" => ("Second Mass", &text[0..5]),

"C" => ("Third Mass", &text[0..5]),

"D" => ("Alternative observance", &text[0..5]),

"E" => ("Alternative propers", &text[0..5]),

_ => ("", text),

}

} else {

("", text)

}

There's still that pesky reduplication in the return values in rust. If we get rid of returning values and pass in a function, we can do better:

fn process_doubling_indicator<F>(&self, text: &str, mut cl: F)

where

F: FnMut(&str, &str),

{

if text.len() == 7 {

if let Some(val) = self.special_types.get(text[6:]) {

Some(val) => cl(val, &text[0..5]);

}

void process_doubling_indicator(const std::string_view text, std::function<void(const std::string_view, const std::string_view)> func) {

if (text.length() != 7)

return;

if (auto iter = special_types.find(text.substr(6)); iter != special_types.end()) {

func(*iter, text.substr(0,5));

}

where special_types is a hash map with the appropriate values, and the function has become a member of a class which has done prior setup.

These differ considerably syntactically[1], but functionally they look pretty much similar.

[1] The designers of recent languages seem to be prone to what Fowler called "elegant variation" (it's not a compliment). The is really no reason in the world that the four preeminent OO systems programming languages should initialize variables in four different ways:

auto x = 7;

var x = 7;

x := 7

let x = 7;

We have languages requiring all if/else blocks to be in braces, and some requiring all if tests to be in parens, as alternatives.

There's no obvious better choice here. And there's always lisp in the background, taunting us with "all you really need is parens".

(let ((x 7)) .. )

There's an expression-based language for you. It makes rust look half-hearted.

(There's a rub: in the C++ code the developer can happily -- and safely -- manipulate the string views and assign them to values in the outer scope, because we know that the values remain a valid reference (text and special_types being valid in that scope). In rust, the compiler will tell you that the slice will go out of scope on leaving the closure, meaning that you have to copy the value to a string.)

In C++ the functional version is likely more efficient, because it avoids branching on the value in a switch statement and the hash map lookup reduces the number of required comparisons. In rust, you probably lose any efficiency gains because of the extra string allocations. However, in both cases the functional version is both more concise and clearer than the earlier version, and they're probably about equally clear and maintainable.

The affordances of the languages are similar. We have closures, string views/slices, ranges (not used here for the C++ code, but available), coroutines, safe(-ish) high level concurrency tools, expected return values (in C++23). The last few years have seen a lot of convergence between languages like rust, C++, and java. (C++ has yet to get an expression-based switch, which java added using rust-like syntax, but that's essentially syntactic sugar in Java, as opposed to rust).

If we really want to squeeze a bit more performance out of the C++ version, we can use an array as a jump table instead of a hash map:

void process_doubling_indicator(const std::string_view text, std::function<void(const std::string_view, const std::string_view)> func) {

if (text.length() != 7)

return;

if (auto ind = text[6] - 'A';

(ind >= 0)

&& (ind < special_types.size())) {

func(special_types[ind], text.substr(0,5));

}

In rust, we can do the same thing logically, but it's a bit more cumbersome:

fn process_doubling_indicator<F>(&self, text: &str, mut cl: F)

where

F: FnMut(&str, &str),

{

if text.len() == 7 {

let ind = (text.chars().nth(6).unwrap() as u32 - 'A' as u32) as usize;

if let Some(val) = self.special_types.get(ind) {

cl(val, &text[0..5]);

}

We can avoid explicit bounds checking in rust because the Option returned by get() handles it for us. However, the overhead of the Option check is probably more expensive than the C++ approach. (If we move to unsafe rust we can probably get the performance improvement by shifting the logic to match the C++ logic and using unsafe functions). Also, we jump through hoops to get the sixth character of the string in rust, and have to unwrap an additional option there as well.)

In both cases, the code is cleaner in the hash map version because we're doing what the language expects us to, rather than being clever with jump tables and converting chars to zero-based indices. Being clever costs in documentation and maintainability, unless you're in an environment where such an approach is a standard part of the idiomatic toolkit (as it sometimes is). And the more clever your code gets, the more you need tests addressing edge cases as well as core functionality.

Rust provides more safety, but at a cost (which is why there is a plethora of unsafe methods to get around the cost): and that cost is not simply a runtime cost, but a reduction in clarity. When std::expected is added to C++ it does not entail adding a set of new calls to standard library functions: there will be no std::vector version of at() returning an expected value to handle bounds-checking. (The philosophy of C++ is that you don't pay for what you don't use. If you want to write a vector wrapper providing such a function, go right ahead.)

In my own view rust tends towards wordier and thus less easily grasped code. Compare

let ind = (text.chars().nth(6).unwrap() as u32 - 'A' as u32) as usize;

and

auto ind = text[6] - 'A;

Yes, the need for explicit casts between integral types can, in some cases, be a protection (though not here). Yes, reserving the [] notation for slices (compare go, where it can be used to get characters as well as slices) makes things generally simpler in the sense that they mean only one thing. (Also, although it makes no difference here, with guaranteed-ASCII text, the choice between bytes() and chars() has to be taken into consideration, and is a driver of the wordier syntax.)

Of course, domains vary. If you're working in an environment where internationalization is important, rust's (and go's) support for Unicode out of the box is great. If you're working in an environment where ISO-Latin-1 is entirely adequate, it's a bit of a burden.

I'm using the above function in a context where the effective difference in efficiency is zero - no perceptible difference at the user end. The primary advantage of the functional version - in both languages - is that it lets you move a lot of picky detail out into setup, which is always a good thing, and gives you a simpler and more general call site.

The execution difference provided by the index-based approach is invisible. For general use, the map-based approach - in both languages - is the clearest and simplest at the call site, and most open to extension (because adding a new case requires no change at this location, only the constructor which sets up the map).

Modern C++ has converged as far as many basic language features go with the other popular systems languages, and lacks nothing in safety as long as one holds to RAII, "no bare pointers", and modern error handling. It has, to my mind, two great advantages: first, it is a multi-paradigm language, and secondly its syntax tends, on the whole, to be slightly sparer and cleaner than its rivals.[2]

[2]With the exception of go. But then go was imagined more as a "better C"; its weakness compared to C++ at a language level is that C++ is a richer language with more paradigms (no streams, weaker generics, less powerful inheritance). (As with almost all the other systems programming languages, it is richer in core library resources: C++ provides a very core set of resources and lets its users select what implementations they want to use for, e.g. wrapping TCP/IP calls).

Search This Blog

C++ Development: The Breviary Project

Convergence

Comments

Post a Comment

Popular posts from this blog

Boundaries

Overview

Considerations on an Optimization