LT Project: Refactoring Field Formatting in GTK

I was going to move on to the details involving formatting review fields, but I realized I had to deal with another matter first: how the individual fields are formatted.

The call is:

void NoStreamFieldFormatter::format(const LtLibrary::ELibraryRecord inRec,

                                    const std::string &inValue) const

{

  if ((inRec == LtLibrary::ELibraryRecord::Sort_Character)

      || (inRec == LtLibrary::ELibraryRecord::Book_Id)

      || (inRec == LtLibrary::ELibraryRecord::Work_id)

      || ((inRec == LtLibrary::ELibraryRecord::ISBN) && inValue == "[]"s))

    return;

  gtk_text_buffer_get_end_iter(m_buffer, &m_iter);

  if ((inRec == LtLibrary::ELibraryRecord::Private_Comment)

      && (inValue.length() > 3) && (inValue.substr(0, 3) == "Box"s))

    {

      insertCategory("Archived in");

      insertPlainText(inValue + "\n");

    }

  else

    {

      insertCategory(IdToCategoryName(inRec));

      if (((inRec == LtLibrary::ELibraryRecord::Subjects)

           || (inRec == LtLibrary::ELibraryRecord::Dewey_Wording))

          && (inValue.length() > 120))

        {

  insertSmallText(inValue);

          insertPlainText("\n");

        }

      else if (inRec == LtLibrary::ELibraryRecord::Review)

        {

          formatReview(inValue);

        }

      else

        {

          insertPlainText(inValue + "\n");

        }

    }

}

This is essentially a somewhat messy disguised switch statement, and could be reformatted that way, with a bit of tweaking to handle the common logic which would need to be repeated.  There are eight different field types referenced.

General Discussion

Consider the following function;

void foo(const Bar& inVal)

{

    preprocess(inVal);

    process(inVal, m_data);

 }

Where the class Bar supports the method

EType getType() const;

(EType being an enum) and process might be a free function or a member function.

Now assume that a little later our requirements are expanded, and we need special processing, in this context only, for a given type 

We now have:

void foo(const Bar& inVal)

{

    preprocess(inVal)

    if (inVal.getType() == EType::Special)

        process(transform(inVal), m_data);

    else

        process(inVal, m_data);

 }

There's probably not a major reason to change the dispatch method, although we could wrap the more complex one with a convenient alias:

void foo(const Bar& inVal)

{

    preprocess(inVal);

    if (inVal.getType() == EType::Special)

        processSpecial(inVal, m_data);

    else

        process(inVal, m_data);

 }

Three months later, another variant requirement: this one doesn't require preprocessing:

void foo(const Bar& inVal)

{

    if (inVal.getType() == EType::ExtraSpecial)

        process ExtraSpecial(inVal);

    else

    {

        preprocess(inVal);

        if (inVal.getType() == EType::Special)

            processSpecial(inVal, m_data);

        else

            process(inVal, m_data);

    }

 }

This is starting to look ugly. Also, if it's been touched several times, odds are it's going to be touched again soon.  Let's neaten it up a bit:

void foo(const Bar& inVal)

{

    switch (inVal.getType())

    {

        using enum EType;

        case  ExtraSpecial:

            process ExtraSpecial(inVal);

            break;

        case Special:

            preprocess(inVal);

            processSpecial(inVal, m_data);

            break;

        default:

            preprocess(inVal);

            process(inVal, m_data);

    break;

    }

 }

Now for the question: is that neat enough? (Subsidiary question: we're repeating that preprocess() call. Should we avoid that?)

To rephrase: at some point we may have twenty branches to that switch statement. At what point do we stop using switch and turn to a command pattern?

The answer is presumably not "as soon as we get two variants".  It is also, surely, well before we get to twenty variants. The earlier we convert over, the less overall work, but the smaller the number of branches, the more textual overhead there is in setting up the dispatching for the commands relative to the body of code being executed via the commands.

Another answer is "when the function length approaches a full screen", which means 24 lines even if you have a large monitor. The function as it stands is 17 lines, and would gain three to four lines per switch branch. That would imply handling four special cases.

That justification can also run into the problem that setting up a command model for a lot of special cases can take more lines than executing it via the switch statement.  It takes more text to write:

 m_actions.insert({ LtLibrary::ELibraryRecord::Dewey_Wording,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       formatLongRecord("Dewey Wording", inValue);

                     } });

than it does to write:

case Dewey_Wording:

    formatLongRecord("Dewey Wording", inValue);

and if you're setting up all the commands at once the setup function will be longer than the function with the switch statement. Note, though, that the switch statement is one indivisible statement whereas the setup calls can, themselves, be subdivided into functional groups.

More to the point, the general principle is that it's better to move complexity into setup (constructors) and out of processing.  This principle is implicit both in dependency injection and functional programming with pure functions, where data and classes or function objects have to be set up before any real processing takes place.

Yet another answer, to which I am partial, is the general principle that "when you have to deal with the same problem three times it's time to provide a general solution". (Usually this is cited as when to extract a pattern into a function.)

Either way, it's arguable that shifting to use a command pattern is usually desirable relatively early on.

This doesn't mean that switch statements have become a niche construct.  This is only talking about dispatching functionality.  A switch statement of the form

std::string val;

switch (var) {

case 1:

    val = "foo";

case 2:

    val = "bar";

...

};

/// Do something involving val

is not suitable for a command pattern, though it might be extracted to its own function.

Lambdas or Classes?

There are two ways, at a high level, of implementing a command pattern.

One is via classical inheritance: we create an abstract base class, e.g.

class IRecordFormatter {

 public:

    virtual ~IRecordFormatter();

    virtual void process(const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) = 0;

};

and then create a map:

std::map<LtLibrary::ELibraryRecord, std::unique_ptr<IRecordFormatter>> m_actions;

to which we add objects of subclasses:

class ISBNFormatter: public IRecordFormatter {

public:

    ISBNFormatter(NoStreamFieldFormatter* inParent): m_parent(inParent) { }

   ~ISBNFormatter() override;

   void process(const LtLibrary::ELibraryRecord inRec,

       const std::string &inValue) override

   {

        if (inValue != "[]"s)

             m_parent->formatNormalRecord("ISBN", inValue);

    }

private:

    NoStreamFieldFormatter* m_parent;

};

m_actions.insert({LtLibrary::ELibraryRecord::ISBN, std::make_unique<ISBNFormatter>(this) });

Our switch call then becomes one line; two or three if we have to handle some notional "not found" default.

The other approach is to use std::function objects:

  std::map<LtLibrary::ELibraryRecord,

           std::function<void(const LtLibrary::ELibraryRecord inRec,

                              const std::string &inValue)>>

      m_actions;

This allows us to use objects as above (though now not stored as pointers):

m_actions.insert({LtLibrary::ELibraryRecord::ISBN, ISBNFormatter(this) });

(although we would add a operator() wrapping the process() call) or , more generally, to use lambdas:

m_actions.insert({ LtLibrary::ELibraryRecord::ISBN,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       if (inValue != "[]"s)

                         formatNormalRecord("ISBN", inValue);

                     } });

The lambda has the advantage that the capture capability allows the handle to m_parent to be implicit.  It has the disadvantage that to be declared simply, as above, it has to be inside the scope of the parent class (though there are ways of managing this: a factory can have a reference to the parent and the reference can be captured explicitly, in which case it looks more like the class above).  The lambda has less syntactic overhead overall, although the complexity of the class model is easier to break down into small chunks (e.g. by separating declarations and definitions).

The interface classes have other advantages: they can factor out functionality, either into subfunctions or via inheritance.  They can be defined on their own, and tested as full-function classes.  If the parent class itself is defined in terms of an interface, they can depend on the interface rather than on the concrete parent class.

A class can more neatly move functionality out of the parent class, if it is called only in that context.  This is mainly a conceptual move -- it saves no lines of code, typically, but it is still conceptually neater.  This results in a smaller parent class, which is also a good thing in itself.

It's fairly clear that the function model, which supports both lambdas and custom classes is the way to go.  The only thing to be careful about is that your custom classes should support copy/move; the map of unique pointers has no such restriction.

Field Formatting

We are at eight or so distinct types for the format() call -- well into time to look at a command model.  In addition, there's that block of logic still to be looked at for reviews which can be moved out of the NoStreamFieldFormatter class if we use a class rather than a lambda to handle reviews.

Let's start by adding a map of actions:

  std::map<LtLibrary::ELibraryRecord,

           std::function<void(const LtLibrary::ELibraryRecord inRec,

                              const std::string &inValue)>>

      m_actions;

We do a bit of encapsulation of repeated blocks of code to make them simpler to call:

  void insertCategory(const std::string &inTitle) const override;

  void insertPlainText(const std::string &inValue) const override;

  void insertSmallText(const std::string &inValue) const override;

  void formatLongRecord(const std::string &inTopic,

                        const std::string &inValue);

  void formatNormalRecord(const std::string &inTopic,

                          const std::string &inValue);

The top three, for reasons we will get to in a moment, inherit from a new parent class:

class IRecordFieldSink

{

public:

  virtual ~IRecordFieldSink();

  virtual void insertCategory(const std::string &inTitle) const = 0;

  virtual void insertPlainText(const std::string &inValue) const = 0;

  virtual void insertSmallText(const std::string &inValue) const = 0;

};

We have three types in the old function which should have pure no-ops:

  if ((inRec == LtLibrary::ELibraryRecord::Sort_Character)

      || (inRec == LtLibrary::ELibraryRecord::Book_Id)

      || (inRec == LtLibrary::ELibraryRecord::Work_id)

      || ((inRec == LtLibrary::ELibraryRecord::ISBN) && inValue == "[]"s))

    return;

ISBN isn't a no-op, and gets a special handler, which will act as a base example:

  m_actions.insert({ LtLibrary::ELibraryRecord::ISBN,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       if (inValue != "[]"s)

                         formatNormalRecord("ISBN", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::ISBN);

added is a local set of values which have special handlers.  It's used later in setting the default handlers.

std::unordered_set<LtLibrary::ELibraryRecord> added;

We can encapsulate the addition of the no-op handlers:

void NoStreamFieldFormatter::initializeActionsWithNoOps(

    std::unordered_set<LtLibrary::ELibraryRecord> &outAdded)

{

  auto l = [&](const LtLibrary::ELibraryRecord inRecord) {

    m_actions.insert({ inRecord, [](const LtLibrary::ELibraryRecord inRec,

                                    const std::string &inValue) {} });

    outAdded.insert(inRecord);

  };

  l(LtLibrary::ELibraryRecord::Sort_Character);

  l(LtLibrary::ELibraryRecord::Book_Id);

  l(LtLibrary::ELibraryRecord::Work_id);

}

The next several special cases are straightforward:

  m_actions.insert({ LtLibrary::ELibraryRecord::Private_Comment,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       if (inValue.find("Box"s) == 0)

                         formatNormalRecord("Archived in", inValue);

                       else

                         formatNormalRecord("Private Comment", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::Private_Comment);


  m_actions.insert({ LtLibrary::ELibraryRecord::Subjects,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       formatLongRecord("Subjects", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::Subjects);


  m_actions.insert({ LtLibrary::ELibraryRecord::Dewey_Wording,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       formatLongRecord("Dewey Wording", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::Dewey_Wording);

If we were using classes, here, we could declare a base class with logic for formatLongRecord() and a member field for the field name and pass it in for the two variants using formatLongRecord().  However, aside from moving the short function formatLongRecord()

void NoStreamFieldFormatter::formatLongRecord(const std::string &inTopic,

                                              const std::string &inValue)

{

  gtk_text_buffer_get_end_iter(m_buffer, &m_iter);

  insertCategory(inTopic);

  if (inValue.length() > 120)

    {

      insertSmallText(inValue);

      insertPlainText("\n");

    }

  else

    insertPlainText(inValue + "\n");

}

out of the general class, there's little benefit to this; we've already handled the duplication of logic by making it a single function call.  And the formatLongRecord() call would remain tightly tied to the parent class because of the four other member functions it calls.

We do use a class for the review formatting.  The implementation will be for the next post, but the interface is simple:

class ReviewFormatter

{

public:

  ReviewFormatter(IRecordFieldSink *inSink, GtkTextBuffer *inBuffer,

                  GtkTextIter *inIter):

      m_sink(inSink),

      m_buffer(inBuffer), m_iter(inIter)

  { }

  void operator()(const LtLibrary::ELibraryRecord inRec,

                  const std::string &inValue)

  {

    formatReview(inValue);

  }

private:

  void formatReview(const std::string &inValue) const;

  IRecordFieldSink *m_sink;

  GtkTextBuffer *m_buffer;

  GtkTextIter *m_iter;

};

and we have:

  m_actions.insert({ LtLibrary::ELibraryRecord::Review,

                     ReviewFormatter{ this, m_buffer, &m_iter } });

  added.insert(LtLibrary::ELibraryRecord::Review);

formatReview() is essentially the old formatReview() call from NoStreamFieldFormatter moved to the new location (and deleted from the old class). Note that this is where we use the interface we provided for the formatting functions.

Most keys get the default handler.  Here's the logic for adding it:

  std::ranges::for_each(

      std::ranges::iota_view{

          0, static_cast<int>(LtLibrary::ELibraryRecord::Library_Record_None) }

          | std::views::filter([&added](const int inVal) {

              return !added.contains(

                  static_cast<LtLibrary::ELibraryRecord>(inVal));

            }),

      [&](const int inVal) {

        m_actions.insert({ static_cast<LtLibrary::ELibraryRecord>(inVal),

                           [&](const LtLibrary::ELibraryRecord inRec,

                               const std::string &inValue) {

                             formatNormalRecord(IdToCategoryName(inRec),

                                                inValue);

                           } });

      });

We use an iota_view rather than just iterating over the numbers because it allows us to compound the view with a filter (using the added object) to remove the values already handled.

(This is an extreme case of the actions model being longer.  The other alternative is not to insert default handlers and have the calling function handle a not found result.  Instead of

void NoStreamFieldFormatter::format(const LtLibrary::ELibraryRecord inRec,

                                    const std::string &inValue) const

{

  m_actions.find(inRec)->second(inRec, inValue);

}

we would need:

void NoStreamFieldFormatter::format(const LtLibrary::ELibraryRecord inRec,

                                    const std::string &inValue) const

{

    auto action = m_actions.find(inRec);

    if (action == m_actions.end())

        formatNormalRecord(IdToCategoryName(inRec), inValue);

    else

        action->second(inRec, inValue);

}

But I can't see any reason for doing this other than being uncomfortable with the relative complexity of the insert for_each call.  As it stands, the execution is slightly faster and the call site is cleaner.)

Finally, we have a special handler for the sentinel value in the for_each loop, which is a special case anyway, and needs to be handled to ensure that all values of the enum are covered:

  m_actions.insert({ LtLibrary::ELibraryRecord::Library_Record_None,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       gtk_text_buffer_get_end_iter(m_buffer, &m_iter);

                       insertPlainText("No record found\n");

                     } });

The class constructor for the NoStreamFieldFormatter class is now much longer:

NoStreamFieldFormatter::NoStreamFieldFormatter(GtkTextBuffer *inBuffer):

    m_buffer(inBuffer),

    m_inserter(&NormalBibliographicTitleInserter::theInserter)

{

 std::unordered_set<LtLibrary::ELibraryRecord> added;

  initializeActionsWithNoOps(added);


  m_actions.insert({ LtLibrary::ELibraryRecord::ISBN,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       if (inValue != "[]"s)

                         formatNormalRecord("ISBN", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::ISBN);


  m_actions.insert({ LtLibrary::ELibraryRecord::Private_Comment,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       if (inValue.find("Box"s) == 0)

                         formatNormalRecord("Archived in", inValue);

                       else

                         formatNormalRecord("Private Comment", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::Private_Comment);


  m_actions.insert({ LtLibrary::ELibraryRecord::Subjects,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       formatLongRecord("Subjects", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::Subjects);


  m_actions.insert({ LtLibrary::ELibraryRecord::Dewey_Wording,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       formatLongRecord("Dewey Wording", inValue);

                     } });

  added.insert(LtLibrary::ELibraryRecord::Dewey_Wording);


  m_actions.insert({ LtLibrary::ELibraryRecord::Review,

                     ReviewFormatter{ this, m_buffer, &m_iter } });

  added.insert(LtLibrary::ELibraryRecord::Review);


  std::ranges::for_each(

      std::ranges::iota_view{

          0, static_cast<int>(LtLibrary::ELibraryRecord::Library_Record_None) }

          | std::views::filter([&added](const int inVal) {

              return !added.contains(

                  static_cast<LtLibrary::ELibraryRecord>(inVal));

            }),

      [&](const int inVal) {

        m_actions.insert({ static_cast<LtLibrary::ELibraryRecord>(inVal),

                           [&](const LtLibrary::ELibraryRecord inRec,

                               const std::string &inValue) {

                             formatNormalRecord(IdToCategoryName(inRec),

                                                inValue);

                           } });

      });


  m_actions.insert({ LtLibrary::ELibraryRecord::Library_Record_None,

                     [&](const LtLibrary::ELibraryRecord inRec,

                         const std::string &inValue) {

                       gtk_text_buffer_get_end_iter(m_buffer, &m_iter);

                       insertPlainText("No record found\n");

                     } });

}

but, as seen above, the execution of the frequently-called format() call is much simpler, and we've also, in passing, moved the special review formatting functionality out of the parent class and into a specialized assistant.  The logic for that assistant, in detail, will be the next post (I promise).

Comments

Popular posts from this blog

Boundaries

State Machines

Considerations on an Optimization