LT Project: Searchable Sets

 There are two sets of traits which I wanted to be able to search flexibly on.  Tags, which are arbitrary names, and Collections, also arbitrary.  On the website, tags are used for simple attribute naming, while collections are used as a standard mode of seeing a group.  Tags will tend to be things like "Mystery Novels" and collections will be things like "To read" and "Your Library".  Intrinsically there isn't very much difference but because of the use made of them for display they tend to function very differently.

One thing that they have in common is that a given book can belong to more than one collection, and can have multiple tags.  So each book will have a set of tags and a set of collections, both normally non-empty, on which we want to search.  There are some subtle differences in practice: I normally want to search on all of a tag (because I have tags like "History", "Social History" and "History of Technology") but I usually can use part of a collection to search on ("Currently reading" is the only collection to have either word in it).

Still, they have a lot of commonality, so the idea of a searchable set applies to both, even though "matches" will mean slightly different things:

class ISearchableSet

{

public:

  virtual ~ISearchableSet();

  virtual bool matches(const std::string &inTag) const = 0;

};

This has some common implementation, using a GoF template method:

class CommonSearchableSet : public ISearchableSet

{

protected:

  using searchable_set_type = std::set<std::string>;

  ~CommonSearchableSet() override;

public:

  void addEncodedList(const std::string &inList);


protected:

  virtual void storeSingleRawField(const std::string &inField) = 0;

  bool matchesOnModifiedField(const std::string &inField) const;

  void store(const std::string &inVal);

private:

  searchable_set_type m_set;

};

This intermediate abstract class is the one which presents a mutable public call; once the sets have been determined, they publish only the single immutable call from the interface in most contexts.

matchesOnModifiedField() has the simple implementation one might expect, as does store(), which gets used by the child classes in their implementations of storeSingleRawField():

bool CommonSearchableSet::matchesOnModifiedField(const std::string &inField) const

{

  return m_set.contains(inField);

}

void CommonSearchableSet::store(const std::string &inVal)

{

  m_set.insert(inVal);

}

The public call addEncodedList(), which is the template method, is a little more elaborate:

void CommonSearchableSet::addEncodedList(const std::string &inList)

{

  const static boost::char_separator<char> commaSeparator(",");

  std::ranges::for_each(

      boost::tokenizer<boost::char_separator<char>>(inList, commaSeparator),

      [&](const auto &inVal) { storeSingleRawField(inVal); });

}

This is a fine example of how the more recent C++ functionality doesn't so much enable the ability to do something different, but do it more clearly and with less cruft in the code.  The C++03 version of the above would be:

void CommonSearchableSet::addEncodedList(const std::string &inList)

{

  const static boost::char_separator<char> commaSeparator(",");

  class Storer {

  public:

      Storer(CommonSearchableSet& inParent): m_parent(inParent) { }

      void operator()(const std::string& inVal) {

          m_parent.storeSingleRawField(inVal);

      }

  private:

      CommonSearchableSet& m_parent;

  };


  Storer storer(*this);

  boost::tokenizer<boost::char_separator<char> > tokens(inList, commaSeparator);

  std::for_each(tokens.begin(), tokens.end(), storer);

}

It's not impossibly ugly, but it takes twice as long to provide what is implicit in the lambda and the range version of for_each.

The TagSearchableSet simply defines the two necessary abstract functions:

class TagSearchableSet : public CommonSearchableSet {

public:

  ~TagSearchableSet() override;

  void storeSingleRawField(const std::string &inField) override;

  bool matches(const std::string& inTag) const override;

};

matches() is very straightforward:

bool TagSearchableSet::matches(const std::string& inTag) const {

  return matchesOnModifiedField(inTag);

}

Storing the data involves eliminating spaces, no more:

void TagSearchableSet::storeSingleRawField(const std::string &inField) {

  std::string modifiedName = boost::algorithm::trim_copy(inField);

  if (modifiedName.contains(" "))

    std::ranges::replace(modifiedName, ' ', '_');

  store(modifiedName);

}

The CollectionSearchableSet has the same interface:

class CollectionSearchableSet : public CommonSearchableSet {

public:

  ~CollectionSearchableSet() override;

  void storeSingleRawField(const std::string &inField) override;

  bool matches(const std::string& inTag) const override;

};

But its implementations reflect the use of lower case only and the storage of *parts* of the collection names -- parts longer than 3 characters (eliminating words like "to" or "but" which are not going to be used as significant search terms.

void CollectionSearchableSet::storeSingleRawField(const std::string &inField)

{

  const static boost::char_separator<char> sep(" ");

  std::ranges::for_each(

      boost::tokenizer<boost::char_separator<char>>(inField, sep)

          | std::views::filter(

              [](const auto &inVal) { return inVal.length() > 3; }),

      [&](const auto &inVal) {

        store(boost::algorithm::to_lower_copy(inVal));

      });

}

bool CollectionSearchableSet::matches(const std::string &inTag) const

{

  return matchesOnModifiedField(boost::algorithm::to_lower_copy(inTag));

}

The for_each loop above separates out the two concerns of filetering on length and storing the value in lower case by means of a C++20 filter_view.  In earlier versions the for_each lambda (or the custom functor) would have to mix the two concerns together in its body.

Comments

Popular posts from this blog

Boundaries

LT Project: Author

State Machines