LT Project: Secondary Author Set

I'm going to take a brief break from talking about the Breviary project and shift attention, temporarily, to my other project.

I've been a LibraryThing member for a number of years and have about 3,000 books catalogued there.  LibraryThing is a very nice resource, but it can be slow and some sorts of intersecting searches aren't possible using its interface.  However, it is possible to download one's library as a tab-delimited set of line records.  My LibraryThing project involved using that downloaded data to generate a flexible means of quickly searching and displaying them.

The data characteristics are in many ways quite different from the Breviary project.  There's a lot more searching and processing of longish record sets, no XML, and different use made of STL data.

Here I'm going to start with a very low-level class, the SecondaryAuthorSet.  LT stores secondary author names in one field of their record as pipe-separated fields, and roles in the next record as another set of pipe-separated fields.  I want to be able to assemble a single set of records; I also want to support not only the form of the name provided but that appropriate for a bibliographic record for a secondary author.  I also want to be able to retrieve multiple names corresponding to a single role (e.g. Editor).

The required (immutable) public interface is provided as follows:

class ISecondaryAuthorSet {

public:

  virtual ~ISecondaryAuthorSet();

  virtual std::vector<std::string>

  getSecondaryAuthorInfo(std::string_view inRole) const = 0;

  virtual bool empty() const = 0;

  virtual void process(

      std::function<void(const std::string &inName, const std::string &inRole)>

          inFunc) const = 0;

};

For the concrete class, we start by defining a storage type:

  using storage_type = std::vector<std::pair<std::string, std::string>>;

With C++11 use of auto this sort of definition isn't as useful as it once was, but here I have a need for it to define parameters for a subclass:

  class INameAdder

  {

  public:

    virtual ~INameAdder();

    virtual void addName(storage_type &outType, const std::string& inName) = 0;

    virtual INameAdder *clone() const = 0;

  };

This is used as a strategy to determine how names are stored when added to the set.

For ordinary purposes, we just store what gets passed in:

class StandardNameAdder : public LtLibrary::SecondaryAuthorSet::INameAdder

{

public:

  ~StandardNameAdder() override {}

  void addName(LtLibrary::SecondaryAuthorSet::storage_type &outType,

          const std::string& inName) override

  {

    outType.emplace_back(inName, ""s);

  }

  StandardNameAdder *

  clone() const override

  {

    return new StandardNameAdder();

  }

};


Note that the role will be passed in from a later LT field.

For a bibliographic record, we want to reorder the name:

class BibliographicNameAdder : public LtLibrary::SecondaryAuthorSet::INameAdder

{

public:

  ~BibliographicNameAdder() override {}


  void addName(LtLibrary::SecondaryAuthorSet::storage_type &outType,

          const std::string& inName) override

  {

    auto val = inName.find(","s);

    if (val == std::string::npos)

      outType.emplace_back(inName, ""s);

    else

      {

        auto origVal = val;

        ++val;

        while (std::isspace(inName[val]))

          ++val;

        outType.emplace_back(inName.substr(val)

                                 .append(" "s)

     .append(inName.substr(0, origVal)),

                             ""s);

      }

  }


  BibliographicNameAdder *

  clone() const override

  {

    return new BibliographicNameAdder();

  }

};


This converts Smith, John to John Smith; only the primary author in a bibliographic record has the last name first.

class SecondaryAuthorSet : public ISecondaryAuthorSet

{


public:

  using storage_type = std::vector<std::pair<std::string, std::string>>;


  class INameAdder

  {

  public:

    virtual ~INameAdder();


    virtual void addName(storage_type &outType, const std::string& inName) = 0;


    virtual INameAdder *clone() const = 0;

  };

  SecondaryAuthorSet(const bool inIsBibliographic);

  SecondaryAuthorSet(const SecondaryAuthorSet &inOther)

      : m_storage(inOther.m_storage), m_nameAdder(inOther.m_nameAdder->clone())

  { }

  ~SecondaryAuthorSet() {}

  SecondaryAuthorSet &

  operator=(const SecondaryAuthorSet &inOther)

  {

    if (this == &inOther)

      return *this;

    m_storage = inOther.m_storage;

    m_nameAdder.reset(inOther.m_nameAdder->clone());

    return *this;

  }

  SecondaryAuthorSet(SecondaryAuthorSet &&inOther) noexcept

      : m_storage(std::move(inOther.m_storage)),

        m_nameAdder(std::move(inOther.m_nameAdder))

  { }

  SecondaryAuthorSet &

  operator=(SecondaryAuthorSet &&inOther) noexcept

  {

    if (this == &inOther)

      return *this;

    m_storage = std::move(inOther.m_storage);

    m_nameAdder = std::move(inOther.m_nameAdder);

    return *this;

  }

  void addAuthorName(const std::string &inName);

  void addAuthorRole(const std::string &inName, const uint8_t inIndex);


  std::vector<std::string>

  getSecondaryAuthorInfo(std::string_view inRole) const override;

  bool empty() const override

  {

    return m_storage.empty();

  }


  void process(

      std::function<void(const std::string &inName, const std::string &inRole)>

          inFunc) const override;


private:

  storage_type m_storage;

  std::unique_ptr<INameAdder> m_nameAdder;

};

The NameAdder is stored in a std::unique_ptr but we want normal data semantics, so we have to create the standard five construction/assignment/delete functions.  Note that the concrete class has necessary modifying interface calls which are required because of the stepwise createion of the records, but that once it has been created, it will be accessed by its immutable interface.

The constructor just sets up the NameAdder based on the global configuration:

SecondaryAuthorSet::SecondaryAuthorSet(const bool inIsBibliographic)

    : m_nameAdder(inIsBibliographic

                      ? static_cast<INameAdder *>(new BibliographicNameAdder())

                      : static_cast<INameAdder *>(new StandardNameAdder()))

}

And addAuthorName() just makes use of the NameAdder.

void SecondaryAuthorSet::addAuthorName(const std::string &inName)

{

  m_nameAdder->addName(m_storage, inName);

}

addAuthorRole() depends on the fact that roles are stored in the same order as the names.  It does perform a basic sanity check, though:

void SecondaryAuthorSet::addAuthorRole(const std::string &inName,

                                       const uint8_t inIndex)

{

  if (m_storage.empty() || (inIndex >= m_storage.size()))

    return;

  m_storage[inIndex].second = inName;

}

The generic processing function just uses a range-based for loop:

void SecondaryAuthorSet::process(

    std::function<void(const std::string &inName, const std::string &inRole)>

        inFunc) const

{

  for (const auto &ref : m_storage)

    {

      inFunc(ref.first, ref.second);

    }

}


The only really interesting function is the one for getting names matching roles.

std::vector<std::string>

SecondaryAuthorSet::getSecondaryAuthorInfo(std::string_view inRole) const

{

  if ((m_storage.size() == 1) && (m_storage[0].second == inRole))

    return { m_storage[0].first };

  else if (m_storage.size() > 1)

    {

      auto l

= [&](const std::pair<std::string, std::string> &inVal) -> bool {

          return inVal.second == inRole;

        };

      if (std::ranges::any_of(m_storage, l))

{

  auto vview = std::views::filter(m_storage, l) | std::views::keys;

  return {vview.begin(), vview.end()};

}

    }

  return {};    

}

It's fairly frequent that there will be only one secondary author, so it makes sense to provide a specialization for that case. (It's even more frequent that there will be no data...) If there are two or more secondary authors, we define a lambda which can be used, first, as a test in std::any_of() and secondly as a test using the views filter_view capability, which itself gets handed to an auto variable so that it can be used twice in constructing a std::vector.

Comments

Popular posts from this blog

Boundaries

State Machines

Considerations on an Optimization