LT Project: Adding Fields

I generally described the downloaded record from LibraryThing as a set of tab=separated fields in a line record format. To control handling the fields we provide an enum which has descriptors for the fields in the order in which they are downloaded:

enum class ELibraryRecord {

Book_Id = 0,

Title,

Sort_Character,

Primary_Author,

Primary_Author_Role,

Secondary_Author,

Secondary_Author_Roles,

Publication,

Date,

Review,

Rating,

Comment,

Private_Comment,

Summary,

Media,

Physical_Description,

Weight,

Height,

Thickness,

Length,

Dimensions,

Page_Count,

LCCN,

Acquired,

Date_Started,

Date_Read,

Barcode,

BCID,

Tags,

Collections,

Languages,

Original_Languages,

LC_Classification,

ISBN,

ISBNs,

Subjects,

Dewey_Decimal,

Dewey_Wording,

Other_Call_Number,

Copies,

Source,

Entry_Date,

From_Where,

OCLC,

Work_id,

Lending_Patron,

Lending_Status,

Lending_Start,

Lending_End,

Library_Record_Size,

Library_Record_None

};

There's a convenience class for incrementing the values:

inline ELibraryRecord increment(const ELibraryRecord inRec)

{

if (inRec == ELibraryRecord::Library_Record_None)

return ELibraryRecord::Library_Record_None;

else

return static_cast<ELibraryRecord>(static_cast<int>(inRec) + 1);

}

An entire record looks like:

211176944 Embracing Modern C++ Safely 1 Lakos, John Author Romeo, Vittorio|Khlebnikov, Rostislav|Meredith, Alisdair Author|Author|Author Addison-Wesley Professional (2021), Edition: 1, 1376 pages 2021 Embracing Modern C++ Safely by John Lakos (2021) Paperback 1376 p.; 9.15 inches 38.205 pounds 9.15 inches 2 inches 7.4 inches 9.15 x 7.4 x 2 inches 1376 C++, Programming Your library, Willowdale, To Review English English [0137380356] 0137380356, 9780137380350 005.133 Computer programming, programs, data, security > Computing and Information > General Programming Languages > Information > Languages > Programming 1 amazon.com books [2022-01-16] Amazon.ca 27597342

The key functionality for processing the field records, controlled by the enum, is the IFieldAdder interface:

class IFieldAdder {

public:

using storage_type=std::map<ELibraryRecord, std::string>;

virtual ~IFieldAdder();

virtual void addField(storage_type &outStorage, ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const = 0;

virtual bool processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const = 0;

};

The using-declaration is referenced throughout the code dealing with generalized field storage.

There's a NullObject version:

class NullFieldAdder : public IFieldAdder

{

public:

~NullFieldAdder() override;

void addField(storage_type &outStorage, ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{ }

bool processField(ISettableAuthorContainer &outKey, const ELibraryRecord inType,

const std::string &inVal) const override

{

return true;

}

};

The simplest implementation, used for most fields, is the GeneralFieldAdder:

class GeneralFieldAdder : public IFieldAdder

{

public:

~GeneralFieldAdder() override {}

void addField(storage_type &outStorage,

ISettableAuthorContainer &outKey,

const LtLibrary::ELibraryRecord inType,

const std::string &inVal) const override

{

outStorage.emplace(inType, inVal);

}

bool processField(LtLibrary::ISettableAuthorContainer &outKey,

const LtLibrary::ELibraryRecord inType,

const std::string &inVal) const override

{

return true;

}

};

This just stores the field by enum id in the map, and there is no special processing associated with the field when it is encountered.

The AuthorFieldAdder is the simplest version which does special processing. This stores the field as a general value -- it's what will be used for output -- but also sets the associated Author field, which provides additional functionality (it was described a few posts ago).

class AuthorFieldAdder : public GeneralFieldAdder

{

public:

~AuthorFieldAdder() override {}

void addField(storage_type &outStorage,

ISettableAuthorContainer &outKey,

const LtLibrary::ELibraryRecord inType,

const std::string &inVal) const override

{

GeneralFieldAdder::addField(outStorage, outKey, inType, inVal);

processField(outKey, inType, inVal);

}

bool processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{

outKey.setAuthor(inVal);

return true;

}

};

For the Primary_Author_Role field we don't need to store the value "Author", which is taken as the default, so we have a special handler for it:

class AuthorRoleFieldAdder : public LtLibrary::IFieldAdder

{

public:

~AuthorRoleFieldAdder() override {}

void addField(storage_type &outStorage,

ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{

if (inVal != "Author"s)

outStorage.emplace(inType, inVal);

}

bool processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{

return true;

}

};

In some cases irregularities in the data require special handling on storage. The LT records have two successive fields, ISBN and ISBNs. The value of the first is frequently enclosed in square brackets, and the value of the second usually included the first, e.g.

[0765348276] 0765348276, 9780765348272

We want to parse away the brackets and eliminate the duplicated information.

So for the ISBN field we have:

class ISBNFieldAdder : public IFieldAdder

{

public:

~ISBNFieldAdder() override {}

void

addField(storage_type &outStorage,

ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{

if (!inVal.empty() && (inVal != "[]"s))

{

if (inVal[0] == '[')

{

auto s = inVal.substr(1);

auto offset = s.find(']');

if (offset != std::string::npos)

outStorage.emplace(inType, s.substr(0,offset));

else

outStorage.emplace(inType, s);

}

else

outStorage.emplace(inType, inVal);

}

(Yes, I could use inVal.starts_with("[") but doing a single-character comparison is just as clear and should be more efficient.)

bool processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{

return true;

}

};

and for the ISBNs field we have:

class ISBNsFieldAdder : public IFieldAdder

{

public:

~ISBNsFieldAdder() override {}

void addField(storage_type &outStorage,

ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{

storage_type::iterator iter

= outStorage.find(ELibraryRecord::ISBN);

if (inVal.find(iter->second) != std::string::npos)

outStorage.erase(iter);

outStorage.emplace(inType, inVal);

}

bool processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override

{

return true;

}

};

The most interesting field adder is the SortFieldAdder. This picks up the Sort_Character field (which isn't a character but an offset into the title) and needs to store it into the record class but *not* into the data for printing out. So it gets a reference to the overall record which is reset on parsing each record, and the handling is adjusted accordingly:

class SortFieldAdder : public IFieldAdder

{

class RefHolder

{

public:

RefHolder() : m_val(s_Default) {}

RefHolder(IMinimalLibraryBookRecord &inVal) : m_val(inVal) {}

void set(const int inVal)

{

m_val.setTitleSortOffset(inVal);

}

private:

IMinimalLibraryBookRecord &m_val;

inline static NullMinimalLibraryBookRecord s_Default;

};

public:

SortFieldAdder() : m_offset(std::make_unique<RefHolder>()) {}

~SortFieldAdder() override;

void registerReference(IMinimalLibraryBookRecord &inRef) const

{

m_offset = std::make_unique<RefHolder>(inRef);

}

void clearReference() const

{

m_offset = std::make_unique<RefHolder>();

}

void addField(storage_type &outStorage, ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override;

bool processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const;

private:

mutable std::unique_ptr<RefHolder> m_offset;

};

void SortFieldAdder::addField([[maybe_unused]] storage_type &outStorage,

ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const

{

processField(outKey, inType, inVal);

}

bool SortFieldAdder::processField([[maybe_unused]] ISettableAuthorContainer &outKey,

[[maybe_unused]] const ELibraryRecord inType,

const std::string &inVal) const

{

int offset = 0;

if (!inVal.empty())

{

offset = std::atoi(inVal.c_str());

if (offset > 0)

--offset;

}

m_offsetHolder->set(offset);

return true;

}

Irregularities in the data are treated as an offset of 0. We use std::atoi() rather than std::stoi() because the former does not throw an exception, and a return value of 0 is exactly what we want in the case of irregular data.

Field Adder Sets

Field Adders operate on records as a set. The set itself is a Composite pattern based on the IFieldAdder but with a few extra calls:

class IFieldAdderSet : public IFieldAdder {

public:

virtual ~IFieldAdderSet();

virtual void setSortOffset(IMinimalLibraryBookRecord &inVal) const = 0;

virtual void resetSortOffset() const = 0;

virtual bool isBibliographic() const noexcept = 0;

};

The standard implementation is for instances where we want to display all the relevant data, or a large subset of it. Some fields are either not generally populated by the LT site or are unwanted, so we have mechanisms to suppress them.

class FieldAdderSet : public IFieldAdderSet, public boost::noncopyable

{

using adder_storage_type

= std::map<ELibraryRecord, std::unique_ptr<IFieldAdder>>;

public:

FieldAdderSet(const bool inUseShortForms);

~FieldAdderSet() override;

void addField(storage_type &outStorage, ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override;

bool processField(ISettableAuthorContainer &outKey, const ELibraryRecord inType,

const std::string &inVal) const override;

void setSortOffset(IMinimalLibraryBookRecord &inVal) const override;

void resetSortOffset() const override;

bool isBibliographic() const noexcept override

{

return false;

}

private:

template <typename T>

void addIfNotSuppressed(const ELibraryRecord type,

const std::set<ELibraryRecord> *inSet)

{

if (inSet->contains(type))

m_adders.emplace(type, std::make_unique<NullFieldAdder>());

else

m_adders.emplace(type, std::make_unique<T>());

}

adder_storage_type m_adders;

// Tempting to make these static, but in fact only one instance of the

// class itself will be created, so we create once only in any case

std::set<ELibraryRecord> m_toSuppress

{ ELibraryRecord::Book_Id, ELibraryRecord::Work_id,

ELibraryRecord::Copies };

std::set<ELibraryRecord> m_toSuppressShortForm

{ ELibraryRecord::Book_Id, ELibraryRecord::Work_id,

ELibraryRecord::Copies, ELibraryRecord::Publication,

ELibraryRecord::Summary, ELibraryRecord::LC_Classification,

ELibraryRecord::Subjects, ELibraryRecord::Dewey_Decimal,

ELibraryRecord::Dewey_Wording, ELibraryRecord::Source,

ELibraryRecord::Entry_Date, ELibraryRecord::Physical_Description,

ELibraryRecord::Weight, ELibraryRecord::Height,

ELibraryRecord::Thickness, ELibraryRecord::Length,

ELibraryRecord::Dimensions, ELibraryRecord::ISBN,

ELibraryRecord::ISBNs, ELibraryRecord::Lending_Patron,

ELibraryRecord::Lending_Status, ELibraryRecord::Lending_Start,

ELibraryRecord::Lending_End };

};

The constructor handles the finicky per-field logic:

FieldAdderSet::FieldAdderSet(const bool inUseShortForms)

{

const std::set<ELibraryRecord> *suppressSet

= (inUseShortForms ? &m_toSuppressShortForm : &m_toSuppress);

std::ranges::for_each(

std::ranges::iota_view{

0, static_cast<int>(ELibraryRecord::Library_Record_Size) },

[&](const auto i) {

switch (ELibraryRecord type = static_cast<ELibraryRecord>(i); type)

{

using enum ELibraryRecord;

case Sort_Character:

m_adders.emplace(type, std::make_unique<SortFieldAdder>());

break;

case Primary_Author:

m_adders.emplace(type, std::make_unique<AuthorFieldAdder>());

break;

case Primary_Author_Role:

m_adders.emplace(type, std::make_unique<AuthorRoleFieldAdder>());

break;

case ISBN:

addIfNotSuppressed<ISBNFieldAdder>(type, suppressSet);

break;

case ISBNs:

addIfNotSuppressed<ISBNsFieldAdder>(type, suppressSet);

break;

default:

addIfNotSuppressed<GeneralFieldAdder>(type, suppressSet);

break;

}

});

}

Note that because addIfNotSuppressed() adds a NullObject Field Adder, there ends up being a processor for every field in a raw record.

It's an indicator of the sorts of differences modern C++ makes to look at what this would be like in C++03. First, there wouldn't be any unique_ptr so the storage type would have to be different (probably bare pointers with cleanup in the class destructor). Then the other facilities would mean that the constructor would look something like:

FieldAdderSet::FieldAdderSet(const bool inUseShortForms)

{

const std::set<ELibraryRecord> *suppressSet

= (inUseShortForms ? &m_toSuppressShortForm : &m_toSuppress);

class Handler {

public:

Handler(const std::set<ELibraryRecord>& inSuppressSet,

FieldAdderSet::adder_storage_type& inAdders):

m_suppressSet(inSuppressSet),

m_adders(inAdders)

{ }

void operator()(const int inVal) {

ELibraryRecord type = static_cast<ELibraryRecord>(inVal);

switch (type)

{

case Sort_Character:

m_adders.insert(std::make_pair<ELibraryRecord, IFieldAdder*>(type, new SortFieldAdder()));

break;

case Primary_Author:

m_adders.insert(std::make_pair<ELibraryRecord, IFieldAdder*>(type, new AuthorFieldAdder()));

break;

case Primary_Author_Role:

m_adders.insert(std::make_pair<ELibraryRecord, IFieldAdder*>(type, new AuthorRoleFieldAdder()));

break;

case ISBN:

addIfNotSuppressed<ISBNFieldAdder>(type, suppressSet);

break;

case ISBNs:

addIfNotSuppressed<ISBNsFieldAdder>(type, suppressSet);

break;

default:

addIfNotSuppressed<GeneralFieldAdder>(type, suppressSet);

break;

}

private:

const std::set<ELibraryRecord>& m_suppressSet;

FieldAdderSet::adder_storage_type& m_adders;

template <typename T> void addIfNotSuppressed(const ELibraryRecord type)

{

if (m_suppressSet.count(type) > 0)

m_adders.insert(std::make_pair<ELibraryRecord, IFieldAdder*>(type, new NullFieldAdder()));

else

m_adders.insert(std::make_pair<ELibraryRecord, IFieldAdder*>(type, new T()));

}

};

Handler h(*suppressSet, m_adders);

for( int i = 0; i != static_cast<int>(ELibraryRecord::Library_Record_Size); ++i)

h(i);

}

This does have the small advantage that addIfNotSuppressed() can migrate from the overall class to the local class, but otherwise it's less easy to comprehend and a little more brittle. It's not a dramatic difference but it's still there.

The two inherited processing functions implement typically for a Composite pattern:

void FieldAdderSet::addField(storage_type &outStorage,

ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const

{

if (adder_storage_type::const_iterator iter = m_adders.find(inType);

iter != m_adders.end())

iter->second->addField(outStorage, outKey, inType, inVal);

}

bool FieldAdderSet::processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const

{

if (adder_storage_type::const_iterator iter = m_adders.find(inType);

iter != m_adders.end())

iter->second->processField(outKey, inType, inVal);

return true;

}

The two specialized functions for resetting the sort adder reference are slightly ugly, but simple:

void FieldAdderSet::setSortOffset(IMinimalLibraryBookRecord &inVal) const

{

adder_storage_type::const_iterator iter

= m_adders.find(ELibraryRecord::Sort_Character);

if (iter != m_adders.end())

static_cast<const SortFieldAdder *>(iter->second.get())

->registerReference(inVal);

}

void FieldAdderSet::resetSortOffset() const

{

adder_storage_type::const_iterator iter

= m_adders.find(ELibraryRecord::Sort_Character);

if (iter != m_adders.end())

static_cast<const SortFieldAdder *>(iter->second.get())->clearReference();

}

The checks against the find() results are probably unnecessary -- the constructor should ensure that the object exists -- but is general good form, avoiding "can't happen" assumptions.

There is a Bibliographic version of the Field Adder Set, which makes use of far fewer fields. because it has a smaller number of fields, and avoids most of the special cases, its internal model is rather simpler:

class BibliographicFieldAdderSet : public IFieldAdderSet,

public boost::noncopyable

{

public:

BibliographicFieldAdderSet();

void addField(storage_type &outStorage, ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const override;

bool processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const;

void setSortOffset(IMinimalLibraryBookRecord &inVal) const override;

void resetSortOffset() const override;

bool isBibliographic() const noexcept override

{

return true;

}

private:

SortFieldAdder m_sortFieldAdder;

inline static std::set<ELibraryRecord> m_toInclude{

ELibraryRecord::Title,

ELibraryRecord::Primary_Author,

ELibraryRecord::Primary_Author_Role,

ELibraryRecord::Private_Comment,

ELibraryRecord::Media,

ELibraryRecord::Publication,

ELibraryRecord::Collections

};

There is no array of handlers here, so the addField() and processField() functions are a little less generic:

void BibliographicFieldAdderSet::addField(IFieldAdderSet::storage_type &outStorage,

ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const

{

if (processField(outKey, inType, inVal))

outStorage.insert(std::make_pair(inType, inVal));

}

bool BibliographicFieldAdderSet::processField(ISettableAuthorContainer &outKey,

const ELibraryRecord inType,

const std::string &inVal) const

{

if (m_toInclude.count(inType) > 0)

{

if (inType == ELibraryRecord::Primary_Author)

outKey.setAuthor(inVal);

return true;

}

else if (inType == ELibraryRecord::Sort_Character)

{

m_sortFieldAdder.processField(outKey, inType, inVal);

}

return false;

}

Because the SortFieldAdder is no longer part of a map, it can be referenced directly:

void BibliographicFieldAdderSet::setSortOffset(IMinimalLibraryBookRecord &inVal) const

{

m_sortFieldAdder.registerReference(inVal);

}

void BibliographicFieldAdderSet::resetSortOffset() const

{

m_sortFieldAdder.clearReference();

}

Search This Blog

C++ Development: The Breviary Project

LT Project: Adding Fields

Field Adder Sets

Comments

Post a Comment

Popular posts from this blog

Boundaries

State Machines

Considerations on an Optimization