LT Project: In-memory DB based Records

The RefLibraryBookRecord and BaseLibraryBookRecord should be discussed together.

The BaseLibraryBookRecord is a record designed for the in-memory database when one is used. It is not an implementation of the ILibraryBookRecord general interface: it has no specific interfaces for querying, printing, or searching the data it contains, although it does support a general-purpose processing call.

class BaseLibraryBookRecord

{

public:

explicit BaseLibraryBookRecord(const std::string &inRec);

constexpr bool

operator<(const BaseLibraryBookRecord &inRec) const

{

return this < &inRec;

}

ILibraryBookRecord *createRecord(const IFieldAdderSet &inAdder,

const bool inBreakOutCollections,

const bool inBreakoutTags) const;

constexpr int

getId() const

{

return m_bookId;

}

void process(std::function<void(const ELibraryRecord, const std::string &)>

inVal) const;

// Used in unit testing

void print(std::ostream &outStream) const;

private:

std::vector<std::pair<ELibraryRecord, std::string>> m_fields;

int m_bookId = 0;

};

Note that the storage type is a vector and not a map. As everything that happens to this happens via iteration, having a storage format with better iteration properties makes a difference. Even when referenced in the derived RefLibraryBookRecord, iteration dominates over searching.

The constructor looks rather like that for LibraryBookRecord, except that there is less done to special fields -- only the ID is extracted -- and the storage type is, of course different.

BaseLibraryBookRecord::BaseLibraryBookRecord(const std::string &inRec)

{

boost::char_separator<char> theSeparator("\t", "", boost::keep_empty_tokens);

ELibraryRecord type = ELibraryRecord::Book_Id;

boost::tokenizer<boost::char_separator<char>> tokens(inRec, theSeparator);

std::ranges::for_each(tokens, [&](const auto &inVal) {

if (!inVal.empty())

{

m_fields.push_back(std::make_pair(type, inVal));

if (type == ELibraryRecord::Book_Id)

{

m_bookId = std::stoi(inVal);

}

else if (type == ELibraryRecord::Primary_Author)

{

m_fields.push_back(std::make_pair(type, "Anonymous"s));

}

else if (type == ELibraryRecord::Primary_Author_Role)

{

m_fields.push_back(std::make_pair(type, "Author"s));

}

type = increment(type);

});

}

The two methods which tie this to the actual LibraryBookRecord types are the general process() function --

void BaseLibraryBookRecord::process(

std::function<void(const ELibraryRecord, const std::string &)> inFunc)

const

{

std::ranges::for_each(m_fields, [&inFunc](const auto& inVal) {

inFunc(inVal.first, inVal.second);

});

}

-- which is used in one constructor for LibraryBookRecord and the createRecord() function --

ILibraryBookRecord *

BaseLibraryBookRecord::createRecord(const IFieldAdderSet &inAdder,

const bool inBreakOutCollections,

const bool inBreakoutTags) const

{

auto rec = new RefLibraryBookRecord(

std::ranges::ref_view(m_fields), getId(), inAdder, inBreakOutCollections,

inBreakoutTags);

rec->setIsOnHeap(true);

return rec;

}

-- which returns a RefLibraryBookRecord allocated on the heap. (note that when an in-memory database is used, this is the *only* implementation of ILibnraryBookRecord which is used. Note that we take a ref_view; the RefLibraryBookRecord (as we shall see) takes ownership of the view. It would be possible to optimize the construct by filtering out records never used in the configuration of the record, but, to be honest, performance issues are not such as to make doing such a filtering step particularly helpful.

RefLibraryBookRecord is a template; it can hold any collection or view whose record type has two fields, ELibraryRecord, and std::string, and which support the std::tuple style of access (so any of std::pair, std::tuple, or a handcrafted struct would work). As written, it ends up holding a ref_view to the vector in the base record. As written, because it moves the argument in, it could also take an owning copy of the vector, but copying it would defeat the purpose. Holding a view in a class is a cheap way avoids any significant copy penalty.

template <typename T> class RefLibraryBookRecord : public ILibraryBookRecord

{

class IAllocationStrategy

{

public:

virtual ~IAllocationStrategy() {}

virtual RefLibraryBookRecord *

getHeapAllocatedCopy(RefLibraryBookRecord &inVal) const = 0;

};

public:

RefLibraryBookRecord(T &&inRec, const int inId,

const IFieldAdderSet &inAdder,

const bool inBreakOutCollections,

const bool inBreakoutTags)

: m_fields(std::move(inRec)),

m_secondaryData(inId, inAdder.isBibliographic()),

m_allocationStrategy(&s_StackStrategy)

{

AdderScoper scoper(inAdder, m_secondaryData);

std::ranges::for_each(m_fields, [&](auto const &inVal) {

auto [type, val] = inVal;

inAdder.processField(m_secondaryData, type, val);

m_secondaryData.expandEntry(inBreakOutCollections, inBreakoutTags, type,

val);

});

m_secondaryData.rationalizeTitleSortOffset();

}

void printField(const ELibraryRecord inRec,

IFormatterWrapper &inFormatter) const override

{

if (IsAuthorExtensionField(inRec))

return;

if (isUnderReconsideration())

inFormatter.markAsUnderReconsideration();

std::ranges::for_each(m_fields, [&](const auto &inKey) {

auto [type, val] = inKey;

if ((type == inRec) && !handleSpecialTypes(inFormatter, inRec, val))

inFormatter.format(inRec, val);

});

}

void printAll(IFormatterWrapper &inFormatter) const override

{

if (isUnderReconsideration())

inFormatter.markAsUnderReconsideration();

std::ranges::for_each(m_fields | std::views::filter([](const auto &inVal) {

return IsAuthorExtensionField(std::get<0>(inVal));

}),

[&](const auto &inRec) {

auto [type, val] = inRec;

if (!handleSpecialTypes(inFormatter, type, val))

inFormatter.format(type, val);

});

}

void printSome(IFormatterWrapper &inFormatter,

std::span<ELibraryRecord> inIncludes) const override

{

if (isUnderReconsideration())

inFormatter.markAsUnderReconsideration();

std::ranges::for_each(

m_fields | std::views::filter([&inIncludes](const auto &inVal) {

return std::ranges::any_of(inIncludes, [&](const auto &inField) {

return inField == std::get<0>(inVal);

});

}),

[&](const auto &inRec) {

auto [type, val] = inRec;

if (!handleSpecialTypes(inFormatter, type, val))

inFormatter.format(type, val);

});

}

bool matchesOnAuthor(std::string_view inName) const override

{

if (m_secondaryData.matchesOnAuthor(inName))

return true;

return std::ranges::any_of(m_fields, [&](const auto &inKey) {

auto [type, val] = inKey;

return (type == ELibraryRecord::Secondary_Author)

&& val.contains(inName);

});

}

Author getAuthor() const override

{

return m_secondaryData.getAuthor();

}

bool matchesOnCollection(const std::string &inName) const override

{

return m_secondaryData.matchesOnCollection(inName);

}

bool matchesOnTag(const std::string &inName) const override

{

return m_secondaryData.matchesOnTag(inName);

}

bool isArchived() const override

{

return std::ranges::any_of(m_fields, [](const auto &inKey) {

auto [type, val] = inKey;

return (type == ELibraryRecord::Private_Comment) && val.contains("Box");

});

}

bool isElectronic() const override

{

return std::ranges::any_of(m_fields, [](const auto &inKey) {

auto [type, val] = inKey;

return (type == ELibraryRecord::Media) && (val == "Ebook");

});

}

bool isUnderReconsideration() const override

{

return std::ranges::any_of(m_fields, [](const auto &inKey) {

auto [type, val] = inKey;

return (type == ELibraryRecord::Collections)

&& val.contains("Reconsideration");

});

}

bool isDeaccessioned() const override

{

return m_secondaryData.isDeaccessioned();

}

bool lessThan(const ILibraryBookRecord &inRec) const override

{

return m_secondaryData.lessThan(*this, inRec);

}

RefLibraryBookRecord * clone() const override

{

RefLibraryBookRecord *rval = new RefLibraryBookRecord(std::move(*this));

rval->setIsOnHeap(true);

return rval;

}

void addToCollection(ILibraryRecordSet &inSet) override

{

inSet.addToCollection(m_allocationStrategy->getHeapAllocatedCopy(*this));

}

std::string_view getTitle() const override

{

return m_secondaryData.getTitle();

}

void setIsOnHeap(const bool inVal)

{

if (inVal)

m_allocationStrategy = &s_HeapStrategy;

else

m_allocationStrategy = &s_StackStrategy;

}

constexpr int getId() const override

{

return m_secondaryData.getId();

}

int getTitleSortOffset() const override

{

return m_secondaryData.getTitleSortOffset();

}

void setTitleSortOffset(const int inVal) override

{

m_secondaryData.setTitleSortOffset(inVal);

}

static void SetBibliographicSort()

{

LibraryBookRecordSecondaryData::SetBibliographicSort();

}

static void SetTitleSort()

{

LibraryBookRecordSecondaryData::SetTitleSort();

}

std::string_view getAuthorLastName() const override

{

return m_secondaryData.getAuthorLastName();

}

private:

mutable T m_fields;

mutable LibraryBookRecordSecondaryData m_secondaryData;

IAllocationStrategy *m_allocationStrategy;

bool handleSpecialTypes(IFormatterWrapper &inFormatter,

const ELibraryRecord inType,

const std::string &inVal) const

{

if (inType == ELibraryRecord::Title)

{

inFormatter.formatTitle(inType, inVal,

m_secondaryData.getPostTitleContent());

return true;

}

else if (inType == ELibraryRecord::Primary_Author)

{

auto l = [&](const auto &ref) {

inFormatter.formatAuthor(m_secondaryData, inVal, ref,

m_secondaryData.getSecondaryAuthors());

};

l(std::get<1>(*(std::ranges::find_if(m_fields, [](const auto &inRec) {

return (std::get<0>(inRec) == ELibraryRecord::Primary_Author_Role);

}))));

return true;

}

return false;

}

class StackAllocationStrategy : public IAllocationStrategy

{

public:

~StackAllocationStrategy() override {}

RefLibraryBookRecord * getHeapAllocatedCopy(RefLibraryBookRecord &inVal) const override

{

return inVal.clone();

}

};

class HeapAllocationStrategy : public IAllocationStrategy

{

public:

~HeapAllocationStrategy() override {}

RefLibraryBookRecord * getHeapAllocatedCopy(RefLibraryBookRecord &inVal) const override

{

return &inVal;

}

};

inline static StackAllocationStrategy s_StackStrategy;

inline static HeapAllocationStrategy s_HeapStrategy;

};

Consider the use of find_if in handleSpecialTypes().

For this reference version, operating on a single field is not as obviously find-then-operate as with the map-based owning version. There are other possibilities:

1) Apply a filter view to restrict a set to one element and then operate on it. This is stylish but expensive, as it continues to visit all the other elements after doing the required operation.

2) Using find_if as though it were a filter. There's nothing preventing a find_if functor from operating on the value tested before returning true or false. You can then ignore the return result of find_if because you dealt with the element just before a copy of the iterator was returned. You don't even have to dereference an iterator, because that's done for you on passing to the find_if functor.

In other words, instead of your lambda being

[](const auto& ref) { return test(ref); }

you write

[&](const auto& ref) { bool rval = test(ref); if (rval) process(ref); return rval; }

The only example of this sort I could find online was of somebody trying to connect in sequence to a list of servers and returning true in the first successful connection.

Why do this? Well, the ranges versions of the standard library has moved most iteration from explicit external iterators to implicit internal iterators. any_of, all_of, and none_of, along with contains(), have eliminated many of the other use cases which involve visible iterators in many find() scenarios. The outstanding exceptions are the find() use cases where one wants to operate on the found value.

Applying find_if() in this way gets rid of another explicit iterator case. (You could, perversely, use the same trick in binary_search() by having the test process the value the first time it is seen. I do think that that would violate expectations. It's also unnecessarily expensive, as find_if() does have to check against the known value in any case but the binary_search() functor is just an ordering operator, so the additional check for identity is an extra expense.)

Note that if you need to test the return value against the container's end() iterator, most of the advantage of this goes away. You can't get away from an if_else test and you have to treat the iterator as such for that one test.

The last remaining case is std::set::find, std::map::find and their relatives, which use a comparison function set for the entire collection in its initialization. So you still have to write

std::map<Key, Value>::const_iterator iter = theMap find(k); if (iter != theMap end()) ...

(well, notionally, though auto would probably be used for the iterator type).

Using find_if in this way would lead to:

std::ranges::find_if(m_fields, [](const auto &inRec) {

bool rval = (std::get<0>(inRec) == ELibraryRecord::Primary_Author_Role);

if (rval)

l(std::get<1>(*iter));

return rval;

});

which reduces to

std::ranges::any_of(m_fields, [](const auto &inRec) {

bool rval = (std::get<0>(inRec) == ELibraryRecord::Primary_Author_Role);

if (rval)

l(std::get<1>(*iter));

return rval;

});

Search This Blog

C++ Development: The Breviary Project

LT Project: In-memory DB based Records

Comments

Post a Comment

Popular posts from this blog

Boundaries

State Machines

Considerations on an Optimization