Parsing Psalm Specifications

 Psalms are a little more complicated than the Collect model.

Psalms are usually optional elements -- they default to the day of the week.  In practice, only Vespers needs to have psalm specifications spelled out in the XML configuration.

These can take several forms.

First, the psalms element can actually spell out a set of psalms.  Thus, one can have:

    <psalms>

      <psalm number="1" spec="110"/>

      <psalm number="2" spec="113"/>

      <psalm number="3" spec="116:2"/>

      <psalm number="4" spec="126"/>

      <psalm number="5" spec="139"/>

    </psalms>

These are very rare cases, though, and they can be represented by a reference to the feast name -- there is a special constructor in the Vespers psalm spec,  So there are a few cases that look like:

    <psalms ref="Corpus Christi"/>

Most psalm specifications, though are neither of these.  In many cases, the psalms may be those of a specific day:

    <psalms ref="MONDAY"/>

In a few cases, they default to the day of the week but for special reasons (usually festal specs being applied to an octave) this has to be spelled out, so:

    <psalms ref="FERIA"/>

which means "use today, whatever it is".

So the Psalms tag can have one attribute, but the attribute has different interpretations based on the contents.

Psalm Tag

The Psalm tag, seen above, is always closed, so there does not need to be a separate element object: all the relevant information can be gleaned from the tag.

class PsalmTag : public BreviaryTag

{

public:

  PsalmTag(): BreviaryTag("psalm") {}

  ~PsalmTag() override;

  int getNumber() const { return m_number; }

  const std::string &getPsalm() const { return m_psalm; }

private:

  int allowedAttributeCount() const override { return 2; }

  std::span<std::string> getAllowedAttributes() const override;

  bool validate(std::string_view inAttribute,

                std::string_view inValue) const override;

  void setValue(std::string_view inAttribute,

                std::string_view inValue) override;

  bool checkMandatoryAttributes() const override

  {

    return hasAttribute("number") && hasAttribute("spec") && isClosed();

  }

  std::string m_psalm;

  int m_number = 0;

};

We allow number and spec as attributes, and the checkMandatoryAttributes() function requires that both be set.

std::span<std::string> PsalmTag::getAllowedAttributes() const

{

  static std::array<std::string, 2> rval{ "number"s, "spec"s };

  return rval;

}

Validation enforces the tight number constraints and does a general consistency check on the specification attribute

bool PsalmTag::validate(std::string_view inAttribute,

                        std::string_view inValue) const

{

  if (inAttribute == "number"sv)

    return (inValue.length() == 1) && (inValue[0] >= '1')

           && (inValue[0] <= '5');

  else if (inAttribute == "spec"sv)

    {

      if (!std::isdigit(inValue[0]))

        return false;

      return std::ranges::all_of(inValue, [](const auto val) {

        return std::isdigit(val) || (val == ':');

      });

    }

  else

    return false;

}

Because the numbers are guaranteed to be single characters, converting to integers is cheap:

void PsalmTag::setValue(std::string_view inAttribute, std::string_view inValue)

{

  if (inAttribute == "number"sv)

    {

      m_number = static_cast<int>(inValue[0] - '0');

    }

  else

    m_psalm = inValue;

}

Psalms Tag

The Psalms tag makes values derived from the ref attribute available.  Because m_day will always have a valid value, the surrounding context has to check getNamedException() first before relying on the day value. "Specific" means "includes a specific list".

class PsalmsTag : public BreviaryTag

{

public:

  explicit PsalmsTag(const Days inActualDay):

      BreviaryTag(GetTagName()), m_actualDay(inActualDay)

  { }

  ~PsalmsTag() override;

  bool isSpecific() { return m_specific; }

  Days getReferenceDay() const { return m_day; }

  const std::string &getNamedException() const { return m_namedException; }

  static std::string GetTagName() { return "psalms"; }

private:

  int allowedAttributeCount() const override { return 1; }


  std::span<std::string> getAllowedAttributes() const override;


  bool validate(std::string_view inAttribute,

                std::string_view inValue) const override;


  void setValue(std::string_view inAttribute,

                std::string_view inValue) override;

  bool checkMandatoryAttributes() const override;

  Days m_actualDay;

  Days m_day = Days::SUNDAY;

  bool m_specific = true;

  std::string m_namedException;

};

The PsalmsTag handles the differing values of ref with the assistance of a static helper function:

namespace

{

bool IsWeekdayName(std::string_view inValue)

{

  return (inValue == "SATURDAY"sv) || (inValue == "SUNDAY"sv)

         || (inValue == "MONDAY"sv) || (inValue == "TUESDAY"sv)

         || (inValue == "WEDNESDAY"sv) || (inValue == "THURSDAY"sv)

         || (inValue == "FRIDAY"sv);

}

}


bool PsalmsTag::validate(std::string_view inAttribute,

                         std::string_view inValue) const

{

  return (inValue == "FERIA"sv) || IsWeekdayName(inValue)

         || (inValue == "Corpus Christi"sv) || (inValue == "Christmas"sv)

         || (inValue == "Epiphany"sv)

         || (inValue == "Apostles Second Vespers"sv);

}

void PsalmsTag::setValue(std::string_view inAttribute,

                         std::string_view inValue)

{

  if (inValue == "FERIA"sv)

    m_day = m_actualDay;

  else if (IsWeekdayName(inValue))

    m_day = StringAsDay(inValue);

  else

    m_namedException = inValue;

  m_specific = false;

}

The final check requires *either* an open tag with element content or a closed tag, as the two content models cannot coexist in one element.

bool PsalmsTag::checkMandatoryAttributes() const

{

  return !((hasAttribute("ref") && !isClosed())

           || (!hasAttribute("ref") && isClosed()));

}

Psalms Element

Now we can see how the element pulls this together:

class PsalmsElement : public MultiElementElement

{

public:

  PsalmsElement(std::string_view inText, const Days inActualDay, const bool inRomanUse);

  ~PsalmsElement() override;

  const std::vector<PsalmSpec>& getPsalms() const { return m_psalms; }

  const std::string& getNamedException() const { return m_tag.getNamedException(); }

private:

  const std::string &getStartTagName() const override { return m_tag.getName(); }

  std::size_t processOnePsalm(std::string_view inRest, const int inNumber);

  PsalmsTag m_tag;

  std::vector<PsalmSpec> m_psalms;

};

Named exceptions are just propagated to the next level up, as their values are sufficent to construct a VespersPsalmSpec.

As always, all the logic is in the constructor:

PsalmsElement::PsalmsElement(std::string_view inText, const Days inActualDay,

                             const bool inRomanUse):

    MultiElementElement(inText, PsalmsTag::GetTagName()),

    m_tag(inActualDay)

{

  auto val = m_tag.set(inText.substr(0, getLength()));

  if (!val.has_value())

    {

      throw OfficeParseException(val.error(), inText.substr(0, getLength()));

    }

  if (m_tag.isSpecific())

    {

      std::string_view rest(inText.substr(getLength()));

      rest.remove_prefix(compareActualWithExpectedIndex(

          rest, getNextTagIndex(rest), incrementOverWhitespace(rest)));

      int number = 1;

      while (rest.starts_with("<psalm "))

        {

          auto index = processOnePsalm(rest, number++);

          rest.remove_prefix(index);

          incrementLength(index);

          rest.remove_prefix(compareActualWithExpectedIndex(

              rest, getNextTagIndex(rest), incrementOverWhitespace(rest)));

        }

      if (!rest.starts_with(getEndTag()))

        throw OfficeParseException(

            getStartTagName()

                + " with unexpected element before closing tag: ",

            rest);

      incrementLength(getEndTag().length());

    }

  else if (m_tag.getNamedException().empty())

    {

      VespersPsalmSpec vss(m_tag.getReferenceDay(), inRomanUse);

      std::ranges::copy(vss.getDetails(), std::back_inserter(m_psalms));

    }

}

It takes care of converting a day reference into an actual list of psalms. processOnePsalm(), typically, takes more space checking errors than in doing the simple data handling:

std::size_t PsalmsElement::processOnePsalm(std::string_view inRest,

                                           const int inNumber)

{

  auto index = inRest.find('>');

  if (index == std::string_view::npos)

    throw OfficeParseException("Unclosed psalm element", inRest);

  ++index;

  PsalmTag tag;

  auto val = tag.set(inRest.substr(0, index));

  if (!val.has_value())

    {

      throw OfficeParseException(val.error(), inRest.substr(0, index));

    }

  if (tag.getNumber() != inNumber)

    throw OfficeParseException("Unexpected gap in psalm spec numbering",

                               inRest);

  m_psalms.emplace_back(tag.getPsalm());

  return index;

}

It may be worth while thinking about how a token-driven lexer/parser would handle the above.

All the extraction and validation logic would be moved around, but the results of parsing would be returned to be managed on a stack by a more general context. Every tag (including closed ones, which transition to the context they started with) would trigger a state transition between states, with the primary dispatch driving the state transition.  This means, traditionally, a switch statement for every possible state that one can be in between the consumption of data by the lexer (e.g. inside-psalms-inside-second-vespers-inside-complete-day).

One can make that more object-oriented with a command pattern.  The commands would be attached to specific states and do the validation and the converting of data parsed into data to be used by the parser when assembling higher-level entities.  In fact, they would look rather like the bodies of the elements except that they would not store the results of their parsing; they would be stateless.  The same logic would need to take the data parsed and either return it as part of a tuple or have access to a direct mechanism to push the items onto a general stack via dependency injection.  At any transitions out of a state, the outer level would have to look at the stack of parsed values in light of the state transition and decide whether or not to reduce the values to some simpler model or to assemble everything into a final value and conclude the parse.

Thus in a state inside-psalms-inside-second-vespers-inside-complete-day you would have up to five string values on the top of the stack, each representing a parsed psalm spec.  On encountering the "</psalms>" close element, this could be reduced to a single VectorPsalmSpec. reducing the depth of the stack; that value would stay on the stack until the "</vespers>" close element was encountered at which point a fairly extended stack would be reduced to a Vespers element.

This is generally more efficient than the element-driven parsing we are doing above.  There are fewer objects created during the parse (all the stateless parsers are created at the start), the (program) stack is shallower.  A tag with no attributes would just generate a change of state (i.e. assign an integer), which is much cheaper than building an element to extract the required information.  Why not do it? Because it's somewhat harder to get right.  (I almost wrote "harder to write", but writing a simple state machine is fairly trivial.  Getting the transitions correct once you get beyond a few states is a different matter.)  Testing suddenly has to be integration testing; the chunking as classes makes unit testing of errors easier to catch and fix. There is always a tradeoff, and in a case like this where performance is acceptable moving to the one which is more work and less testable makes little sense.


Comments

Popular posts from this blog

Boundaries

State Machines

Considerations on an Optimization