Command Line Arguments

Command-line options are the flip side of configuration files, though even more fundamental: configuration file locations tend to come from the command-line, but not vice-versa.

The same arguments apply to command-line parsing as apply to config files, in spades. Not only is command-line processing supported by general configuration packages, but for simple contexts there is always getopt(), in unistd.h. And in many production contexts I have seen, the developers should have stuck with getopt(). So what is the reason for doing anything else?

In my view, getopt() and getopt_long() are absolutely wonderful under C. They're OK in C++ with one drawback (if you have short and long options for the same value, like -f and --filename, you have to check for them independently) and one style condition (you're happy parsing in a while loop), and (most strongly) you want the option parsing to take place entirely at the application level. (I admit that you can pass an option string or, worse, an option string plus option* array into a library function if you want to expose your dependencies.) This is because getopt() has to know what you expect before you start parsing.

I personally prefer to have more of the mechanics taking place at a generic level.

Third-party libraries, especially those derived from frameworks or other large scale models (hello X11) may also want access to the command line. Sometimes you may have to scrub it (you pass in an option with one meaning for the application and another for X11; remove it before you initialize X11.) But in most cases it's useful to have a more robust, searchable, and usable copy of the command-line arguments at the application level.

And, as I said about config files, this is an exercise in coding.

The config code we looked at allows a command line to be entirely subsumed by a config file, but usually that's not the way to go, either; it makes good sense to distinguish between per-session types of options and settings which are likely to be persistent.

We want to support:

(1) Simple access to command line values without the need for raw C parsing of argv. This includes querying and general iteration.

(2) Some form of merging so that if we have long and short options with the same meaning we can support them as a single key

(3) Support for the following formats:

-o outputFile

-ooutputFile

--output_file outputFile

--output_file=outputFile

(4) Positional parameters (e.g. "the first argument is the configuration directory location".

Finally, we care more about clarity and ease of use than about cost of initialization. Optimization is *not* a particular concern. This is a one-time step and it's going to form no significant part of the runtime of any application much more elaborate than "hello, world".

Here's an interface that does that meets the requirements for (1):

class ICommandLineOptions

{

public:

virtual ~ICommandLineOptions();

virtual void

process(std::function<void(const std::string &, const std::string &)> inFunc)

const = 0;

virtual bool isSingleArg(const std::string &inVal) const = 0;

virtual const std::string &getDoubleArg(const std::string &inVal) const = 0;

virtual const std::string &getPositionalArg(const int inVal) const = 0;

virtual const std::string& getAppName() const = 0;

};

A higher level which knows what to expect can determine what it needs to given the functions above.

The implementation is somewhat longer.

First, we have a class to take care of the argument pairs as they are parsed and before they are stored:

class ArgPair

{

class IReductionPolicy

{

public:

virtual ~IReductionPolicy();

virtual std::string reduce(std::string_view inView) const = 0;

};

class SimpleReductionPolicy : public IReductionPolicy

{

public:

~SimpleReductionPolicy() override;

std::string reduce(std::string_view inView) const override

{

return std::string(inView);

}

};

class ActiveReductionPolicy : public IReductionPolicy

{

public:

ActiveReductionPolicy(const std::map<char, std::string> &inMap):

m_argMap{ inMap }

{ }

~ActiveReductionPolicy() override;

std::string reduce(std::string_view inView) const override;

private:

std::map<char, std::string> m_argMap;

};

public:

ArgPair(): m_reductionPolicy(std::make_unique<SimpleReductionPolicy>()) {}

ArgPair(const std::map<char, std::string> &inMap):

m_reductionPolicy(std::make_unique<ActiveReductionPolicy>(inMap))

{

}

void setFirst(std::string_view inView)

{

m_arg1 = m_reductionPolicy->reduce(inView);

}

void setSecond(std::string_view inView) { m_arg2 = std::string(inView); }

std::pair<std::string, std::string> toPair()

{

return std::make_pair(m_arg1, m_arg2);

}

bool hasArg() const { return !m_arg1.empty(); }

void clear()

{

m_arg1.clear();

m_arg2.clear();

}

private:

std::string m_arg1;

std::string m_arg2;

std::map<char, std::string> m_argMap;

std::unique_ptr<IReductionPolicy> m_reductionPolicy;

};

Note that if an application does not use short and long options with the same meanings it can choose not to have the extra overhead of converting the two to one (long) form. This class also takes care of the conversion between string_views, handy for cheap parsing of temporaries, and the strings that are wanted for permanent storage, (You can actually rely on the memory in the argv list being in place throughout the whole of the application's life. That doesn't necessarily mean that we want out long-lived options to be relying on their values having null-terminated strings, for example).

Here's the class that uses it:

class CommandLineOptions: public ICommandLineOptions

{

//... Where we define the local class, above

public:

CommandLineOptions(const int argc, char **argv): m_appName(argv[0])

{

init(argc, argv);

}

CommandLineOptions(const int argc, char **argv,

const std::map<char, std::string> &inMap):

m_appName(argv[0]),

m_tempArg(inMap)

{

init(argc, argv);

}

~CommandLineOptions() override;

void process(std::function<void(const std::string &, const std::string &)>

inFunc) const override;

bool isSingleArg(const std::string &inVal) const override;

const std::string &getDoubleArg(const std::string &inVal) const override;

const std::string &getPositionalArg(const int inVal) const override;

const std::string& getAppName() const override { return m_appName; }

private:

std::string m_appName;

ArgPair m_tempArg;

std::map<std::string, std::string> m_args;

std::map<int, std::string> m_positionalArgs;

void init(const int argc, char **argv);

void addDoubleArg(std::string_view inArg)

{

m_tempArg.setSecond(inArg);

m_args.insert(m_tempArg.toPair());

m_tempArg.clear();

}

void addSingleArg()

{

if (m_tempArg.hasArg())

{

m_args.insert(m_tempArg.toPair());

m_tempArg.clear();

}

};

The init() call, shared between constructors, does most of the work:

void CommandLineOptions::init(const int argc, char **argv)

{

std::ranges::for_each(

std::ranges::iota_view{ 1, argc }, [&](const int inVal) {

if (std::string_view arg{ argv[inVal] }; arg[0] == '-')

{

switch (arg.length())

{

case 1:

{

std::cerr << "Unexpected command line argument '-'"

<< std::endl;

}

break;

case 2: // standard short option

if (arg[1] != '-') // ignore --

{

addSingleArg();

m_tempArg.setFirst(arg.substr(1));

}

break;

default:

addSingleArg();

if (arg[1] == '-') // long option

{

arg.remove_prefix(2);

if (auto index = arg.find("=");

index != std::string_view::npos) // --foo=bar

{

if (index == 0)

std::cerr << "Unexpected command line argument beginning with --=:" << arg

<< std::endl;

else

{

m_tempArg.setFirst(arg.substr(0, index));

addDoubleArg(arg.substr(index + 1));

}

else

m_tempArg.setFirst(arg);

}

else // short option with argument connected

{

m_tempArg.setFirst(arg.substr(1, 1));

addDoubleArg(arg.substr(2));

}

break;

}

else if (m_tempArg.hasArg())

{

addDoubleArg(arg);

}

else

{

m_positionalArgs.insert(

std::make_pair(inVal, std::string(arg)));

}

});

}

Note that we are ignoring the -- argument; by rights it ought to force every subsequent option to a positional value, but in my own personal uses that is not a condition I would create -- I put positional arguments first. In an institutional context it would be handled as having its normal meaning.

Also, as this occurs right at the beginning of an application, it's fair to assume that potential error-handling has not been set up yet, so we just print error messages to stderr and continue. If there are substantive errors cascading from these items they will probably show up sooner rather than later.

The access functions are straightforward:

void CommandLineOptions::process(

std::function<void(const std::string &, const std::string &)> inFunc) const

{

std::ranges::for_each(

m_args, [&](const std::pair<std::string, std::string> &inPair) {

inFunc(inPair.first, inPair.second);

});

}

One effect of this model is that options will be encountered in alphabetical order (standard ASCII order, case matters). This may provide a small benefit to applications as they will know what order options will be encountered in on an iteration over the options.

bool CommandLineOptions::isSingleArg(const std::string &inVal) const

{

if (auto iter = m_args.find(inVal); iter != m_args.end())

return iter->second.empty();

return false;

}

const std::string &

CommandLineOptions::getDoubleArg(const std::string &inVal) const

{

if (auto iter = m_args.find(inVal); iter != m_args.end())

return iter->second;

else

{

const static std::string rval;

return rval;

}

const std::string &CommandLineOptions::getPositionalArg(const int inVal) const

{

if (auto iter = m_positionalArgs.find(inVal); iter != m_positionalArgs.end())

return iter->second;

else

{

const static std::string rval;

return rval;

}

Given the length of the likely set of options there is likely to be at best a tiny improvement in using an unordered_map for the lookups.

Application configuration

For the breviary library, where a lot of required configuration can be determined at the library level, much of the interpretation of the command-line information is actually in the top-level factories (which is the subject of the next post). But there are still some values which need to be fielded at the very top level. So here is a sample top-level configuration class for an application using the utility.

class CommandLineArgs

{

public:

CommandLineArgs(const JSBUtil::ICommandLineOptions &inOptions);

unsigned int getDay() const { return m_day; }

unsigned int getMonth() const { return m_month; }

bool isPriest() const { return m_priest; }

const std::string &getConfig() const { return m_config; }

private:

bool m_priest = false;

unsigned int m_day = 0;

unsigned int m_month = 0;

std::string m_config;

};

CommandLineArgs::CommandLineArgs(

const JSBUtil::ICommandLineOptions &inOptions):

m_priest{inOptions.isSingleArg("P")},

m_config{inOptions.getPositionalArg(1)}

{

std::string date = inOptions.getDoubleArg("D");

if (!date.empty())

{

if ((date.length() != 5) || date[2] != '-')

throw std::runtime_error("Invalid value for date parameter");

std::string s1 = date.substr(0, 2);

std::string s2 = date.substr(3);

m_month = std::stoi(s1);

m_day = std::stoi(s2);

}

This allows a user to specify a date at the command line; in addition an optional first parameter can specify a location for the base configuration file which is not the default, and the choice of whether the office should reflect a priest as officiant rather than a layman is bumped all the way up to the command line.

This is essentially a wrapper which makes the meaning of the options clearer in use. It's also not used outside of main().

Options for the Base LT App

Using the CommandLineOptions class in the simple LT application resulted in a small addition. That application as originally designed gets its parameters from positional parameters with options following; the options class allowed access to options at specific offsets, which this application does not carer about. It could just iterate through indexes (1, 2...) but that's rather clunky, and it wouldn't work for applications with a pattern like

app arg1 -option1=foo arg3 arg4

as a possible invocation, beuse the arguments would be at offsets 1, 3, and 4.

So we add

virtual void getPositionalArgs(std::map<int, std::string>& outVals) const = 0;

to the options interface, with a simple implementation in the concrete class, and then:

std::map<int, std::string> args;

options.getPositionalArgs(args);

std::ranges::for_each(args | std::views::values, [&](const auto &inVal) {

std::size_t off = inVal.find(':');

std::size_t off2 = inVal.find('?');

if ((off == std::string_view::npos) && (off2 == std::string_view::npos))

author = inVal;

else if (off == 1)

{

if (inVal[0] == 'c')

{

collection = inVal.substr(2);

}

else if (inVal[0] == 't')

{

tag = inVal.substr(2);

if (tagCriteria.empty())

tagCriteria

= LtLibrary::TagCriteriaParam(tag, false);

else

tagCriteria.addTag(tag);

}

else if ((off2 == 1) && (inVal[0] == 't')

{

tagCriteria = LtLibrary::TagCriteriaParam(inVal.substr(2), true);

}

});

We use the values view because we don't care about the exact positions of the arguments.

A Note on Options and the GTK

The GTK is a C wrapper around X11, and the X11 aspects tend to peek through at the seams. (I am here dealing with the GTK proper, not the available C++ wrappers for it.

One of the places this is evident is in its handling of options. It's a framework driven by one initial function, and everything else cascades from there: you cal call the GTK from C++ code easily, but anything it runs should be an extern "C" function.

It all starts with one function:

int main (int argc, char *argv[])

{

return g_application_run(G_APPLICATION (ltdisp_app_new()), argc, argv);

}

Ignoring ltdisp_app_new() for now, as we are focussing only on the options, note that this takes argc and argv as parameters.

The user code can get the option values from a gtk callback. That looks like this:

void ltdisp_app_init(LtDispApp *app)

{

GOptionEntry entries[2];

entries[0].long_name = "mode";

entries[0].short_name = 'm';

entries[0].flags = G_OPTION_FLAG_NONE;

entries[0].arg = G_OPTION_ARG_CALLBACK;

entries[0].arg_data = reinterpret_cast<void *>(lt_mode_set);

entries[0].description = "Mode: driver | standard";

entries[0].arg_description

= "Sets mode to driver for command-line utilty or in-memory dump to standard output";

entries[1].long_name = nullptr;

entries[1].short_name = '\0';

entries[1].flags = G_OPTION_FLAG_NONE;

entries[1].arg = G_OPTION_ARG_NONE;

entries[1].arg_data = nullptr;

entries[1].description = nullptr;

entries[1].arg_description = nullptr;

g_application_add_main_option_entries(G_APPLICATION(app), entries);

}

This extracts one option with both a short and long name and passes it to the function lt_mode_set: the second entry is a sentry marking the end of user options.

int lt_mode_set(const char *optionName, const char *value, void *data,

GError **err)

{

if ((std::strcmp(optionName, "-m") == 0)

|| (std::strcmp(optionName, "--mode") == 0))

{

if (std::strcmp(value, "driver") == 0)

{

theMode = AppMode::CommandLine;

}

else if (std::strcmp(value, "standard") == 0)

{

theMode = AppMode::InMemory;

}

else if (std::strcmp(value, "windowed") == 0)

{

theMode = AppMode::Windowed;

}

else

std::cerr << "Bad mode option: " << value << std::endl;

}

return TRUE;

}

(There is a default mode, so we don't signal a failure on a bad option via return value.)

It's generally best to use this mechanism, as it also handles the possibility of using short option values which would otherwise be meaningful to GTK.

Search This Blog

C++ Development: The Breviary Project