Skip to content

Syntax Description

gershnik edited this page Mar 19, 2022 · 17 revisions

General

A command line as understood by Argum can contain:

  • Any number of Options each followed by a single optional Argument
  • Any number of Positional Arguments or simply Positionals
  • An optional single Option Terminator. If an Option Terminator occurs everything following it is considered a Positional.

Options

An Option can be of one of the three types: short, multi-short or long described below. Options are only recognized before option terminator command-line argument is encountered. After option terminator all command line arguments, even if they look like an option are treated as positional arguments

All 3 option types allow a single optional Argument to be given.

All options are recognized as such only if they are known to (sometimes called registered with) the parser. Thus if the parser knows that -c is an only possible option then -c will be recognized as such but -a won't be. What happens when a parser encounters an unknown option is described in Unknown Options below.

Short Options

A standalone short option starts with one of the defined short option prefixes followed by a single letter or digit. By default the only short option prefix is - but this is configurable. If no short option prefixes are configured the short options aren't recognized.

Thus -a can be a short option and so is -c. However -foo cannot (has more than one letter) and so cannot --c (wrong prefix).

A short option can have an argument either as a separate command line argument either:

  • directly following it: -c arg or
  • attached to it: -carg.

In both cases the Argument is arg.

Multiple consequent short options can be concatenated together with one prefix. For example if -c and -a are both short options they can be specified:

  • separately as -c -a or
  • merged as -ca

If short options are merged only the last one of the merged sequence can have an argument. Both

  • -ca arg and
  • -caarg

mean option -c followed by -a with an argument arg

Multi-short options

A multi-short option starts with one of the defined short option prefixes followed by more than one letter or digit. By default the only short option prefix is - but this can be changed. If no short option prefixes are configured the multi-short options aren't recognized.

Thus -foo is a multi-short option and so is -bar.

Arguments to multi-short options can either follow them in a separate argument or be connected to the option with one of the value delimiter characters. By default the only value delimiter character is = but this can be changed.

Thus to pass an argument arg to a multi-short option -foo you can use

  • -foo arg or
  • -foo=arg

By default, multi-short options can be abbreviated so if a -foo option is known to the parser, -fo and -f will be recognized as -foo as well. Such abbreviation can be disabled.

Resolving ambiguities between multi-short and short options

Due to definitions above various ambiguities can arise. For example if -bar, -baz and -b are all registered then -ba is potentially ambiguous and so is -b. Moreover if -a and -r are also registered in addition to those then does -bar mean -bar or merged -b, -a and -r? Argum resolves any ambiguities in the following manner:

  • Rule 1: If an option contains only a single character then it is matched only against known short options. There is no "competition" with any multi-short options
  • Otherwise, if an option contains multiple characters, eg.g -c1c2...cn
    • If -c1 is a known short option then only the entire sequence -c1c2...cn (until value separator =, if present) is matched against known multi-short options - no abbreviations are considered regardless of parser configuration.
      • Rule 2: If there is exactly one multi-short match then this is the matching multi-short option
      • Rule 3: If there is more than one match the parsing is ambiguous and an error is reported
      • Rule 4: Otherwise, the rest of the sequence c2...cn is processed as either an argument to -c1 or as merged short options as described in Short Options above
    • Otherwise, if -c1 is not a known short option, then the sequence -c1c2...cn is matched against known multi-short options (using abbreviations, if allowed).
      • Rule 5: If there are multiple possible matches then parsing is ambiguous and error is reported
      • Rule 6: Otherwise, this is the matching multi-short option

To use the example above, let's suppose that -bar, -baz, -b, -a and -r are registered options. Then:

  • -b matches -b. There is no ambiguity by Rule 1.
  • -ba is ambiguous by Rule 2
  • -bar matches -bar by Rule 3
  • -bor matches -b with argument or by Rule 4

If only -foobarabc and -foobaz are registered options

  • -f, -fo, -foo, -foob and -fooba are all ambiguous by Rule 5
  • -foobar matches -foobarabc by Rule 6

Long Options

A long option starts with one of the defined long option prefixes followed by one or more letters or digits. By default the only long option prefix is -- but that can be changed. If no long option prefixes are configured the long options aren't recognized. Long option prefixes cannot be the same as short option prefixes.

Thus --foo is a long option and so is --bar.

Arguments to long options can either follow them in a separate argument or be connected to the option with one of the value delimiter characters. By default the only value delimiter character is = but this can be changed. Note that long and multi-short options share the same value delimiters.

Thus to pass an argument arg to a long option --foo you can use

  • --foo arg or
  • --foo=arg

By default, long options can be abbreviated, so if a --foo option is known to the parser, --fo and --f will be recognized as --foo as well. Such abbreviation can be disabled. If abbreviation can match more than one long option parser reports an ambiguous option error.

As you can see long options behave exactly like multi-short options except they use different prefixes and so do not "compete" with short options. Thus they have no tricky ambiguity resolution rules unlike multi-short options.

Unknown Options

If the parser encounters a command line argument that looks like an option (either short, multi-short or long) but such option is not registered, it raises an unknown option error unless the argument can be interpreted as a number. Specifically a command line argument can be interpreted as a number IFF after stripping leading and trailing spaces:

  • it can be converted to a long long without overflow in the current C locale "as if" by strtoll
  • it can be converted to a long double without overflow in the current C locale "as if" by strtold

If an unknown option is interpreted as a number it is used as an argument to preceding option or a positional argument as appropriate.

The reason for this special treatment is that common options prefixes like - or + also happen to be valid starts of number. Not carving out such an exception would make passing numerical arguments in command line exceedingly convoluted. It also matches Posix command line guidelines except while Posix only requires recognizing integers, Argum extends it both in range (to long long) and type (floating point).

Option Arguments

When an option is registered with a parser you can specify whether it has no arguments, an optional argument or a required argument. You cannot specify more than one argument see [below](#Option arguments that look like an option) for the reason why.

An option is said to have an explicit argument when

  • It is a short option and it has a string attached to it, e.g. -farg for option -f.
  • It is a multi-short or a long option and it has an argument attached with a value separator, e.g. -foo=arg or --foo=arg

If an option has an explicit argument but the option definition specifies that the option has no argument parser raises an error.

An implicit argument is a command-line argument that follows an option like -f arg, -foo arg or --foo arg. It is recognized as an argument only if the option has an optional or required argument defined. Otherwise it is treated as a positional argument

Option arguments that look like an option

It can happen that an option argument must have a form that is identical to an option. Consider a utility that has a --filename, -filename or -f option that takes a name of some file as an argument. What if you want to operate on a file named --myfile? If you pass --myfile on a command line it will be interpreted as an unknown option and raise an error. There is an easy way to "escape" an option argument - make it explicit:

  • For short options: attach the argument to the option, e.g. -f--myfile
  • For multi-short and long options: use value separator character(s), e.g. --filename=--myfile or -filename=--myfile

Note that this can only work because (unlike some other command line parsers but like Posix) Argum only allows a single option argument.

Positional Arguments

Any command line argument that is not interpreted as an option or an option argument is a positional argument. Specifically all arguments after option terminator are treated as positional arguments regardless of their form.

The maximum number of possible positional arguments (it may be infinite) is made known to the parser. If there are more positional arguments than parser is configured to recognize it will raise an error. See section on Quantifiers for details on how parser figures out number and handlers for positional arguments.

Summary of settings

As described above the following settings or confirmation options are available for the parser

  • Short option prefixes - a set of non-empty sets of strings. These start short and multi-short options. Each inner set in the outer set contains equivalent prefixes (see below). If the outer set is empty no short or multi-short options are recognized.
  • Long option prefixes - a set of non-empty sets of strings. These start long options. Each inner set in the outer set contains equivalent prefixes (see below). If the outer set is empty no long options are recognized.
  • Option terminators - a set of strings. These indicate that following command line arguments should be treated as positional arguments.
  • Value delimiters - a set of characters. These are characters that can separate option and its argument value for explicit long and multi-short options.
  • Allow abbreviation - a boolean flag. If true, multi-short and long options can be abbreviated.

Short or long prefixes are considered equivalent if an option can start with any of them and be considered the "same" option. For example if - and / are equivalent (as is usually desirable on Windows) then either -c or /c will be dispatched identically by the parser.

The same string cannot occur in multiple equivalence sets and in both long and short prefixes.

Pre-defined configurations

Argum comes with some pre-defined configurations:

  • Common Unix (this is the default)
    Short prefixes: [["-"]]
    Long prefixes: [["--"]]
    Option terminators: ["--"]
    Value delimiters: ['=']
    Allow abbreviation: true
  • "Long-only" Unix
    Short prefixes: []
    Long prefixes: [["--", "-"]]
    Option terminators: ["--"]
    Value delimiters: ['=']
    Allow abbreviation: true
  • Windows short-only (see e.g. cmd command line)
    Short prefixes: [["/", "-"]]
    Long prefixes: []
    Option terminators: ["--"]
    Value delimiters: [':']
    Allow abbreviation: true
  • Windows long (see e.g. robocopy command line)
    Short prefixes: []
    Long prefixes: [["/", "-", "--"]]
    Option terminators: ["--"]
    Value delimiters: [':']
    Allow abbreviation: true

A common Unix extension is to add + to the short prefixes list.

Further Reading

Customizing Syntax
Defining Options
Positional Arguments