-
Notifications
You must be signed in to change notification settings - Fork 4
PredicateRfc
This document aims to specify the structure and interpretation of semantic predicates in the DELPH-IN ecosystem.
Predicates (often abbreviated as preds) are a symbol representing a semantic entity or construction. Examples of predicates in the English Resource Grammar are:
"_dog_n_1_rel" : a nominal (n) predicate for a dog or dogs in general
_a_q_rel : a quantifier (q) predicate for the "a" as in "a dog"
"_eat_v_1_rel" : a verbal (v) predicate for an eating event
parg_d_rel : an abstract predicate for passive constructions
These predicates are used in predications (a predicate with its semantic arguments and other constraints), as in the following MRS for The cake was eaten by a dog (see the MRS specification for more explanation of MRS semantics):
[ LTOP: h0
INDEX: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ]
RELS: < [ _the_q_rel<0:3> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ]
[ "_cake_n_1_rel"<4:8> LBL: h7 ARG0: x3 ]
[ "_eat_v_1_rel"<13:18> LBL: h1 ARG0: e2 ARG1: x8 [ x PERS: 3 NUM: sg IND: + ] ARG2: x3 ]
[ parg_d_rel<13:18> LBL: h1 ARG0: e9 [ e SF: prop ] ARG1: e2 ARG2: x3 ]
[ _a_q_rel<22:23> LBL: h10 ARG0: x8 RSTR: h11 BODY: h12 ]
[ "_dog_n_1_rel"<24:28> LBL: h13 ARG0: x8 ] >
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
Surface predicates represent overt words in a sentence. Note that not all words have a predicate, as in the semantically-empty but syntactically-required "to" in Kim gave a book to Sandy (compare: Kim gave Sandy a book, which has the same semantics). Abstract predicates are used for all other non-lexical situations, such as implicit quantifiers (e.g., the quantifier for "dog" in Dogs bark) or semantic constructions (such as parg_d_rel for the passive construction above). Abstract and surface predicates are further discussed below.
Surface predicates (also called "real" or "object-level") and abstract predicates (also called "grammar" or "meta-level" predicates) can be identified by their form, where the presence of a leading underscore indicates a surface predicate, and conversely the lack of a leading underscore indicates an abstract predicate (compare _the_q_rel and udef_q_rel). All predicates end with _rel; this convention creates a namespace for predicates so they don't collide with non-predicate types in a grammar. Surface predicates have an internal structure that combines a lemma, a pos (part of speech), and a sense, all separated by underscores. Thus we have two basic forms for predicates:
_(lemma)_(pos)_(sense)_rel : surface predicate form
(name)_rel : abstract predicate form
Spaces may not occur in a predicate (although they may be possible in string predicates (see below) or by escaping the space, but these usages should be discouraged). When a predicate represents a lexical unit that contains spaces, the + may be used (e.g., all+too_rel, _a+bit_a_1_rel, "_act_v_seem+to_rel"). The - character is used where it appears in the original word, or to show alternates in the sense (e.g., "_tri-state_a_1_rel", "_ally_v_to-with_rel", _from_p_place-in_rel). Other non-word characters sometimes appear in the lemma (e.g., / in "_24/7_a_1_rel"). Unicode characters are not a problem (e.g., "_魚つり_n_1_g_rel").
Both surface and abstract predicates can be specified as a grammar type or as a quoted string. Grammar-type predicates are defined somewhere in the grammar, perhaps in a type hierarchy. A predicate type-hierarchy means that predicates used in an MRS may unify with other predicates (e.g., via underspecification or a common subtype). Preds specified as a string are atomic types that do not exist in a hierarchy, and are not required to be specified in a grammar except by their lexical entries or lexical types. Quoted string preds may only use surrounding double quotes (e.g., "_quote_n_1_rel"). An open-single-quoted variant (e.g., 'null_coord_rel) used to be available, but it has been deprecated and modern tools will warn the user upon encountering one.
Note for Developers
A tool that compares *MRS representations without loading a grammar (such as pyDelphin) may fail to recognize semantic representations that are not exactly equivalent but do unify (i.e., one subsumes the other). Proposed changes to the Sem-I could make this possible just by reading a .smi file instead of the whole grammar.
Surface predicates always have three fields (lemma, pos, and sense), although the sense field is occasionally unspecified (e.g., "_and_c_rel").
The lemma field of a surface pred may be just about anything that does not contain underscores or spaces.
The POS field must be a single character, and specifically one of n, v, a, j, r, s, c, p, q, x, u, or d (see RmrsPos, but note that it does not mention the d POS used, for instance, in parg_d_rel).
The sense field is specified like a lemma, although it should not be a single letter (so as to distinguish it from the POS field). Often the sense field is just a number (e.g, "_angstrom_n_1_rel"), but it may be more descriptive (e.g., "_argue_v_about_rel" vs "_argue_v_for_rel", etc.).
When *MRS representations are serialized, surface predicates may appear decomposed to their lemma, pos, and sense values. In this context, they are called real predicates and are contrasted with surface predicates that use the underscore-delimited form discussed above. For example, in the XML format for MRS:
<realpred lemma="dog" pos="n" sense="1" />
Note for Developers
Real and surface predicates represent the same kinds of information (lemma, pos, sense), but for compatibility it is suggested that tools note whether a predicate was specified as a real or surface predicate, as well as note the presence of quotation marks in string preds. By doing this, a round trip (reading then writing a predicate) can maintain the original form.
Abstract predicates do not have the 3-part internal structure that surface predicates do. The only constraints on their form are that they do not start with an underscore, do not contain spaces, and end in _rel. Some example from the ERG include: season_rel, some_q_rel, place_n_rel, free_relative_ever_q_rel, interval_p_end_rel.
Note for Developers
Notice that, despite the lack of internal structure, the abstract predicates listed above do seem to generally follow the format governing surface predicates, with POS and sometimes sense values. Tools may benefit from decomposing abstract predicates as if they were surface predicates, i.e. in order to detect quantifiers by looking for the q POS value.
Predicates are always case-insensitive and quotes are ignored. Therefore the following predicates are equivalent:
_dog_n_1_rel
_DOG_N_1_REL
"_dog_n_1_rel"
'_dog_n_1_rel ; NB: this single-quoted form is deprecated; do not use it
Furthermore, a surface predicate and its corresponding decomposed "real" pred (see the Real vs Surface section above) are equivalent:
_dog_n_1_rel
<realpred lemma="dog" pos="n" sense="1" />
An abstract predicate and a surface predicate with the same apparent values (e.g., place_n_rel and a hypothetical _place_n_rel) are not equivalent, and grammar writers should avoid creating such similar predicates.
Most tools that process predicates are admirably robust in parsing those that do not conform to the specs above, but such robustness can cause a grammar developer to miss the non-conformity. Therefore, tool developers are encouraged to warn the user or throw an error when encountering non-conforming predicates.
A sample of observed deviations are:
"_only_child_n_1_rel" : unescaped underscore in the lemma
"_くもる_v_1_g_rel" : unescaped underscore in the sense
"coord" : no _rel suffix
Home | Forum | Discussions | Events