-
Notifications
You must be signed in to change notification settings - Fork 4
TdlRFC
Type Description Language and other aspects of DELPH-IN Joint Reference Formalism
- Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)
- Everything in TDL not inside of quotes.
- Lexicon look-up.
- Proper names?
- Acronyms?
- .. approach these with token-mapping (preserve the info, and then downcase anyway)
- Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)
Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)
TDL definitions allow documentation strings ("docstrings") before any term in the top-level conjunction or before the terminating dot (.) character:
n_-_c_le := n_intr_lex_entry
"""Intransitive count noun (icn)
<ex>The dog barked.
<nex>Much dog bark.""".
1 # File Contents
2
3 TdlTypeFile := ( TypeDef | TypeAddendum | Spacing )* EOF
4 TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF
5
6 # Types and Lexical Rules
7
8 TypeDef := Type DefOp TypedDefBody Dot
9 Typeddendum := Type AddOp ( DefBody | DocString ) Dot
10 LexRuleDef := LexRuleId DefOp Affix? TypedDefBody Dot
11 LexRuleId := Identifier Spacing
12
13 # Definition Bodies (top-level conjunctions of terms)
14 #
15 # The body of a type definition, type addendum, or lexical rule is
16 # essentially a conjunction of Terms, but there are two special features
17 # of top-level conjunctions (i.e., those outside of an AVM):
18 #
19 # (1) """DocStrings""" may precede any Term or the final Dot (.)
20 #
21 # (2) TypeDef and LexRuleDef require at least one Type (supertype)
22 # somewhere in the conjunction (conventionally the first Term)
23
24 TypedDefBody := ( TopLevelConj And )? DocString? Type ( And TopLevelConj )? DocString?
25 DefBody := TopLevelConj DocString?
26 TopLevelConj := DocString? Term ( And DocString? Term )*
27 DocString := TQString
28
29 # Terms and Conjunctions
30
31 Conjunction := Term ( And Term )*
32 Type := Identifier Spacing
33 Term := ( Type
34 | FeatureTerm
35 | DiffList
36 | ConsList
37 | Coreference
38 | DQString
39 | QSymbol
40 | Regex
41 )
42 FeatureTerm := LBrack AttrVals? RBrack
43 AttrVals := AttrVal ( Comma AttrVal )*
44 AttrVal := Attribute ( Dot Attribute )* Conjunction
45 Attribute := Identifier Spacing
46 DiffList := DLOpen Conjunctions? DLClose
47 ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose
48 ConsEnd := Comma Ellipsis | Dot Conjunction
49 Conjunctions := Conjunction ( Comma Conjunction )*
50 Coreference := "#" Identifier Spacing
51
52 # Letter-sets, Wild-cards, and Affixes
53
54 MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")"
55 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
56 WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
57 LetterSetVar := /![^ ]/
58 WildCardVar := /\?[^ ]/
59 LetterSet := /([^)\\]|\\.)+/
60 Affix := AffixClass AffixPattern+ Spacing
61 AffixClass := "%prefix" | "%suffix"
62 AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
63 CharList := ( LetterSetVar | WildCardVar | AffixChar )+
64 NullChar := "*"
65 AffixChar := /([^!?\s*\\]|\\[^ ])+/
66
67 # Whitespace and Comments
68
69 Spacing := Space? Comment*
70 Space := /\s+/
71 Comment := ( LineComment | BlockComment ) Space?
72 LineComment := /;.*$/
73 BlockComment := "#|" /([^|\\]|\\.|\|(?!#))*/ "|#"
74
75 # Literals
76
77 DefOp := ":=" Spacing
78 AddOp := ":+" Spacing
79 Identifier := /[^\s.:<=&,#[]$()>!^\/]+/
80 Dot := "." Spacing
81 And := "&" Spacing
82 Comma := "," Spacing
83 LBrack := "[" Spacing
84 RBrack := "]" Spacing
85 DLOpen := "<!" Spacing
86 DLClose := "!>" Spacing
87 CLOpen := "<" Spacing
88 CLClose := ">" Spacing
89 Ellipsis := "..." Spacing
90 DQString := /"([^"\\]|\\.)*"/ Spacing
91 TQString := /"""([^"\\]|\\.|"(?!")|""(?!"))*"""/ Spacing
92 QSymbol := "'" Identifier Spacing
93 Regex := "^" /([^$\\]|\\.)*/ "$"
Multiple docstrings may be present on a single definition, but only the first one encountered on a definition is considered its primary docstring, and implementers are free to store or discard the other doc strings as they see fit. Docstrings on type-addenda should be concatenated with a newline to the previous docstring(s), or appended to a list of docstrings, associated with the type.
The syntax description above allows for comments anywhere that separating whitespace is allowed (not including those within strings, regular expressions, letter sets, etc.). This includes within a dotted attribute path (e.g., [ SYNSEM #| comment |# . #| comment |# LOCAL ... ]), although grammar developers may want to use this flexibility sparingly.
1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? (see this thread on the 'developers' mailing list)
2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?
Home | Forum | Discussions | Events