-
Notifications
You must be signed in to change notification settings - Fork 4
GrammarPerformance
There are several ways to increase the performance of DELPH-IN grammars during parsing and generation.
This page attempts to give a rough idea of how to tweak your grammar for better performance. As people add new techniques, please link them here.
Contents
- Things to tweak for overall performance
- Restrictions on the application of morphological rules
- Current Issues
- Things to do to reduce noise during grammar engineering
- Things that magically just happen
- To Do
Make sure the two quick-check files are kept up to date.
- LKB: ${JACY}/lkb/checkpaths.lsp PET: ${JACY}/pet/qc.tdl
Quick check is a method where paths where unifications likely to fail are checked first, for efficiency. Which unifications are likely to fail are found by preprocessing a text and seeing which points of failure are common. It is described in, e,g,:
Ulrich Callmeier. Preprocessing and Encoding Techniques in PET. In Stephan Oepen, Dan Flickinger, Jun-ichi Tsujii and Hans Uszkoreit editors, Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing, CSLI Publications, Stanford, 2002.
Make sure:
- - you are using compatible versions of flop and cheap - your grammar is up-to-date:
See ${JACY}/utils/make-qc.bash
mv pet/qc.tdl pet/qc.tdl.old
flop japanese
cat testfile | cheap -limit=100000 -packing -compute-qc=pet/qc.tdl japanese;
flop japanese
- The testfile must be segmented
grep -v '#' testsuites/mt-test-set-1.txt | chasen -F "%m " > testfile
After you have made the quick check file, you need to rebuild the grammar
Note: This is slow, as quick-check is, off course, turned off. In general, you should use the mode you would normally use (e.g. with packing if you use packing).
The file is read in when you flop, so add the following to flop.set
;; list of files to load after everything else
postload-files := "pet/qc".
;; `pseudo' types outside the type hierarchy. these are ignored for
;; appropriateness, expansion etc.
pseudo-types :=
$qc_unif_trad $qc_unif_set $qc_subs_trad $qc_subs_set
$qc_unif_trad_pack $qc_unif_set_pack $qc_subs_trad_pack $qc_subs_set_pack.
See Copestake (2002: pp 196--197).
mv lkb/checkpaths.lsp pet/checkpaths.lsp.old
from within the *common-lisp* buffer:
(lkb::with-check-path-list-collection
"~/delphin/grammars/japanese/lkb/checkpaths.lsp"
(parse-sentences
"~/delphin/grammars/japanese/testsuites/hinoki-test-a.100"
"~/delphin/grammars/japanese/testsuites/hinoki-test-a.100.results"))
- This would be nice to automate
- It would be nice to share the config between PET and the LKB (or convert)
- It may be worth doing a grid search to optimize how many quick-check paths should actually be checkd.
You can gain some performance increase by setting the order in which the daughters of rules are checked (Oepen & Carroll 2002: pp 204--206). The order can be specified in the grammar or in the configuation files for the lkb and pet.
* In the grammar
- You can use KEY-ARG and specify it per rule in the grammar.
binary_rule_left_to_right := rule &
[ ARGS < [ KEY-ARG + ] , [ KEY-ARG bool ] > ].
* In the LKB (lkb/globals.lsp)
(defparameter *rule-keys*
'((HEAD-ADJUNCT-RULE1 . 1)
(COMPOUNDS-RULE . 1)
(KARA-MADE-RULE . 2)
(HEAD_SUBJ_RULE . 2)
(HEAD-SPECIFIER-RULE . 2)
(HEAD-COMPLEMENT-RULE . 2)
(HEAD-COMPLEMENT2-RULE . 2)
(HEAD-ADJUNCT-RULE2 . 2)))
* In PET (pet/japanese.set)
;; assoc (rules -> keyarg position) (alternative to KEY-ARG mechanism)
rule-keyargs :=
$HEAD-ADJUNCT-RULE1 1
$HEAD-ADJUNCT-RULE2 2
$HEAD-ADJUNCT-RULE3 1
$RELATIVE-CLAUSE-RULE 1
$COMPOUNDS-RULE 1
$SENTENCE-TE-COORDINATION-RULE 1
$CONJ-RULE 1
$KARA-MADE-RULE 2
$HEAD_SUBJ_RULE 2
$HEAD-SPECIFIER-RULE 2
$HEAD-COMPLEMENT-HF-RULE 2
$HEAD-COMPLEMENT-HI-RULE 1
$HEAD-COMPLEMENT-AFFIXBIND-RULE 2
$HEAD-COMPLEMENT2-RULE 2
$HEAD-2OBL-COMPLEMENTS-RULE 2
$VN-LIGHT-RULE 2
$VEND-VEND-RULE 1
$VSTEM-VEND-RULE 2
$VN-VEND-RULE 2
$PREFIX-ATTACH-RULE 1
$NP-QUEST-FRAG-RULE 2.
Key mode in cheap is set with:
`-key=n' --- select key mode (0=key-driven, 1=l-r, 2=r-l, 3=head-driven)
default is 0.
You get the data by creating two profiles one with -key=1 and one with key=2, turning on -rulestats. First enable [Process,switches:write rule relation] in [incr tsdb()]. Use the mode you would normally use (e.g. with packing if you use packing).
Then [Analyze:rule table] for both profiles and you want to check the daughter with the least number of active edges (the passive edges should be the same modulo memory overflow errors).
- This would be nice to automate
- It would be nice to share the config between PET and the LKB (or convert)
In PET only, you can set rules to only apply over the entire span.
spanning-only-rules := $frg-np $frg-pp $frg-s-adv $frg-i-adv
$frg-pp-np $frg-i-adv-np $frg-pp-int
$runon_s.
Making the rules spanning only for Jacy once reduced the number of tasks by 7.2%, and speeded things up by 5.1%.
(Deprecated)
You can control when to add lexical entries with empty semantics to the generator chart using trigger rules. If they were all added all the time then the chart would get too big.
See LkbGeneration for more discussion (note that trigger rules also work with Ace).
- The Idiom optimizations don't seem to be working
- It would be nice to use supertagging
This page aims to document DELPH-IN techniques. It was started by Francis, inspired by the Capitol Hill Grammar Engineering Meeting and based on a page originally written for Jacy (JacyPerformance).
Home | Forum | Discussions | Events