-
Notifications
You must be signed in to change notification settings - Fork 4
WeSearch_Adaptation_Background
Organising my thoughts
-
Maria Wolters and Mathias Kirsten. Exploring the Use of Linguistic Features in Domain and Genre Classification. EACL'99
-
Baldwin et al. How Noisy Social Media Text, How Diffrnt Social Media Sources?
-
Barbara Plank, Gertjan van Noord: Effective Measures of Domain Similarity for Parsing. ACL 2011
-
Lilja ØVRELID and Arne SKJÆRHOLT. Lexical categories for improved parsing of web data
-
Bonnie Webber. Genre distinctions for discourse in the Penn TreeBank http://aclweb.org/anthology//P/P09/P09-1076.pdf
-
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
http://aclweb.org/anthology//W/W10/W10-26.pdf
-
particularly Vincent Van Asch; Walter Daelemans. Using Domain Similarity for Performance Estimation
-
-
Daniel Gildea. Corpus Variation and Parser Performance
-
Tom Lippincott; Diarmuid Ó Séaghdha; Lin Sun; Anna Korhonen. Exploring variation across biomedical subdomains
-
David McClosky; Eugene Charniak; Mark Johnson. Automatic Domain Adaptation for Parsing http://aclweb.org/anthology//N/N10/N10-1004.pdf
-
Barbara Plank and Khalil Sima’an. Subdomain Sensitive Statistical Parsing using Raw Corpora
http://www.lrec-conf.org/proceedings/lrec2008/pdf/120_paper.pdf
-
Sujith Ravi; Kevin Knight; Radu Soricut. Automatic Prediction of Parser Accuracy
-
Jennifer Foster; Ozlem Cetinoglu; Joachim Wagner; Joseph Le Roux; Joakim Nivre; Deirdre Hogan; Josef van Genabith. From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0
-
Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner and Josef van Genabith, 2011. Comparing the Use of Edited and Unedited Test in Parser Self-Training
-
Satoshi Sekine. 1997. The Domain Dependence of Parsing
-
Adam Kilgarriff. 2001. Comparing Corpora.
http://www.kilgarriff.co.uk/Publications/2001-K-CompCorpIJCL.pdf
- Different ways of looking at corpora:
- which words are characteristic of a text/corpus:
- chi-squared
- Mann-Whitney (Wilcoxon) ranks test
- t-test
- MI
- log-likelihood
- Fisher's exact test (what log-likelyhood is approximating?)
- TF.IDF
- how similar are two corpora? how homogeneous is a corpus?
- uses known-similarity corpora to evaluate similarity measures
- Spearman rank correlation co-efficient
- chi-squared
- perplexity
Low-frequency and high-frequency (closed-class) words should generally be treated separately, since they have very different statistical properties.
Home | Forum | Discussions | Events