WeSearch_Adaptation_Background

Organising my thoughts

Maria Wolters and Mathias Kirsten. Exploring the Use of Linguistic Features in Domain and Genre Classification. EACL'99

http://acl.ldc.upenn.edu/E/E99/E99-1019.pdf
Baldwin et al. How Noisy Social Media Text, How Diffrnt Social Media Sources?

http://aclweb.org/anthology//I/I13/I13-1041.pdf
Barbara Plank, Gertjan van Noord: Effective Measures of Domain Similarity for Parsing. ACL 2011

http://aclweb.org/anthology//P/P11/P11-1157.pdf
Lilja ØVRELID and Arne SKJÆRHOLT. Lexical categories for improved parsing of web data

http://aclweb.org/anthology/C/C12/C12-2088.pdf
Bonnie Webber. Genre distinctions for discourse in the Penn TreeBank http://aclweb.org/anthology//P/P09/P09-1076.pdf
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

http://aclweb.org/anthology//W/W10/W10-26.pdf
- particularly Vincent Van Asch; Walter Daelemans. Using Domain Similarity for Performance Estimation
  
  http://aclweb.org/anthology//W/W10/W10-2605.pdf
Daniel Gildea. Corpus Variation and Parser Performance

http://aclweb.org/anthology//W/W01/W01-0521.pdf
Tom Lippincott; Diarmuid Ó Séaghdha; Lin Sun; Anna Korhonen. Exploring variation across biomedical subdomains

http://aclweb.org/anthology//C/C10/C10-1078.pdf
David McClosky; Eugene Charniak; Mark Johnson. Automatic Domain Adaptation for Parsing http://aclweb.org/anthology//N/N10/N10-1004.pdf
Barbara Plank and Khalil Sima’an. Subdomain Sensitive Statistical Parsing using Raw Corpora

http://www.lrec-conf.org/proceedings/lrec2008/pdf/120_paper.pdf
Sujith Ravi; Kevin Knight; Radu Soricut. Automatic Prediction of Parser Accuracy

http://aclweb.org/anthology//D/D08/D08-1093.pdf
Jennifer Foster; Ozlem Cetinoglu; Joachim Wagner; Joseph Le Roux; Joakim Nivre; Deirdre Hogan; Josef van Genabith. From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0

http://aclweb.org/anthology//I/I11/I11-1100.pdf
Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner and Josef van Genabith, 2011. Comparing the Use of Edited and Unedited Test in Parser Self-Training

http://aclweb.org/anthology//W/W11/W11-2925.pdf
Satoshi Sekine. 1997. The Domain Dependence of Parsing

http://aclweb.org/anthology/A/A97/A97-1015.pdf‎
Adam Kilgarriff. 2001. Comparing Corpora.

http://www.kilgarriff.co.uk/Publications/2001-K-CompCorpIJCL.pdf‎
- Different ways of looking at corpora:
- which words are characteristic of a text/corpus:
  - chi-squared
  - Mann-Whitney (Wilcoxon) ranks test
  - t-test
  - MI
  - log-likelihood
  - Fisher's exact test (what log-likelyhood is approximating?)
  - TF.IDF
- how similar are two corpora? how homogeneous is a corpus?
  - uses known-similarity corpora to evaluate similarity measures
  - Spearman rank correlation co-efficient
  - chi-squared
  - perplexity
Low-frequency and high-frequency (closed-class) words should generally be treated separately, since they have very different statistical properties.

Home | Forum | Discussions | Events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeSearch_Adaptation_Background

Clone this wiki locally