Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntactic markers #43

Open
odanoburu opened this issue Jul 15, 2019 · 2 comments
Open

syntactic markers #43

odanoburu opened this issue Jul 15, 2019 · 2 comments

Comments

@odanoburu
Copy link

An adjective may be annotated with a syntactic marker indicating a limitation on the syntactic position the adjective may have in relation to noun that it modifies. If so marked, the marker appears between the word and its following comma. If a lex_id is specified, the marker immediately follows it. The syntactic markers are:
(p) predicate position
(a) prenominal (attributive) position
(ip) immediately postnominal position

(from https://wordnet.princeton.edu/documentation/wninput5wn)

how to represent them in the text format?

I think they are similar to frames, so we could encode them as such..? (1.) or should we include them as another ad hoc thing, like frames, but with its own name? (2.) or should we just put this information in a separate file? (3.) (I'm thinking we might want to have a few of those anyway, so this information could be shown in the emacs mode and even be editable there)

w: abounding
w: galore 1 frame 3     # with 3 meaning (ip)
sim: adj.all:abundant
g: existing in abundance; "abounding confidence"; "whiskey galore"
w: abounding
w: galore 1  marker ip
sim: adj.all:abundant
g: existing in abundance; "abounding confidence"; "whiskey galore"
adjs.all:galore:1	ip
@arademaker
Copy link
Member

I tend to prefer 2

@odanoburu
Copy link
Author

that's my least favorite option from the implementation point of view, since it introduces more ad hoc things. in the wordsense and synset datatypes we've defined we already have fields for frames, while all realations are lumped together in one field. plus we only have markers in a few adjectives, but all wordsenses would end up having this field -- unless we can think of a better representation. i don't like the idea of treating adjectives specially, but maybe that's one way to go..

data WNWord = WNWord WordSenseIdentifier [FrameIdentifier] [WordPointer]
  deriving (Show,Eq)

-- synsets can be 
data Unvalidated
data Validated

data Synset a = Synset
  { sourcePosition       :: SourcePosition
  , lexicographerFileId  :: LexicographerFileId
  , wordSenses           :: NonEmpty WNWord
  , definition           :: Text
  , examples             :: [Text]
  , frames               :: [Int]
  , relations            :: NonEmpty SynsetRelation
  } deriving (Show,Eq)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants