Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What scores does Table 1 use on the paper? #26

Open
ghost opened this issue Mar 29, 2018 · 3 comments
Open

What scores does Table 1 use on the paper? #26

ghost opened this issue Mar 29, 2018 · 3 comments

Comments

@ghost
Copy link

ghost commented Mar 29, 2018

On the paper, Table 1 (c) shows the entity linking scores, but how to solve them especially CoNLL scores?

(c) Entity Linking model Comparison. 
CoNLL
Link Count only: 68.614
manual (oracle): 98.217

For example, some mentions and its candidate entities are there.

doc_id, mention, candidate entity, label
-------------------------------------
1, apple, Apple Pie, True
1, apple, Apple (company), False
1, apple, Apple (fruits), False
...

If it predicts one entity that has the highest score of each mentions, I don't need to use false candidates to solve accuracy, but I don't know the Table 1 used false candidates or not.

How did you solve the Table 1 (c) scores?

Paper: https://arxiv.org/pdf/1802.01021.pdf

@JonathanRaiman
Copy link
Contributor

If I understand correctly the data you are referring to also provides a "proposal set" of entities for each mention, and marks one of the proposed entities as correct, while others are incorrect?
Table 1 measured for all mentions given in the CoNLL eval set the accuracy at recovering the true entity under the proposal set of any entity in Wikipedia/Wikidata (e.g. not just those proposed by CoNLL).

@ghost
Copy link
Author

ghost commented Mar 29, 2018

Well, I just used AIDA CoNLL-YAGO dataset, and prepared it for solving accuracy like this:

out = []
with open('../conll_dataset/aida-yago2-dataset/AIDA-YAGO2-dataset.tsv') as f:
    index = 1
    me = []
    ss = []
    first = True
    for line in f:
        if line.startswith('-DOCSTART-'):
            if first:
                first = False
                continue
            out.append([index, ' '.join(ss), list(set(me))])
            index += 1
            me = []
            ss = []
        else:
            line_spl = line.replace('\n', '').split('\t')
            ss.append(line_spl[0])
            if len(line_spl) > 4:
                if line_spl[1] == 'B':
                    me.append((line_spl[2], line_spl[4].replace('http://en.wikipedia.org/wiki/','')))
data = out

data[0] is like this:

# [doc_id, doc_text, [pairs of mention and true entity] ]
[1,
 'EU rejects German call to boycott British lamb .  Peter Blackburn  BRUSSELS 1996-08-22  The European Commission said on Thursday it disagreed with German advice to consumers to shun British lamb until scientists determine whether mad cow disease can be transmitted to ...... ',
 [('Loyola de Palacio', 'Loyola_de_Palacio'),
  ('Britain', 'United_Kingdom'),
  ('Germany', 'Germany'),
  ('European Commission', 'European_Commission'),
  ('France', 'France'),
  ('Europe', 'Europe'),
  ('BRUSSELS', 'Brussels'),
  ...
]]

and calculated accuracy:

for d in tqdm_notebook(data):
    # sentence of the target document
    sentence = d[1]
    
    # ts are target mentions on the document
    ts = [str(t[0]) for t in d[2]]
    true_entities = [str(t[1]).replace('_', ' ') for t in d[2]]
    
    # tokenize sentence by using target mentions
    # and model_probs is the output of get_probs function from the notebook you added
    tokenize = partial(en_tokenize, ts=ts)
    sent_splits, model_probs = solve_model_probs(sentence, tagger, tokenize=tokenize)
    
    # predicted entities that have the highest score of each mentions
    pred_entities = run(ts, sent_splits, model_probs, indices2title, type_oracle, trie, trie_index2indices_values, trie_index2indices_counts)
    
    # append result: true -> true entity, pred -> predicted entity
    results += [{'doc_id':d[0], 'mention':x, 'true': y, 'pred': z} for x,y,z in zip(ts, true_entities, pred_entities)]
    
df = pd.DataFrame(results)
matched = df['pred'] == df['true']
length = df['pred'].shape[0]
assert len(df['pred']) == len(df['true'])
accuracy = float(sum(matched))/float(length)

Is this correct way to calculate accuracy?

@lbozarth
Copy link

I'm stuck on this same part, the accuracy calculated this way is 0.7 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants