NLTK's sentence_nist()
returns ZeroDivisionError
when the hypothesis & reference are the same. Here is my code:
from nltk.translate.nist_score import sentence_nistref_test = '太少'hypo_test = '太少'split_ref = ''.join(splitKeyword(ref_test))split_hypo = ''.join(splitKeyword(hypo_test))# Case 1: tokenized each Chinese character with spacenist_test = sentence_nist([split_ref], split_hypo) # ZeroDivisionError: division by zeroprint(nist_test)# Case 2: without splitting up the Chinese charactersnist_test = sentence_nist([ref_test], hypo_test) # ZeroDivisionError: division by zeroprint(nist_test)
both results have the same error:
ZeroDivisionError: division by zero
I expect the NIST calculation will have high scores when the hypothesis and reference are the same.
How can I correctly calculate the NIST score of the Chinese sentence pair above?
Detailed Trackback
---------------------------------------------------------------------------ZeroDivisionError Traceback (most recent call last)Cell In[41], line 6 4 split_ref = ''.join(splitKeyword(ref_test)) 5 split_hypo = ''.join(splitKeyword(hypo_test))----> 6 nist_test = sentence_nist([split_ref], split_hypo) 7 #nist_test = sentence_nist([ref_test], hypo_test) 8 print(nist_test)File ~/.local/lib/python3.10/site-packages/nltk/translate/nist_score.py:70, in sentence_nist(references, hypothesis, n) 18 def sentence_nist(references, hypothesis, n=5): 19 """ 20 Calculate NIST score from 21 George Doddington. 2002. "Automatic evaluation of machine translation quality (...) 68 :type n: int 69 """---> 70 return corpus_nist([references], [hypothesis], n)File ~/.local/lib/python3.10/site-packages/nltk/translate/nist_score.py:165, in corpus_nist(list_of_references, hypotheses, n) 162 nist_precision = 0 163 for i in nist_precision_numerator_per_ngram: 164 precision = (--> 165 nist_precision_numerator_per_ngram[i] 166 / nist_precision_denominator_per_ngram[i] 167 ) 168 nist_precision += precision 169 # Eqn 3 in Doddington(2002)