In [4]:
%pylab inline
from __future__ import print_function
import segeval as se

Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].


Comparing word syllabifications¶

Syllabification is the task of placing syllable boundaries within a word, e.g., the word syllable could be segmented into three syllables using two boundaries as syl‧la‧ble.

To evaluate whether an automatic syllabifier is as good as a human syllabifier, we must evaluate how similar two segmentations are. For this, we can use Boundary Similarity (B) and Boundary Edit Distance (BED). BED is an edit distance for boundaries placed within a segmentation that can quantify the number of near misses and full misses between two solutions, and B is a normalization of BED. These metrics are described in:

Chris Fournier. 2013. Evaluating Text Segmentation using Boundary Edit Distance. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA.

An implementation of these metrics is provided by the SegEval package.

Data¶

Let's assume that we already ran an automatic segmenter upon a few words, and we can compare them to what we can assume is a manual solution provided by a version of the CMU Pronouncing Dictionary that has been augmented with syllable boundaries.

In [26]:
SYLLABLE_BOUNDARY = u'·'

word_order = ['automatic', 'segmentation', 'is', 'fun']

manual_solution = {
'automatic'    : u'au·to·ma·tic',
'segmentation' : u'seg·men·ta·tion',
'is'           : u'is',
'fun'          : u'fun'
}

automatic_solution = {
'automatic'    : u'au·tom·a·tic',
'segmentation' : u'seg·ment·ation',
'is'           : u'is',
'fun'          : u'f·un'
}


We need a way to convert these syllabifications into something that the segeval package can understand. Specifically, we want to create a tuple containing the size (in letters) of each syllable.

In [27]:
def convert_to_masses(syllabification):
'''
Take a syllabification, e.g., au·to·ma·tic, and produce a
tuple of segment masses, e.g., (2,2,2,3).
'''
syllables = syllabification.split(SYLLABLE_BOUNDARY)
return tuple([len(syllable) for syllable in syllables])

print(u'The \'{0}\' syllabification results in this tuple of masses: {1}'.format(automatic_solution['automatic'], convert_to_masses(automatic_solution['automatic'])))

The 'au·tom·a·tic' syllabification results in this tuple of masses: (2, 3, 1, 3)


BED is a bit difficult to interpret, so we want a way to read a high-level summary of its results.

In [28]:
def bed_to_str(bed):
'''
Take a tuple containing three types of boundary edits and create a string summary.
'''
string = list()
substitutions  = len(bed[1])
transpositions = len(bed[2])
if substitutions > 0:
string.append('{0} sub(s)'.format(substitutions))
if transpositions > 0:
string.append('{0} near'.format(transpositions))
return ', '.join(string)

word = 'automatic'
manual = se.boundary_string_from_masses(convert_to_masses(manual_solution[word]))
auto = se.boundary_string_from_masses(convert_to_masses(automatic_solution[word]))
bed = se.boundary_edit_distance(manual, auto)
print('BED of two segmentations of \'{0}\' can be interpreted as \'{1}\'.'.format(word, bed_to_str(bed)))

BED of two segmentations of 'automatic' can be interpreted as '1 near'.


Now, let's compare the automatic and manual syllabifiers against eachother using BED and B.

In [29]:
title = u'{0: <13}  {1: <15}  {2: <14}  {3: <18}  {4:}'.format('Word', 'Manual Solution', 'Automatic Solution', 'BED', 'B')
print(title)
print(''.join(['-'] * (len(title) + 3)))

all_b = list()

for word in word_order:
manual_segmentation = convert_to_masses(manual_solution[word])
manual_segmentation = se.boundary_string_from_masses(manual_segmentation)
automatic_segmentation = convert_to_masses(automatic_solution[word])
automatic_segmentation = se.boundary_string_from_masses(automatic_segmentation)

b = se.boundary_similarity(manual_segmentation, automatic_segmentation, boundary_format=se.BoundaryFormat.sets)
bed = se.boundary_edit_distance(manual_segmentation, automatic_segmentation)

all_b.append(b)

values = {
'b'      : b,
'bed'    : bed_to_str(bed),
'word'   : word,
'manual' : manual_solution[word],
'auto'   : automatic_solution[word]
}

print(u'{word: <13}  {manual: <15}  {auto: <18}  {bed: <18}  {b:.2f}'.format(**values))

mean_b = sum(all_b) / len(all_b)
std_b = sqrt(sum([pow(b - mean_b, 2) for b in all_b]) / len(all_b))
std_err_b = float(std_b) / sqrt(len(all_b))

print('\nOverall mean B = {0:.4f} +/- {1:.4f}, n={2}'.format(mean_b, std_err_b, len(all_b)))

Word           Manual Solution  Automatic Solution  BED                 B
----------------------------------------------------------------------------
automatic      au·to·ma·tic     au·tom·a·tic        1 near              0.83
segmentation   seg·men·ta·tion  seg·ment·ation      1 miss(es), 1 near  0.50
is             is               is                                      1.00
fun            fun              f·un                1 miss(es)          0.00

Overall mean B = 0.5833 +/- 0.1909, n=4


In conclusion, the overall mean similarity (B) of the syllabifications produced by the hypothetical automatic segmenter appears to be not so great, but unfortunately, with so few syllabifications to test, we cannot get a very accurate measurement of the similarity.