Sentence boundary detection without speech recognition: A case of an under-resourced language

Main Article Content

Jamil N., Ramli M.I. Seman N.

Abstract

Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends. Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches. Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component. This is a constraint for Under Resource Languages such as the Malay language. This paper describes the SBD for spontaneous Malay language spoken audio. Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers. The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries. Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification. Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate. For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates. 

Article Details

Section
Articles