Research Of Singing With Note-name Recognition Algorithm Based On End-to-End Model

Main Article Content

Bintao Deng, Shimin Wang, Xiaodan Ren, Haiyang Ni, Ning Wang, Ming Zhao


With the rapid development of artificial intelligence, and people are not only seeking material needs, but also pursuing happiness and spiritual aspirations. Especially some singing talent shows appear, but there are many controversial results. In order to solve the problem of singing with note-name evaluation and evaluate the singer's singing level in the professional music education of solfeggio and ear training, the paper uses artificial intelligence algorithms to recognize the note-name. Firstly, preprocess the singing sound data, including pre emphasis, framing, and windowing to reduce the impact of noise, and extract the acoustic features of the singing sound, the acoustic speech features are organized into a form suitable for DCNN input and a DCNN model is designed to obtain the pinyin form of the singing sound, the softmax layer adopts an end-to-end CTC structure, which classifies the input singing speech, finds corresponding phoneme sequences, and obtains the output results. The end-to-end CTC structure is used to classify and optimize the recognition process, and finally the state of the obtained features is output. And then singing pronunciation dictionary generates candidate word sequences based on the mapping relationship between phonemes and notes. Finally, based on the acoustic model score, the candidate note sequence with the highest score is obtained through decoder processing, and obtained the singing with note-name result of the singing sound. In order to further improve recognition performance, the paper introduces a new CTC-DCNN acoustic model. In this model, residual blocks can transfer input features to the output part through shortcuts, allowing the multi-layer convolutional speech features to be preserved as much as possible. At the same time, the deep structure can also better achieve the extraction and analysis of speech features. A new and improved CTC-DCNN acoustic model is obtained by optimizing the maxout function. The algorithm proposed in this paper is fair and equal in obtaining information, and no one participates in the entire process of obtaining scores. It is believed that the scoring results obtained in this way should be more objective.

Article Details