A method and an apparatus for multi-speaker singing voice synthesis are provided. In the method, a music score file represented by symbols is parsed to extract multiple lyrics and multiple notes. Audio data of each of the notes is loaded from an audio database. A vocoder is used to perform acoustic modeling on the audio data to adjust each audio data and concatenate the adjusted audio data to generate singing voice data. A generator of a voice conversion model generator is used to convert multiple acoustic features in the singing voice data into output features conditioned on target attributes, and the voice conversion model is trained according to multiple types of losses of the voice conversion model to obtain synthetic singing voice data with optimized output features. |