| A computer-implemented method for voice activity detection (VAD) is provided. According to this computer-implemented method, multiple first features of an input utterance are extracted based on multiple feature extractors respectively, and whether the input utterance corresponds to a target object is determined through a pre-trained classifier based on the multiple first features. The multiple feature extractors are respectively trained by multiple training sets corresponding to multiple different scenarios. In addition, a system and a computer program product using this method are also provided. |