In order to analyze relationship between young children’s language development and their listening environment, researchers record everyday life of kids and collects statistics on the frequency of sounds. Quntifying sound events requries labeling all of the audio. However, manual labeling of audio (days or weeks of audio) is expensive and time-consuming. Automatic labeling using machine learning techniques require enough training data set and fine-tuning for specific applications, and very unreliable for practical uses.


Instead of fully manual or automatic way, I designed an interactive sound annotation system where a user and machine interact each other. An initial machine's prediction is presented based on a few examples of labeled data. Then, a human validates or modifies the machine's prediction as a feedback. As the user feedback is submitted each round of the interaction loop, the machine's prediction becomes more accurate.

Development Process

Bongjun Kim interviewed researchers who have labeled recordings of children's everyday life and figured out their manual annotation process and the difficulties of the annotation task. He analyzed the audio data to understand characteristics of the sound environment and found appropriate recognition algorithms. He designed and implemented the interactive sound annotation system and updated it based on user feedback.