Audio Classification

Data Collection

The first and most important thing to learn any machine learning model is data. We gathered about 3500 audio samples from different different sites and data sets of the Indian birds. We collected data of about 23 bird species predominantly found in the Delhi. The data was then cleaned and converted into 10 sec clips .

Detection / Classification

We first converted the sound sample into a MEL spectrogram . Then a CNN based classifier was trained on a 70/30 data split classify bird species. A similar model was trained for bird sound detection on BADC 2017 dataset . Appart from CNN we also used some feature mapping techniques and SVM /Linear classifier on it.

Model Flowchart

In the final model we resorted to a small 3 layered CNN specified to be invariant against the temporal axis of the image as the place of bird sound should not affect the classification label . On further increasing the layers over fitting was observed.

Results

Detection Test Accuracy (On BADC dataset) : 86%

The main problem in the classification was absence of clean and accurate data as was present in BADC . We had to decide a trade off between accuracy and number of classes of birds the app would be able to predict.

Classification Test Accuracy on 16 bird Model:69%

Classification Test Accuracy on 11 bird Model:77%

This work is still in testing stage and is bounded by the amount of data we have. We believe this can be taken as a base model for further research and improvements.

Spectrograms of some Common birds