Dataset Preparation
The Training Dataset was prepared by scraping images using python script from Oriental Image Database and The IBC Bird Collection. (15,436 images for 35 classes). The bounding boxes for Training Dataset was then prepared using the software RectLabel. Proper Training-Validation-Test Split (70:20:10 ) was done for model evaluation purpose at the later stage.
Image Detection Model Training
For the detection purpose, we tried several pretrained models from Mobile-net Single Shot Detectors to Mask-RCNN. We fine-tuned these models and evaluate the Detection accuracy and IOU metric score on their predicted bounding boxes against the bounding boxes prepared using RectLabel Software. The best results came with Mobile-net SSD architecture with pre-trained weights from COCO Dataset.On fine-tuning, Around 94.2% of the birds were correctly detected from the test images with confidence value>0.5. The mean IOU Metric of the detected birds were found to be 0.647 on the test dataset. So we finalised this model as our Detection model.

Classification Model Training
We then cropped the bounding boxes from the Training Images for further classification. Several pre-trained networks from VGG to ResNet 50 were tried and we finally decided to go with Resnet 50 architecture. Transfer Learning approach was followed on each of the model. For the initial trials, The weights from ImageNet were taken and the initial few layers were freezed. The final dense layer after the global average pooling were only trained at each stage. Augmentation was done on each image sample. In the dense layer too, different combination of no of layers and no of nodes were tried and for each, the validation accuracy was recorded. But we were not able to get more than 0.65 F1 score and 70% accuracy with these approaches. So, we started experimenting with the frozen layers. We unfreeze some more layers and achieved a better F1 score. This was a tradeoff between training time. In our final model, we froze top 150 layers from ResNet 50 model and trained only the last 25+Dense layers. At this configuration, we were able to achieve 90% accuracy on test dataset. Beyond this, the test accuracy started increasing signalling the model has overfit with the limited no of image samples and huge no of trainable parameters.
Results:
- Training Accuracy: 95.9% Training F1 Score: 0.957
- Validation Accuracy: 90.1% Validation F1 Score: 0.886
- Test Accuracy:90.0% Test F1 Score: 0.897
| F1 score (Classification) |
| 0.8936170212765957 |
| 0.8947368421052632 |
| 0.9534883720930233 |
| 0.9448818897637795 |
| 0.8 |
| 0.9051094890510949 |
| 0.9 |
| 0.8450704225352113 |
| 0.9174311926605505 |
| 0.875 |
| 0.9354838709677419 |
| 0.9007633587786259 |
| 0.8918918918918919 |
| 0.9 |
| 0.7272727272727273 |
| 0.8990825688073395 |
| 0.9009009009009009 |
| 0.868421052631579 |
| 0.8985507246376812 |
| 0.875 |
| 0.905982905982906 |
| 0.8767123287671232 |
| 0.9333333333333333 |
| 0.9421487603305785 |
| 0.926829268292683 |
| 0.9357798165137615 |
| 0.9108910891089109 |
| 0.8695652173913043 |
| 0.9647058823529412 |
| 0.9541284403669725 |
| 0.8888888888888888 |
| 0.8620689655172413 |
| 0.8163265306122449 |
| 0.9565217391304348 |
| 0.9583333333333334 |
| Bird Class | Counts |
| Brahminy_maina | 234 |
| Bulbul | 593 |
| Collared_dove | 439 |
| Common_myna | 631 |
| Common_sparrow | 442 |
| Coppersmith | 729 |
| Crow_pheasant | 310 |
| Drongo | 372 |
| Golden_backed_woodpecker | 512 |
| Green Barbet | 314 |
| Hoopoe | 315 |
| House_Crow | 675 |
| Indian_hornbill | 392 |
| Indian_robin | 300 |
| Jungle_Crow | 306 |
| Jungle_babbler | 553 |
| Koel | 595 |
| Little_green_beeeater | 378 |
| Magpie_robin | 342 |
| Owlet | 333 |
| Parakeet | 563 |
| Pariah_kite | 342 |
| Partridge | 413 |
| Peacock | 639 |
| Pied_myna | 189 |
| Pied_wagtail | 580 |
| Pigeon | 505 |
| Pond_heron | 311 |
| Red_wattled_lapwing | 421 |
| Rufous_backed_shrike | 541 |
| Shikra | 573 |
| Sunbird | 517 |
| Tailor_bird | 430 |
| White_breasted_kingfisher | 358 |
| White_breasted_water_hen | 488 |