Bird Species Classification System Using Transfer Learning

: Manual recognition is limited by the observer's expertise and knowledge, which can lead to errors when observed by non-experts. The objective of this research is to create a machine learning algorithm that can accurately classify bird species based on physical features and develop a software system that includes the machine learning algorithm, allowing for efficient classification of different bird species. This research also wants to evaluate the accuracy of the approach in real-world scenarios. The methodology research uses the machine learning life cycle model and software development life cycle model. The research aims to provide a user-friendly interface that recommends bird species classifications based on uploaded images, ultimately contributing to a more accessible and informative bird identification experience. In this research, the F1-score with fine-tuning is reported as 0.8889. It is close to 1, it suggests a well-balanced performance in terms of correctly identifying positive instances which is precision, and capturing relevant positive instances which are recalled. Based on the result, the proposed system can enhance users' ability to accurately identify and classify various bird species through the utilization of a pre-trained convolutional neural network model.


Introduction
Recognizing the various bird species is essential for ecological research, conservation efforts, and biodiversity monitoring.Manual bird species identification can be challenging, especially when dealing with similar-looking species, leading to errors [1] [2] [3] [4] [5] in ecological studies and conservation initiatives.Various deep learning methods have been explored in recent research.Deep learning models such as faster Region-based Convolutional Neural Network (R-CNN) and You Only Look Once (YOLO) have shown promising results in bird detection using unmanned aerial vehicle imagery [6].Therefore, a robust bird species recognition system is developed.The system compiles a comprehensive dataset of wild bird images and employ deep learning methods to identify and categorize different bird species based on their distinct physical characteristics, plumage patterns, and behaviors.By leveraging cutting-edge technology, this system can provide ecologists, conservationists and especially bird watchers with a reliable tool for accurate bird species identification using image.Accurate bird species recognition is vital for monitoring population trends, assessing habitat health, and implementing targeted conservation strategies [7] [8].Moreover, advancements in bird species recognition may contribute to more informed conclusions about bird behaviors, migration patterns, and ecological roles, thereby aiding in the maintenance of ecosystems and the preservation of biodiversity [9].The application of transfer learning with Convolutional Neural Networks (CNNs) has been recognized as a significant discipline in computer vision research, witnessing tremendous growth due to its efficacy [10].
Transfer learning allows the model to leverage knowledge learned from one task, such as recognizing objects, and apply it to a related but different task, such as identifying bird species [11].Overall, this research highlights the potential of using advanced technology to improve animal welfare and has the potential to contribute to advancements in the field of machine learning and computer vision.Developing this system can help human recognize bird species and use the information for various purposes.Due to the vast number of distinct breeds and the similarities in appearances, it is challenging to identify the specific bird species in the wild.It is crucial for various purposes, including biodiversity surveys and disease management.Manual identification methods are prone to errors, particularly for individuals lacking expertise in the field.Firstly, the great similarity in appearance among bird species poses a significant challenge to accurate manual classification.Manual recognition is limited by the observer's expertise and knowledge, which can lead to misidentification and errors [12].Plus, the accuracy of manual recognition heavily relies on the observer's ability to distinguish fine-grained features, which can be challenging, leading to potential misclassifications [13].Transfer learning has been proven to be helpful in generalizing among bird monitoring researchs with new species, in new ecosystems, and using images of differing resolutions and image capture features [14].Additionally, the use of transfer learning in the field of image recognition has been tested to evaluate the best model for bird recognition systems [15].
In conclusion, this research not only addresses the pressing issue of bird species recognition but also paves the way for broader applications in the field of bird welfare and conservation.The scope of the wild bird species recognition system research encompasses the development of a machine learning algorithm designed to accurately identify various bird species from images captured in their natural habitats.This endeavor necessitates in-depth research of the distinguishing physical characteristics that set different bird species apart, including plumage patterns, beak shapes, and distinctive markings.

Literature Review 2.1. Overview of Bird Species
There are numerous recognized bird species that showcase distinct characteristics and behaviors.Bird species belong to various families and orders within the avian classification.Well-known wild bird species include eagles, hawks, owls, and falcons, which are known for their impressive hunting abilities and keen eyesight.Additionally, there are smaller wild bird species such as finches, robins, and warblers, each with their own unique plumage patterns and songs.
Birds are a diverse group of animals, with over 10,000 species worldwide.They exhibit a wide range of behaviors, including vocalizations, migration, and adaptation to various environments.Research in this field has utilized advanced technologies such as CNN and Recurrent Neural Networks (RNN) to classify bird species based on vocalizations and spectrogram analysis [16].Additionally, the research of bird species has extended to ecological and environmental aspects, such as the impact of noise pollution on the song characteristics of certain bird [17].Furthermore, the analysis of bird vocalizations has led to the development of richly annotated birdsong audio datasets, providing valuable resources for researching bird behavior and communication [18].Bird species classification has also been explored using citizen science data, where volunteer contributions have been analyzed to understand spatial and temporal patterns in bird data contribution activities [19].
Apart from the widely recognized species, there are also exotic and lesser-known birds inhabiting various regions globally.These include the resplendent quetzal in Central America, the majestic peacock in Asia, and the elusive kakapo in New Zealand.Each bird species possesses its own set of physical attributes, vocalizations, and behavioral tendencies, making them captivating subjects for bird species recognition research.In summary, the research of birds and bird species encompasses a wide range of interdisciplinary research, including ecology, computer science, artificial intelligence, and environmental science.The utilization of advanced technologies and methodologies has significantly contributed to our understanding of bird behavior, communication, and adaptation to changing environments.

Artificial Intelligence
Artificial Intelligence (AI) is a rapidly advancing technology that has become increasingly prevalent in various aspects of human life [20].It encompasses the intelligence displayed by computers, which is achieved through the development of algorithms and models that enable machines to perform tasks that typically require human intelligence [21].AI can be categorized into weak AI (Narrow AI) and strong AI (General AI), with the latter being designed to learn and display conformity with human procedures, potentially exceeding human capabilities in the form of artificial superintelligence [22].The application of AI extends to diverse fields such as healthcare, marketing, entrepreneurship, and decision-making in organizations.Furthermore, AI has been implemented in sectors such as public health, where it has been utilized in the detection of diseases like Covid-19 through machine learning algorithms [23] [24].In conclusion, AI is a multifaceted and rapidly evolving technology with wideranging applications across various domains, presenting both opportunities and challenges that necessitate ethical considerations and legal frameworks to ensure responsible and equitable use.

Machine Learning
Machine learning is a part of AI and computer science that focuses on using data and algorithms to mimic human learning processes and progressively increase accuracy.Over the last couple of decades, Machine learning-based products like Netflix's recommendation engine and self-driving cars have been made possible thanks to technological advancement in storage and processing power [25].The core principle of machine learning is to enable systems to learn and adapt from data by iteratively optimizing and adjusting their internal parameters.This process involves training a model on a labeled dataset, where the model learns to generalize patterns and relationships present in the data.
Machine learning has various techniques algorithms such as supervised learning, unsupervised learning and reinforcement learning.In supervised learning, the model learns from labeled examples.The input data paired with corresponding target labels.The goals of this technique are to train the model to classify or predict unknown data correctly.Meanwhile, unsupervised data reveals hidden structure in unlabeled data.In unsupervised data, there are no output variables to predict or classify the data.The objective of this technique is to find patterns or structure in the data based on the relationship between data points themselves [26].Reinforcement learning involves trial and error.It tries different actions and receive feedback on whether it is a good move or a bad move.Reinforcement learning is used in various application such as text summarization and translation.

Computer Vision
Computer vision is a field of computer science that focuses on giving computers the ability to recognize and comprehend people and objects in pictures and videos.Computer vision, like other forms of AI, aims to carry out and automate tasks that mimic human abilities.In this situation, computer vision aims to mimic both how people see and how they perceive what they see [27].This specific technique uses input from sensing devices, AI, machine learning, and deep learning to imitate human vision system.Computer vision make use of algorithm that have been trained on vast amounts of visual data or cloud-based images.They recognize patterns in visual data and apply the patterns to predict and conclude the content of other images [27].Computer vision is widely used in many industries ranging from energy to utilities to manufacturing and automotive.The usage is expected to reach USD 48.6 billion by 2022 as the market range continue to grow [28].This technique has a lot of potential for future usage to make human's life better with the help of AI. a. Image Classification Image classification is a technique in computer vision that works with camera and AI software to be able to identify objects, places, people, writing and action in digital images based on visual data [29].It is a supervised learning problem that trains a model to recognize object by using labelled sample photos.In the early times, computer vision models relied on raw pixel data as the input for the model but the raw pixel data itself is insufficient and unstable to represent all the possible ways an object could be captured and appear in an image.It is mentioned that contentbased filtering is a technique that identifies items by using keywords and attributes on which the user profile is built.This means the data are supplied from the start.b.Image Segmentation Image segmentation is a process where all pixels in an image are assigned to a label with similar characteristics.It is primarily used to locate objects and detect borders like lines and curves in the image.The goal of segmentation is to make it simpler or change the visual representation of an image into a more understandable format for machines to analyse [30].

Deep Learning
Deep learning has spread wide and grown in the last few years as it has been used in many industries.
Deep learning has beat well-known machine learning techniques in various domains like cybersecurity and bioinformatics among many others [33].Unlike traditional computer vision techniques that heavily rely on handcrafted features and complex algorithms, deep learning leverages the power of Artificial Neural Networks (ANNs) to automatically learn intricate patterns and representations directly from raw visual data [34].Deep learning learns by discovering and identifying complex patterns in the input data that were given as training.The networks can create multiple levels of abstraction to represent data by constructing computational models that are made up of multiple processing layers [35].
ANNs are computational models inspired by the human brain's neural structure [36].They are widely used in various fields, including computer vision, forecasting, and decision support systems.CNNs are a specific type of ANN that have been particularly successful in computer vision tasks, especially after achieving remarkable results in the ImageNet Large Scale Visual Recognition Competition in 2012 [37].CNNs are designed to learn spatial hierarchies of features automatically and adaptively through a backpropagation algorithm [38].Transfer learning is a machine learning technique where a model developed for a particular task is reused as the starting point for a model on a second task [39].It has been widely used to leverage pre-trained CNNs for various computer vision tasks, especially when the amount of labeled data is limited [36].CNNs, as a class of ANNs, have been dominant in computer vision tasks since their success in the ImageNet competition [37].They have been widely used in various applications, including forecasting, decision support systems, and industrial engineering.Transfer learning, on the other hand, has been applied in computer vision tasks to improve the performance of models, especially when dealing with limited labeled data [36].It has been particularly useful in leveraging pre-trained CNNs for new tasks, thereby reducing the need for large amounts of labeled data.In summary, ANNs, including CNNs, have been widely used in various fields such as computer vision, forecasting, and decision support systems.Transfer learning, particularly in the context of leveraging pre-trained CNNs, has been an effective technique to improve model performance, especially when dealing with limited labelled data.Understanding the strengths and applications of these techniques is crucial for selecting the most suitable approach for specific problem domains.

Methodology
A hybrid approach combining the Machine Learning Life Cycle (MLLC) and Software Development Life Cycle (SDLC).SDLC was chosen for the development of this research.This approach integrates the strengths of both methods to ensure successful development of bird species classification systems.This model serves as a framework that provides a structured, step-by-step approach to effectively completing development tasks.While, the detection of bird species and data collecting are simultaneously addressed using the MLLC.

Software Development Life Cycle
The Software Development Life Cycle (SDLC) is used in the early stages to specify the system's objectives and requirements.The process includes identifying the necessities of use, gathering user requirements, and figuring out any limitations or other factors.SDLC consists of several phases that are crucial for the development of software products.These phases typically include the requirement phase, design phase, coding phase, testing phase, and maintenance phase, all of which are interconnected in a cyclical manner [40].

Machine Learning Life Cycle
The Machine Learning Life Cycle (MLLC) is an essential component of the SDLC, particularly in the development phase.The MLLC involves several key phases, including gathering data, data preparation, data wrangling, data analysis, model training, testing, and deployment.These phases are crucial for effectively developing and deploying machine learning models within the software development process [41].The model training stage involves epochs layering the pre-trained network on the bird dataset, allowing it to learn intricate features essential for accurate classification.Evaluation metrics, such as precision, recall, and F1-score, are then employed to assess the model's performance on a validation set.Finally, successful models are deployed for real-world applications, contributing to the advancement of deep learning over traditional machine learning algorithms in the intricate task of recognizing diverse bird species through image analysis.In the context of a bird species recognition system using transfer learning, these stages are crucial.

Development of VGG16 for Bird Species Classification System
An approach based on the use of a CNN as the architectural foundation for a bird species classification system is proposed in order to achieve the goal of developing a machine learning algorithm that accurately classifies cat breeds based on physical features.CNNs are ideally suited for this application because of their remarkable performance in picture classification tasks.The first step in the algorithm development process is to compile a large dataset of images of labelled birds from diverse species.The CNN model is trained using this dataset as the basis.The dataset is used for preprocessing procedures like resizing, normalization, and augmentation to increase diversity and strengthen algorithm adaptability.During the training phase, the CNN model is optimized and further layers are added as necessary to improve the system's performance.Transfer learning is a method that uses pre-trained models on massive datasets to speed up training and increase classification precision.
The CNN model can take advantage of these pre-trained models' insights and generalizations by using their knowledge and learning features.Transfer learning allows the CNN model to initialize its weights efficiently and use previously learned information to begin learning from the dataset for a particular cat breed.This can make the training process faster and help the model's ability to classify bird species accurately increases.The bird species classification system's performance can be improved by adjusting model parameters, adding extra layers, and making use of pre-trained knowledge.This can be done by adding layers, fine-tuning and enhancing the CNN model and using transfer learning.This approach can save training time, improve performance, and enable accurate cat breed classification based on the learned features and generalizations from the pre-trained model.Figure 1 shows the layers in CNN and Figure 2 shows the pre-trained model.

System Architecture
The proposed system aimed to leverage transfer learning, a technique in machine learning, to facilitate the automated identification of bird species.The system consists of three main stages image input, feature extraction using a pre-trained model, and species prediction.The system employed a pretrained model to extract features from the input images and subsequently predicted the species of the bird depicted by allowing users to upload images of birds.The inclusion of a clickable link to a Google web search for additional information on the predicted bird species enhanced the user experience and provided valuable supplementary information.Firstly, users will need to upload an image of the unspecified bird.The machine will detect a bird shape based on the image.The pretrained model is based on the database that it was trained on.Lastly the output will be the bird species prediction.In addition to the species prediction, the system provided a clickable link that directed users to a Google web search for more information on the predicted bird species.The text will be clickable and can bring users to google web search of the said species.

Evaluation for The Trained Model
The accuracy of the bird species classification system will be evaluated after all the models, including the CNN with transfer learning, have been trained.The evaluation process aims to assess the performance and effectiveness of the machine learning models in accurately classifying different cat breeds based on their physical features.The evaluation process will involve testing the trained models on a separate evaluation dataset that contains real-world images of various cat breeds.The models will classify each image, and their predictions will be compared against the ground truth labels to measure accuracy.This step will determine whether the produced system is effective in accurately identifying cat breeds in real-life situations by carrying out a complete evaluation of the machine learning models.This evaluation will help to validate the suggested techniques and methods for identifying bird species and serve as a guide for future necessary adjustments and improvement.

Development Design
This section explained the architecture, flowchart, and interface design of the Bird Species Classification System.This section annotates the development flow of the system while giving more insights of how the system implements machine learning that classifies the images uploaded by users.

Model Training
The classification results provided to the user will include the predicted bird species along with a confidence score or probability associated with the prediction.This score indicates the model's level of certainty regarding the classification.The user will be able to view the classification results directly on the website interface, allowing for a seamless and user-friendly experience.
Figure 5 shows the flowchart of the model training.

Web Development
The integration of the VGG16 model for transfer learning involved incorporating the pre-trained model into the system and fine-tuning it on a dataset of bird images.This process allows the model to learn the specific features and characteristics of different bird species, enabling it to accurately classify bird images based on their visual attributes.An evaluation was conducted to assess the performance of the machine learning model.An image will be uploaded to test into the model and the VGG16 model will analyze the image and give the prediction.The accuracy of the classification process refers to the percentage of the prediction being correct.The home page contains the image upload feature.The background of the webpage is an mp4 video of a forest.There is also a top bar in this home page.Users need to insert an image through the white box and the system will predict the bird species.Figure 6 and 7 shows the page interface.In this page, the user will be uploading the image and the system will predict the bird species of the image uploaded based on the trained model.Users need to insert an image through the white box and the system will predict the bird species.
Figure 8 shows the bird species classification page interface page.The output will show a name of the predicted species and the probability percentage.A clickable link leading to a google search of the bird species will be under the bird species title.A button is also added to give the user second and third prediction with no probability percentage.

c. About Page
The about page contains the explanation of the website and how to use the website to know the bird species.There is also a contact form for users to get in touch with the developer team and give feedback for the system.Figure 9 shows the "About" page interface.

Accuracy and Loss
The overall accuracy that this system got was 89.66%.It is a result after 50 layers of epochs.The training accuracy fluctuated between 14.5% and 78.2%, while the validation accuracy ranged from 21.9% to 81.2%.This accuracy metric is essential as it provides a quantitative measure of the system's performance in correctly classifying bird species.The time taken for this training process was approximately 1 hour and 26 minutes.These findings provide insights into the model's performance.
Figure 10 shows the accuracy and loss graph of the model.

Confusion Matrix
A confusion matrix is a fundamental tool in the evaluation of machine learning models, offering a comprehensive and insightful overview of their performance.Figure 11 shows the confusion matrix.The confusion matrix in this research only plot for 10 first classes in the dataset to conclude the overall accuracy.Almost all labels were predicted correctly.Some images were mixed up by the model prediction as the accuracy is not 100%.A confusion matrix is a pivotal tool for assessing the performance of machine learning models, particularly in classification tasks.It provides a detailed breakdown of predictions compared to actual class labels, categorizing results into true positives, true negatives, false positives, and false negatives.The F1-score is calculated using the following Equation 1. (1) The F1-score ranges from 0 to 1, where a higher score indicates better overall model performance.It is especially beneficial when there is an uneven distribution between the positive and negative classes.
Figure 13 show F1-Score.In this research, the F1-score with fine-tuning is reported as 0.8889.It is close to 1 where 1 suggest a well-balanced performance in terms of correctly identifying positive instances which is precision and capturing relevant positive instances which is recall.

Conclusion
The evaluation of the machine learning model proposed for bird species classification, specifically with fine-tuning, demonstrated promising results.The score for proportion of positive identifications (true positives) that were actually correct is 0.9180 (precision score), while the score for proportion of actual positives that were identified correctly is 0.8900 (recall score).A high precision means that when the model predicts something as positive, it's likely to be correct while a high recall means that the model is good at detecting all the positive cases.The model achieved an accuracy of 91.80%, reflecting its proficiency in accurately classifying bird species based on the images provided.Meanwhile a recall with a value of 89.66%, gauges the model's ability to capture and correctly predict all instances of the actual positive class.
Precision and recall metrics further revealed a well-balanced performance, resulting in an F1-score of 0.8889.This score signifies the model's ability to effectively identify positive instances while capturing a substantial proportion of the actual positive class.These findings collectively underscore the reliability and suitability of the fine-tuned model for the intricate task of bird species classification through image uploads.The recommendations outlined for future work aim to address these limitations and enhance the bird species classification system in future development and research towards becoming a pivotal resource for bird classification and research for bird related field.
Several recommendations for future works to address the limitation and further enhancement of image classification system.
• Continuous validation and refinement of the classification algorithms are essential to ensure the system's accuracy and reliability across different backgrounds and conditions.• A collaboration with ornithologists and bird experts can provide valuable insights for refining the system and expanding its applicability in real-world bird classification scenarios.• The training dataset needs to be updated and expanded by adding more species of birds to ensure this system remains adaptable and inclusive of various bird species.

Figure 5 .
Figure 5. Flowchart of Bird Species Classification Model Training

Figure 8 .
Figure 8. Bird Species Classification Page Interface

Figure 10 .
Figure 10.Accuracy and loss for model's performance

Figure 11 .
Figure 11.Confusion Matrix for Model's Performance

Table 1 .
Image segmentation does not process the whole image.This technique is widely used in healthcare field for medical imaging analysis.Given the ability to segment objects, this technique is most suitable to be used in diagnosing various types of diseases.c.Comparison between Image Classification and Segmentation Segmentation and classification are clearly different to some extent.The one difference between these computer vision techniques is the classification process is easier than segmentation because in a single image.All objects are categorized in a single class.Meanwhile, in segmentation, Comparison between Image Classification and Image Segmentation [32]y object of a single class in an image is highlighted with different shades to make the objects easily recognizable by computer vision[31].Image classification and image segmentation are both different.Image segmentation is a process of dividing an image into meaningful region while image classification is a process of labelling the entire image into a class according to images it has been trained[32].Both computer vision techniques have different purposes and use different approaches which means both requires specific and different types of training data to accomplish successful model training.Table1compares the classification and segmentation on various aspects.

Table 2
compares the CNN, ANN and Transfer Learning.

Table 2 .
Comparison between CNN, ANN and Transfer Learning