This project mainly focuses on using computer vision to classify wildlife species in camera trap images. The dataset used for this project is provided by the Wild Chimpanzee Foundation and the Max Planck Institute for Evolutionary Anthropology, specifically for images captured in Taï National Park.
The task is to develop a machine learning model capable of classifying images into one of the following categories:
- antelope_duiker
- bird
- blank
- civet_genet
- hog
- leopard
- monkey_prosimian
- rodent
Each image either contains one of these animal species or has no animals (blank). The goal is to build a model that predicts the probability distribution across these classes.
The competition involves classifying camera trap images, where each image is assigned a label indicating one of the species or marked as blank if there are no animals detected in the image. The challenge is set up as a multi-class classification problem with the added difficulty of handling images from different environmental conditions and ensuring the model generalizes well to unseen data from different sites within the park.
The dataset consists of images taken from camera traps at different sites in Taï National Park. The training and testing datasets are split by site, with no overlap between the training and testing sets.
- train_features: Images of wildlife and blank scenes.
- train_labels: Corresponding labels for each image in
train_features, with one-hot encoded values indicating the species class.
- test_features: Images without labels, which are to be predicted by the model.
- id: Unique identifier for each image.
- filepath: Path to the image file.
- site: The site where the image was captured.
The model uses deep learning techniques, specifically Convolutional Neural Networks (CNNs), to classify images. The model leverages transfer learning from a pre-trained model (e.g., ResNet50, EfficientNet) and fine-tunes it on the provided dataset to predict species classifications or blank images.
- Image augmentation techniques such as rotation, flipping, and color jittering are applied to improve model generalization.
- Images are resized to a consistent shape for input into the CNN.
The model's performance is evaluated using log loss (cross-entropy loss), which is a measure of how well the predicted probabilities match the actual labels. The lower the log loss, the better the model's performance.
The following libraries and tools are required to run the code:
- Python 3.x
- TensorFlow or PyTorch (for model development)
- Keras (for deep learning)
- scikit-learn (for data preprocessing and evaluation metrics)
- OpenCV (for image processing)
- Pandas (for data handling)
- Clone this repository:
git clone https://github.com/RamamAgarwal/conser-vision.git cd conser-vision
- Install the required dependencies:
pip install -r requirements.txt