SCM/TCM Code and Annotations for PanAfrican2019 Videos ------------------------------------------------------ (associated to paper: X Yang, M Mirmehdi, T Burghardt. 'Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending.' Computer Vision for Wildlife Conservation (CVWC) Workshop at IEEE International Conference of Computer Vision (ICCVW), October 2019.) CONTENTS: This public and freely available dataset contains a PyTorch code section for the SCM and TCM components described in the above paper, a cleaned annotation dataset for video object detection in the PanAfrican2019 videos, and the detailed split of the videos into training/validation/test portions. ANNOTATION SET: This is a cleaned dataset compared to the originally used dataset, leading to slightly improved performance results as given below: ResultsA ResultsB ResultsC Backbone Res50 baseline 80.79 82.88 84.41 +TCM 90.02 90.11 90.31 +SCM+TCM 90.81 90.57 91.22 Backbone Res101 baseline 85.25 86.02 87.23 +SCM+TCM 90.21 89.44 91.07 ResultsA - pretrained on Imagenet VID and fine-tuned on uncleaned annotation set (paper) ResultsB - training on this annotation set ResultsC - pretrained on Imagenet VID and fine-tuned on this annotation set DATA: The code sections are provided in a file called context_block.py Users are free to utilise these SCM or TCM component under the Non-Commercial Government Licence for public sector information. All other code components of the network described in the paper are available as adaptations and combinations of other sources. The annotation dataset contains bounding box annotations and the animal species tags (chimp/gorilla) found within the 500 mp4 camera trap videos (encoded by 10-character filenames .mp4) in the PanAfrican2019 dataset provided by the PanAfrica Programme. The data is split into 3 subsets, that is 85% for training, 5% for validation and 10% for testing. Txt files are provided to specify the exact split. Almost all video files contain 360 frames (15 seconds times 24 frames per second), thus, each video contributing approx. 180000 frames. Users are free to utilise these annotations under the Non-Commercial Government Licence for public sector information. DETAILED FILE FORMATS: (CODE) For the PyTorch code, a single context_block.py file is provided specifying the SCM and TCM layers fully. (SPLITS) The files trainingdata.txt, validationdata.txt and testdata.txt list all video file names for each of the 3 subsets. The file alldata.txt contains all file names of the dataset. (ANNOTATIONS) The annotation data is contained in the annotations.zip archive. Once unzipped, there is a folder per video file that contains one XML file per frame. Each XML file has a name consistent with VIDEOFILENAME_frame_FRAMENUMBER starting with FRAMENUMBER=1. Each XML file contains information about each of the animal bounding boxes of that frame. Providing an example, the structure is as follows: 1DtGa13tFZ 1 404 720 3 True Great Ape chimpanzee 273.00 277.91 350.55 403.79 Great Ape chimpanzee 455.56 274.21 487.12 343.24 FURTHER DESCRIPTION: The annotations and metadata contained in this dataset relate to the video source data "PanAfrican2019" which contains 500 videos of approx 15s each. The videos show great apes, that is gorillas or chimpanzees, passing by camera traps. This PanAfrican2019 video data is a small subset of the data gathered in the Pan African Programme. It was used and fully manually annotated for the above ICCVW2019 paper. VIDEO DATA COPYRIGHT AND ACCESS: Please contact the copyright holder Pan African Programme at http://panafrican.eva.mpg.de directly to obtain the related PanAfrican2019 video dataset. The PanAfrica programme holds copyrights of all video data. The University of Bristol holds a copy of the video data for project documentation purposes only and cannot release the dataset without the copyright holder's explicit permission. ACKNOWLEDGEMENTS: We would like to thank all annotators. We would also like to thank the entire team of the Pan African Programme: "The Cultured Chimpanzee" and its collaborators for allowing the use of their video data. Particularly, we thank: H Kuehl, C Boesch, M Arandjelovic, and P Dieguez. We would also like to thank: K Zuberbuehler, K Corogenes, E Normand, V Vergnes, A Meier, J Lapuente, D Dowd, S Jones, V Leinert, E Wessling, H Eshuis, K Langergraber, S Angedakin, S Marrocoli, K Dierks, T C Hicks, J Hart, K Lee, and M Murai. Thanks also to the technical team at https://www.chimpandsee.org. The work that allowed for the collection of the video dataset was funded by the Max Planck Society, Max Planck Society Innovation Fund, and Heinz L. Krekeler. In this respect we would also like to thank: Foundation Ministre de la Recherche Scientifique, and Ministre des Eaux et Forests in Cote d'Ivoire; Institut Congolais pour la Conservation de la Nature and Ministre de la Recherche Scientifique in DR Congo; Forestry Development Authority in Liberia; Direction des Eaux, Forests Chasses et de la Conservation des Sols, Senegal; and Uganda National Council for Science and Technology, Uganda Wildlife Authority, National Forestry Authority in Uganda.