SCM/TCM Code and Annotations for PanAfrican2019 Videos
------------------------------------------------------
(associated to paper: X Yang, M Mirmehdi, T Burghardt. 
 'Great Ape Detection in Challenging Jungle Camera Trap Footage 
 via Attention-Based Spatial and Temporal Feature Blending.' 
 Computer Vision for Wildlife Conservation (CVWC) Workshop 
 at IEEE International Conference of Computer Vision (ICCVW), 
 October 2019.)


CONTENTS:
This public and freely available dataset contains a PyTorch code 
section for the SCM and TCM components described in the above 
paper, a cleaned annotation dataset for video object detection in the 
PanAfrican2019 videos, and the detailed split of the videos into 
training/validation/test portions.


ANNOTATION SET:
This is a cleaned dataset compared to the originally used dataset, 
leading to slightly improved performance results as given below:

                       ResultsA  ResultsB  ResultsC  
Backbone Res50      
          baseline     80.79     82.88     84.41
          +TCM         90.02     90.11     90.31
          +SCM+TCM     90.81     90.57     91.22
Backbone Res101	
          baseline     85.25     86.02     87.23
          +SCM+TCM     90.21     89.44     91.07

ResultsA - pretrained on Imagenet VID and fine-tuned on uncleaned annotation set (paper)
ResultsB - training on this annotation set
ResultsC - pretrained on Imagenet VID and fine-tuned on this annotation set


DATA:
The code sections are provided in a file called context_block.py 
Users are free to utilise these SCM or TCM component under the 
Non-Commercial Government Licence for public sector information.
All other code components of the network described in the paper 
are available as adaptations and combinations of other sources.

The annotation dataset contains bounding box annotations and the 
animal species tags (chimp/gorilla) found within the 500 mp4 camera 
trap videos (encoded by 10-character filenames .mp4) in the 
PanAfrican2019 dataset provided by the PanAfrica Programme.
 
The data is split into 3 subsets, that is 85% for training, 
5% for validation and 10% for testing. Txt files are provided 
to specify the exact split. Almost all video files contain 360 frames 
(15 seconds times 24 frames per second), thus, each video contributing
approx. 180000 frames. Users are free to utilise these annotations under 
the Non-Commercial Government Licence for public sector information.


DETAILED FILE FORMATS:

(CODE) For the PyTorch code, a single context_block.py file is 
provided specifying the SCM and TCM layers fully. 

(SPLITS) The files trainingdata.txt, validationdata.txt and testdata.txt 
list all video file names for each of the 3 subsets. The file alldata.txt 
contains all file names of the dataset.

(ANNOTATIONS) The annotation data is contained in the annotations.zip 
archive. Once unzipped, there is a folder per video file that contains 
one XML file per frame. Each XML file has a name consistent with 
VIDEOFILENAME_frame_FRAMENUMBER starting with FRAMENUMBER=1. Each 
XML file contains information about each of the animal bounding boxes 
of that frame. Providing an example, the structure is as follows:

<annotation>
  <videoname>1DtGa13tFZ</videoname>
  <frameid>1</frameid>
  <size>
    <height>404</height>
    <width>720</width>
    <depth>3</depth>
  </size>
  <is_object>True</is_object>
  <object>
    <category>Great Ape</category>
    <name>chimpanzee</name>
    <bndbox>
      <xmin>273.00</xmin>
      <ymin>277.91</ymin>
      <xmax>350.55</xmax>
      <ymax>403.79</ymax>
    </bndbox>
  </object>
  <object>
    <category>Great Ape</category>
    <name>chimpanzee</name>
    <bndbox>
      <xmin>455.56</xmin>
      <ymin>274.21</ymin>
      <xmax>487.12</xmax>
      <ymax>343.24</ymax>
    </bndbox>
  </object>
</annotation>


FURTHER DESCRIPTION:
The annotations and metadata contained in this dataset relate to 
the video source data "PanAfrican2019" which contains 500 videos 
of approx 15s each. The videos show great apes, that is gorillas 
or chimpanzees, passing by camera traps. This PanAfrican2019 video 
data is a small subset of the data gathered in the Pan African Programme. 
It was used and fully manually annotated for the above ICCVW2019 paper. 


VIDEO DATA COPYRIGHT AND ACCESS:
Please contact the copyright holder Pan African Programme at 
http://panafrican.eva.mpg.de directly to obtain the related 
PanAfrican2019 video dataset. The PanAfrica programme holds 
copyrights of all video data. The University of Bristol holds 
a copy of the video data for project documentation purposes only 
and cannot release the dataset without the copyright holder's 
explicit permission.


ACKNOWLEDGEMENTS:
We would like to thank all annotators. We would also like to thank 
the entire team of the Pan African Programme: "The Cultured Chimpanzee" 
and its collaborators for allowing the use of their video data. 
Particularly, we thank: H Kuehl, C Boesch, M Arandjelovic, 
and P Dieguez. We would also like to thank: K Zuberbuehler, 
K Corogenes, E Normand, V Vergnes, A Meier, J Lapuente, D Dowd, 
S Jones, V Leinert, E Wessling, H Eshuis, K Langergraber, 
S Angedakin, S Marrocoli, K Dierks, T C Hicks, J Hart, K Lee, 
and M Murai. Thanks also to the technical team at https://www.chimpandsee.org. 
The work that allowed for the collection of the video dataset was 
funded by the Max Planck Society, Max Planck Society Innovation Fund, 
and Heinz L. Krekeler. In this respect we would also like to thank: 
Foundation Ministre de la Recherche Scientifique, and Ministre 
des Eaux et Forests in Cote d'Ivoire; Institut Congolais pour 
la Conservation de la Nature and Ministre de la Recherche Scientifique in 
DR Congo; Forestry Development Authority in Liberia; Direction des Eaux, 
Forests Chasses et de la Conservation des Sols, Senegal; and Uganda National 
Council for Science and Technology, Uganda Wildlife Authority, National 
Forestry Authority in Uganda.