We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License.

Complete download (zip, 28.4 GiB)

Alternative title EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
Creator(s) Ahmad Dar Khalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen
Publication date 15 Aug 2022
Language eng
Publisher University of Bristol
Licence Non-Commercial Government Licence for public sector information
DOI 10.5523/bris.2v6cgv1x04ol22qp9rm9x2j6a7
Citation Ahmad Dar Khalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen (2022): EPIC-KITCHENS VISOR. https://doi.org/10.5523/bris.2v6cgv1x04ol22qp9rm9x2j6a7
Total size 28.4 GiB


Data Resources