The 2021 RoboTHOR Challenge is a continuation of our 2020 RoboTHOR Challenge, held in conjunction with the Embodied AI Workshop at CVPR. The challenge focused on the problem of simulation-to-real transfer. Our goal with this challenge is to encourage researchers to work on this important problem, and to create a unified benchmark and track progress over time. Due to COVID-19, the challenge will only be done in simulation.
For decades, the AI community has sought to create perceptive agents that can augment human capabilities in real world tasks. The widespread availability of large and open, computer vision and natural language datasets, massive amounts of compute, and standardized benchmarks have been critical to the fast-paced progress witnessed over the past few years. In stark contrast, the considerable costs involved in acquiring physical robots and experimental environments, compounded by the lack of standardized benchmarks are proving to be principal hindrances towards progress in real world embodied AI.
Recently, the vision community has leveraged progress in computer graphics and created a host of simulated perceptual environments with the promise of training models in simulation that can be deployed on robots in the physical world. These environments are free to use, continue to be improved and lower the barrier of entry to research in real world embodied AI; democratizing research in this direction. This has led to progress on a variety of tasks in simulation, including visual navigation and instruction following. But the elephant in the room remains: How well do these models trained in simulation generalize to the real world?
The RoboTHOR Challenge 2021 deals with the task of Visual Semantic Navigation from ego-centric RGB-D camera input. The agent starts from a random location in an apartment and is expected to navigate towards an object that is specified by its type. Across different episodes, different object types are used as targets. However, all of the object types that appear in the validation and testing sets also appear in the training set.
The dataset is divided into the following splits:
Split | Episodes | Scene Count |
---|---|---|
Debug | 4 | 1 |
Train | 108000 | 60 |
Val | 1080 | 15 |
Test | 2040 | 10 |
Each episode is then provided in the following format:
{
"id": "FloorPlan_Train1_1_AlarmClock_0",
"scene": "FloorPlan_Train1_1",
"object_type": "AlarmClock",
"initial_position": {
"x": 3.75,
"y": 0.9,
"z": -2.25
}
"initial_orientation": 150,
"initial_horizon": 30,
"shortest_path": [
{"x": 3.75, "y": 0.0045, "z": -2.25},
...,
{"x": 9.25, "y": 0.0045, "z": -2.75}
],
"shortest_path_length": 5.57
}
Winners of the challenge will have the opportunity to present their work at the virtual CVPR 2021 Embodied AI Workshop.
We have built support for this challenge into the AllenAct framework, this support includes:
For more information see here.
SPL, Success weighted by (normalized inverse) Path Length, is a quick and common navigation metric (see Anderson et al. and Batra et al.) as
where, for each episode , is the binary indicator variable denoting if the episode was successful, is the shortest path length (in meters) from the agent's starting position to the target, and is the path length (in meters) that the agent took. The metric ranges inclusively from .A navigation episode is considered successful if both of the following criteria are met:
Our evaluation feedback also provides the SPL if criteria 2 were to be ignored. But, this metric is not used for evaluation.
It may be helpful to check out the SPL section of the RoboTHOR documentation. It discusses an object's
To cite this work, please cite the RoboTHOR paper:
@InProceedings{RoboTHOR,
author = {Matt Deitke and Winson Han and Alvaro Herrasti and Aniruddha Kembhavi and Eric Kolve and Roozbeh Mottaghi and Jordi Salvador and Dustin Schwenk and Eli VanderBilt and Matthew Wallingford and Luca Weihs and Mark Yatskar and Ali Farhadi},
title = {RoboTHOR: An Open Simulation-to-Real Embodied AI Platform},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}