The RoboTHOR challenge at CVPR 2020 involves deploying navigation models onto the LoCoBot and running it through the physical RoboTHOR environment. This physical environment is located at the Allen Institute for AI (AI2) offices in Seattle. Due to the COVID-19 situation, all employees of AI2 are currently working from home and will be doing so for the foreseeable future. This prevents us from running experiments in the physical robot. As a result, the RoboTHOR challenge at CVPR 2020 will only involve simulated scenes. The final winners will be chosen based on performance in the simulated scenes in the Test-Challenge set. The component of the RoboTHOR challenge involving the physical robot will be held at a later date, to be determined as we get more clarity on the COVID-19 stay at home order. We wish you all the best of health, through this challenging time.
For decades, the AI community has sought to create perceptive agents that can augment human capabilities in real world tasks. The widespread availability of large and open, computer vision and natural language datasets, massive amounts of compute, and standardized benchmarks have been critical to the fast-paced progress witnessed over the past few years. In stark contrast, the considerable costs involved in acquiring physical robots and experimental environments, compounded by the lack of standardized benchmarks are proving to be principal hindrances towards progress in real world embodied AI.
Recently, the vision community has leveraged progress in computer graphics and created a host of simulated perceptual environments with the promise of training models in simulation that can be deployed on robots in the physical world. These environments are free to use, continue to be improved and lower the barrier of entry to research in real world embodied AI; democratizing research in this direction. This has led to progress on a variety of tasks in simulation, including visual navigation and instruction following. But the elephant in the room remains: How well do these models trained in simulation generalize to the real world?
The RoboTHOR challenge is an embodied AI challenge focused on the problem of simulation-to-real transfer. The goals of the challenge are to encourage researchers to work on this important problem, to create a unified benchmark and track progress over time.
The RoboTHOR Challenge 2020 deals with the task of Visual Semantic Navigation. An agent starts from a random location in an apartment and is expected to navigate towards an object that is specified by its name. The 2020 challenge will restrict teams to using an ego-centric RGB camera mounted on the robot. Participants will train their models in simulation and these models will be evaluated by the challenge organizers using a real robot in physical apartments.
Training and evaluation is performed within the RoboTHOR framework. RoboTHOR is composed of 4 parts: 60 simulated apartments in Train, 15 simulated apartments in Val, 4 simulated apartments with real counterparts in Test-Dev and 10 simulated apartments with real counterparts in Test-Challenge. The RoboTHOR challenge consists of 3 phases:
Participants are provided with 60 simulated apartments (Train set) to train their models and 15 simulated apartments (Val set) to validate their performance in simulation. Each scene comes with a set of data points (a data point is a starting location along with a target object category name). The data points in train are provided for convenience. However, participants may generate a different (and potentially larger) set of Train data points using metadata provided in the RoboTHOR environment.
The goal of this phase is to enable teams to iterate on their models and improve their generalization to the real world. This is similar to a practitioner improving their model, while having access to a development set of data.
Participants will upload their models via EvalAI. These models will be evaluated on the Test-Dev-Simulation apartments. In this phase, the top 5 entries each week as per the simulation leaderboard will be evaluated on the Test-Dev-Real environment, that consists of a LocoBot navigating in RoboTHOR apartments built at the AI2 office in Seattle. Teams will gain access to the results of their model as well as metadata such as images observed, actions taken by the agent and a video feed, which should help them create improved models.
The top 3 entries as per the Test-Dev-Real leaderboard will be evaluated on the Test-Challenge-Real apartments. These apartments will have different layouts, object instances and object configurations compared to the Train, Val and Test-Dev apartments. Final results will be posted on this website.
The evaluation metric used is Success weighted by (normalized inverse) Path Length (SPL), defined in Anderson et al. as
where, for each episode , is the binary indicator variable denoting if the episode was successful (as defined below), is the shortest path length (in meters) from the agent's starting position to the target, and is the path length (in meters) that the agent took. A navigation episode is considered successful if both of the following criteria are met:Our evaluation feedback also provides the SPL if criteria 2 were to be ignored. But, this metric is not used for evaluation.
To cite this work, please cite the RoboTHOR paper:
@InProceedings{RoboTHOR,
author = {Matt Deitke and Winson Han and Alvaro Herrasti and Aniruddha Kembhavi and Eric Kolve and Roozbeh Mottaghi and Jordi Salvador and Dustin Schwenk and Eli VanderBilt and Matthew Wallingford and Luca Weihs and Mark Yatskar and Ali Farhadi},
title = {RoboTHOR: An Open Simulation-to-Real Embodied AI Platform},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}