RoboTHOR Challenge 2021

Held in conjunction with the CVPR 2021 Embodied AI Workshop

The 2021 RoboTHOR Challenge is a continuation of our 2020 RoboTHOR Challenge, held in conjunction with the Embodied AI Workshop at CVPR. The challenge focused on the problem of simulation-to-real transfer. Our goal with this challenge is to encourage researchers to work on this important problem, and to create a unified benchmark and track progress over time. Due to COVID-19, the challenge will only be done in simulation.

Sign Up for Challenge Updates
You can unsubscribe at any time.



For decades, the AI community has sought to create perceptive agents that can augment human capabilities in real world tasks. The widespread availability of large and open, computer vision and natural language datasets, massive amounts of compute, and standardized benchmarks have been critical to the fast-paced progress witnessed over the past few years. In stark contrast, the considerable costs involved in acquiring physical robots and experimental environments, compounded by the lack of standardized benchmarks are proving to be principal hindrances towards progress in real world embodied AI.

Recently, the vision community has leveraged progress in computer graphics and created a host of simulated perceptual environments with the promise of training models in simulation that can be deployed on robots in the physical world. These environments are free to use, continue to be improved and lower the barrier of entry to research in real world embodied AI; democratizing research in this direction. This has led to progress on a variety of tasks in simulation, including visual navigation and instruction following. But the elephant in the room remains: How well do these models trained in simulation generalize to the real world?

The RoboTHOR Challenge 2021 deals with the task of Visual Semantic Navigation from ego-centric RGB-D camera input. The agent starts from a random location in an apartment and is expected to navigate towards an object that is specified by its type. Across different episodes, different object types are used as targets. However, all of the object types that appear in the validation and testing sets also appear in the training set.

The dataset is divided into the following splits:

SplitEpisodesScene Count

Each episode is then provided in the following format:

    "id": "FloorPlan_Train1_1_AlarmClock_0",
    "scene": "FloorPlan_Train1_1",
    "object_type": "AlarmClock",
    "initial_position": {
        "x": 3.75,
        "y": 0.9,
        "z": -2.25
    "initial_orientation": 150,
    "initial_horizon": 30,
    "shortest_path": [
        {"x": 3.75, "y": 0.0045, "z": -2.25},
        {"x": 9.25, "y": 0.0045, "z": -2.75}
    "shortest_path_length": 5.57
2021 RoboTHOR Challenge Announced
Feb 17, 2021
Submissions Close
May 31, 2021
Winner Announcement
Jun 19, 2021
To participate in the challenge, please refer to our GitHub page. It includes instructions for installation, downloading the dataset, and a simple example.

Winners of the challenge will have the opportunity to present their work at the virtual CVPR 2021 Embodied AI Workshop.


AllenAct Baseline Models

We have built support for this challenge into the AllenAct framework, this support includes:

  1. Several CNN to RNN model baseline model architectures along with our best pretrained model checkpoint (trained for 300M steps) obtaining a test-set success rate of ~26%.
  2. Reinforcement/imitation learning pipelines for training with Distributed Decentralized Proximal Policy Optimization (DD-PPO) and DAgger.
  3. Utility functions for visualization and caching (to improve training speed).

For more information see here.

SPL, Success weighted by (normalized inverse) Path Length, is a quick and common navigation metric (see Anderson et al. and Batra et al.) as

SPL=1Ni=1NSiimax(pi,i),\text{SPL} = \frac{1}{N} \sum_{i=1}^N S_i \cdot \frac{\ell_i}{\max(p_i, \ell_i)},where, for each episode i{1,2,3,,N}i\in \lbrace 1, 2, 3,\ldots, N\rbrace, SiS_i is the binary indicator variable denoting if the episode was successful, i\ell_i is the shortest path length (in meters) from the agent's starting position to the target, and pip_i is the path length (in meters) that the agent took. The metric ranges inclusively from [0:1][0:1].

A navigation episode is considered successful if both of the following criteria are met:

  1. The specified object category is within 1 meter (Euclidean distance) from the agent's camera, and the agent issues the STOP action, which indicates the termination of the episode.
  2. The object is visible from in the final action's frame.

Our evaluation feedback also provides the SPL if criteria 2 were to be ignored. But, this metric is not used for evaluation.


It may be helpful to check out the SPL section of the RoboTHOR documentation. It discusses an object's visible attribute, visibilityDistance, and several evaluation helper methods.

We will be using an AI2 Leaderboard to host challenge submissions. Submissions will open towards the end of February.

Where do I ask a question about the RoboTHOR Challenge?
Please open up a discussion on our GitHub Page! We are happy to help.
Can we use external data for training our models?
Yes, you can use any external data (in addition to the provided dataset) for training your models.
The RoboTHOR 2021 challenge is being organized by the PRIOR team at the Allen Institute for AI. The organizers are listed below in alphabetical order.
Matt Deitke
Winson Han
Alvaro Herrasti
Ani Kembhavi
Apoorv Khandelwal
Eric Kolve
Roozbeh Mottaghi
Jordi Salvador
Dustin Schwenk
Eli VanderBilt
Luca Weihs