For decades, the AI community has sought to create perceptive agents that can augment human capabilities in real world tasks. The widespread availability of large and open, computer vision and natural language datasets, massive amounts of compute, and standardized benchmarks have been critical to the fast-paced progress witnessed over the past few years. In stark contrast, the considerable costs involved in acquiring physical robots and experimental environments, compounded by the lack of standardized benchmarks are proving to be principal hindrances towards progress in real world embodied AI.
Recently, the vision community has leveraged progress in computer graphics and created a host of simulated perceptual environments with the promise of training models in simulation that can be deployed on robots in the physical world. These environments are free to use, continue to be improved and lower the barrier of entry to research in real world embodied AI; democratizing research in this direction. This has led to progress on a variety of tasks in simulation, including visual navigation and instruction following. But the elephant in the room remains: How well do these models trained in simulation generalize to the real world?
The RoboTHOR challenge is an embodied AI challenge focused on the problem of simulation-to-real transfer. The goals of the challenge are to encourage researchers to work on this important problem, to create a unified benchmark and track progress over time.
The RoboTHOR Challenge 2020 deals with the task of Visual Semantic Navigation. An agent starts from a random location in an apartment and is expected to navigate towards an object that is specified by its name. The 2020 challenge will restrict teams to using an ego-centric RGB camera mounted on the robot. Participants will train their models in simulation and these models will be evaluated by the challenge organizers using a real robot in physical apartments.
Training and evaluation is performed within the RoboTHOR framework. RoboTHOR is composed of 4 parts: 60 simulated apartments in Train, 15 simulated apartments in Val, 4 simulated apartments with real counterparts in Test-Dev and 10 simulated apartments with real counterparts in Test-Challenge. The RoboTHOR challenge consists of 3 phases:
Phase 1: Training (in sim)
Participants are provided with 60 simulated apartments (Train set) to train their models and 15 simulated apartments (Val set) to validate their performance in simulation. Each scene comes with a set of data points (a data point is a starting location along with a target object category name). The data points in train are provided for convenience. However, participants may generate a different (and potentially larger) set of Train data points using metadata provided in the RoboTHOR environment.
Phase 2: Test-Dev (in sim and on a real robot)
The goal of this phase is to enable teams to iterate on their models and improve their generalization to the real world. This is similar to a practitioner improving their model, while having access to a development set of data.
Participants will upload their models via EvalAI. These models will be evaluated on the Test-Dev-Simulation apartments. In this phase, the top 5 entries each week as per the simulation leaderboard will be evaluated on the Test-Dev-Real environment, that consists of a LocoBot navigating in RoboTHOR apartments built at the AI2 office in Seattle. Teams will gain access to the results of their model as well as metadata such as images observed, actions taken by the agent and a video feed, which should help them create improved models.
Phase 3: Test-Challenge (on a real robot)
The top 3 entries as per the Test-Dev-Real leaderboard will be evaluated on the Test-Challenge-Real apartments. These apartments will have different layouts, object instances and object configurations compared to the Train, Val and Test-Dev apartments. Final results will be posted on this website.
The evaluation metric used is Success weighted by (normalized inverse) Path Length (SPL) defined in Anderson et al.
We consider an episode of navigation successful if both of the following criteria are met:
The specified object category is within 1 meter (geodesic distance) from the agent’s camera, and the agent issues STOP action, which indicates the termination of the episode.
The object is within the field of view of the agent.
We will also evaluate the agents for the case that we ignore criteria 2.
|Training data released (Simulation)||Feb 11, 2020|
|Validation data released (Simulation)||Feb 11, 2020|
|Test-Dev (Simulation) opens on Eval AI||Mar 15, 2020|
|Test-Dev phase||Apr 1, 2020 - May 14, 2020|
|Test-Challenge phase||May 18, 2020 - Jun 1, 2020|
To participate in the challenge, please refer to our Github page. It includes instructions for installation, downloading the dataset, and a simple example.
- Can we use external data for training our models?
- Yes, you can use any external data (in addition to the provided training apartments) for training your models.
- Can we access the real robot and apartments to evaluate our models?
- During the Test-Dev phase, we will evaluate your model in the real apartments and provide you with results and other metadata. During the Test-Challenge phase we will evaluate your model in the real apartments and only provide you with the final metrics.
- What set do I use to report numbers in a paper?
- To report an ablation analysis in simulation, we recommend using the Val set. To report final numbers in real, we recommend using the Real-Dev set. If your entry is selected for a final evaluation on Test-Challenge, we also recommend that you report these numbers.
The organizers are listed below in alphabetical order.