Documentation

Requirements for using AI2-THOR

  • OS: Mac OS X 10.9+, Ubuntu 14.04+
  • Graphics Card: DX9 (shader model 3.0) or DX11 with feature level 9.3 capabilities.
  • CPU: SSE2 instruction set support.
  • Python 2.7 or Python 3.5+
  • Linux users: X server with GLX module enabled

Installing using pip

You can install AI2-THOR using pip. Create a Python 2.7/3.5/3.6 virtual environment, and run

$ pip install ai2thor

Before running the below code, make sure X server with OpenGL is running, and the OpenGL extensions have been installed for your graphics card.

Initialization

Prior to running any additional commands, it is assumed you have run the following:

import ai2thor.controller
controller = ai2thor.controller.Controller()

The first time a Controller is initialized, the game environment containing the 3D scenes will be downloaded to $HOME/.ai2thor. The size of the binary is approximately 500MB.

To initialize an agent in a scene, do the following.

controller = Controller(scene='FloorPlan_Train1_1', agentMode='bot')
Parameter Type Description Default
scene string The name of which scene to initialize. See below for a full list of RoboTHOR scenes.  
agentMode string Always set this to ‘bot’ in order to correctly initialize the RobotTHOR agent.  
agentType string Set this to stochastic if stochastic movement is desired. null
gridSize float Size of the grid that the agent navigates in. This determines the step size that the agent takes when the actions MoveAhead or MoveBack are taken. This grid is only used when agentType is not stochastic. 0.25
applyActionNoise bool If the agentType is set to stochastic, this bool will apply movement and rotation Gaussian noise to Move and Rotate actions automatically. True
movementGaussianMu float This value is only applied if agentType is set to stochastic and applyActionNoise is set to True. If nonzero, the agent’s position after a move action will be offset by a random value taken from a Gaussian with mean movementGaussianMu. 0.001
movementGaussianSigma float This value is only applied if agentType is set to stochastic and applyActionNoise is set to True. If nonzero, the agent’s position after a move action will be offset by a random value taken from a Gaussian with standard deviation movementGaussianSigma. 0.005
rotateGaussianMu float This value is only applied if agentType is set to stochastic and applyActionNoise is set to True. If nonzero, the agent’s rotation after a rotate action will be offset by a random value taken from a Gaussian with mean rotateGaussianMu. 0.0
rotateGaussianSigma float This value is only applied if agentType is set to stochastic and applyActionNoise is set to True. If nonzero, the agent’s rotation after a rotate action will be offset by a random value taken from a Gaussian with standard deviation rotateGaussianSigma. 0.5
renderDepthImage bool When enabled a depth image is sent and made available on the returned Event as the attribute depth_frame. False
renderClassImage bool When enabled a class segmentation image is sent and made available on the returned Event as the attribute class_segmentation_frame. False
renderObjectImage bool When enabled an object segmentation image is sent and made available on the returned Event as the attribute instance_segmentation_frame. False
visibilityDistance float Distance in meters from the agent’s camera (positioned near the top of the agent) that an object should be considered visible. 1.5
width int Sets the width of the returned images in pixels. Must be >= 300. The image quality is improved with larger images, as opposed to simply resizing the default 300px by 300px image. 300
height int Sets the height of the returned images in pixels. Must be >= 300. The image quality is improved with larger images, as opposed to simply resizing the default 300px by 300px image. 300
rotateStepDegrees float The number of degrees the agent will rotate when using the RotateLeft or RotateRight actions. 90.0

Changing Scenes

Use this method to change to a new floorplan or reset the current floorplan after a Controller has been initialized.

controller.reset(scene='FloorPlan_Train1_3')
Parameter Type Description Default
scene string The name of which scene to load.  

Training Scenes

There are 60 total training scenes. There are 12 different wall configurations with 5 sets of object variations for each wall configuration. Wall configurations are denoted by the first number in the scene name, and which object variation set is denoted by the second number.

Training Scenes
FloorPlan_Train1_1, FloorPlan_Train1_2, FloorPlan_Train1_3, FloorPlan_Train1_4, FloorPlan_Train1_5
FloorPlan_Train2_1, FloorPlan_Train2_2, FloorPlan_Train2_3, FloorPlan_Train2_4, FloorPlan_Train2_5
FloorPlan_Train3_1, FloorPlan_Train3_2, FloorPlan_Train3_3, FloorPlan_Train3_4, FloorPlan_Train3_5
FloorPlan_Train4_1, FloorPlan_Train4_2, FloorPlan_Train4_3, FloorPlan_Train4_4, FloorPlan_Train4_5
FloorPlan_Train5_1, FloorPlan_Train5_2, FloorPlan_Train5_3, FloorPlan_Train5_4, FloorPlan_Train5_5
FloorPlan_Train6_1, FloorPlan_Train6_2, FloorPlan_Train6_3, FloorPlan_Train6_4, FloorPlan_Train6_5
FloorPlan_Train7_1, FloorPlan_Train7_2, FloorPlan_Train7_3, FloorPlan_Train7_4, FloorPlan_Train7_5
FloorPlan_Train8_1, FloorPlan_Train8_2, FloorPlan_Train8_3, FloorPlan_Train8_4, FloorPlan_Train8_5
FloorPlan_Train9_1, FloorPlan_Train9_2, FloorPlan_Train9_3, FloorPlan_Train9_4, FloorPlan_Train9_5
FloorPlan_Train10_1, FloorPlan_Train10_2, FloorPlan_Train10_3, FloorPlan_Train10_4, FloorPlan_Train10_5
FloorPlan_Train11_1, FloorPlan_Train11_2, FloorPlan_Train11_3, FloorPlan_Train11_4, FloorPlan_Train11_5
FloorPlan_Train12_1, FloorPlan_Train12_2, FloorPlan_Train12_3, FloorPlan_Train12_4, FloorPlan_Train12_5

Validation Scenes

There are 15 total validation scenes. There are 3 different wall configurations with 5 sets of object variations for each wall configuration. Wall configurations are denoted by the first number in the scene name, and which object variation set is denoted by the second number.

Validation Scenes
FloorPlan_Val1_1, FloorPlan_Val1_2, FloorPlan_Val1_3, FloorPlan_Val1_4, FloorPlan_Val1_5
FloorPlan_Val2_1, FloorPlan_Val2_2, FloorPlan_Val2_3, FloorPlan_Val2_4, FloorPlan_Val2_5
FloorPlan_Val3_1, FloorPlan_Val3_2, FloorPlan_Val3_3, FloorPlan_Val3_4, FloorPlan_Val3_5

Navigation

The following actions are used to navigate within a scene. Set the agentType to stochastic upon initialization to allow a noise value to be passed in to individual actions. Set agentType to stochastic set applyActionNoise to True, and set the movement and rotation guassians upon initialization to allow movement and rotation Gaussians to automatically affect all Move and Rotate actions. If applyActionNoise is True and movement and rotation Gaussians are set, the noise parameter of individual actions can also be applied on top of the intrinsic Gaussian noise.

MoveAhead

Moves the agent forward.

event = controller.step(action='MoveAhead')
Parameter Type Description Default
moveMagnitude float When moveMagnitude is passed in, the agent moves by a distance of moveMagnitude, instead of moving by the default gridSize value. If any movementGaussian values were initialized, they will be applied as well unless applyActionNoise was initialized to False. 0.0 (will default to gridSize if not specified)
noise float Only applies if agentType is initialized to stochastic. Adds a noise value to the movement. This is applied on top of any Gaussian noise set upon initialization. 0.0 (will default to gridSize)

MoveBack

Moves the agent backward by gridSize (without changing its view direction).

event = controller.step(action='MoveBack')
Parameter Type Description Default
moveMagnitude float When moveMagnitude is passed in, the agent moves by a distance of moveMagnitude, instead of moving by the default gridSize value. If any movementGaussian values were initialized, they will be applied as well unless applyActionNoise was initialized to False. 0.0 (will default to gridSize if not specified)
noise float Only applies if agentType is initialized to stochastic. Adds a noise value to the movement. This is applied on top of any Gaussian noise set upon initialization. 0.0 (will default to gridSize)

RotateRight

Rotates the agent to the right by rotateStepDegrees degrees, which is set upon initialization.

event = controller.step(action='RotateRight')
Parameter Type Description Default
noise float Only applies if agentType is initialized to stochastic. Adds a noise value to the rotation. This is applied on top of any Gaussian noise set upon initialization. 0.0

RotateLeft

Rotates the agent to the left by rotateStepDegrees degrees, which is set upon initialization.

event = controller.step(action='RotateLeft')
Parameter Type Description Default
noise float Only applies if agentType is initialized to stochastic. Adds a noise value to the rotation. This is applied on top of any Gaussian noise set upon initialization. 0.0

LookUp

Angles the agent’s view up in 30 degree increments (max upward angle is 30 degrees above the forward horizon).

event = controller.step(action='LookUp')

LookDown

Angles the agent’s view down in 30 degree increments (max downward angle is 60 degrees below the forward horizon).

event = controller.step(action='LookDown')

Get Reachable Positions

Sets actionReturn in the event’s metadata to an array of Vector3s. This array gives valid position coordinates that the Agent can reach without colliding with the environment or Sim Objects in the current scene based on the current gridSize. This can be used in tandem with Teleport to warp the Agent as needed. This is useful for things like randomizing the initial position of the agent without clipping into the environment.

event = controller.step(action='GetReachablePositions')

Teleport

Moves the agent to any location in the scene. Using this command it is possible to put the agent into places that would not normally be possible to navigate to, but it can be useful if you need to place an agent in the exact same spot for a task.

controller.step(action='Teleport', x=0.999, y=1.01, z=-0.3541)
Parameter Type Description Default
x float x coordinate in 3D scene space 0.0
y float y coordinate in 3D scene space 0.0
z float z coordinate in 3D scene space 0.0

TeleportFull

Moves the agent to any location in the scene. Using this command it is possible to put the agent into places that would not normally be possible to navigate to, but it can be useful if you need to place an agent in the exact same spot for a task. Identical to Teleport, but also allows rotation and horizon to be passed in.

event = controller.step(action='TeleportFull', x=0.999, y=1.01, z=-0.3541, rotation=90.0, horizon=30.0)
Parameter Type Description Default
x float x coordinate in 3D scene space 0.0
y float y coordinate in 3D scene space 0.0
z float z coordinate in 3D scene space 0.0
rotation float Rotation about the Y axis to change the forward orientation of the Agent relative to world x/z axes. 0.0
horizon float Rotation about the X axis to change the Up/Down look angle of the Agent. Any angle can be used here, but values of -30.0, 0.0, 30.0, 60.0 will mimic the maximum and minimum angles used by the LookUp and LookDown actions. 0.0

The horizon angle values describe the rotation about the Agent’s X-Axis. This axis has “right hand” facing with respect to the forward Z-Axis, and because of this the values are slightly misleading as the (-30.0) horizon will actually angle the agent’s forward Z direction 30 degrees upward. Because horizon values describe changes about the X-Axis, positive and negative angles can result in the same end position.

Horizon Angle Value Change in Forward Z
-30.0 (330.0) Look 30 Degrees Up
0.0 Look straight ahead
30.0 Look 30 Degrees Down
60.0 Look 60 Degrees Down

Metadata

Event Object

This is the return object from controller.step().

event = controller.step(action=<SOME ACTION>)
Attribute Type Description
metadata dict all attributes about agent, objects, visibility, etc. See description below for more detailed documentation.
screen_width int width of the player; extracted from event.metadata[‘screenWidth’].
screen_height int height of the player; extracted from event.metadata[‘screenHeight’].
frame Numpy Array Current RGB image from the agent’s camera. Shape of array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8
depth_frame Numpy Array Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32
cv2img Numpy Array Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order.
color_to_object_id dict Dictionary: key=RGB tuple, value=string that corresponds to either an objectId or object type. This is structure is populated only when renderObjectImage is set to True when Initialize called for a scene.
object_id_to_color dict Inverse of the color_to_object_id structure.
instance_segmentation_frame Numpy Array Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call.
class_segmentation_frame number Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call.
instance_detections2D dict 2D bounding boxes of detected objects. Dictionary: key=objectId value=bounding box. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.
class_detections2D number 2D bounding boxes of detected classes. Dictionary: key=object class value=list of bounding boxes. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.
instance_masks dict Dictionary of object masks that can be applied to other images from the event. key=objectId value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.
class_masks dict Dictionary of class masks that can be applied to other images from the event. key=object class value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.
actionReturn   Certain actions will return metadata via this variable. Details will be listed in the action documentation if a specific action uses this return.

Metadata attributes

These elements are retrieved by using the instance variable ‘metadata’.

event.metadata
Attribute Type Description Example
agent agent attributes pertaining to agent’s location, camera position and rotation.  
errorMessage string string explaining why the last action failed (if lastActionSuccess is false).  
lastAction string The action that was issued to the agent to generate the response. MoveAhead
lastActionSuccess boolean True/False whether the last action suceeded. True
objects array of objects Array of all objects in the scene.  
screenHeight number Height of the image rendered by Unity. 300
screenWidth number Width of the image rendered by Unity. 300
sequenceId number Used to ensure that commands and responses are aligned.  

Agent attributes

event.metadata['agent']
Attribute Type Description Example
cameraHorizon float Position of camera relative to the horizon. 0.0 is looking straight ahead, 30.0 degrees is looking down by 30 degrees and 330 is looking up by 30.0 degrees. 0.0
position Vector3 X,Y,Z coordinates of the agent in the world reference frame.  
rotation Vector3 X,Y,Z rotations of the agent in degrees in global space.  

Object attributes

event.metadata['objects']
Attribute Type Description Example
distance float Distance from centerpoint of object to the agent’s camera 3.541793
name string Name of the object in the Scene. Note these are not guaranteed to be Unique identifiers of object instances, as multiple object can have the same string name. alarm_clock_bajk_v
objectId string A unique id for the object within the scene. This is composed of the object’s Object Type and the position coordinates of the object in the scene. This can be used to uniquely identify object instances. A full list of Object Types found in RoboTHOR can be found in the Environment section . AlarmClock|-02.08|+00.94|-03.62
position Vector3 X,Y,Z coordinates of the object in global space.  
rotation Vector3 X,Y,Z rotations of the object in degrees in global space.  
visible boolean Boolean indicating whether the object is visible to the agent. True

Vector3 attributes

Attribute Type Description Example
x float A float representing the position in space on, or rotation about the x axis. 90.0
y float A float representing the position in space on, or rotation about the y axis. 90.0
z float A float representing the position in space on, or rotation about the z axis. 90.0

Evaluation

Import the module:

import ai2thor.util.metrics

To evaluate a your model you can use the shortest path length metric (spl).

If you are using the dataset we provide the shortest_path and shortest_path_length as fields of each datapoint.

Shortest Path

To get the shortest path to an object type, create your controller on a specific scene.

get_shortest_path_to_object_type(
    controller,
    object_type,
    initial_position
)
Parameter Type Description Default
controller ai2thor.controller.Controller() The controller used to get the shortest path.  
object_type string Type of object in the scene.  
initial_position Vector3 Initial position of the route.  


If there are none or more than one objects of type object_type the method raises a ValueError exception.

Returns an array with a sequence of Vector3 objects representing the corners of the path.

For multiple objects of the same type e.g. Bottle, there is a version of the method which takes a specific object id.

get_shortest_path_to_object(
    controller,
    object_id,
    initial_position
)
Parameter Type Description Default
controller ai2thor.controller.Controller() The controller used to get the shortest path.  
object_id string The objectId of the object in the scene. See object attributes  
initial_position Vector3 Initial position of the route.  


If there is no object with id object_id the method raises a ValueError exception.

Returns an array with a sequence of Vector3 objects representing the corners of the path.

path_distance(path)
Parameter Type Description Default
path array of Vector3 The points of the path.  


Returns floating point number with the total euclidean distance of the path.

Batch Shortest Paths

Computes shortest path for an episode sequence.

get_episodes_with_shortest_paths(
    controller,
    episodes,
    initialize_func=None
)
Parameter Type Description Default
controller ai2thor.controller.Controller() The controller used to get the shortest paths.  
episodes array of Episode Sequence of episodes for which shortest paths will be computed.  
initialize_func function(controller) Optional initialization function for the controller called after a loading the scene with reset None


Returns the a copy of the same episode sequence with an extra shortest_path field in each episode representing the corners of the shortest path.

SPL

Computes the spl metric for a single path.

compute_single_spl(
    path,
    shortest_path,
    successful_path
)
Parameter Type Description Default
path array of Vector3 The path to evaluate.  
shortest_path array of Vector3 The shortest path as given by get_shortest_path_to_object.  
successful_path boolean Indicates whether the object was found using path.  


Returns floating point number representing the spl metric for episode.

Batch SPL

Computes the spl metric for an episode sequence.

compute_spl(episodes_with_golden)
Parameter Type Description Default
episodes_with_golden array of Episodes as returned by get_episodes_with_shortest_paths Sequence of episodes to evaluate.  


Returns floating point number representing the spl metric for episode list.

Episode attributes

Dictionary with fields:

Parameter Type Description Default
scene boolean Indicates whether the object was found using path.  
initial_position Vector3 The starting position in the scene to search for the object.  
initial_rotation Vector3 The initial orientation the agent has.  
target_object_type string The object type to look for. One in this list. ’’
target_object_id string The object id to look for. objectId parameter in object metadata. ’’
shortest_path array of Vector3 Shortest path from starting position to object. None


Either target_object_type or target_object_id must be present in the object when passed to get_episodes_with_shortest_paths.

shortest_path must be present when episodes are passed to compute_spl

Object Types

Target Object Types are gauranteed to have only one instance in each scene, making them suited to use as target objects.

Background Object Types can be found in each scene but are not guaranteed to be in every scene.

Target Object Types
Apple
BaseballBat
BasketBall
Bowl
GarbageCan
HousePlant
Laptop
Mug
Remote
SprayBottle
Vase
AlarmClock
Television


Background Object Types
Pillow
Bottle
PepperShaker
SaltShaker
CD
DeskLamp
Statue
Newspaper
Book
CellPhone
Candle
Watch
Box
FloorLamp
Pen
Pencil
Cup
TennisRacket
ButterKnife
Pot
Plate
TeddyBear
Painting
Chair
ArmChair
Bed
CoffeeTable
Desk
DiningTable
Dresser
SideTable
Shelf
Sofa
TVStand