Documentation
Event Metadata

Each call to the controller.step() function returns an Event object that contains a rich amount of information about the state of the environment and each of the objects within the environment.

from ai2thor.controller import Controller
controller = Controller(scene='FloorPlan28', gridSize=0.25) 
event = controller.step(action=<SOME ACTION>)


Metadata Object

This is the return object from controller.step().

event = controller.step(action=<SOME ACTION>)
Attribute Type Description  
metadata dict all attributes about agent, objects, visibility, etc. See description below for more detailed documentation  
screen_width int width of the player; extracted from event.metadata[‘screenWidth’]  
screen_height int height of the player; extracted from event.metadata[‘screenHeight’]  
frame Numpy Array Current RGB image from the agent’s camera. Shape of array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8  
depth_frame Numpy Array Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32  
cv2img Numpy Array Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order.  
color_to_object_id dict Dictionary: key=RGB tuple, value=string that corresponds to either an objectId or object type. This is structure is populated only when renderObjectImage is set to True when Initialize called for a scene.  
object_id_to_color dict Inverse of the color_to_object_id structure.  
instance_segmentation_frame Numpy Array Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call.  
class_segmentation_frame number Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call.  
instance_detections2D dict 2D bounding boxes of detected objects. Dictionary: key=objectId value=bounding box. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.  
class_detections2D number 2D bounding boxes of detected classes. Dictionary: key=object class value=list of bounding boxes. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.  
instance_masks dict Dictionary of object masks that can be applied to other images from the event. key=objectId value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.  
class_masks dict Dictionary of class masks that can be applied to other images from the event. key=object class value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.  
third_party_camera_frames List List of current RGB images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Shape of image array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8  
third_party_class_segmentation_frames List List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call  
third_party_instance_segmentation_frames List List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call.  
third_party_depth_frames List List of current depth images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Each image is a Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32. Only available when renderDepthImage=True is passed to the Initialize action  
sceneBounds sceneBounds object This object provides all coordinates that are within bounds of the scene. This can be used in tandem with actions like PlaceObjectAtPoint to make sure the coordinate used is not out of bounds. This returns a sceneBounds object that includes an 8x3 matrix of xyz coordinates that represent the 8 corners of the box encompassing the entire scene, an xyz dictionary for the coordinates of the center of that box, and an xyz dictionary for the size (extents) of that box.  
actionReturn   Certain actions will return metadata via this variable. Details will be listed in the action documentation if a specific action uses this return  

Multi-Agent Event Object

If the environment has been initialized with more than one agent a Multi-Agent Event Object will be returned from the step() method. The agentId can be 0…N (where N = number of agents - 1).

controller = Controller(scene='FloorPlan28', gridSize=0.25, agentCount=2) 
event = controller.step(action=<SOME ACTION>, agentId=0)
Attribute Type Description
metadata dict Metadata for the active agent (agent that received the most recent action). All attributes about agent, objects, visibility, etc. See description below for more detailed documentation
screen_width int width of the player; extracted from event.metadata[‘screenWidth’]
screen_height int height of the player; extracted from event.metadata[‘screenHeight’]
cv2img Numpy Array cv2img for the active agent. Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order.
events list Array of event objects. One per agent. Element 0 corresponds to the first agent, element 1 for the second.
third_party_camera_frames List List of current RGB images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Shape of image array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8

Metadata attributes

These elements are retrieved by using the instance variable ‘metadata’.

event.metadata
Attribute Type Description Example
agent agent attributes pertaining to agent’s location, camera position and rotation  
errorMessage string string explaining why the last action failed (if lastActionSuccess is false)  
lastAction string The action that was issued to the agent to generate the response MoveAhead
lastActionSuccess boolean True/False whether the last action suceeded True
objects array of objects Array of all objects in the scene  
screenHeight number Height of the image rendered by Unity 300
screenWidth number Width of the image rendered by Unity 300
sequenceId number Used to ensure that commands and responses are aligned  
thirdPartyCameras List<thirdPartyCamera> List of third party camera attributes  
currentTime float time in seconds since the start of the episode. If the agent is initialized to agentMode = 'drone', this time is instead the time in seconds synced with the delayed physics updated due to being in drone mode. 0.0

Agent attributes

event.metadata['agent']
Attribute Type Description Example
cameraHorizon float Position of camera relative to the horizon. 0.0 is looking straight ahead, 30.0 degrees is looking down by 30 degrees and 330 is looking up by 30.0 degrees. 0.0
position xyz dictionary x, y, z coordinates of the agent in the world reference frame  
rotation xyz dictionary x, y, z rotations of the agent in degrees in global space  
isStanding bool bool for if the agent is currently in the “standing” position. This bool can be changed if the agent uses the Stand or Crouch actions, which will change the camera height. Note that only the Physics controller allows for these actions to be used at this time. True

Object attributes

event.metadata['objects']

#example return of object metadata for a single sim object of type `Box`
[{'name': 'DeskLamp_33ba15a6',
  'position': {'x': -1.31978738, 'y': 1.23870516, 'z': -0.994436145},
  'rotation': {'x': 359.900757, 'y': 89.95341, 'z': 359.986816},
  'visible': False,
  'receptacle': False,
  'toggleable': True,
  'isToggled': True,
  'breakable': False,
  'isBroken': False,
  'canFillWithLiquid': False,
  'isFilledWithLiquid': False,
  'dirtyable': False,
  'isDirty': False,
  'canBeUsedUp': False,
  'isUsedUp': False,
  'cookable': False,
  'isCooked': False,
  'ObjectTemperature': 'RoomTemp',
  'canChangeTempToHot': False,
  'canChangeTempToCold': False,
  'sliceable': False,
  'isSliced': False,
  'openable': False,
  'isOpen': False,
  'pickupable': False,
  'isPickedUp': False,
  'moveable': True,
  'mass': 2.06,
  'salientMaterials': ['Metal', 'Fabric'],
  'receptacleObjectIds': None,
  'distance': 4.10230827,
  'objectType': 'DeskLamp',
  'objectId': 'DeskLamp|-01.32|+01.24|-00.99',
  'parentReceptacles': ['Dresser|-01.33|+00.01|-00.74'],
  'isMoving': False,
  'axisAlignedBoundingBox': {'cornerPoints': [[-1.24880266,
     1.56558228,
     -0.902912259],
    [-1.24880266, 1.56558228, -1.086083],
    [-1.24880266, 1.23842049, -0.902912259],
    [-1.24880266, 1.23842049, -1.086083],
    [-1.390816, 1.56558228, -0.902912259],
    [-1.390816, 1.56558228, -1.086083],
    [-1.390816, 1.23842049, -0.902912259],
    [-1.390816, 1.23842049, -1.086083]],
   'center': {'x': -1.31980932, 'y': 1.40200138, 'z': -0.994497657},
   'size': {'x': 0.142013311, 'y': 0.3271618, 'z': 0.1831708}},
  'objectOrientedBoundingBox': {'cornerPoints': [[-1.24066579,
     1.22984934,
     -1.10477746],
    [-1.24084544, 1.22990012, -0.8839622],
    [-1.39887786, 1.22962642, -0.8840907],
    [-1.39869821, 1.22957551, -1.104906],
    [-1.241268, 1.57760191, -1.10485792],
    [-1.24144769, 1.57765281, -0.88404274],
    [-1.39948022, 1.57737911, -0.884171247],
    [-1.39930058, 1.57732821, -1.10498643]]}}]
Attribute Type Description Example
name string Name of the object in Unity Scene. These names are unique within any individual scene. Table_akjlis2j
position xyz dictionary X,Y,Z coordinates of the object in global space  
rotation xyz dictionary X,Y,Z rotations of the object in degrees in global space  
distance float Distance from the sim object’s pivot point to the center of the agent. Note that the sim object’s pivot point is not guaranteed to be the exact center of the object. 4.10230827
visible boolean This will be True if this object is currently within view of the Agent’s camera True
receptacle boolean If True, this object has the Receptacle property and can contain other objects. True
toggleable boolean If True, this object has the Toggleable property and can be manipulated with the ToggleObjectOn and ToggleObjectOff actions. True
isToggled boolean Only used if this object is toggleable = True. If True, this object is toggled on. If False, the object is toggled off. True
breakable boolean If True, this object has the Breakable property and can be broken with the BreakObject action or contextually if enough force is applied to it. True
isBroken boolean Only used if this object is breakable = True. If True, this object is currently in its broken state. If False, this object is not broken. True
canFillWithLiquid boolean If True, this object has the Fillable property and can be filled with liquid using the FillObjectWithLiquid action or via contextual interactions with some parts of the environment. True
isFilledWithLiquid boolean Only used if this object is canFillWithLiquid = True. If True, this object is currently filled. If False, this object is empty. True
dirtyable boolean If True, this object has the Dirty property, allowing it to be switched between clean and dirty states using the DirtyObject and CleanObject actions. True
isDirty boolean Only used if this object is dirtyable = True. If True, this object is currently in its dirty state. If False, this object is in its clean state. True
canBeUsedUp boolean If True, this object has the UsedUp property and can change its state to a deplated form with the UsedUpObject action. True
isUsedUp boolean Only used if this object is canBeUsedUp = True. If True, this object is currently in its used up state. If False, the object is full and not used up yet. True
cookable boolean If True, this object has the cookable property and can change its state to a cooked form with the CookObject action. True
isCooked boolean Only used if this object is cookable = True. If True, this object is currently in its cooked state. If False, the object is not cooked. True
ObjectTemperature string String that indicates what this object’s current abstracted temperature is. Valid strings are: Hot, Cold, RoomTemp Hot
canChangeTempToHot boolean If True, this object is a source of Heat and can contextually change other object’s Temperature to Hot True
canChangeTempToCold boolean If True, this object is a source of Cold and can contextually change other object’s Temperature to Cold True
sliceable boolean If True, this object has the sliceable property and can be sliced into multiple pieces with the SliceObject action. True
isSliced boolean Only used if this object is sliceable = True. If True, this object has been sliced. If False, this object is full and not sliced True
openable boolean If True, this object has the openable property and can be switched to an open or closed state with the OpenObject and CloseObject actions True
isOpen boolean Only used if this object is openable = True. If True, this object is currently in its opened state. If False, this object is in its closed state True
pickupable boolean If True, this object has the Pickupable property and can be picked up by the agent with the PickupObject action. True
isPickedUp boolean Only used if this object is pickupable = True. If True, this object is currently being picked up by the agent. If False, the object is not being held by the agent. True
moveable bool If True, this object has the Moveable property and can be moved around the environment with actions like PushObject. False
mass float The mass of a Pickupable or Moveable sim object in Kilograms 0.5
salientMaterials array of strings Array of strings listing the salient materials a pickupable object is composed of. Valid strings are: Metal, Wood, Plastic, Glass, Ceramic, Stone, Fabric, Rubber, Food, Paper, Wax, Soap, Sponge, Organic Metal, Plastic
receptacleObjectIds array of strings If the object is a receptacle, this is an array of objectIds that the receptacle contains Spoon|-02.1|+00.93|2.62, Knife|-01.1|+00.93|4.34
distance float Distance from centerpoint of object to the agent’s camera 3.541793
objectType string The type of the object. See the Objects section of the documentation for a full list of objectTypes.  
objectId string A unique id for the object within the scene. This is composed of the object’s type and the position coordinates of the object in the scene. TableTop|-02.08|+00.94|-03.62
parentReceptacles list of strings A list of objectId strings of all receptacles that contain this object.  
isMoving bool A bool tracking if this object is actively in motion. This can be useful when tracking object behaviors whileusing PausePhysicsAutoSim and AdvancePhysicsStep actions False
axisAlignedBoundingBox axisAlignedBoundingBox object Returns an axisAlignedBoundingBox object that includes an 8x3 matrix of xyz coordinates that represent the 8 corners of the box, an xyz dictionary for the coordinates of the center of the box, and an xyz dictionary for the size (extents) of the box. This axis aligned bounding box is the smallest box that can completely encloses the sim object that is aligned to the world axis. This means that if the object is rotated or moved, the overall size and volume of this axis aligned box will change in order to remain aligned relative to the static, world-axis. This is best used for rough approximations of the area/volume an object takes up. Do note that large, oddly shaped objects like countertops that wrap around kitchens may have awkwardly defined axis-aligned bounds  
objectOrientedBoundingBox objectOrientedBoundingBox object Returns an objectOrientedBoundingBox object that includes an 8x3 matrix of xyz coordinates that represent the 8 corners of the object oriented box. This object oriented bounding box is a box that completely encloses a sim object. The difference between this object oriented box and the axis aligned box above is this box’s dimensions are static relative to the object’s rotation and position. This means this object oriented box will always have the same total volume regardless of how the object is manipulated/rotated/moved. Note that only Pickupable objects have an objectOrientedboundingBox at this time.  
numStructureHits int Only used if the agent is initialized in agentMode = 'drone'. The total number of times this object has hit structure objects while moving through the scene. These structure objects are environmental structures like the walls, floor, or other static structures. 0
numFloorHits int Only used if the agent is initialized in agentMode = 'drone'. The total number of times this object has hit the floor structure specifically while moving through the scene. 0
numSimObjHits int Only used if the agent is initialized in agentMode = 'drone'. The total number of times this object has hit other sim objects while moving through the scene 0
isCaught bool Only used if the agent is initialized in agentMode = 'drone'. This is true if the object has been caught by the drone agent’s basket. False

thirdPartyCamera attributes

event.metadata['thirdPartyCameras']
Attribute Type Description Example
thirdPartyCameraId int id of the camera. Used in conjuction with UpdateThirdPartyCamera action to change the position/rotation of a camera. 0
position xyz dictionary x, y, z coordinates of the agent in the world reference frame  
rotation xyz dictionary x, y, z rotations of the agent in degrees in global space