Event Metadata

Each call to the controller.step() function returns an Event object that contains a rich amount of information about the state of the environment and each of the objects within the environment.

import ai2thor.controller
controller = ai2thor.controller.Controller()
controller.start()
# can be any one of the scenes FloorPlan###
controller.reset('FloorPlan28')
event = controller.step(dict(action='Initialize', gridSize=0.25))


Event Object

# return object from controller.step()
event = controller.step(dict(action=<SOME ACTION>))
Attribute Type Description
metadata dict all attributes about agent, objects, visibility, etc. See description below for more detailed documentation
screen_width int width of the player; extracted from event.metadata[‘screenWidth’]
screen_height int height of the player; extracted from event.metadata[‘screenHeight’]
frame Numpy Array Current RGB image from the agent’s camera. Shape of array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8
depth_frame Numpy Array Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32
cv2image Numpy Array Numpy Array suitable for use with OpenCV. Shape: (h, w, c) Channels are in BGR order.
color_to_object_id dict Dictionary: key=RGB tuple, value=string that corresponds to either an objectId or object type. This is structure is populated only when renderObjectImage is set to True when Initialize called for a scene.
object_id_to_color dict Inverse of the color_to_object_id structure.
instance_segmentation_frame Numpy Array Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call.
class_segmentation_frame number Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call.
instance_detections2D dict 2D bounding boxes of detected objects. Dictionary: key=objectId value=bounding box. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.
class_detections2D number 2D bounding boxes of detected classes. Dictionary: key=object class value=list of bounding boxes. bounding box=[start_x, start_y, end_x, end_y]. Only available when renderObjectImage is enabled during Initialize call.
instance_masks dict Dictionary of object masks that can be applied to other images from the event. key=objectId value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.
class_masks dict Dictionary of class masks that can be applied to other images from the event. key=object class value=Numpy array shape: (h, w) dtype=numpy.bool. Only available when renderObjectImage is enabled during Initialize call.
third_party_camera_frames List List of current RGB images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Shape of image array is (width, height, channels). Channels are in RGB order. Shape: (h, w, c) dtype: numpy.uint8
third_party_class_segmentation_frames List List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by class of object (e.g. all mugs are the same color). Colors correspond to keys found in color_to_object_id. Only available when renderClassImage is enabled during Initialize call
third_party_instance_segmentation_frames List List of current segmentation images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Segmentation image by individual object, Shape: (h, w, c) colors correspond to the keys found in color_to_object_id. Only available when renderObjectImage is enabled during Initialize call.
third_party_depth_frames List List of current depth images from any third party cameras in the scene. The order of the list corresponds to the order they cameras were added. Each image is a Numpy Array containing depth information in millimeters with a max set of 5 meters. Shape: (h, w) dtype: numpy.float32. Only available when renderDepthImage=True is passed to the Initialize action


Metadata attributes

# retrieved by using the instance variable 'metadata'
event.metadata
Attribute Type Description Example
agent agent attributes pertaining to agent’s location, camera position and rotation  
errorMessage string string explaining why the last action failed (if lastActionSuccess is false)  
lastAction string The action that was issued to the agent to generate the response MoveAhead
lastActionSuccess boolean True/False whether the last action suceeded True
objects array of objects Array of all objects in the scene  
screenHeight number Height of the image rendered by Unity 300
screenWidth number Width of the image rendered by Unity 300
sequenceId number Used to ensure that commands and responses are aligned  
thirdPartyCameras List<thirdPartyCamera> List of third party camera attributes  


Agent attributes

event.metadata['agent']
Attribute Type Description Example
cameraHorizon float Position of camera relative to the horizon. 0.0 is looking straight ahead, 30.0 degrees is looking down by 30 degrees and 330 is looking up by 30.0 degrees. 0.0
position vector3 X,Y,Z coordinates of the agent in the world reference frame  
rotation vector3 X,Y,Z rotations of the agent in degrees in global space  


Object attributes

Attribute Type Description Example
distance float Distance from centerpoint of object to the agent’s camera 3.541793
isopen boolean Boolean indicating whether the object is open or closed True
name string Name of the object, not guaranteed to be unique  
objectId string Unique id for the object within the scene TableTop|-02.08|+00.94|-03.62
openable boolean Boolean indicating whether the object can be opened, such as a cabinet or drawer True
pickupable boolean Boolean indicating whether the object can be picked up by the agent. It will only be possible to actually pick up the object if it is also visible to the agent True
receptacle boolean Boolean indicating whether the object is a receptacle that can contain other objects True
position vector3 X,Y,Z coordinates of the object in global space  
rotation vector3 X,Y,Z rotations of the object in degrees in global space  
visible boolean Boolean indicating whether the object is visible to the agent True
receptacleObjectIds array of strings If the object is a receptacle, this is an array of objectIds that the receptacle contains  

Vector3 attributes

Attribute Type Description Example
x float    
y float    
z float    


thirdPartyCamera attributes

event.metadata['thirdPartyCameras']
Attribute Type Description Example
thirdPartyCameraId int id of the camera. Used in conjuction with UpdateThirdPartyCamera action to change the position/rotation of a camera. 0
position vector3 X,Y,Z coordinates of the agent in the world reference frame  
rotation vector3 X,Y,Z rotations of the agent in degrees in global space  


Next Steps

Continue on to the Object Types documentation.