This notebook serves as an introduction to the new functionality added to the nuScenes devkit for the prediction challenge.
It is organized into the following sections:
from nuscenes import NuScenes
# This is the path where you stored your copy of the nuScenes dataset.
DATAROOT = '/data/sets/nuscenes'
nuscenes = NuScenes('v1.0-mini', dataroot=DATAROOT)
This section assumes basic familiarity with the nuScenes schema.
The goal of the nuScenes prediction challenge is to predict the future location of agents in the nuScenes dataset. Agents are indexed by an instance token and a sample token. To get a list of agents in the train and val split of the challenge, we provide a function called get_prediction_challenge_split
.
The get_prediction_challenge_split function returns a list of strings of the form {instancetoken}{sample_token}. In the next section, we show how to use an instance token and sample token to query data for the prediction challenge.
from nuscenes.eval.prediction.splits import get_prediction_challenge_split
mini_train = get_prediction_challenge_split("mini_train", dataroot=DATAROOT)
mini_train[:5]
We provide a class called PredictHelper
that provides methods for querying past and future data for an agent. This class is instantiated by wrapping an instance of the NuScenes
class.
from nuscenes.prediction import PredictHelper
helper = PredictHelper(nuscenes)
To get the data for an agent at a particular point in time, use the get_sample_annotation
method.
instance_token, sample_token = mini_train[0].split("_")
annotation = helper.get_sample_annotation(instance_token, sample_token)
annotation
To get the future/past of an agent, use the get_past_for_agent
/get_future_for_agent
methods. If the in_agent_frame
parameter is set to true, the coordinates will be in the agent's local coordinate frame. Otherwise, they will be in the global frame.
future_xy_local = helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=True)
future_xy_local
The agent's coordinate frame is centered on the agent's current location and the agent's heading is aligned with the positive y axis. For example, the last coordinate in future_xy_local
corresponds to a location 0.31 meters to the left and 9.67 meters in front of the agents starting location.
future_xy_global = helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=False)
future_xy_global
Note that you can also return the entire annotation record by passing just_xy=False
. However in this case, in_agent_frame
is not taken into account.
helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=True, just_xy=False)
If you would like to return the data for the entire sample, as opposed to one agent in the sample, you can use the get_annotations_for_sample
method. This will return a list of records for each annotated agent in the sample.
sample = helper.get_annotations_for_sample(sample_token)
len(sample)
Note that there are get_future_for_sample
and get_past_for_sample
methods that are analogous to the get_future_for_agent
and get_past_for_agent
methods.
We also provide methods to compute the velocity, acceleration, and heading change rate of an agent at a given point in time
# We get new instance and sample tokens because these methods require computing the difference between records.
instance_token_2, sample_token_2 = mini_train[5].split("_")
# Meters / second.
print(f"Velocity: {helper.get_velocity_for_agent(instance_token_2, sample_token_2)}\n")
# Meters / second^2.
print(f"Acceleration: {helper.get_acceleration_for_agent(instance_token_2, sample_token_2)}\n")
# Radians / second.
print(f"Heading Change Rate: {helper.get_heading_change_rate_for_agent(instance_token_2, sample_token_2)}")
We've added a couple of methods to the Map API to help query lane center line information.
from nuscenes.map_expansion.map_api import NuScenesMap
nusc_map = NuScenesMap(map_name='singapore-onenorth', dataroot=DATAROOT)
To get the closest lane to a location, use the get_closest_lane
method. To see the internal data representation of the lane, use the get_lane_record
method.
You can also explore the connectivity of the lanes, with the get_outgoing_lanes
and get_incoming_lane
methods.
x, y, yaw = 395, 1095, 0
closest_lane = nusc_map.get_closest_lane(x, y, radius=2)
closest_lane
lane_record = nusc_map.get_lane(closest_lane)
lane_record
nusc_map.get_incoming_lane_ids(closest_lane)
nusc_map.get_outgoing_lane_ids(closest_lane)
To help manipulate the lanes, we've added an arcline_path_utils
module. For example, something you might want to do is discretize a lane into a sequence of poses.
from nuscenes.map_expansion import arcline_path_utils
poses = arcline_path_utils.discretize_lane(lane_record, resolution_meters=1)
poses
Given a query pose, you can also find the closest pose on a lane.
closest_pose_on_lane, distance_along_lane = arcline_path_utils.project_pose_to_lane((x, y, yaw), lane_record)
print(x, y, yaw)
closest_pose_on_lane
# Meters.
distance_along_lane
To find the entire length of the lane, you can use the length_of_lane
function.
arcline_path_utils.length_of_lane(lane_record)
You can also compute the curvature of a lane at a given distance along the lane.
# 0 means it is a straight lane.
arcline_path_utils.get_curvature_at_distance_along_lane(distance_along_lane, lane_record)
It is common in the prediction literature to represent the state of an agent as a tensor containing information about the semantic map (such as the drivable area and walkways), as well the past locations of surrounding agents.
Each paper in the field chooses to represent the input in a slightly different way. For example, CoverNet and MTP choose to rasterize the map information and agent locations into a three channel RGB image. But Rules of the Road decides to use a "taller" tensor with information represented in different channels.
We provide a module called input_representation
that is meant to make it easy for you to define your own input representation. In short, you need to define your own StaticLayerRepresentation
, AgentRepresentation
, and Combinator
.
The StaticLayerRepresentation
controls how the static map information is represented. The AgentRepresentation
controls how the locations of the agents in the scene are represented. The Combinator
controls how these two sources of information are combined into a single tensor.
For more information, consult input_representation/interface.py
.
To help get you started, we've provided implementations of input representation used in CoverNet and MTP.
import matplotlib.pyplot as plt
%matplotlib inline
from nuscenes.prediction.input_representation.static_layers import StaticLayerRasterizer
from nuscenes.prediction.input_representation.agents import AgentBoxesWithFadedHistory
from nuscenes.prediction.input_representation.interface import InputRepresentation
from nuscenes.prediction.input_representation.combinators import Rasterizer
static_layer_rasterizer = StaticLayerRasterizer(helper)
agent_rasterizer = AgentBoxesWithFadedHistory(helper, seconds_of_history=1)
mtp_input_representation = InputRepresentation(static_layer_rasterizer, agent_rasterizer, Rasterizer())
instance_token_img, sample_token_img = 'bc38961ca0ac4b14ab90e547ba79fbb6', '7626dde27d604ac28a0240bdd54eba7a'
anns = [ann for ann in nuscenes.sample_annotation if ann['instance_token'] == instance_token_img]
img = mtp_input_representation.make_input_representation(instance_token_img, sample_token_img)
plt.imshow(img)
We've provided PyTorch implementations for CoverNet and MTP. Below we show, how to make predictions on the previously created input representation.
from nuscenes.prediction.models.backbone import ResNetBackbone
from nuscenes.prediction.models.mtp import MTP
from nuscenes.prediction.models.covernet import CoverNet
import torch
Both models take a CNN backbone as a parameter. We've provided wrappers for ResNet and MobileNet v2. In this example, we'll use ResNet50.
backbone = ResNetBackbone('resnet50')
mtp = MTP(backbone, num_modes=2)
# Note that the value of num_modes depends on the size of the lattice used for CoverNet.
covernet = CoverNet(backbone, num_modes=64)
The second input is a tensor containing the velocity, acceleration, and heading change rate for the agent.
agent_state_vector = torch.Tensor([[helper.get_velocity_for_agent(instance_token_img, sample_token_img),
helper.get_acceleration_for_agent(instance_token_img, sample_token_img),
helper.get_heading_change_rate_for_agent(instance_token_img, sample_token_img)]])
image_tensor = torch.Tensor(img).permute(2, 0, 1).unsqueeze(0)
# Output has 50 entries.
# The first 24 are x,y coordinates (in the agent frame) over the next 6 seconds at 2 Hz for the first mode.
# The second 24 are the x,y coordinates for the second mode.
# The last 2 are the logits of the mode probabilities
mtp(image_tensor, agent_state_vector)
# CoverNet outputs a probability distribution over the trajectory set.
# These are the logits of the probabilities
logits = covernet(image_tensor, agent_state_vector)
print(logits)
The CoverNet model outputs a probability distribution over a set of trajectories. To be able to interpret the predictions, and perform inference with CoverNet, you need to download the trajectory sets from the nuscenes website. Download them from this link and unzip them in a directory of your choice.
Uncomment the following code when you do so:
#import pickle
# Epsilon is the amount of coverage in the set,
# i.e. a real world trajectory is at most 8 meters from a trajectory in this set
# We released the set for epsilon = 2, 4, 8. Consult the paper for more information
# on how this set was created
#PATH_TO_EPSILON_8_SET = ""
#trajectories = pickle.load(open(PATH_TO_EPSILON_8_SET, 'rb'))
# Saved them as a list of lists
#trajectories = torch.Tensor(trajectories)
# Print 5 most likely predictions
#trajectories[logits.argsort(descending=True)[:5]]
We also provide two physics-based models - A constant velocity and heading model and a physics oracle. The physics oracle estimates the future trajectory of the agent with several physics based models and chooses the one that is closest to the ground truth. It represents the best performance a purely physics based model could achieve on the dataset.
from nuscenes.prediction.models.physics import ConstantVelocityHeading, PhysicsOracle
cv_model = ConstantVelocityHeading(sec_from_now=6, helper=helper)
physics_oracle = PhysicsOracle(sec_from_now=6, helper=helper)
The physics models can be called as functions. They take as input a string of the instance and sample token of the agent concatenated with an underscore ("_").
The output is a Prediction
data type. The Prediction
data type stores the predicted trajectories and their associated probabilities for the agent. We'll go over the Prediction
type in greater detail in the next section.
cv_model(f"{instance_token_img}_{sample_token_img}")
physics_oracle(f"{instance_token_img}_{sample_token_img}")
Participants must submit a zipped json file containing serialized Predictions
for each agent in the validation set.
The previous section introduced the Prediction
data type. In this section, we explain the format in greater detail.
A Prediction
consists of four fields:
(number_of_modes,)
.You will get an error if any of these conditions are violated.
from nuscenes.eval.prediction.data_classes import Prediction
import numpy as np
# This would raise an error because instance is not a string.
#Prediction(instance=1, sample=sample_token_img,
# prediction=np.ones((1, 12, 2)), probabilities=np.array([1]))
# This would raise an error because sample is not a string.
#Prediction(instance=instance_token_img, sample=2,
# prediction=np.ones((1, 12, 2)), probabilities=np.array([1]))
# This would raise an error because prediction is not a numpy array.
#Prediction(instance=instance_token_img, sample=sample_token_img,
# prediction=np.ones((1, 12, 2)).tolist(), probabilities=np.array([1]))
# This would throw an error because probabilities is not a numpy array. Uncomment to see.
#Prediction(instance=instance_token_img, sample=sample_token_img,
# prediction=np.ones((1, 12, 2)), probabilities=[0.3])
# This would throw an error because there are more than 25 predicted modes. Uncomment to see.
#Prediction(instance=instance_token_img, sample=sample_token_img,
# prediction=np.ones((30, 12, 2)), probabilities=np.array([1/30]*30))
# This would throw an error because the number of predictions and probabilities don't match. Uncomment to see.
#Prediction(instance=instance_token_img, sample=sample_token_img,
#prediction=np.ones((13, 12, 2)), probabilities=np.array([1/12]*12))
To make a submission to the challenge, store your model predictions in a python list and save it to json. Then, upload a zipped version of your file to the eval server.
For an example, see eval/prediction/baseline_model_inference.py