nuScenes prediction tutorial¶

This notebook serves as an introduction to the new functionality added to the nuScenes devkit for the prediction challenge.

It is organized into the following sections:

Data splits for the challenge
Getting past and future data for an agent
Changes to the Map API
Overview of input representation
Model implementations
Making a submission to the challenge

from nuscenes import NuScenes

# This is the path where you stored your copy of the nuScenes dataset.
DATAROOT = '/data/sets/nuscenes'

nuscenes = NuScenes('v1.0-mini', dataroot=DATAROOT)

======
Loading NuScenes tables for version v1.0-mini...
Loading nuScenes-lidarseg...
32 category,
8 attribute,
4 visibility,
911 instance,
12 sensor,
120 calibrated_sensor,
31206 ego_pose,
8 log,
10 scene,
404 sample,
31206 sample_data,
18538 sample_annotation,
4 map,
404 lidarseg,
Done loading in 0.365 seconds.
======
Reverse indexing ...
Done reverse indexing in 0.1 seconds.
======

1. Data Splits for the Prediction Challenge¶

This section assumes basic familiarity with the nuScenes schema.

The goal of the nuScenes prediction challenge is to predict the future location of agents in the nuScenes dataset. Agents are indexed by an instance token and a sample token. To get a list of agents in the train and val split of the challenge, we provide a function called get_prediction_challenge_split.

The get_prediction_challenge_split function returns a list of strings of the form {instancetoken}{sample_token}. In the next section, we show how to use an instance token and sample token to query data for the prediction challenge.

from nuscenes.eval.prediction.splits import get_prediction_challenge_split
mini_train = get_prediction_challenge_split("mini_train", dataroot=DATAROOT)
mini_train[:5]

['bc38961ca0ac4b14ab90e547ba79fbb6_39586f9d59004284a7114a68825e8eec',
 'bc38961ca0ac4b14ab90e547ba79fbb6_356d81f38dd9473ba590f39e266f54e5',
 'bc38961ca0ac4b14ab90e547ba79fbb6_e0845f5322254dafadbbed75aaa07969',
 'bc38961ca0ac4b14ab90e547ba79fbb6_c923fe08b2ff4e27975d2bf30934383b',
 'bc38961ca0ac4b14ab90e547ba79fbb6_f1e3d9d08f044c439ce86a2d6fcca57b']

2. Getting past and future data for an agent¶

We provide a class called PredictHelper that provides methods for querying past and future data for an agent. This class is instantiated by wrapping an instance of the NuScenes class.

from nuscenes.prediction import PredictHelper
helper = PredictHelper(nuscenes)

To get the data for an agent at a particular point in time, use the get_sample_annotation method.

instance_token, sample_token = mini_train[0].split("_")
annotation = helper.get_sample_annotation(instance_token, sample_token)
annotation

{'token': 'a286c9633fa34da5b978758f348996b0',
 'sample_token': '39586f9d59004284a7114a68825e8eec',
 'instance_token': 'bc38961ca0ac4b14ab90e547ba79fbb6',
 'visibility_token': '4',
 'attribute_tokens': ['cb5118da1ab342aa947717dc53544259'],
 'translation': [392.945, 1148.426, 0.766],
 'size': [1.708, 4.01, 1.631],
 'rotation': [-0.5443682117180475, 0.0, 0.0, 0.8388463804957943],
 'prev': '16140fbf143d4e26a4a7613cbd3aa0e8',
 'next': 'b41e15b89fd44709b439de95dd723617',
 'num_lidar_pts': 0,
 'num_radar_pts': 0,
 'category_name': 'vehicle.car'}

To get the future/past of an agent, use the get_past_for_agent/get_future_for_agent methods. If the in_agent_frame parameter is set to true, the coordinates will be in the agent's local coordinate frame. Otherwise, they will be in the global frame.

future_xy_local = helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=True)
future_xy_local

array([[ 0.01075063,  0.2434942 ],
       [-0.20463666,  1.20515386],
       [-0.20398583,  2.57851309],
       [-0.25867757,  4.50313379],
       [-0.31359088,  6.67620961],
       [-0.31404147,  9.67727022]])

The agent's coordinate frame is centered on the agent's current location and the agent's heading is aligned with the positive y axis. For example, the last coordinate in future_xy_local corresponds to a location 0.31 meters to the left and 9.67 meters in front of the agents starting location.

future_xy_global = helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=False)
future_xy_global

array([[ 392.836, 1148.208],
       [ 392.641, 1147.242],
       [ 392.081, 1145.988],
       [ 391.347, 1144.208],
       [ 390.512, 1142.201],
       [ 389.29 , 1139.46 ]])

Note that you can also return the entire annotation record by passing just_xy=False. However in this case, in_agent_frame is not taken into account.

helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=True, just_xy=False)

[{'token': 'b41e15b89fd44709b439de95dd723617',
  'sample_token': '356d81f38dd9473ba590f39e266f54e5',
  'instance_token': 'bc38961ca0ac4b14ab90e547ba79fbb6',
  'visibility_token': '4',
  'attribute_tokens': ['cb5118da1ab342aa947717dc53544259'],
  'translation': [392.836, 1148.208, 0.791],
  'size': [1.708, 4.01, 1.631],
  'rotation': [-0.5443682117180475, 0.0, 0.0, 0.8388463804957943],
  'prev': 'a286c9633fa34da5b978758f348996b0',
  'next': 'b2b43ef63f5242b2a4c0b794e673782d',
  'num_lidar_pts': 10,
  'num_radar_pts': 2,
  'category_name': 'vehicle.car'},
 {'token': 'b2b43ef63f5242b2a4c0b794e673782d',
  'sample_token': 'e0845f5322254dafadbbed75aaa07969',
  'instance_token': 'bc38961ca0ac4b14ab90e547ba79fbb6',
  'visibility_token': '4',
  'attribute_tokens': ['cb5118da1ab342aa947717dc53544259'],
  'translation': [392.641, 1147.242, 0.816],
  'size': [1.708, 4.01, 1.631],
  'rotation': [-0.5443682117180475, 0.0, 0.0, 0.8388463804957943],
  'prev': 'b41e15b89fd44709b439de95dd723617',
  'next': '7bcf4bc87bf143588230254034eada12',
  'num_lidar_pts': 13,
  'num_radar_pts': 3,
  'category_name': 'vehicle.car'},
 {'token': '7bcf4bc87bf143588230254034eada12',
  'sample_token': 'c923fe08b2ff4e27975d2bf30934383b',
  'instance_token': 'bc38961ca0ac4b14ab90e547ba79fbb6',
  'visibility_token': '4',
  'attribute_tokens': ['cb5118da1ab342aa947717dc53544259'],
  'translation': [392.081, 1145.988, 0.841],
  'size': [1.708, 4.01, 1.631],
  'rotation': [-0.5443682117180475, 0.0, 0.0, 0.8388463804957943],
  'prev': 'b2b43ef63f5242b2a4c0b794e673782d',
  'next': '247a25c59f914adabee9460bd8307196',
  'num_lidar_pts': 18,
  'num_radar_pts': 3,
  'category_name': 'vehicle.car'},
 {'token': '247a25c59f914adabee9460bd8307196',
  'sample_token': 'f1e3d9d08f044c439ce86a2d6fcca57b',
  'instance_token': 'bc38961ca0ac4b14ab90e547ba79fbb6',
  'visibility_token': '4',
  'attribute_tokens': ['cb5118da1ab342aa947717dc53544259'],
  'translation': [391.347, 1144.208, 0.841],
  'size': [1.708, 4.01, 1.631],
  'rotation': [-0.5443682117180475, 0.0, 0.0, 0.8388463804957943],
  'prev': '7bcf4bc87bf143588230254034eada12',
  'next': 'e25b9e7019814d53876ff2697df7a2de',
  'num_lidar_pts': 20,
  'num_radar_pts': 4,
  'category_name': 'vehicle.car'},
 {'token': 'e25b9e7019814d53876ff2697df7a2de',
  'sample_token': '4f545737bf3347fbbc9af60b0be9a963',
  'instance_token': 'bc38961ca0ac4b14ab90e547ba79fbb6',
  'visibility_token': '4',
  'attribute_tokens': ['cb5118da1ab342aa947717dc53544259'],
  'translation': [390.512, 1142.201, 0.891],
  'size': [1.708, 4.01, 1.631],
  'rotation': [-0.5443682117180475, 0.0, 0.0, 0.8388463804957943],
  'prev': '247a25c59f914adabee9460bd8307196',
  'next': 'fe33c018573e4abda3ff8de0566ee800',
  'num_lidar_pts': 24,
  'num_radar_pts': 2,
  'category_name': 'vehicle.car'},
 {'token': 'fe33c018573e4abda3ff8de0566ee800',
  'sample_token': '7626dde27d604ac28a0240bdd54eba7a',
  'instance_token': 'bc38961ca0ac4b14ab90e547ba79fbb6',
  'visibility_token': '4',
  'attribute_tokens': ['cb5118da1ab342aa947717dc53544259'],
  'translation': [389.29, 1139.46, 0.941],
  'size': [1.708, 4.01, 1.631],
  'rotation': [-0.5443682117180475, 0.0, 0.0, 0.8388463804957943],
  'prev': 'e25b9e7019814d53876ff2697df7a2de',
  'next': '2c1a8ae13d76498c838a1fb733ff8700',
  'num_lidar_pts': 30,
  'num_radar_pts': 2,
  'category_name': 'vehicle.car'}]

If you would like to return the data for the entire sample, as opposed to one agent in the sample, you can use the get_annotations_for_sample method. This will return a list of records for each annotated agent in the sample.

sample = helper.get_annotations_for_sample(sample_token)
len(sample)

78

Note that there are get_future_for_sample and get_past_for_sample methods that are analogous to the get_future_for_agent and get_past_for_agent methods.

We also provide methods to compute the velocity, acceleration, and heading change rate of an agent at a given point in time

# We get new instance and sample tokens because these methods require computing the difference between records.
instance_token_2, sample_token_2 = mini_train[5].split("_")

# Meters / second.
print(f"Velocity: {helper.get_velocity_for_agent(instance_token_2, sample_token_2)}\n")

# Meters / second^2.
print(f"Acceleration: {helper.get_acceleration_for_agent(instance_token_2, sample_token_2)}\n")

# Radians / second.
print(f"Heading Change Rate: {helper.get_heading_change_rate_for_agent(instance_token_2, sample_token_2)}")

Velocity: 4.385040264738063

Acceleration: 0.30576530453207523

Heading Change Rate: 0.0

Changes to the Map API¶

We've added a couple of methods to the Map API to help query lane center line information.

from nuscenes.map_expansion.map_api import NuScenesMap
nusc_map = NuScenesMap(map_name='singapore-onenorth', dataroot=DATAROOT)

To get the closest lane to a location, use the get_closest_lane method. To see the internal data representation of the lane, use the get_lane_record method. You can also explore the connectivity of the lanes, with the get_outgoing_lanes and get_incoming_lane methods.

x, y, yaw = 395, 1095, 0
closest_lane = nusc_map.get_closest_lane(x, y, radius=2)
closest_lane

'5933500a-f0f2-4d69-9bbc-83b875e4a73e'

lane_record = nusc_map.get_lane(closest_lane)
lane_record

[{'start_pose': [421.2419602954602, 1087.9127960414617, 2.739593514975998],
  'end_pose': [391.7142849867393, 1100.464077182952, 2.7365754617298705],
  'shape': 'LSR',
  'radius': 999.999,
  'segment_length': [0.23651121617864976,
   28.593481378991886,
   3.254561444252876]}]

nusc_map.get_incoming_lane_ids(closest_lane)

['f24a067b-d650-47d0-8664-039d648d7c0d']

nusc_map.get_outgoing_lane_ids(closest_lane)

['0282d0e3-b6bf-4bcd-be24-35c9ce4c6591',
 '28d15254-0ef9-48c3-9e06-dc5a25b31127']

To help manipulate the lanes, we've added an arcline_path_utils module. For example, something you might want to do is discretize a lane into a sequence of poses.

from nuscenes.map_expansion import arcline_path_utils
poses = arcline_path_utils.discretize_lane(lane_record, resolution_meters=1)
poses

[(421.2419602954602, 1087.9127960414617, 2.739593514975998),
 (420.34712994585345, 1088.2930152148274, 2.739830026428688),
 (419.45228865726136, 1088.6732086473173, 2.739830026428688),
 (418.5574473686693, 1089.0534020798073, 2.739830026428688),
 (417.66260608007724, 1089.433595512297, 2.739830026428688),
 (416.76776479148515, 1089.813788944787, 2.739830026428688),
 (415.87292350289306, 1090.1939823772768, 2.739830026428688),
 (414.97808221430097, 1090.5741758097668, 2.739830026428688),
 (414.0832409257089, 1090.9543692422567, 2.739830026428688),
 (413.1883996371168, 1091.3345626747464, 2.739830026428688),
 (412.29355834852475, 1091.7147561072363, 2.739830026428688),
 (411.39871705993266, 1092.0949495397263, 2.739830026428688),
 (410.5038757713406, 1092.4751429722162, 2.739830026428688),
 (409.6090344827485, 1092.8553364047061, 2.739830026428688),
 (408.7141931941564, 1093.2355298371958, 2.739830026428688),
 (407.81935190556436, 1093.6157232696858, 2.739830026428688),
 (406.92451061697227, 1093.9959167021757, 2.739830026428688),
 (406.0296693283802, 1094.3761101346656, 2.739830026428688),
 (405.1348280397881, 1094.7563035671556, 2.739830026428688),
 (404.239986751196, 1095.1364969996453, 2.739830026428688),
 (403.3451454626039, 1095.5166904321352, 2.739830026428688),
 (402.4503041740119, 1095.8968838646251, 2.739830026428688),
 (401.5554628854198, 1096.277077297115, 2.739830026428688),
 (400.6606215968277, 1096.657270729605, 2.739830026428688),
 (399.7657803082356, 1097.0374641620947, 2.739830026428688),
 (398.8709390196435, 1097.4176575945846, 2.739830026428688),
 (397.9760977310515, 1097.7978510270746, 2.739830026428688),
 (397.0812564424594, 1098.1780444595645, 2.739830026428688),
 (396.1864151538673, 1098.5582378920544, 2.739830026428688),
 (395.2915738652752, 1098.9384313245444, 2.739830026428688),
 (394.3967548911081, 1099.318677260896, 2.739492242286598),
 (393.5022271882191, 1099.69960782173, 2.738519982101022),
 (392.60807027168346, 1100.0814079160527, 2.737547721915446),
 (391.71428498673856, 1100.4640771829522, 2.7365754617298705)]

Given a query pose, you can also find the closest pose on a lane.

closest_pose_on_lane, distance_along_lane = arcline_path_utils.project_pose_to_lane((x, y, yaw), lane_record)

print(x, y, yaw)
closest_pose_on_lane

395 1095 0

(396.25524909914367, 1098.5289922434013, 2.739830026428688)

# Meters.
distance_along_lane

27.5

To find the entire length of the lane, you can use the length_of_lane function.

arcline_path_utils.length_of_lane(lane_record)

32.08455403942341

You can also compute the curvature of a lane at a given distance along the lane.

# 0 means it is a straight lane.
arcline_path_utils.get_curvature_at_distance_along_lane(distance_along_lane, lane_record)

0

4. Input Representation¶

It is common in the prediction literature to represent the state of an agent as a tensor containing information about the semantic map (such as the drivable area and walkways), as well the past locations of surrounding agents.

Each paper in the field chooses to represent the input in a slightly different way. For example, CoverNet and MTP choose to rasterize the map information and agent locations into a three channel RGB image. But Rules of the Road decides to use a "taller" tensor with information represented in different channels.

We provide a module called input_representation that is meant to make it easy for you to define your own input representation. In short, you need to define your own StaticLayerRepresentation, AgentRepresentation, and Combinator.

The StaticLayerRepresentation controls how the static map information is represented. The AgentRepresentation controls how the locations of the agents in the scene are represented. The Combinator controls how these two sources of information are combined into a single tensor.

For more information, consult input_representation/interface.py.

To help get you started, we've provided implementations of input representation used in CoverNet and MTP.

import matplotlib.pyplot as plt
%matplotlib inline

from nuscenes.prediction.input_representation.static_layers import StaticLayerRasterizer
from nuscenes.prediction.input_representation.agents import AgentBoxesWithFadedHistory
from nuscenes.prediction.input_representation.interface import InputRepresentation
from nuscenes.prediction.input_representation.combinators import Rasterizer

static_layer_rasterizer = StaticLayerRasterizer(helper)
agent_rasterizer = AgentBoxesWithFadedHistory(helper, seconds_of_history=1)
mtp_input_representation = InputRepresentation(static_layer_rasterizer, agent_rasterizer, Rasterizer())

instance_token_img, sample_token_img = 'bc38961ca0ac4b14ab90e547ba79fbb6', '7626dde27d604ac28a0240bdd54eba7a'
anns = [ann for ann in nuscenes.sample_annotation if ann['instance_token'] == instance_token_img]
img = mtp_input_representation.make_input_representation(instance_token_img, sample_token_img)

plt.imshow(img)

<matplotlib.image.AxesImage at 0x7f96cf06acd0>

Model Implementations¶

We've provided PyTorch implementations for CoverNet and MTP. Below we show, how to make predictions on the previously created input representation.

from nuscenes.prediction.models.backbone import ResNetBackbone
from nuscenes.prediction.models.mtp import MTP
from nuscenes.prediction.models.covernet import CoverNet
import torch

Both models take a CNN backbone as a parameter. We've provided wrappers for ResNet and MobileNet v2. In this example, we'll use ResNet50.

backbone = ResNetBackbone('resnet50')
mtp = MTP(backbone, num_modes=2)

# Note that the value of num_modes depends on the size of the lattice used for CoverNet.
covernet = CoverNet(backbone, num_modes=64)

The second input is a tensor containing the velocity, acceleration, and heading change rate for the agent.

agent_state_vector = torch.Tensor([[helper.get_velocity_for_agent(instance_token_img, sample_token_img),
                                    helper.get_acceleration_for_agent(instance_token_img, sample_token_img),
                                    helper.get_heading_change_rate_for_agent(instance_token_img, sample_token_img)]])

image_tensor = torch.Tensor(img).permute(2, 0, 1).unsqueeze(0)

# Output has 50 entries.
# The first 24 are x,y coordinates (in the agent frame) over the next 6 seconds at 2 Hz for the first mode.
# The second 24 are the x,y coordinates for the second mode.
# The last 2 are the logits of the mode probabilities
mtp(image_tensor, agent_state_vector)

tensor([[ 0.4839,  0.0944,  0.3460,  0.3384,  0.2737,  0.0826,  0.0553,  0.6067,
         -0.3861, -0.1224,  0.2972,  0.1663, -0.0052,  0.1951,  0.0379, -0.4311,
         -0.0968,  0.0072, -0.1539, -0.0251,  0.0557, -0.1463,  0.0608,  0.5879,
          0.0323,  0.5285, -0.7789, -0.3915, -0.1293, -0.2928,  0.9087,  0.1868,
          0.3437, -0.7392, -0.0741,  0.1211,  0.3565, -0.6262, -0.2251,  0.2054,
         -0.4049, -0.0972,  0.0707,  0.0077, -0.1150, -0.0028, -0.5042,  0.1046,
         -0.0161, -0.0428]], grad_fn=<CatBackward>)

# CoverNet outputs a probability distribution over the trajectory set.
# These are the logits of the probabilities
logits = covernet(image_tensor, agent_state_vector)
print(logits)

tensor([[ 0.2389,  0.0762, -0.2946, -0.1539,  0.1344, -0.1617,  0.1858,  0.3436,
         -0.2182,  0.2768, -0.2110,  0.4687, -0.1216,  0.2953, -0.0632,  0.4549,
          0.1188,  0.3894,  0.1003, -0.3558,  0.3494, -0.6580, -0.3267,  0.3371,
         -0.2052, -0.0848, -0.0737, -0.3038, -0.2854, -0.2493,  0.3163, -0.3109,
         -0.4432,  0.2070,  0.3575,  0.2349, -0.1563, -0.0755,  0.0773,  0.3021,
          0.4280, -0.2927, -0.5366,  0.0690,  0.3658,  0.1188, -0.2077, -0.2085,
          0.4574,  0.6331,  0.1192,  0.0566, -0.2119, -0.3563, -0.8957, -0.0793,
         -0.6749, -0.2598, -0.6100,  0.0388,  0.3853,  0.0765, -0.0026,  0.0628]],
       grad_fn=<AddmmBackward>)

The CoverNet model outputs a probability distribution over a set of trajectories. To be able to interpret the predictions, and perform inference with CoverNet, you need to download the trajectory sets from the nuscenes website. Download them from this link and unzip them in a directory of your choice.

Uncomment the following code when you do so:

#import pickle

# Epsilon is the amount of coverage in the set, 
# i.e. a real world trajectory is at most 8 meters from a trajectory in this set
# We released the set for epsilon = 2, 4, 8. Consult the paper for more information
# on how this set was created

#PATH_TO_EPSILON_8_SET = ""
#trajectories = pickle.load(open(PATH_TO_EPSILON_8_SET, 'rb'))

# Saved them as a list of lists
#trajectories = torch.Tensor(trajectories)

# Print 5 most likely predictions
#trajectories[logits.argsort(descending=True)[:5]]

We also provide two physics-based models - A constant velocity and heading model and a physics oracle. The physics oracle estimates the future trajectory of the agent with several physics based models and chooses the one that is closest to the ground truth. It represents the best performance a purely physics based model could achieve on the dataset.

from nuscenes.prediction.models.physics import ConstantVelocityHeading, PhysicsOracle

cv_model = ConstantVelocityHeading(sec_from_now=6, helper=helper)
physics_oracle = PhysicsOracle(sec_from_now=6, helper=helper)

The physics models can be called as functions. They take as input a string of the instance and sample token of the agent concatenated with an underscore ("_").

The output is a Prediction data type. The Prediction data type stores the predicted trajectories and their associated probabilities for the agent. We'll go over the Prediction type in greater detail in the next section.

cv_model(f"{instance_token_img}_{sample_token_img}")

Prediction(instance=bc38961ca0ac4b14ab90e547ba79fbb6, sample=7626dde27d604ac28a0240bdd54eba7a, prediction=[[[ 388.17909232 1136.96919062]
  [ 387.06818463 1134.47838124]
  [ 385.95727695 1131.98757186]
  [ 384.84636926 1129.49676248]
  [ 383.73546158 1127.0059531 ]
  [ 382.6245539  1124.51514372]
  [ 381.51364621 1122.02433435]
  [ 380.40273853 1119.53352497]
  [ 379.29183084 1117.04271559]
  [ 378.18092316 1114.55190621]
  [ 377.07001548 1112.06109683]
  [ 375.95910779 1109.57028745]]], probabilities=[1])

physics_oracle(f"{instance_token_img}_{sample_token_img}")

Prediction(instance=bc38961ca0ac4b14ab90e547ba79fbb6, sample=7626dde27d604ac28a0240bdd54eba7a, prediction=[[[ 388.17909232 1136.96919062]
  [ 387.06818463 1134.47838124]
  [ 385.95727695 1131.98757186]
  [ 384.84636926 1129.49676248]
  [ 383.73546158 1127.0059531 ]
  [ 382.6245539  1124.51514372]
  [ 381.51364621 1122.02433435]
  [ 380.40273853 1119.53352497]
  [ 379.29183084 1117.04271559]
  [ 378.18092316 1114.55190621]
  [ 377.07001548 1112.06109683]
  [ 375.95910779 1109.57028745]]], probabilities=[1])

6. Making a submission to the challenge¶

Participants must submit a zipped json file containing serialized Predictions for each agent in the validation set.

The previous section introduced the Prediction data type. In this section, we explain the format in greater detail.

A Prediction consists of four fields:

instance: The instance token for the agent.
sample: The sample token for the agent.
prediction: Prediction from model. A prediction can consist of up to 25 proposed trajectories. This field must be a numpy array with three dimensions (number of trajectories (also called modes), number of timesteps, 2).
probabilities: The probability corresponding to each predicted mode. This is a numpy array with shape (number_of_modes,).

You will get an error if any of these conditions are violated.

from nuscenes.eval.prediction.data_classes import Prediction
import numpy as np

# This would raise an error because instance is not a string.

#Prediction(instance=1, sample=sample_token_img,
#           prediction=np.ones((1, 12, 2)), probabilities=np.array([1]))

# This would raise an error because sample is not a string.

#Prediction(instance=instance_token_img, sample=2,
#           prediction=np.ones((1, 12, 2)), probabilities=np.array([1]))

# This would raise an error because prediction is not a numpy array.

#Prediction(instance=instance_token_img, sample=sample_token_img,
#           prediction=np.ones((1, 12, 2)).tolist(), probabilities=np.array([1]))

# This would throw an error because probabilities is not a numpy array. Uncomment to see.

#Prediction(instance=instance_token_img, sample=sample_token_img,
#           prediction=np.ones((1, 12, 2)), probabilities=[0.3])

# This would throw an error because there are more than 25 predicted modes. Uncomment to see.

#Prediction(instance=instance_token_img, sample=sample_token_img,
#           prediction=np.ones((30, 12, 2)), probabilities=np.array([1/30]*30))

# This would throw an error because the number of predictions and probabilities don't match. Uncomment to see.

#Prediction(instance=instance_token_img, sample=sample_token_img,
           #prediction=np.ones((13, 12, 2)), probabilities=np.array([1/12]*12))

To make a submission to the challenge, store your model predictions in a python list and save it to json. Then, upload a zipped version of your file to the eval server.

For an example, see eval/prediction/baseline_model_inference.py