Operations

How to run

We provide various basic operations allowing users to modify the built database for ML applications. These operations include building database from different data providers;aggregating datasets from diverse source; splitting datasets to training/test set;sanity check/filtering scenarios. All commands can be run with python -m scenarionet.[command], e.g. python -m scenarionet.list for listing available operations. The parameters for each script can be found by adding a -h flag.

Note

When running python -m, make sure the directory you are at doesn’t contain a folder called scenarionet. Otherwise, the running may fail. This usually happens if you install ScenarioNet or MetaDrive via git clone and put it under a directory you usually work with like home directory.

List

This command can list all operations with detailed descriptions:

python -m scenarionet.list

Convert

ScenarioNet doesn’t provide any data. Instead, it provides converters to parse common open-sourced driving datasets to an internal scenario description, which comprises scenario databases. Thus converting scenarios to our internal scenario description is the first step to build the databases. Currently,we provide convertors for Waymo, nuPlan, nuScenes (Lyft) datasets.

Convert Waymo

python -m scenarionet.convert_waymo [-h] [--database_path DATABASE_PATH]
                        [--dataset_name DATASET_NAME] [--version VERSION]
                        [--overwrite] [--num_workers NUM_WORKERS]
                        [--raw_data_path RAW_DATA_PATH]
                        [--start_file_index START_FILE_INDEX]
                        [--num_files NUM_FILES]

Build database from Waymo scenarios

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        A directory, the path to place the converted data
  --dataset_name DATASET_NAME, -n DATASET_NAME
                        Dataset name, will be used to generate scenario files
  --version VERSION, -v VERSION
                        version
  --overwrite           If the database_path exists, whether to overwrite it
  --num_workers NUM_WORKERS
                        number of workers to use
  --raw_data_path RAW_DATA_PATH
                        The directory stores all waymo tfrecord
  --start_file_index START_FILE_INDEX
                        Control how many files to use. We will list all files
                        in the raw data folder and select
                        files[start_file_index: start_file_index+num_files]
  --num_files NUM_FILES
                        Control how many files to use. We will list all files
                        in the raw data folder and select
                        files[start_file_index: start_file_index+num_files]

This script converted the recorded scenario into our scenario descriptions. Detailed guide is available at Section Waymo.

Convert nuPlan

python -m scenarionet.convert_nuplan [-h] [--database_path DATABASE_PATH]
                     [--dataset_name DATASET_NAME] [--version VERSION]
                     [--overwrite] [--num_workers NUM_WORKERS]
                     [--raw_data_path RAW_DATA_PATH] [--test]

Build database from nuPlan scenarios

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        A directory, the path to place the data
  --dataset_name DATASET_NAME, -n DATASET_NAME
                        Dataset name, will be used to generate scenario files
  --version VERSION, -v VERSION
                        version of the raw data
  --overwrite           If the database_path exists, whether to overwrite it
  --num_workers NUM_WORKERS
                        number of workers to use
  --raw_data_path RAW_DATA_PATH
                        the place store .db files
  --test                for test use only. convert one log

This script converted the recorded nuPlan scenario into our scenario descriptions. It needs to install nuplan-devkit and download the source data from https://www.nuscenes.org/nuplan. Detailed guide is available at Section nuPlan.

Convert nuScenes (Lyft)

python -m scenarionet.convert_nuscenes [-h] [--database_path DATABASE_PATH]
                           [--dataset_name DATASET_NAME]
                           [--split
{v1.0-mini,mini_val,v1.0-test,train,train_val,val,mini_train,v1.0-trainval}]
                           [--dataroot DATAROOT] [--map_radius MAP_RADIUS]
                           [--future FUTURE] [--past PAST] [--overwrite]
                           [--num_workers NUM_WORKERS]

Build database from nuScenes/Lyft scenarios

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        directory, The path to place the data
  --dataset_name DATASET_NAME, -n DATASET_NAME
                        Dataset name, will be used to generate scenario files
  --split
    {v1.0-mini,mini_val,v1.0-test,train,train_val,val,mini_train,v1.0-trainval}
                        Which splits of nuScenes data should be sued. If set
                        to ['v1.0-mini', 'v1.0-trainval', 'v1.0-test'], it
                        will convert the full log into scenarios with 20
                        second episode length. If set to ['mini_train',
                        'mini_val', 'train', 'train_val', 'val'], it will
                        convert segments used for nuScenes prediction
                        challenge to scenarios, resulting in more converted
                        scenarios. Generally, you should choose this parameter
                        from ['v1.0-mini', 'v1.0-trainval', 'v1.0-test'] to
                        get complete scenarios for planning unless you want to
                        use the converted scenario files for prediction task.
  --dataroot DATAROOT   The path of nuscenes data
  --map_radius MAP_RADIUS The size of map
  --future FUTURE       6 seconds by default. How many future seconds to
                        predict. Only available if split is chosen from
                        ['mini_train', 'mini_val', 'train', 'train_val',
                        'val']
  --past PAST           2 seconds by default. How many past seconds are used
                        for prediction. Only available if split is chosen from
                        ['mini_train', 'mini_val', 'train', 'train_val',
                        'val']
  --overwrite           If the database_path exists, whether to overwrite it
  --num_workers NUM_WORKERS

This script converted the recorded nuScenes scenario into our scenario descriptions. It needs to install nuscenes-devkit and download the source data from https://www.nuscenes.org/nuscenes. For Lyft datasets, this API can only convert the old version Lyft data as the old Lyft data can be parsed via nuscenes-devkit. However, Lyft is now a part of Woven Planet and the new data has to be parsed via new toolkit. We are working on support this new toolkit to support the new Lyft dataset. Detailed guide is available at Section nuScenes.

Convert PG

python -m scenarionet.convert_pg [-h] [--database_path DATABASE_PATH]
                     [--dataset_name DATASET_NAME] [--version VERSION]
                     [--overwrite] [--num_workers NUM_WORKERS]
                     [--num_scenarios NUM_SCENARIOS]
                     [--start_index START_INDEX]

Build database from synthetic or procedurally generated scenarios

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        directory, The path to place the data
  --dataset_name DATASET_NAME, -n DATASET_NAME
                        Dataset name, will be used to generate scenario files
  --version VERSION, -v VERSION
                        version
  --overwrite           If the database_path exists, whether to overwrite it
  --num_workers NUM_WORKERS
                        number of workers to use
  --num_scenarios NUM_SCENARIOS
                        how many scenarios to generate (default: 30)
  --start_index START_INDEX
                        which index to start

PG refers to Procedural Generation. Scenario database generated in this way are created by a set of rules with hand-crafted maps. These scenarios are collected by driving the ego car with an IDM policy in different scenarios. Detailed guide is available at Section PG.

Merge

This command is for merging existing databases to build a larger one. This is why we can build a ScenarioNet! After converting data recorded in different format to this unified scenario description, we can aggregate them freely and enlarge the database.

python -m scenarionet.merge [-h] --to DATABASE_PATH --from FROM [FROM ...]
                [--exist_ok] [--overwrite] [--filter_moving_dist]
                [--sdc_moving_dist_min SDC_MOVING_DIST_MIN]

Merge a list of databases. e.g. scenario.merge --from db_1 db_2 db_3...db_n
--to db_dest

optional arguments:
-h, --help            show this help message and exit
--database_path DATABASE_PATH, -d DATABASE_PATH, --to DATABASE_PATH
                    The name of the new combined database. It will create
                    a new directory to store dataset_summary.pkl and
                    dataset_mapping.pkl. If exists_ok=True, those two .pkl
                    files will be stored in an existing directory and turn
                    that directory into a database.
--from FROM [FROM ...]
                    Which datasets to combine. It takes any number of
                    directory path as input
--exist_ok            Still allow to write, if the dir exists already. This
                    write will only create two .pkl files and this
                    directory will become a database.
--overwrite           When exists ok is set but summary.pkl and map.pkl
                    exists in existing dir, whether to overwrite both
                    files
--filter_moving_dist  add this flag to select cases with SDC moving dist >
                    sdc_moving_dist_min
--sdc_moving_dist_min SDC_MOVING_DIST_MIN
                    Selecting case with sdc_moving_dist > this value. We
                    will add more filter conditions in the future.

Split

The split action is for extracting a part of scenarios from an existing one and building a new database. This is usually used to build training/test/validation set.

python -m scenarionet.split [-h] --from FROM --to TO [--num_scenarios NUM_SCENARIOS]
            [--start_index START_INDEX] [--random] [--exist_ok]
            [--overwrite]

Build a new database containing a subset of scenarios from an existing
database.

optional arguments:
  -h, --help            show this help message and exit
  --from FROM           Which database to extract data from.
  --to TO               The name of the new database. It will create a new
                        directory to store dataset_summary.pkl and
                        dataset_mapping.pkl. If exists_ok=True, those two .pkl
                        files will be stored in an existing directory and turn
                        that directory into a database.
  --num_scenarios NUM_SCENARIOS
                        how many scenarios to extract (default: 30)
  --start_index START_INDEX
                        which index to start
  --random              If set to true, it will choose scenarios randomly from
                        all_scenarios[start_index:]. Otherwise, the scenarios
                        will be selected sequentially
  --exist_ok            Still allow to write, if the to_folder exists already.
                        This write will only create two .pkl files and this
                        directory will become a database.
  --overwrite           When exists ok is set but summary.pkl and map.pkl
                        exists in existing dir, whether to overwrite both
                        files

Copy (Move)

As the the database built by ScenarioNet stores the scenarios with virtual mapping, directly move or copy an existing database to a new location with cp or mv command will break the soft link. For moving or copying the scenarios to a new path, one should use this command. When --remove_source is added, this copy command will be changed to move.

python -m scenarionet.cp [-h] --from FROM --to TO [--remove_source] [--copy_raw_data]
               [--exist_ok] [--overwrite]

Move or Copy an existing database

optional arguments:
  -h, --help       show this help message and exit
  --from FROM      Which database to move.
  --to TO          The name of the new database. It will create a new
                   directory to store dataset_summary.pkl and
                   dataset_mapping.pkl. If exists_ok=True, those two .pkl
                   files will be stored in an existing directory and turn that
                   directory into a database.
  --remove_source  Remove the `from_database` if set this flag
  --copy_raw_data  Instead of creating virtual file mapping, copy raw
                   scenario.pkl file
  --exist_ok       Still allow to write, if the to_folder exists already. This
                   write will only create two .pkl files and this directory
                   will become a database.
  --overwrite      When exists ok is set but summary.pkl and map.pkl exists in
                   existing dir, whether to overwrite both files

Num

Report the number of scenarios in a database.

python -m scenarionet.num [-h] --database_path DATABASE_PATH

The number of scenarios in the specified database

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        Database to check number of scenarios

Filter

Some scenarios contain overpasses, short ego-car trajectory or traffic signals. This scenarios can be filtered out from the database by using this command. Now, we only provide filters for ego car moving distance, number of objects, traffic lights, overpasses and scenario ids. If you would like to contribute new filters, feel free to create an issue or pull request on our Github repo.

python -m scenarionet.filter [-h] --database_path DATABASE_PATH --from FROM
                      [--exist_ok] [--overwrite] [--moving_dist]
                      [--sdc_moving_dist_min SDC_MOVING_DIST_MIN]
                      [--num_object] [--max_num_object MAX_NUM_OBJECT]
                      [--no_overpass] [--no_traffic_light] [--id_filter]
                      [--exclude_ids EXCLUDE_IDS [EXCLUDE_IDS ...]]

Filter unwanted scenarios out and build a new database

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        The name of the new database. It will create a new
                        directory to store dataset_summary.pkl and
                        dataset_mapping.pkl. If exists_ok=True, those two .pkl
                        files will be stored in an existing directory and turn
                        that directory into a database.
  --from FROM           Which dataset to filter. It takes one directory path
                        as input
  --exist_ok            Still allow to write, if the dir exists already. This
                        write will only create two .pkl files and this
                        directory will become a database.
  --overwrite           When exists ok is set but summary.pkl and map.pkl
                        exists in existing dir, whether to overwrite both
                        files
  --moving_dist         add this flag to select cases with SDC moving dist >
                        sdc_moving_dist_min
  --sdc_moving_dist_min SDC_MOVING_DIST_MIN
                        Selecting case with sdc_moving_dist > this value.
  --num_object          add this flag to select cases with object_num <
                        max_num_object
  --max_num_object MAX_NUM_OBJECT
                        case will be selected if num_obj < this argument
  --no_overpass         Scenarios with overpass WON'T be selected
  --no_traffic_light    Scenarios with traffic light WON'T be selected
  --id_filter           Scenarios with indicated name will NOT be selected
  --exclude_ids EXCLUDE_IDS [EXCLUDE_IDS ...]
                        Scenarios with indicated name will NOT be selected

Build from Errors

This script is for generating a new database to exclude (include) broken scenarios. This is useful for debugging broken scenarios or building a completely clean datasets for training or testing.

python -m scenarionet.generate_from_error_file [-h] --database_path DATABASE_PATH --file
                               FILE [--overwrite] [--broken]

Generate a new database excluding or only including the failed scenarios
detected by 'check_simulation' and 'check_existence'

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        The path of the newly generated database
  --file FILE, -f FILE  The path of the error file, should be xyz.json
  --overwrite           If the database_path exists, overwrite it
  --broken              By default, only successful scenarios will be picked
                        to build the new database. If turn on this flog, it
                        will generate database containing only broken
                        scenarios.

Sim

Load a database to simulator and replay the scenarios. We provide different render mode allows users to visualize them. For more details of simulation, please check Section Simulation or the MetaDrive document.

python -m scenarionet.sim [-h] --database_path DATABASE_PATH
          [--render {none,2D,3D,advanced,semantic}]
          [--scenario_index SCENARIO_INDEX]

Load a database to simulator and replay scenarios

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        The path of the database
  --render {none,2D,3D,advanced,semantic}
  --scenario_index SCENARIO_INDEX
                        Specifying a scenario to run

Check Existence

We provide a tool to check if the scenarios in a database are runnable and exist on your machine. This is because we include the scenarios to a database, a folder, through a virtual mapping. Each database only records the path of each scenario relative to the database directory. Thus this script is for making sure all original scenario file exists and can be loaded.

If it manages to find some broken scenarios, an error file will be generated to the specified path. By using generate_from_error_file, a new database can be created to exclude or only include these broken scenarios. In this way, we can debug the broken scenarios to check what causes the error or just ignore and remove the broke scenarios to make the database intact.

python -m scenarionet.check_existence [-h] --database_path DATABASE_PATH
                          [--error_file_path ERROR_FILE_PATH] [--overwrite]
                          [--num_workers NUM_WORKERS] [--random_drop]

Check if the database is intact and all scenarios can be found and recorded in
internal scenario description

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        Dataset path, a directory containing summary.pkl and
                        mapping.pkl
  --error_file_path ERROR_FILE_PATH
                        Where to save the error file. One can generate a new
                        database excluding or only including the failed
                        scenarios.For more details, see operation
                        'generate_from_error_file'
  --overwrite           If an error file already exists in error_file_path,
                        whether to overwrite it
  --num_workers NUM_WORKERS
                        number of workers to use
  --random_drop         Randomly make some scenarios fail. for test only!

Check Simulation

This is a upgraded version of existence check. It not only detect the existence and the completeness of the database, but check whether all scenarios can be loaded and run in the simulator.

python -m scenarionet.check_simulation [-h] --database_path DATABASE_PATH
                       [--error_file_path ERROR_FILE_PATH] [--overwrite]
                       [--num_workers NUM_WORKERS] [--random_drop]

Check if all scenarios can be simulated in simulator. We recommend doing this
before close-loop training/testing

optional arguments:
  -h, --help            show this help message and exit
  --database_path DATABASE_PATH, -d DATABASE_PATH
                        Dataset path, a directory containing summary.pkl and
                        mapping.pkl
  --error_file_path ERROR_FILE_PATH
                        Where to save the error file. One can generate a new
                        database excluding or only including the failed
                        scenarios.For more details, see operation
                        'generate_from_error_file'
  --overwrite           If an error file already exists in error_file_path,
                        whether to overwrite it
  --num_workers NUM_WORKERS
                        number of workers to use
  --random_drop         Randomly make some scenarios fail. for test only!

Check Overlap

This script is for checking if there are some overlaps between two databases. The main goal of this command is to ensure that the training and test sets are isolated.

python -m scenarionet.check_overlap [-h] --d_1 D_1 --d_2 D_2 [--show_id]

Check if there are overlapped scenarios between two databases. If so, return
the number of overlapped scenarios and id list

optional arguments:
  -h, --help  show this help message and exit
  --d_1 D_1   The path of the first database
  --d_2 D_2   The path of the second database
  --show_id   whether to show the id of overlapped scenarios