Raw data preprocess#

Typical Scenario#

  • Real-time or offline trajectory data production;

  • Update and test the algorithms used in the trajectory pre-processing module;

  • Develop data adapter for another data source;

  • Improve the OSM map parse procedure;

Step 0: Install additional dependencies#

If the application is related the parse OSM data or doing the map-matching, you need to install the additional packages. This step may be complicated and problematic, so we write another guide to help you finish this. Please refer to this page Dependencies for preprocessing.

Step 1: Create the configure file for your application#

The command line tool create_config_file can create the configuration file with arguments, which is provided by mtldp.utils and is automatically installed when installing the library:

(mtldp) $ create_config_file --help

You can also create the configuration file by using this script interactively. Notice that the value in the square brackets is the default value, you can apply them by hitting the Enter key. The difference here is that we add a mode command line option with value preprocess to start this tool.

(mtldp) $ create_config_file --mode preprocessing
(mtldp) $ Region ID: my_region
(mtldp) $ ...

Once you create the configuration file successfully, you will see this in the console:

Configuration file generated at: "configs/my_region.json"

Step 3: Download the raw data as input for your application#

Map data: Please refer to this guide Preparation of Signal Phase and Timing (SPaT) data. The downloaded OSM data should name map.osm and put in the ${raw_dir}/${region_id}/raw_trajs directory.

Trajectory data: Please contact henryliu@umich.edu to get the Google Drive link to the raw trajectory data. The downloaded files should first be unzipped and put in the ${raw_dir}/${region_id}/raw_trajs directory.

SPaT data: Please contact henryliu@umich.edu to get the Google Drive link to the raw SPaT data. The downloaded files should first be unzipped and put in the ${raw_dir}/${region_id}/raw_spat directory.

Step 4: Edit the template files for overwriting the date in the process#

Currently, only map data can be overwritten when parsing the OSM data.

In the ${raw_dir}/${region_id}/raw_map directory, we have four CSV files for overwriting the attributes of the node, link, segment, and movement respective. For each file, they follow the same pattern:

  • The first row is the header;

  • The header should always be ${type}_id,${attribute_1_name},${attribute_2_name},… , and the ${attribute_1_name} is the name of the attribute that you want to overwrite;

  • Each of the following rows is the overwritten detail of one element: ${id},${attribute_1_value},${attribute_1_value},…, and the ${attribute_1_value} is the value used to overwrite attribute_1.

Step 6: Build the traffic network data#

You can use the command line tool build_traffic_network to build the standard network.

Here is an example of using this tool:

(mtldp) $ build_traffic_network -c configs/my_region.json

If you want to add some extra process, you can also build your network from scratch with functions in mtldp.preproc.build_network.

Here is an example of using this function:

from mtldp.preproc.build_network import build_network
from mtldp.utils.config import ProdRegion

def main():
    region = ProdRegion('configs/my_region.json')
    region.network = build_network(region_id=region.region_id,
                                   city_id=region.city_id,
                                   osm_file_path=region.input.osm_path,
                                   arterial_file_path=region.input.arterial_path,
                                   overwrite_node_path=region.input.overwrite_node_path,
                                   overwrite_movement_path=region.input.overwrite_movement_path,
                                   overwrite_segment_path=region.input.overwrite_segment_path,
                                   overwrite_link_path=region.input.overwrite_link_path,
                                   overwrite_json_path=region.output.overwrite_json_path,
                                   filtered_osm_file_path=region.output.filtered_osm_path,
                                   logger_dir=region.output.network_dir,
                                   shp_file_dir=region.output.shapefile_dir)
    # manipulate the network object
    ...

    # save the network object
    dump_traffic_network_to_pickle(region.network, region.output.network_pickle_path)

Step 7: Parse the SPaT data (optional)#

You can use the command line tool parse_spat_data to parse the CSV SPaT data.

Here is an example of using this tool:

(mtldp) $ parse_spat_data -c configs/my_region.json

Step 8: Process the trajectory data#

There are three procedures in the trajectory data process: - Split the raw trajectory data into date-based files; - Map-matching; - Trajectory-based traffic index calculation.

First, you need to split the raw trajectory data into date-based files. You can use the command line tool split_trajs to do this.

(mtldp) $ split_traj_data -c configs/my_region.json -s 2023-10-01 -e 2023-10-07

Then, you can use the command line tool match_trajs_to_map to do the map-matching.

(mtldp) $ match_trajs_to_map -c configs/my_region.json -s 2023-10-01 -e 2023-10-07

Finally, you can use the command line tool calc_traj_index to calculate the traffic index.

(mtldp) $ calc_traj_index -c configs/my_region.json -s 2023-10-01 -e 2023-10-07

Tips: All the command line tools have the --help option, you can use it to get more information about the parameters.