Raw data preprocess#
Typical Scenario#
Real-time or offline trajectory data production;
Update and test the algorithms used in the trajectory pre-processing module;
Develop data adapter for another data source;
Improve the OSM map parse procedure;
Step 0: Install additional dependencies#
If the application is related the parse OSM data or doing the map-matching, you need to install the additional packages. This step may be complicated and problematic, so we write another guide to help you finish this. Please refer to this page Dependencies for preprocessing.
Step 1: Create the configure file for your application#
The command line tool create_config_file
can create the
configuration file with arguments, which is provided by mtldp.utils
and is automatically installed when installing the library:
(mtldp) $ create_config_file --help
You can also create the configuration file by using this script
interactively. Notice that the value in the square brackets is the
default value, you can apply them by hitting the Enter
key. The
difference here is that we add a mode
command line option with value
preprocess
to start this tool.
(mtldp) $ create_config_file --mode preprocessing
(mtldp) $ Region ID: my_region
(mtldp) $ ...
Once you create the configuration file successfully, you will see this in the console:
Configuration file generated at: "configs/my_region.json"
Step 3: Download the raw data as input for your application#
Map data: Please refer to this guide Preparation of Signal Phase and Timing (SPaT) data. The downloaded OSM data should
name map.osm
and put in the ${raw_dir}/${region_id}/raw_trajs
directory.
Trajectory data: Please contact henryliu@umich.edu to get the Google
Drive link to the raw trajectory data. The downloaded files should first
be unzipped and put in the ${raw_dir}/${region_id}/raw_trajs
directory.
SPaT data: Please contact henryliu@umich.edu to get the Google Drive
link to the raw SPaT data. The downloaded files should first be unzipped
and put in the ${raw_dir}/${region_id}/raw_spat
directory.
Step 4: Edit the template files for overwriting the date in the process#
Currently, only map data can be overwritten when parsing the OSM data.
In the ${raw_dir}/${region_id}/raw_map
directory, we have four CSV
files for overwriting the attributes of the node, link, segment, and
movement respective. For each file, they follow the same pattern:
The first row is the header;
The header should always be
${type}_id,${attribute_1_name},${attribute_2_name},…
, and the${attribute_1_name}
is the name of the attribute that you want to overwrite;Each of the following rows is the overwritten detail of one element:
${id},${attribute_1_value},${attribute_1_value},…
, and the${attribute_1_value}
is the value used to overwriteattribute_1
.
Step 6: Build the traffic network data#
You can use the command line tool build_traffic_network
to build the standard network.
Here is an example of using this tool:
(mtldp) $ build_traffic_network -c configs/my_region.json
If you want to add some extra process, you can also build your network
from scratch with functions in mtldp.preproc.build_network
.
Here is an example of using this function:
from mtldp.preproc.build_network import build_network
from mtldp.utils.config import ProdRegion
def main():
region = ProdRegion('configs/my_region.json')
region.network = build_network(region_id=region.region_id,
city_id=region.city_id,
osm_file_path=region.input.osm_path,
arterial_file_path=region.input.arterial_path,
overwrite_node_path=region.input.overwrite_node_path,
overwrite_movement_path=region.input.overwrite_movement_path,
overwrite_segment_path=region.input.overwrite_segment_path,
overwrite_link_path=region.input.overwrite_link_path,
overwrite_json_path=region.output.overwrite_json_path,
filtered_osm_file_path=region.output.filtered_osm_path,
logger_dir=region.output.network_dir,
shp_file_dir=region.output.shapefile_dir)
# manipulate the network object
...
# save the network object
dump_traffic_network_to_pickle(region.network, region.output.network_pickle_path)
Step 7: Parse the SPaT data (optional)#
You can use the command line tool parse_spat_data
to parse the CSV SPaT data.
Here is an example of using this tool:
(mtldp) $ parse_spat_data -c configs/my_region.json
Step 8: Process the trajectory data#
There are three procedures in the trajectory data process: - Split the raw trajectory data into date-based files; - Map-matching; - Trajectory-based traffic index calculation.
First, you need to split the raw trajectory data into date-based files.
You can use the command line tool split_trajs
to do this.
(mtldp) $ split_traj_data -c configs/my_region.json -s 2023-10-01 -e 2023-10-07
Then, you can use the command line tool match_trajs_to_map
to do the map-matching.
(mtldp) $ match_trajs_to_map -c configs/my_region.json -s 2023-10-01 -e 2023-10-07
Finally, you can use the command line tool calc_traj_index
to calculate the traffic index.
(mtldp) $ calc_traj_index -c configs/my_region.json -s 2023-10-01 -e 2023-10-07
Tips:
All the command line tools have the --help
option, you can use it to get more information about
the parameters.