Task Order 5320
Traffic Operations Research
Highway Traffic Data Sensitivity Analysis
Mike Mauch
California PATH
Benjamin Coifman
Ohio State University
Summary
Highway traffic data are used for diverse applications spanning from real-time operations, e.g., traffic responsive ramp metering, to offline planning applications, e.g., reporting for the Highway Performance Monitoring System (HPMS). Many studies have revealed detection errors from the existing surveillance infrastructure, including several by our group, many of which were supported by PATH. A special sub-group of detectors worth noting are the non-invasive detectors that can be installed on the wayside, without obstructing lanes. By virtue of the greater distance from the vehicles and occlusions across lanes, these detectors normally do not perform as well as functioning loop detectors.
The response to these findings has typically been to either live with the errors or retreat to specifications, discarding the sensors and possibly the data. In recent years, a third alternative has received attention in the literature, cleaning the data by identifying statistically anomalous measurements.
But the fundamental question about how good data need to be has gone un-answered. An error of 20% is undesirable, but perhaps even given the uncertainty in the measurement, the response would be the same, e.g., the difference between 10 mph and 12 mph. Furthermore, real variability in the data may outweigh the impacts of measurement errors.
From an operations standpoint the most important function of a traffic surveillance system is determining reliably whether the facility is free flowing or congested. The second most important function is responding rapidly when the facility becomes congested. The main impediment is differentiating between a noisy sample that erroneously suggests that conditions may have changed and an accurate sample accurately shows changing conditions. These errors are difficult to identify in real-time, but many become evident in time series data (e.g., a 10 mph sample recorded between two 60 mph samples). In planning applications, count or volume is the most important metric. Errors in count are difficult to detect automatically, though vehicle conservation can be used in some cases. As discussed in the next section, the analysis will seek ways to relax "difficult to maintain" detector specifications without sacrificing performance or accuracy.
Methodology
There are several common vehicle detectors, each with a different performance range. Similarly, there are numerous applications that use detector data, each with their own specific needs. A practical application should be robust to some noise and it is impractical to demand completely perfect detection. Ultimately, this trade-off has direct impacts on Caltrans ability to monitor accurately the roadways as well as the costs to do so. It is one objective of the proposed research to find a balance between performance and cost. Of course the costs are not simply monetary, e.g., a longer sampling period will generally reduce measurement noise but degrade response time.
The proposed research will develop a methodology to establish the data needs for any given application and then apply this methodology to the two applications listed in the RFP: ramp metering and identifying the onset of congestion. It is recognized that a practical solution cannot ignore the large fixed investment in the existing detection infrastructure, nor should it lock future investments to past standards. As such the research will consider both typical detector deployments found in California, most of which provide data to PeMS, and how the situation changes if one were to deploy new detectors (loops or otherwise) in the course of maintenance or larger projects.
Identify Control Conditions: The first task of this research is identifying precisely what the needs of Caltrans are. We will work with Caltrans engineers to establish the input and response of the two core applications, ramp metering and identifying the onset of congestion. These interviews and discussions will leverage our existing ties with engineers in Caltrans Districts 3, 4, and Headquarters. We will also pursue additional contacts provided by Caltrans and specifically our project manager. In this work we will identify the various implementations of the two applications across California and where applicable, other applications that build off of them, e.g., incident detection or incident verification that need to identify the onset of congestion. It is anticipated that most of these deployments will be documented, though some will not be. For the latter, the research will add the necessary documentation as needed.
The focus of this task will be to identify the control or decision algorithms, or more precisely, the input needed for these algorithms so that we can establish the algorithms' sensitivity to input noise in later tasks. The set of parameters in these algorithms may be discrete, continuous, or a combination. For example, traffic responsive ramp metering may be limited to specific hours (a discrete control parameter) and may be proportional to measured occupancy (a continuous control parameter).
Establish the Frequency of Measurement Errors: For this task we will process data from PeMS and BHL to quantify the frequency, severity and impact of measurement errors. The work will examine the noise patterns in the data, making use of both manual verification and automated analysis, to quantify the extent that the data are noisy.
Most of Caltrans' operational traffic detectors now feed into the PeMS database. The proposed research will sample these measurements from across the state to identify the frequency and severity of errors. The analysis will employ a combination of techniques including the identification of statistically anomalous measurements. There are other errors that are within the normal statistical distribution for individual measurements, but become evident when examining the time series data at a single station, e.g., one low speed measurement bounded by two free flow measurements. Information from adjacent lanes or successive detector stations on the same facility should usually be consistent and will also be used to identify potential problems in the data stored in the PeMS database, e.g., the conservation of vehicles on a freeway segment without ramps.
Some errors could still pass all of the above tests, e.g., if all detectors over-count the number of vehicles by a small percentage. So the research will also employ manual verification to ensure detectors are working properly and to find additional features observable in aggregated data that are indicators of problems. This effort will compare measured data to concurrently collected video. The work will include such a data set collected in the BHL during the summer of 2003. This task will likely include additional field data collection to address issues that become evident in the analysis.
Establish the Sensitivity to Measurement Errors: The first two tasks seek to establish what is needed and what is available, respectively. This third task addresses the fundamental question about how good data need to be, e.g., variability in the real traffic flow conditions may outweigh the impacts of measurement errors. It directly complements the previous task; seeking to identify the extent the system is prone to errors. Much of the work will be based on numerical experiments to establish the sensitivity of the algorithms from the first task to detection accuracy and precision. We will investigate the impacts of these problems independent of the detection technology. Comparisons will be made both in terms of absolute error and percent error throughout the range of feasible measurements.
Conventional practice employs fixed sample periods, typically ranging from 20 sec to 5 min and the research will specifically address this existing standard. The research will also, however, consider other possible sampling strategies, employing high resolution detector event data, providing the times of every vehicle passage, sampled at 60 Hz in the BHL and other locations. The event data can then be aggregated to any sampling period, facilitating direct comparisons between various alternatives. Employing this fact, we investigate whether the detector infrastructure is overbuilt for the two subject applications and determine what is the response time is in the presence of noisy measurements, e.g., having to wait one more sample period to eliminate the possibility of a transient measurement error.
Establish the Impacts of Measurement Errors: After quantifying the measurement errors, this analysis will explicitly examine their impacts on the two core applications. Systematic measurement errors and noisy data might have no significant impact on a given application or may render the application totally unreliable. For example, an algorithm that seeks to identify the onset of congestion using average vehicle speeds may function just fine in the presence of relatively large measurement errors of flow or vehicle count. By comparison, a density based algorithm that compares the difference in cumulative arrivals at successive detector stations to find some critical vehicular density and identify the onset of congestion are very sensitive to systematic over (or under) counting of vehicular flows. The output of either of these algorithms (speed-based or density-based) could serve as an input to ramp metering control algorithms, which, in turn, could be robust and largely insensitive to erroneous inputs or basically rendered ineffective. Other examples are equally relevant; detectors operating in pulse mode, as opposed to presence mode, may still record reliable vehicle counts while recorded occupancies would be invalid. In this situation speeds from dual loops may or may not be reliable. Likewise, an unexpectedly high proportion of long vehicles will bias speed estimates from single loop detectors, and incorrectly matched pulses at a dual loop detector can produce implausible speed measurements.
We will use empirical results, deterministic and stochastic queueing models, and micro-simulation to quantify the impacts of various levels of measurement errors on the core applications. This work will combine the results from the first three tasks to quantify and explain the impacts of errors on the core applications for each of the traffic sensing technologies, leading to better-informed decision making when choosing detection technologies and how the data should be sampled. It will also illuminate the impacts of choosing poorly, resulting in a "mismatch" between the available traffic sensing technologies and the core applications.
|