update citation JOSS | min_spacing

UTEL-UIUC · Mar 7, 2024 · 8e92735 · 8e92735
1 parent 14e00b3
commit 8e92735
Show file tree

Hide file tree

Showing 4 changed files with 25 additions and 14 deletions.
diff --git a/gtfs_segments/utils.py b/gtfs_segments/utils.py
@@ -75,24 +75,28 @@ def plot_hist(
 
 
 def summary_stats(
-    df: pd.DataFrame, max_spacing: float = 3000, export: bool = False, **kwargs: Any
+    df: pd.DataFrame, max_spacing: float = 3000, min_spacing: float = 10, export: bool = False, **kwargs: Any
 ) -> pd.DataFrame:
     """
-    It takes in a dataframe, and returns a dataframe with summary statistics
+    It takes in a dataframe, and returns a dataframe with summary statistics.
+    The max_spacing and min_spacing serve as threshold to remove outliers.
 
     Args:
       df: The dataframe that you want to get the summary statistics for.
+      max_spacing: The maximum spacing between two stops. Defaults to 3000[m]
+      min_spacing: The minimum spacing between two stops. Defaults to 10[m]
       export: If True, the summary will be exported to a csv file. Defaults to False
 
     Returns:
       A dataframe with the summary statistics
     """
     print("Using max_spacing = ", max_spacing)
+    print("Using min_spacing = ", min_spacing)
     percent_spacing = round(
         df[df["distance"] > max_spacing]["traversals"].sum() / df["traversals"].sum() * 100,
         3,
     )
-    df = df[df["distance"] <= max_spacing]
+    df = df[(df["distance"] <= max_spacing) & (df["distance"] >= min_spacing)]
     seg_weighted_mean = (
         df.groupby(["segment_id", "distance"]).first().reset_index()["distance"].mean()
     )

diff --git a/paper/heatmap_and_histogram.pdf b/paper/heatmap_and_histogram.pdf
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -1,11 +1,18 @@
-@misc{devunuri2023bus,
-  title         ={ {Bus Stop Spacings Statistics: Theory and Evidence}},
-  author        = {Devunuri, Saipraneeth and Qiam, Shirin  and Lehe, Lewis  and Pandey, Ayush  and Monzer, Dana },
-  year          = {2023},
-  eprint        = {2208.04394},
-  archiveprefix = {arXiv},
-  primaryclass  = {stat.ME},
-  doi           = {10.48550/arXiv.2208.04394}
+@article{Devunuri2024,
+  title = {Bus Stop Spacing Statistics: {{Theory}} and Evidence},
+  shorttitle = {Bus Stop Spacing Statistics},
+  author = {Devunuri, Saipraneeth and Lehe, Lewis J. and Qiam, Shirin and Pandey, Ayush and Monzer, Dana},
+  year = {2024},
+  month = jan,
+  journal = {Journal of Public Transportation},
+  volume = {26},
+  pages = {100083},
+  issn = {1077-291X},
+  doi = {10.1016/j.jpubtr.2024.100083},
+  url = {https://www.sciencedirect.com/science/article/pii/S1077291X24000031},
+  urldate = {2024-03-07},
+  abstract = {Discussions of bus stop consolidation sometimes refer to average stop spacings, but there are no reliable statistics about spacings, nor methodologies for calculating them. This paper aims to clarify discussions of bus stop spacings by introducing clear definitions, a methodology for creating statistics from General Transit Feed Specification (GTFS) files, and a python package, gtfs-segments, which splits bus networks into isolated `segments.' With the package, we calculate national-level statistics from 539 US transit providers and 83 Canadian providers, as well as agency-level statistics for 30 providers in the US, 10 in Canada, and a sample of 38 providers from other countries. Our estimates of US and Canadian mean spacings are both around 350~m (slightly wider than five stops per mile). US spacings are wider than sometimes claimed but narrower than those in other countries. Finally, the paper gives examples of metrics created by combining GTFS with data from other sources and proposes research ideas and applications to transit planning involving fine-grained stop spacing data.},
+  keywords = {Bus stop,GTFS,Public Transit,Stop Spacings,Transit Planning}
 }
 
 @misc{devunuri2023chatgpt,

diff --git a/paper/paper.md b/paper/paper.md
@@ -32,7 +32,7 @@ The GTFS Segments (gtfs-segments) library is an open-source Python toolkit for c
 
 The choice of bus stop spacing involves a tradeoff between accessibility and speed: wider spacings mean passengers must travel farther to/from stops, but they allow the bus to move faster [@Wu2022]. Many US transit agencies have recently carried out *stop consolidation* campaigns that systematically remove stops, due partly to the perception US stop spacings are much narrower than those abroad. However, there are no reliable data sources to obtain current stop spacings despite the wide adoption of General Transit Feed Specification (GTFS) [@Voulgaris2023Predictors], because GTFS does not include data on stop spacings directly. Spacings must be computed from route shape geometries, stop locations, and stop sequences. A challenge is that stop locations are not placed on top of route shapes and therefore must be somehow projected onto the route's `LINESTRING`. To make spacings available for analysis, `gtfs-segments` use k-dimensional spatial trees and k-nearest neighbor heuristics to snap stops to routes and divide routes into segments for computation of spacings, as described below.
 
-`gtfs-segments` was designed for researchers, transit planners, students and anyone interested in bus networks. The package has been used in several scholarly articles [@devunuri2023bus; @devunuri2023chatgpt; @lehe4135394bus] and to create databases of spacings for over 550 agencies in the US [@DVN/SFBIVU_2022] and 80 agencies in Canada [@DVN/QFTAPM_2023]. Several transit agencies, such as Regional Transportation District Denver (RTD- Denver), have used the package to visualize the effects of their bus stop consolidation efforts. Filtering functions allow the user to explore datasets, identify errors and compute specialized statistics.
+`gtfs-segments` was designed for researchers, transit planners, students and anyone interested in bus networks. The package has been used in several scholarly articles [@Devunuri2024; @devunuri2023chatgpt; @lehe4135394bus] and to create databases of spacings for over 550 agencies in the US [@DVN/SFBIVU_2022] and 80 agencies in Canada [@DVN/QFTAPM_2023]. Several transit agencies, such as Regional Transportation District Denver (RTD- Denver), have used the package to visualize the effects of their bus stop consolidation efforts. Filtering functions allow the user to explore datasets, identify errors and compute specialized statistics.
 
 # Functionality
 
@@ -50,7 +50,7 @@ The fundamental unit of analysis used by `gtfs-segments` is the *segment*, which
 
 `gtfs-segments` overcomes these challenges by increasing the route resolution (i.e., adding points in between geo-coordinates), using spatial k-d trees, and using more than one nearest neighbor. The increase in resolution allows stops to be snapped to nearby points. Using k-d trees reduces the time complexity to $O(nlog(m))$ and makes it possible to compare among several snapping points without added computation. \autoref{fig:interpolate} shows an example where initially snapping to the nearest point produces out-of-order stops (3/4/2) and stop 5 is snapped far away from its location. Increasing the resolution (second panel) fixes 5's location problem but the ordering problem persists. By using `k=3` nearest neighbors, we find a proper ordering (last panel). Once every stop has been snapped to a geo-coordinate on the route shape, the shape is segmented between stops and each segment's geometry is stored in a GeoDataFrame.
 
-![Improvement in snapping due to an increase in resolution and suing k-nearest neighbors.\label{fig:interpolate}. Adapted from "Bus Stop Spacings Statistics: Theory and Evidence" [@devunuri2023bus]](interpolation.jpg)
+![Improvement in snapping due to an increase in resolution and suing k-nearest neighbors.\label{fig:interpolate}. Adapted from "Bus Stop Spacings Statistics: Theory and Evidence" [@Devunuri2024]](interpolation.jpg)
 
 Packages such as `gtfs2gps` [@pereira2023exploring] and `gtfs_functions` [@Toso2023] also compute segments. In addition to its snapping algorithm, visualization, download, and statistical functionalities, `gtfs-segments` is distinguished from those in two ways. First, it has a faster processing rate[^1] to compute segments both with and without parallel processing (see Table 1). Second, `gtfs-segments` is tolerant to deviations from GTFS standards. For example, because the Chicago Transit Authority does not have an agency_id in its routes.txt, `gtfs2gps` fails to read it even though this field is not needed for obtaining segments.
 
@@ -67,7 +67,7 @@ The package can create maps of stops and segments (with basemap), including inte
 
 ## Calculating stop spacing summary statistics
 
- Discussions about stop spacings, commonly include statistical metrics such as means and medians, used to spacings between different agencies or track changes within an agency over time. `gtfs-segments` can produce weighted mean, median, and standard deviations for an agency, using different weighting systems (e.g., weighting segments by the number of times a bus traverses it or the number of routes that include it) as outlined by @devunuri2023bus. For each route, `gtfs-segments` can give metrics such as mean spacing, headways, speeds, number of buses in operation and route lengths.
+ Discussions about stop spacings, commonly include statistical metrics such as means and medians, used to spacings between different agencies or track changes within an agency over time. `gtfs-segments` can produce weighted mean, median, and standard deviations for an agency, using different weighting systems (e.g., weighting segments by the number of times a bus traverses it or the number of routes that include it) as outlined by @Devunuri2024. For each route, `gtfs-segments` can give metrics such as mean spacing, headways, speeds, number of buses in operation and route lengths.
 
 # Acknowledgments