A major issue in the development of reinforcement learning algorithms for autonomous vehicles is the need to make them reflect the preferences of travelers better. One of the ways to do that is to incorporate user schedule preferences under reliability-based route selection criteria into the learning mechanism. This work led by Jinkai Zhou investigates the potential of such an integration. We used data collected from queries from Google Maps to mimic airport shuttle services to train a multi-armed bandit algorithm to see how it is impacted by the consideration of on-time arrival reliability. The work was funded by NSF CMMI-1652735.
https://journals.sagepub.com/doi/full/10.1177/0361198119850457