TrainingarXiv cs.AI — 7 d ago

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

A new reinforcement learning system has been deployed at DoorDash for adapting dispatch objective weights in a three-sided marketplace, utilizing delayed operational feedback like delivery speed and courier utilization. The system employs a store-level policy that selects a discrete multiplier to optimize the tradeoff between delivery quality and batching efficiency, trained using centralized offline data with Double Q-learning and a conservative regularizer to mitigate value overestimation. This approach demonstrates the potential for safely adapting decision policies in real-time using feedback from complex economic and logistics environments, enhancing batching and reducing costs without compromising delivery quality.

reinforcement-learningmarketplacefeedbackrelevance 0.00 · engagement 0.00

Read at source ↗← all news