論文大綱︰ |
As a fundamental computer vision task, Multi-Object Tracking (MOT) aims to analyze motion patterns of dynamic targets in real-world scenarios, with widespread applications in traffic management, sports analytics,industrial monitering, and biological research. This study investigates the critical issue of target representation in Detection-Oriented Tracking (DOT) based MOT systems, where target representation transforms raw detections into compact descriptors encoding spatial, motion, and appearance attributes as the foundation for subsequent association processes.
However, current DOT approaches face significant challenges in handling representation ambiguities caused by appearance similarities and motion uncertainties, particularly under complex conditions involving viewpoint changes, partial occlusions, and crowded scenes with frequent close-range target interactions. While modern DOT approaches employ learning-based Re-IDentification~(Re-ID) features to distinguish targets with different appearances, these methods show notable limitations in groups with uniform appearance, cross-camera scenarios and long-term occlusion events. Moreover, most feature extractors are optimized for classification or Re-ID tasks rather than spatio-temporal association, leading to misaligned target representations especially under the aforementioned challenging environments.
To address these issues, this thesis proposes a series of methods that extend existing object representations by incorporating contextual cues behind detections, including partial keypoint features, scene dynamics, geometric relationships, and expanded temporal representations. Through comprehensive analysis of current limitations, our work provides both theoretical insights and practical solutions for improving tracking robustness, particularly for scenarios involving visually similar targets and persistent occlusion conditions. The proposed methods demonstrate significant performance improvements in complex real-world tracking situations.
|