Abstract: Multi-target tracking is one of the key research fields of computer vision. Existing methods mainly focus on inferring robust and discriminative features for data association based on ...
Abstract: We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions. Existing 3D visual grounding tasks focus on localizing a unique ...