Abstract: Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating distinct modalities to improve the performance or robustness of the model.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results