Exploring data-efficient multi-modal learning in computer vision