Text this: Systematically working with multimodal data :