CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionShare on Twitter Facebook LinkedIn Previous Next