CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction