Harmonizing Visual Representations for Unified Multimodal Understanding and Generation