Aligning Bag of Regions for Open-Vocabulary Object Detection