Task-Oriented Visual Understanding For Scenes And Events