Tracking objects and distinguishing their states by watching egocentric videos