Learning From Pixels: Image-Centric State Representation Reinforcement Learning For Goal Conditioned Surgical Task Automation