Learning From Human Feedback: Ranking, Bandit, And Preference Optimization