Abstract:
How do we teach machines to do something that we can perform
reasonably well, but cannot easily express as a utility maximization problem? Can machines learn underlying utility of a domain
from many human demonstrations?
The goal of the field of Inverse Reinforcement Learning (IRL) is
to infer the crux of the goal of a domain from expert (human)
demonstrations. This thesis categorically surveys the current
IRL literature with a formal introduciton and motivation for
the problem. We discuss the central challenges of the domain
and expound upon how different algorithms deal with the
challenges. We propose an reformulation of the IRL problem
by including ranked set of trajectories of different levels of
expert capability and discuss how that might lead towards a
new set of algorithms in the field, motivated by some very
recently developed approaches. We conclude with discussing
some broad advances in the research area and possibilities for
further extension.