Learning in Partially Observable Markov Decision Processes

Sachan, Mohit

Learning in Partially Observable Markov Decision Processes

Files

Mohit_thesis_updated.pdf (816.47 KB)

Date

2013-08-21

Authors

Sachan, Mohit

Language

American English

Committee Chair

Mukhopadhyay, Snehasis

Committee Members

Raje, Rajeev
Al Hasan, Mohammad
Fang, Shiaofen

Degree

M.S.

Degree Year

2012

Department

Department of Computer and Information Science

Grantor

Purdue University

Abstract

Learning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need to address a number of realistic problems. A number of methods exist for learning in POMDPs, but learning with limited amount of information about the model of POMDP remains a highly anticipated feature. Learning with minimal information is desirable in complex systems as methods requiring complete information among decision makers are impractical in complex systems due to increase of problem dimensionality.

In this thesis we address the problem of decentralized control of POMDPs with unknown transition probabilities and reward. We suggest learning in POMDP using a tree based approach. States of the POMDP are guessed using this tree. Each node in the tree has an automaton in it and acts as a decentralized decision maker for the POMDP. The start state of POMDP is known as the landmark state. Each automaton in the tree uses a simple learning scheme to update its action choice and requires minimal information. The principal result derived is that, without proper knowledge of transition probabilities and rewards, the automata tree of decision makers will converge to a set of actions that maximizes the long term expected reward per unit time obtained by the system. The analysis is based on learning in sequential stochastic games and properties of ergodic Markov chains. Simulation results are presented to compare the long term rewards of the system under different decision control algorithms.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

Partially Observable Markov Decision Processes, Learning in POMDP, Learning automata tree, POMDP

LC Subjects

Computer programming, Data structures (Computer science), Stochastic systems -- Research, Game theory -- Mathematical models, Sequences (Mathematics), Markov processes, Decision making -- Simulation methods, User interfaces (Computer systems)

Rights

Permanent Link

https://hdl.handle.net/1805/3451
http://dx.doi.org/10.7912/C2/2304

Collections

Computer & Information Science Department Theses and Dissertations

Full item page