Mukhopadhyay, SnehasisSachan, MohitRaje, RajeevAl Hasan, MohammadFang, Shiaofen2013-08-212013-08-212013-08-21https://hdl.handle.net/1805/3451http://dx.doi.org/10.7912/C2/2304Indiana University-Purdue University Indianapolis (IUPUI)Learning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need to address a number of realistic problems. A number of methods exist for learning in POMDPs, but learning with limited amount of information about the model of POMDP remains a highly anticipated feature. Learning with minimal information is desirable in complex systems as methods requiring complete information among decision makers are impractical in complex systems due to increase of problem dimensionality. In this thesis we address the problem of decentralized control of POMDPs with unknown transition probabilities and reward. We suggest learning in POMDP using a tree based approach. States of the POMDP are guessed using this tree. Each node in the tree has an automaton in it and acts as a decentralized decision maker for the POMDP. The start state of POMDP is known as the landmark state. Each automaton in the tree uses a simple learning scheme to update its action choice and requires minimal information. The principal result derived is that, without proper knowledge of transition probabilities and rewards, the automata tree of decision makers will converge to a set of actions that maximizes the long term expected reward per unit time obtained by the system. The analysis is based on learning in sequential stochastic games and properties of ergodic Markov chains. Simulation results are presented to compare the long term rewards of the system under different decision control algorithms.en-USPartially Observable Markov Decision ProcessesLearning in POMDPLearning automata treePOMDPComputer programmingData structures (Computer science)Stochastic systems -- ResearchGame theory -- Mathematical modelsSequences (Mathematics)Markov processesDecision making -- Simulation methodsUser interfaces (Computer systems)Learning in Partially Observable Markov Decision Processes