Přejít k hlavnímu obsahu
top

Bibliografie

Research Report

Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies

Kárný Miroslav, Hůla František

: ÚTIA AV ČR, v.v.i, (Praha 2018)

: Research Report 2376

: GA16-09848S, GA ČR, GA18-15970S, GA ČR

: Exploitation, Exploration, Bayesian estimation, Adaptive systems, Fully probabilistic design, Kullback-Leibler divergence, Decision policy, Markov decision process

: http://library.utia.cas.cz/separaty/2018/AS/karny-0495875.pdf

(eng): Adaptive decision making learns an environment model serving a design of a decision policy. The policy-generated actions influence both the acquired reward and the future knowledge. The optimal policy properly balances exploitation with exploration. The inherent dimensionality\ncurse of decision making under incomplete knowledge prevents the realisation of the optimal design.

: BC

: 10201