(0 comments)

Arxiv: https://arxiv.org/abs/1806.01709

Paper lays out a simulation which performs generalization from one Atari game to another. Atari frames are parsed by an image preprocessor, and the output of that preprocessor is fed to DORA. DORA is hooked into the game via OpenAI Gym, and given access to the score as a value function. DORA surpasses human-like performance in a relatively small number of trials with 1 game, then generalizes to another game. Competing models (e.g., DQNs as in Mnih et al 2015, DNNs as in LeCun et al 2015) need more trials, and then fail to generalize to game 2. Once performance in game 2 is reached, competing models are unable to return to game 1 without a similar retraining phase, whereas DORA can move fluently from game 1 to game 2 or back, showing human-like performance in first shot and exceeding it in later trials.

There are two novel results in this paper:

1) The way that similarity & magnitude computation is handled has only appeared in conference proceedings to date. In short, you rely on the direct mapping of number of active neurons coding for larger magnitudes, activate two laterally-inhibiting units which conjunctively code for the pattern of activation against two things which share at least one magnitude dimension, and learn the settling pattern. This extends the current DORA work by learning connections to patterns of activity rather than patterns of activation - so you code for, like, fast settling with one unit active, fast settling with the other unit active, coactivation. It's a subtle (but possibly confusing) extension. This replaces the clumsier metric array module from LISA, and builds on earlier work in the DORA lab to provide a principled source for magnitude and similarity representations.

2) Using DORA to play video games is novel, and showing human-like performance on a novel (albeit similar) videogame is unprecendented. While other models might not need the image preprocessing DORA depends on, DORA generalizes to entire schemas which were outside the training space.

The paper is dense - the learning algorithm isn't fully specified until supplementary materials, the invariance learned for magitude/similarity has an underlying different process (settling vs. activation for LISAese object/role representations), acronyms and terms are introduced before being defined. It assumes a lot of the reader - even as a DORA lab member, I had to carefully reread several sections to ensure I could follow the argument and explication. I also feel that the labeling of processes for the purposes of explication can confuse people - anything from labeling stuff in figures (e.g., more-x, more-height) to explaining stuff in terms of what DORA does. It seems like making the computational framework more explicit and attributing actions to processes rather than to the framework might make that easier to parse.

Currently unrated

Comments

There are currently no comments

New Comment

required

required (not published)

optional