Posterior Attention Models for Sequence to Sequence Learning

Shankar, Shiv ; Sarawagi, Sunita (2019) Posterior Attention Models for Sequence to Sequence Learning In: ICLR 2019 Conference.

[thumbnail of posterior_attention_models_for.pdf]

PDF
349kB

Abstract

Modern neural architectures critically rely on attention for mapping structured inputs to sequences. In this paper we show that prevalent attention architectures do not adequately model the dependence among the attention and output tokens across a predicted sequence. We present an alternative architecture called Posterior Attention Models that after a principled factorization of the full joint distribution of the attention and output variables, proposes two major changes. First, the position where attention is marginalized is changed from the input to the output. Second, the attention propagated to the next decoding stage is a posterior attention distribution conditioned on the output. Empirically on five translation and two morphological inflection tasks the proposed posterior attention models yield better BLEU score and alignment accuracy than existing attention models.

Item Type:	Conference or Workshop Item (Paper)
Keywords:	posterior inference; attention; seq2seq learning; translation
ID Code:	128325
Deposited On:	19 Oct 2022 09:04
Last Modified:	14 Nov 2022 07:36

Repository Staff Only: item control page