On the Instability of Softmax Attention-Based Deep Learning Models in Side-Channel Analysis

Dimensions

Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep (2024) On the Instability of Softmax Attention-Based Deep Learning Models in Side-Channel Analysis IEEE Transactions on Information Forensics and Security, 19 . pp. 514-528. ISSN 1556-6013

Full text not available from this repository.

Official URL: https://doi.org/10.1109/TIFS.2023.3326667

Related URL: http://dx.doi.org/10.1109/TIFS.2023.3326667

Abstract

In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.

Item Type:	Article
Source:	Copyright of this article belongs to IEEE.
Keywords:	Side-channel Analysis; Deep Learning; Softmax Attention; Multi-head Attention
ID Code:	142797
Deposited On:	24 Jun 2026 07:14
Last Modified:	24 Jun 2026 07:14

Repository Staff Only: item control page

PlumX Metrics