Dynamic batch-sizing for dataset with vastly diverse sequence lengths

Making an efficient use of GPU compute

Anuj Arora
ML@ABEJA
Published in
5 min readOct 25, 2023

--

While training and evaluating models, using a fixed-sized inputs and batch sizes is usual. However, in the domain of sequence tasks using transformer models, input sizes may fluctuate across samples. This variability poses a challenge and can lead to underutilization of GPU resources resulting in slower training. In this blog, we will show two strategies to tackle this issue. But before that, let’s understand the problem better:

--

--