Dynamic batch-sizing for dataset with vastly diverse sequence lengths

Making an efficient use of GPU compute

Published in

ML@ABEJA

5 min readOct 25, 2023

While training and evaluating models, using a fixed-sized inputs and batch sizes is usual. However, in the domain of sequence tasks using transformer models, input sizes may fluctuate across samples. This variability poses a challenge and can lead to underutilization of GPU resources resulting in slower training. In this blog, we will show two strategies to tackle this issue. But before that, let’s understand the problem better:

Dynamic batch-sizing for dataset with vastly diverse sequence lengths

Making an efficient use of GPU compute

Written by Anuj Arora