Dynamic batch-sizing for dataset with vastly diverse sequence lengths
Making an efficient use of GPU compute
Published in
5 min readOct 25, 2023
While training and evaluating models, using a fixed-sized inputs and batch sizes is usual. However, in the domain of sequence tasks using transformer models, input sizes may fluctuate across samples. This variability poses a challenge and can lead to underutilization of GPU resources resulting in slower training. In this blog, we will show two strategies to tackle this issue. But before that, let’s understand the problem better: