Working with global statistics instead of per-batch statistics

In my earlier article, I discussed implementation of importance sampling, based on per-batch statistics. There, a sample with loss value in the top nth-percentile of its corresponding batch was filtered for training.

Now, the shortcoming of the above approach is that it is possible that most batches contains only simple samples. Even if we filter the batch, the filtered samples are still simple enough for the model. Therefore, a filtering scheme contingent on individual batch statistics is unable to fully exploit the benefits of importance sampling.

With that thought, I wanted to employ whole dataset dependent statistics to filter training…

Anuj Arora

Budding AI Researcher

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store