Abstract
In recent years’ video action recognition has become a hot research topic due to a wide range of applications such as video surveillance and video analytics. Large volumes of training data, which have more than 56,000 video sequences, are required for training a decent action recognition neural network. However, such large-scale video recordings, which are supposed to capture the dynamics of every action category, are not only prohibitively expensive to gather but also impractical. Training samples are few and rare (e.g. when the target action classes are not present in the current publicly available datasets). For the proposed frugal pipeline, 10% of labeled data is good enough for training an initial neural network to predict the labels for the unlabeled data. Initially, we started with only 10% of the data and passed it through the augmentation pipeline. Then, we used that data to train a neural network and used the best neural network weights to perform pseudo-labeling. All our unlabeled data was labeled during our experimentation at the 4th iteration of the data through our pipelines. It increased the model’s accuracy by 10\% on a dataset with no augmentation. This approach consists of two pipelines, the first is a data augmentation pipeline, and another is a pseudo labeling pipeline; types of augmentations can be changed according to the use case.