Study and Analysis of DeepFashion2 Dataset for the E-commerce industry

Author(s): Vedansh Surjan, Prateek Khandelwal

Abstract

With the rapid growth of E-commerce and increase in application of Artificial intelligence within the fashion and retail domain, demand for fashion Image datasets have been felt in the market. In recent years fashion datasets, used in the public domain are primarily Fashion-Ai, FashionGen, DeepFashion, DeepFashion2. Among these datasets, DeepFashion2 is the most extensive dataset, with rich annotations and a large dataset collected partially from DeepFashion and partially from the online fashion retail stores. This dataset contains more than 491,000 images consisting of 801,000 clothing items divided into 13 categories. The annotations for each clothing item in the training and validation set include bounding box points, landmark points, scale, occlusion, zoom-in, viewpoint and category name.

Through our analysis, we have highlighted various errors in the DeepFashion2 dataset. Up until 2019 only half of the dataset was released, which contained a labelled dataset of only 191,000 images for training and 52,000 for validation. In the course of this analysis a random subset of data was evaluated. We manually checked 5,000 images and found 20% of them have annotation errors and hence have classified the errors in different categories. We have trained a SSD-Mobilenet and shown a gain in mAP (mean average precision) on cleaned dataset compared to original dataset.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Explore more from Association of Data Scientists

Become ADaSci Chapter Lead

As a chapter lead, you will have the opportunity to connect with fellow data professionals in your area, share knowledge and resources, and work together to advance the field of data science.