Study and Analysis of DeepFashion2 Dataset for the E-commerce industry

Author(s): Vedansh Surjan, Prateek Khandelwal

Abstract

With the rapid growth of E-commerce and increase in application of Artificial intelligence within the fashion and retail domain, demand for fashion Image datasets have been felt in the market. In recent years fashion datasets, used in the public domain are primarily Fashion-Ai, FashionGen, DeepFashion, DeepFashion2. Among these datasets, DeepFashion2 is the most extensive dataset, with rich annotations and a large dataset collected partially from DeepFashion and partially from the online fashion retail stores. This dataset contains more than 491,000 images consisting of 801,000 clothing items divided into 13 categories. The annotations for each clothing item in the training and validation set include bounding box points, landmark points, scale, occlusion, zoom-in, viewpoint and category name.

Through our analysis, we have highlighted various errors in the DeepFashion2 dataset. Up until 2019 only half of the dataset was released, which contained a labelled dataset of only 191,000 images for training and 52,000 for validation. In the course of this analysis a random subset of data was evaluated. We manually checked 5,000 images and found 20% of them have annotation errors and hence have classified the errors in different categories. We have trained a SSD-Mobilenet and shown a gain in mAP (mean average precision) on cleaned dataset compared to original dataset.

Picture of Association of Data Scientists

Association of Data Scientists

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

Subscribe to our Newsletter