Announcing the Data-Centric AI Competition: Revolutionizing Object Detection through Smart Data Curation

Jimmy Guerrero - Aug 1 - - Dev Community

Author: Harpreet Sahota (Hacker in Residence at Voxel51)

Image description

Are you ready to challenge the status quo in AI development? Voxel51 is thrilled to announce the first-ever Data-Centric AI competition on Hugging Face Spaces, focusing on the often-overlooked yet crucial aspect of AI: data curation.

Why This Competition Matters

"Data is the new oil."  

Yup that's true. But try putting oil in your gas tank and see what happens. You need gas to go, and you get gas by putting it through a refinery. And what better tool to refine your data than FiftyOne.

At Voxel51, we say "Data eats models for lunch." Why? Because the quality and quantity of your training data often matter more than the sophistication of your model architecture.

This competition is your chance to prove this principle and hone your skills in one of the most critical areas of AI development: data curation.

The Challenge: Optimize Data, Not Just Models

Your mission, should you choose to accept it, is to curate a subset of our provided dataset that achieves two seemingly contradictory goals:

  1. Reduce the overall size of the dataset

  2. Maintain or improve the performance of a YOLOv8m object detection model

This isn't just about deleting random images. It's about understanding which data points contribute most to model performance and which are redundant or even detrimental.

What You'll Be Working With

  • A dataset of 65,986 images
  • 43 object classes across categories like clothing, people, transportation, and more
  • FiftyOne, an open-source tool for dataset curation and analysis. Here’s a tutorial notebook on FiftyOne for you.
  • YOLOv8m model from the Ultralytics Model Zoo

Prizes

The Data-Centric Visual AI Challenge offers the following rewards for top performers:

  • First Place: $1,000 - The team or individual with the highest-performing solution will receive a cash prize of $1,000.
  • Second Place: Top Tier Community Swag Package - The runner-up will be awarded a top-tier community swag package, filled with exclusive merchandise and goodies.
  • Third Place: Mid-Tier Community Swag Package -    The third-place finisher will receive a mid-tier community swag package as recognition for their efforts.

The Rules of the Game

To keep the playing field level and focus on data curation skills, we've set some ground rules:

What You Can Do

  • Remove images or individual annotations
  • Fix labeling mistakes
  • Apply data augmentation techniques to existing images
  • Adjust model hyperparameters

What You Can't Do

  • Use external data
  • Add new annotations or object classes
  • Generate synthetic images

How You'll Be Judged

We've crafted a unique scoring metric that balances dataset size reduction with model performance:

Score = (mAP * log(N)) / N

Where mAP is the Mean Average Precision on our hidden test set, and N is the number of images in your curated dataset.

This metric encourages you to find a sweet spot between a small, efficient dataset and high model performance.

Timeline and Support

  • Competition Launch: August 1, 2024
  • Submission Period: September 1 - October 27, 2024
  • Results Announced: November 6, 2024

We're not leaving you to figure this out alone. We're offering:

Why You Should Participate

  1. Skill Development: Hone your data curation skills, a critical yet often undervalued aspect of AI development.
  2. Real-world Impact: The techniques you develop here can be applied to numerous AI projects, potentially revolutionizing how we approach model training.
  3. Community and Learning: Engage with fellow AI enthusiasts and learn from diverse approaches to the same challenge.
  4. Recognition: Showcase your skills on a global platform and potentially win recognition from industry leaders.

Ready to Dive In?

  1. Install FiftyOne: pip install -U fiftyone
  2. Load the dataset:
import fiftyone as fo
import fiftyone.utils.huggingface as fouh
dataset = fouh.load_from_hub("Voxel51/Data-Centric-Visual-AI-Challenge-Train-Set")
Enter fullscreen mode Exit fullscreen mode
  1. Start exploring and curating!

Remember, in this competition, less might actually be more. It's not about who has the biggest dataset or the most complex model. It's about who can create the most efficient, effective dataset for the task at hand.

Are you ready to prove that data truly eats models for lunch? Join us in this groundbreaking competition and be part of the data-centric AI revolution!

For full details, rules, and resources, visit our competition page on Hugging Face and the example submission repo.

If you have questions about the competition or need help with the instructions, join the #competitions channel in FiftyOne Community Slack...we are happy to help!

Let's make Visual AI a reality, one carefully curated dataset at a time!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player