Training a Vision Transformer on Amazon SageMaker

In this series of three videos, I focus on training a Vision Transformer model on Amazon SageMaker.

In the first video, I start from the « Dogs vs Cats » dataset on Kaggle, and I extract a subset of images that I upload to S3. Then, using SageMaker Processing, I run a script that loads the images directly from S3 into memory, extracts their features using the Vision Transformer feature extractor, and stores them in S3 as Hugging Face datasets for image classification.

In the second video, I start from the image classification dataset that I prepared in the first video. Then, I download a pre-trained Vision Transformer from the Hugging Face hub, and I fine-tune it on my dataset, using a training script based on the Trainer API in the Transformers library.

In the third video, I start from the image classification dataset that I prepared in the first video. Then, I download a pre-trained base Vision Transformer from the Hugging Face hub, and I use PyTorch Lightning to append a classification layer to it. Finally, I train the model using the Trainer API in PyTorch Lightning.

Resources:

Vision Transformer paper: https://arxiv.org/abs/2010.11929
Dataset: https://www.kaggle.com/c/dogs-vs-cats/
Code: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/vision-transformer
More Hugging Face on SageMaker notebooks: https://github.com/huggingface/notebooks/tree/master/sagemaker

New to Transformers? Check out the Hugging Face course!