We are in the process of moving our companies websites onto the Azure platform. One of the challenges was to move image files out of the website project into blob storage. This week I have moved 150,000 of them.
One thing I keep banging on about is that your source code should not contain data. If it does every time you do a deployment you need to consider where these images are located and ensure you don’t overwrite or loose any. It also goes without saying that deployments of a few Mb are a lot quicker than deployments of 100s of Mb.
Azure blob storage also gives you advantages like distributing storage across multiple datacenters which would be impossible with traditional files on a server.
So now that we have established that this is a good idea lets look at how we could move large amounts of data. In my case all the filenames are stored in a SQL database so the plan of action was to simply loop through the files in the database, download from current storage (either locally or other cloud storage), upload to Azure and tidy up afterwards. Due to the number of images I am going to update the database and mark when a file has been processed so I can do the move over several days.
This is my code
.gist table { margin-bottom: 0; }
First of all I create a tmp folder in the root of my website if it doesn’t exist to store my images temporarily.
I then use an entity framework model to query the database that haven’t been moved, and I use the take() method to limit how many results I process. (I have been passing in 1000 at a time)
I then use a foreach loop over all these files to perform the following actions.
- Create additional subfolders if the filename variable stored in the database isn’t actually a filename but a filepath, note you will have to split filename and filepath which I haven’t included code for here.
- Download file from the original url and save into the temporary folder
- Upload to Azure
- Delete temporary file
- Update database giving a success or fail
Once the foreach is finished I commit the database changes and delete the temporary folder. I am sure there must be other ways to do this transfer but this was quick and easy to setup and now I have a copy of all the files in Azure storage so I can test out other issues with my website.
One last tip about how to schedule this code. I called the above code from a MVC controller and then wrote a Azure Function to call this code on a schedule.