Combining Azure and Lightning leads to more powerful logging

TL;DR: This post demonstrates how to connect PyTorch Lightning logging to Azure ML natively with ML Flow.

If you are new to Azure you can get started a free subscription using the link below.

Create your Azure free account today | Microsoft Azure

Azure ML and PyTorch Lighting

In my last post on the subject, I outlined the benefits of both PyTorch Lightning and Azure ML to simplify training deep learning models. If you haven’t yet check it out!

Training Your First Distributed PyTorch Lightning Model with Azure ML

Once you’ve trained your first distributed PyTorch Lighting model with Azure ML it is time to add logging.

Why do we care about logging?

Logs are critical for troubleshooting and tracking the performance of machine learning models. Since we often train on remote clusters, logs provide a simple mechanism for having a clear understanding of what’s going on at each phase of developing our model.

As opposed to simple print statements, logs are time stamped, can be filtered by severity, and are used by Azure ML to visualize critical metrics such during training, validation, and testing. Logging metrics with Azure ML is alos a perquisite for using the Azure ML HyperDrive Service to help us find optimal model configurations.

Tune hyperparameters for your model - Azure Machine Learning

Logging is a perfect demonstration of how both PyTorch Lighting and Azure ML combine to show simplify your model training just by using lightning we can save ourselves dozens of lines of PyTorch code in our application earning readability in the process.

Logging with PyTorch Lighting

In vanilla PyTorch, keeping track and maintaining logging code can get complicated very quickly.

ML frameworks and services such as Azure ML, Tensor Board, TestTube, Neptune.ai and Comet ML each have their own unique logging APIs. This means that ML engineers often need to maintain multiple log statements at each phase of training, validation and testing.

PyTorch Lighting simplifies this process by providing a unified logging interface that comes with out of the box support with the most popular machine learning logging APIs.

Loggers - PyTorch Lightning 1.0.2 documentation

Multiple Loggers can even be chained together which greatly simplifies your code.

**from** pytorch\_lightning.loggers **import** TensorBoardLogger, TestTubeLogger
logger1 **=** TensorBoardLogger('tb\_logs', name **=**'my\_model')
logger2 **=** TestTubeLogger('tb\_logs', name **=**'my\_model')
trainer **=** Trainer(logger **=** [logger1, logger2])

Once, loggers are provide to a PyTorch Lighting trainer they can be accessed in any lightning_module_function_or_hook outside of __init__.

**class**  **MyModule** (LightningModule):
**def**  **some\_lightning\_module\_function\_or\_hook** (self):
 some\_img **=** fake\_image()
_# Option 1_
 self **.** logger **.** experiment[0] **.** add\_image('generated\_images', some\_img, 0)
_# Option 2_
 self **.** logger[0] **.** experiment **.** add\_image('generated\_images', some\_img, 0)

Azure ML Logging with PyTorch Lighting with ML Flow

Since Azure ML has native integration with ML Flow, we can take advantage of PyTorch Lighting’s ML Flow Logger module to get native metric visualizations across multiple experiment runs and utilize hyperdrive with very minor changes to our training code.

Below I’ll outline the code needed to take advantage of Azure ML Logging with PyTorch lightning.

Step #1 Environment

Add PyTorch Lighting, Azure ML and ML Flow packages to the run environment.

pip 
 - azureml-defaults
 - mlflow
 - azureml-mlflow
 - pytorch-lightning

Step #2 Get Azure ML Run Context and ML Flow Tracking URL

from azureml.core.run import Run

run = Run.get\_context()
mlflow\_url = run.experiment.workspace.get\_mlflow\_tracking\_uri()mlf\_logger =

Step #3 Initialize PyTorch Lighting MLFlow Logger and Link Run.id

MLFlowLogger(experiment\_name=amlexp.name, tracking\_uri=mlflow\_url)
mlf\_logger.\_run\_id = run.id

Step #4 Add logging statements to the PyTorch Lighting the training_step, validation_step, and test_step Hooks

def training\_step(self, batch, batch\_idx):

 # Calculate train loss here 
 self.log("train\_loss", loss)
 # return test loss

def validation\_step(self, batch, batch\_idx):

 # Calculate validation loss here 
 self.log("val\_loss", loss)
 # return test loss

def test\_step(self, batch, batch\_idx):
 # Calculate test loss here 
 self.log("test\_loss", loss)
 # return test loss

Step #5 Add the ML Flow Logger to the PyTorch Lightning Trainer

trainer = pl.Trainer.from\_argparse\_args(args)

trainer.logger = mlf\_logger # enjoy default logging implemented by pl!

And there you have it! Now when you submit your PyTorch Lighting train script you will get real time visualizations and HyperDrive inputs at Train, Validation, and Test time with a fraction of the normal required code.

You shouldn’t but if you have any issues let me know in the comments.

Next Steps

In the next post, I will show you how to configure Multi Node Distributed Training with PyTorch and Azure ML using Low Priority compute instances to minimize training cost by an order of magnitude.

Plan and manage costs - Azure Machine Learning

Acknowledgements

I want to give a major shout out to Minna Xiao and Alex Deng from the Azure ML team for their support and commitment working towards a better developer experience with Open Source Frameworks such as PyTorch Lighting on Azure.

About the Author

Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with the Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.

Configuring Native Azure ML Logging with PyTorch Lighting