Hiteshwar is an SRE based out of Mumbai, India. His area of specialization is in distributed systems. He works on Kubernetes, running his own custom clusters, maintaining them and creating tools to manage and monitor them. He likes to share his learnings by writing articles and blogs on Medium and Linkedin. He is an active speaker in meetups and developer groups and also teaches DevOps and SRE practices at learning centers.
1. How did you become an SRE?
I loved computers from when I was young. In 10th grade, I started building PCs and troubleshooting tasks by installing a lot of software and games on both Windows and Linux which led me to like Systems Administration. I discovered my love for troubleshooting systems and maintaining them in college when I managed all of their infrastructures.
As I started my professional journey I was tasked with a lot of automation projects relevant to my role in DevOps. All this experience propelled me into the path of being an SRE.
**
- What's the most challenging part of your job? **
For me, it's expecting the unexpected. Quoting John Wilkes of Google “Computer components are very reliable but once you have a lot of them they fail all the time”. To maintain that illusion of stability for external users even when you have 1000s of failure domains internally is probably the hardest thing for an SRE to achieve. So when there is a failure, how fast you detect it, building an automated recovery and finding a fix that is resilient is what keeps me busy.
**
- What process, tools and techniques you can't live without?**
Currently, I rely on ATOM as my IDE and Ansible for pushing configs around. I am always looking for good articles and open source projects on the web and try to run them in my own dev environments and if possible I try contributing to them as well. I have been using Kubernetes a lot lately both at work and in my personal projects, hence it's what I am currently most focused on.
4. What according to you is the future of SRE?
SREs are here to stay, any organization that cares about its users and systems wants to build a culture around SRE. Finding and hiring a good SRE is hard because apart from being good technically, one has to respect the process and has to have a sense of ownership and passion towards the systems they manage. In my opinion, SRE’s job is a never-ending one because as your business scales, your users scale, your systems scale and so does your failure domains and to keep those running reliably you need someone with incredible passion and grit who is determined to learn new skills and is curious about everything that happens in the tech world.
5. Any productivity hacks that you would give to new SREs?
- Keep notes of what you learn or if you come across something interesting or challenging along the way.
- Build snippets of code and tools that help you to do your tasks efficiently.
- Always keep revising your fundamentals of OS, Network, Orchestration, Cloud, Monitoring, etc.
- Keep an eye on new and emerging technologies by reading blogs, attending keynotes and deep-dives of various conferences such as SREcon, KubeCon, other CNCF events, VelocityConf etc.
- Try to replicate systems in your dev environments and then try to break them, this will give you an idea of what fails when and how to fix it which is what SRE is all about.
6. What are some of the things people get wrong about this role?
SRE has become a buzzword in the tech world. Everyone wants to be an SRE assuming it's just like system administration and companies are enabling this by renaming SysAdmin jobs to SRE. But the reality is an SRE is a software engineer who knows his way around the Infrastructure that is running the software. SRE understands how that software will behave when the infrastructure running it will fail and how to bring it up if there is a failure.
With this knowledge, SREs can infuse reliability in software at the design level. So a software designed with reliability and failure domains in mind will be more reliable when taking hits from all sorts of expected and unexpected failures both at the service and infrastructure level.
Follow the journey of more such inspiring SREs from around the globe through our SRE Speak Series.
_
Squadcast is an Incident Management tool that’s purpose-built for SRE. Get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. Work in collaboration using virtual incident war rooms and use automation to eliminate toil._