Refactoring is something all developers do at some point. We tend to have a sixth sense when it comes to knowing when to refactor code. Some folks have solid principles to point to when making refactoring decisions. Others can sense that it is necessary when they are working on it.
Whatever camp you land in, refactoring is a necessary process in all code bases. The reality is that requirements evolve. Decisions happen quickly. Production support incidents force us to react quicker than we may like. A wide variety of things can happen that make refactoring code necessary.
But how do we know when it is necessary? It varies from codebase to codebase. But, there are at least three common opportunities that most developers recognize.
1. Duplicated Code
As developers, we have inherited the innate ability to detect when we are repeating ourselves. We have learned, right or wrong, that repeating ourselves is bad and we shouldn't do it.
So, naturally, the first thing we tend to spot as needing refactoring is code where we are repeating ourselves. We even have a coding best practice named after this one.
Don't Repeat Yourself (DRY) is a principle of software development aimed at reducing repetition of software patterns,[1] replacing it with abstractions or using data normalization to avoid redundancy. (Source: WikiPedia)
Duplicated code tends to manifest itself accidentally or as something temporary that we will fix later. It might look something like this.
function doSomething(x, y, z) {
let squared = Math.pow(x, 2)
let summed = y + z
let result = summed / squared
if (result > 0) {
return 'Yes'
} else {
return 'No'
}
}
function doAnotherThing(x, y, z) {
let squared = Math.pow(x, 2)
let summed = y + z
let result = summed / squared
if (result <= 0) {
return 'Yes'
} else {
return 'No'
}
}
This is some trivial code, but it highlights our instincts as developers very well. When we look at this code, especially if the two functions are truly next to each other as we have here, we immediately notice the duplicated logic. We see that a result gets calculated via the parameters and 'Yes'
or 'No'
is returned based on that result. The return value is based on whether result
is less than 0 for doAnotherThing
or greater than 0 for doSomething
.
The refactoring we can make here is straightforward.
function getResult(x, y, z) {
let squared = Math.pow(x, 2)
let summed = y + z
return summed / squared
}
function doSomething(x, y, z) {
let result = getResult(x, y, z)
if (result > 0) {
return 'Yes'
} else {
return 'No'
}
}
function doAnotherThing(x, y, z) {
let result = getResult(x, y, z)
if (result <= 0) {
return 'Yes'
} else {
return 'No'
}
}
Now our two functions call a common function, getResult
that contains the shared code.
How does this benefit us though? It allows common logic to be centralized in one place.
For example, we have a bug in our code, summed / squared
is prone to a divide by zero error because we don't check that x
is not 0. Before our refactor we would have had to handle this in both places where the code was duplicated. Instead, now we can handle this in our new function and both functions get the fix.
function getResult(x, y, z) {
let squared = Math.pow(x, 2)
let summed = y + z
if (squared > 0) {
return summed / squared
} else {
return 0
}
}
function doSomething(x, y, z) {
let result = getResult(x, y, z)
if (result > 0) {
return 'Yes'
} else {
return 'No'
}
}
function doAnotherThing(x, y, z) {
let result = getResult(x, y, z)
if (result <= 0) {
return 'Yes'
} else {
return 'No'
}
}
Duplicated code can cause many bugs because bugs can get duplicated with it. Developers are bad at updating the various areas where the duplication occurs, so one area might get the fix while the other doesn't. If it wasn't for our refactor above we could have easily added our check for zero to one function but missed the other.
But, is all duplicated code bad?
No. Sometimes as developers it is reasonable for us to violate the DRY principle. Some folks even have a different take on DRY, they won't create a common function until they have had to duplicate the code more than twice.
Other scenarios, like microservice based architectures, support violating the principle when necessary. This is because shared code can create coupling when it spans service boundaries. These are things to keep in mind when you encounter the Duplicated Code refactoring opportunity.
2. Long functions
Moving on, the next thing we tend to recognize as needing a fresh coat of paint, long functions. Unlike duplicated code, we refactor long functions so that we can better understand what is happening.
This is different than the advantage gained from refactoring duplicated code. In that scenario, we are refactoring to centralize logic that is shared so we don't repeat ourselves. In this new opportunity, we are going to refactor into smaller functions. Not because their logic can be shared, but so that we can better understand what each section handles.
Let's check out an example of a long function.
const run = async () => {
const databaseOneClient = new Client({
user: process.env.DB1_USER,
host: process.env.DB1_HOST,
database: process.env.DB1_DB,
password: process.env.DB1_PASSWORD,
port: 5439,
})
const databaseTwoClient = new Client({
user: process.env.DB2_USER,
host: process.env.DB2_HOST,
database: process.env.DB2_DB,
password: process.env.DB2_PASSWORD,
port: 5432,
})
const databaseOneCountResult = await databaseOneClient.query(`select count(*) from table_one`)
const databaseOneCount = databaseOneCountResult.rows[0].count
const result = await databaseTwoClient.query(`select count(*) from copy_table`)
const existingCount = result.rows.length != 1 ? 0 : result.rows[0].row_count
for (var i = existingCount; i <= databaseOneCount; i++) {
let newObj = {
x: `lat_x_${i}`,
y: `long_y_${i}`,
z: `d_z_${i}`,
url: ''
}
try {
const response = await fetch(`https://somurl-to-call.com/api/do/${i}`)
const jsonResponse = await response.json()
if (jsonResponse.location >= 10) {
newObj['url'] = jsonResponse.lookup
}
const insertQuery = `insert into copy_table(x, y, z, url) values ($1, $2, $3, $4) `
await databaseTwoClient.query(insertQuery, [newObj.x, newObj.y, newObj.z, newObj.url])
} catch (error) {
console.error(error)
}
}
}
On the surface, this doesn't look all that bad. We are connecting to two different databases. Once connected we get the row counts from a table in each database. If those row counts are not equal, we copy some generated data into the second database after making an API call to an external service.
When we put it into words like that it becomes clear. But when we read through the code there is quite a bit of cognitive load to read what is happening. Luckily, this function isn't too bad though because it still all fits on one screen.
If it didn't fit on one screen we could lose context of what happens at the top when we scroll down to the bottom.
The point of refactoring a long function is to keep the literal meaning of the code the same, but reduce the cognitive load it takes to understand it. To do this we move chunks of the function out into their own separate functions.
Here is one example of what that can look like.
const run = async () => {
const databaseOneClient = initializeDatabaseOne()
const databaseTwoClient = initializeDatabaseTwo()
const databaseOneCount = await getDatabaseOneTotalCount(databaseOneClient)
const existingCount = await getDatabaseTwoTotalCount(databaseTwoClient)
for (var i = existingCount; i <= databaseOneCount; i++) {
let newObj = {
x: `lat_x_${i}`,
y: `long_y_${i}`,
z: `d_z_${i}`,
url: ''
}
try {
const response = await fetch(`https://somurl-to-call.com/api/do/${i}`)
const jsonResponse = await response.json()
if (jsonResponse.location >= 10) {
newObj['url'] = jsonResponse.lookup
}
const insertQuery = `insert into copy_table(x, y, z, url) values ($1, $2, $3, $4) `
await databaseTwoClient.query(insertQuery, [newObj.x, newObj.y, newObj.z, newObj.url])
} catch (error) {
console.error(error)
}
}
}
const getDatabaseOneTotalCount = async (databaseOneClient) => {
const databaseOneCountResult = await databaseOneClient.query(`select count(*) from table_one`)
return databaseOneCountResult.rows[0].count
}
const getDatabaseTwoTotalCount = async (databaseTwoClient) => {
const result = await databaseTwoClient.query(`select count(*) from copy_table`)
const existingCount = result.rows.length != 1 ? 0 : result.rows[0].row_count
return existingCount
}
const initializeDatabaseOne = () => {
return new Client({
user: process.env.DB1_USER,
host: process.env.DB1_HOST,
database: process.env.DB1_DB,
password: process.env.DB1_PASSWORD,
port: 5439,
})
}
const initializeDatabaseTwo = () => {
return new Client({
user: process.env.DB2_USER,
host: process.env.DB2_HOST,
database: process.env.DB2_DB,
password: process.env.DB2_PASSWORD,
port: 5432,
})
}
This is still a lot of text on one screen, but now the logic of our one big function is divided up into smaller functions.
- We now have two functions that initialize the database clients (
initializeDatabaseOne
andinitializeDatabaseTwo
). - We also have two functions that get the table counts from each database (
getDatabaseOneTotalCount
andgetDatabaseTwoTotalCount
).
These are not shared functions so we are not reducing duplicated code, but, we are making the code in our main function easier to understand.
This small refactor makes the beginning of our run()
function much easier to digest when we first look at it. It is very clear that we connect to two databases and get the counts from each one. If the row counts are not equal, we copy some generated data into the second database after making an API call.
Notice that we didn't refactor our loop or the internals of it. This is because that is our third most common refactoring opportunity, let's explore that now.
3. Complex loops
The third and final opportunity for refactoring is a bit of an extension of the previous one. We often use loops in our code to iterate over collections and perform operations and/or transformations on them.
We see that in our sample code from up above.
const run = async () => {
const databaseOneClient = initializeDatabaseOne()
const databaseTwoClient = initializeDatabaseTwo()
const databaseOneCount = await getDatabaseOneTotalCount(databaseOneClient)
const existingCount = await getDatabaseTwoTotalCount(databaseTwoClient)
for (var i = existingCount; i <= databaseOneCount; i++) {
let newObj = {
x: `lat_x_${i}`,
y: `long_y_${i}`,
z: `d_z_${i}`,
url: ''
}
try {
const response = await fetch(`https://somurl-to-call.com/api/do/${i}`)
const jsonResponse = await response.json()
if (jsonResponse.location >= 10) {
newObj['url'] = jsonResponse.lookup
}
const insertQuery = `insert into copy_table(x, y, z, url) values ($1, $2, $3, $4) `
await databaseTwoClient.query(insertQuery, [newObj.x, newObj.y, newObj.z, newObj.url])
} catch (error) {
console.error(error)
}
}
}
We are iterating from existingCount
all the way up to databaseOneCount
. With each iteration we are building a new object, newObj
, and after an external API call we are inserting that object into the second database.
This loop internal could be simplified and made easier to understand with a one small tweak.
Note: I have omitted some of the code from above to focus on this complex loops section.
const run = async () => {
const databaseOneClient = initializeDatabaseOne()
const databaseTwoClient = initializeDatabaseTwo()
const databaseOneCount = await getDatabaseOneTotalCount(databaseOneClient)
const existingCount = await getDatabaseTwoTotalCount(databaseTwoClient)
for (var i = existingCount; i <= databaseOneCount; i++) {
await insertNewObject(i, databaseTwoClient)
}
}
const insertNewObject = async (i, databaseTwoClient) => {
let newObj = {
x: `lat_x_${i}`,
y: `long_y_${i}`,
z: `d_z_${i}`,
url: ''
}
try {
const response = await fetch(`https://somurl-to-call.com/api/do/${i}`)
const jsonResponse = await response.json()
if (jsonResponse.location >= 10) {
newObj['url'] = jsonResponse.lookup
}
const insertQuery = `insert into copy_table(x, y, z, url) values ($1, $2, $3, $4) `
await databaseTwoClient.query(insertQuery, [newObj.x, newObj.y, newObj.z, newObj.url])
} catch (error) {
console.error(error)
}
}
đź’Ą Our complex loop has been drastically simplified.
Wait, didn't we just move the internals of the loop to a separate function? Yup, that's exactly what we did. That small change made a huge difference in understanding what is happening in run
. With a descriptive function name, we see that we iterate from existingCount
up to databaseOneCount
and insertNewObject
.
That small change makes a huge difference in reducing the amount of cognitive load it takes for us to understand what this code is doing.
Of course, we could take this one step forward and refactor the insertNewObject
function as well. We could separate out the API call from the actual insertion into the database. But, that is an exercise I'll leave to you the reader.
Conclusion
Refactoring is a necessary process in every development team. The reality is that the code we write today is based on the assumptions, time constraints, and decisions we have available to us at this moment in time. All those things are bound to change in the future and as such our code will need to change as well.
The art of refactoring is a big topic and we have merely scratched the surface here. There are dozens of other patterns and opportunities to be spotted that we haven't discussed here. But, the three here should be common across any tech stack and should get you more familiar with the practice.
Want to check out my other projects?
I am a huge fan of the DEV community. If you have any questions or want to chat about different ideas relating to refactoring leave a comment below or reach out on Twitter.
Outside of blogging, I created a Learn AWS By Using It course. In the course, we focus on learning Amazon Web Services by actually using it to host, secure, and deliver static websites. It's a simple problem, with many solutions, but it's perfect for ramping up your understanding of AWS. I recently added two new bonus chapters to the course that focus on Infrastructure as Code and Continuous Deployment.
I also curate my own weekly newsletter. The Learn By Doing newsletter is packed full of awesome cloud, coding, and DevOps articles each week. Sign up to get it in your inbox.