JS Minty Fresh: Identifying and Eliminating Smells in Your Code Base by Rebecca Murphey
Thank you, Christian, for introducing me, and for everyone for having me here. It was a long flight, and then there was drinking, and I'm better now.
In the vein of drinking, I'm from the South in the United States, and in the South, we enjoy drinking mint juleps in the summer. This is a mint julep served in a classic pewter cup. Ideally, you would sit out on your wraparound porch, and with the screens, and watch the horses go by...I don't know. Anyway, when I was putting together this presentation, I thought, "Ooh, mint. Mint julep. Yay", and then I went to the speaker dinner, and then I stayed out until 4:30
in the morning, so water has been way more my friend than the mint juleps for the last 36 hours. But thank you Francis, and Alex, and Jake for getting me into that trouble.
Also, when I was looking for pictures of water, I found a picture of a dog opening a car door. I just thought that was cool.
Like Christian said, I work for Bocoup. It's a company in Boston, and we do training. I do a lot of training. We also do consulting, and perhaps most importantly, we are huge supporters of open source. If I don't have client work to be doing, it's this unspoken expectation that I am doing something good for the open web, that I'm doing something good for open source in that time.
There was some conversation after David DeSandro's talk this morning, and Christian, you asked during the Q&A portion, "Why do developers have to do this in their free time? Why aren't companies supporting developers in their paid time to do this?" Because certainly companies benefit from open source. And giving people a little bit of time to give back probably would be kind of them.
And so, I am very grateful that Bocoup gives us that time. I know there are lots of other companies that do give that time. If you work for one of them, thank them. If you don't work for one of them, quit and go work for one that does, because we wouldn't be here doing what it is we do without open source.
That's what I have to say about that. That's me, that's me, that's Bocoup. I want to talk about code smells today. Code smells are generally pretty easy to spot, at least the obvious ones and especially once you have some experience with writing code. They're generally pretty easy to spot. You may not always know what to do about them, but you will often have that feeling in your gut that there's something not quite right about those lines of code that you're seeing on the screen.
Just like Phil said in almost the exact same slide that he had in his talk, except his was red with white letters, and mine's green with white letters. They're suggestive of a deeper problem. They are indicative of a deeper problem. Everything may be fine. But they're suggestive that there's something else bad going on underneath it.
You know, when you come home and you're like, [sniff] I should have taken the trash out. It's like that. They smell. The harm of code smells is...There's these obvious ones. When you have crappy code, it's harder to debug. When there's a problem, it's harder to figure out the nature and the source of that problem.
If you are coming to a new code base and the code base is smelly, then working with that code base can be really challenging, can be really difficult. Maintenance is a tough thing when people have written smelly code.
The biggest thing that I find, and the way that I made a lot of money for a few years, was, features are really hard to add. It's hard to add new stuff to bad code, because the code was written in a way that anything that you...if you're just opening it in your text editor, stuff might break. Features get a lot harder to add when your code is smelly.
Those are the traditional reasons why we care about this. But the bigger reason, in my mind...Well, there's two bigger reasons, two reasons why you should care. Maybe you don't care if your code is maintainable and maybe you don't care if features are hard to add, because you get paid the same no matter how long it takes you.
But the thing about smells, number one. Smells beget other smells. There's the broken window theory, if you've read a Malcolm Gladwell book or read what's his name, "Coding Horror"? What's his real name? Jeff Atwood has a blog post about this. It's in "Pragmatic Programmer," and lots of people talk about this broken window theory, where basically the idea is that in a neighborhood, if the neighborhood starts to decline, the decline becomes more precipitous.
If people see code that didn't do things as well as it could have done...the code works. That's the important thing about smells and the devious thing about smells, is the code works. But it's not as good as it could be, and there are smells that are indicating that things are going to get difficult if you continue to work with this code.
And so, when you encounter code like that, and I know I've done it, and probably you all have done it too, when you encounter code like that, it's really easy to decide, "This is good enough". It's really important, especially when you're starting a new project. It's really important to have that at the top of your mind, and be vigilant about it from the get-go.
Again, like Phil said, you never come back and fix it later. I'm going to show you how to fix it later, but the idea is that you don't do it in the first place. Maybe you make these mistakes while you're prototyping something, or just getting something working. But you don't want to commit this code to your project. You certainly don't want it out in production, because once it's there, it becomes harder and harder to take it back. It becomes a sign to your fellow developers that "This is OK. We won't mind if you write code like this."
It's really difficult to see opportunities when your code is not good, and so opportunities for abstraction, opportunities for reuse, opportunities for testing, or even, how do I test this? Those things can be really hard to see when your code is not as good as it could be.
We're going to take a look at a few different code samples. This is probably, I've been here all day, I think I've watched every talk, and I probably have more code in the first three slides than has been...other than, again, Phil. That one ASP tag is more code than my whole presentation, but there's going to be a lot of code in this talk.
This talk will be online. The slides are online, so you can check it out. There's a GitHub repost, so you can check it out. I'm not going to pick over every single line, but just talk about some high-level concepts and look at smelly code, how to make it better, and then what opportunities we see once we take the time to make it better.
Repetitive code is probably the number one smell that I see. I get to see a lot of code. I get to see code from students. When I was doing consulting work for several years, I got to see a lot of code, and repetitive code is probably the biggest smell, and also one of the easiest to take care of.
Here's some code, where what it's doing doesn't super-matter, but basically there's getting a date out of a date-picker and using it to populate the day, month, and year, region. Can you see...I'm having the same thing where being up close this looks terrible. But I think from far away, it's probably hopefully OK. Let's try that. Let's see what happens.
This is some code where they're using a date-picker to update the day, month, and year region of Menu Item 1, and then they're using another date-picker to update the day, month, and year region of Menu Item 2. You can almost imagine the developer. They wrote that, and they're like, "Cool. It works". Even here, we've got some repetition going on that's not ideal. We're making three selections, all with the same root of Menu Item 1.
The right thing to do here would be to make that Menu Item 1 selection one time, and then find the day, find the month, find the year using the Find method. By the way, all these examples are going to use jQuery, because that's probably what most of you are using. There's nothing jQuery-specific about any of these...repetitive code is repetitive code.
You can, like I said, imagine the developer got this working, and we'll forgive them for that repetitive selection. And then you know what happened. They went like that, and they hit Command-C, and they went like that, and Command-V, changed a couple things, and they were like, "Score. My job is done, let's have a beer."
That's, again, that's fine for getting it working on your computer in your development environment to make sure that yes, this will work, whether I'm working with Menu Item 1, or end date, or start date". That's fine. It should never make it into production, but it does all the time.
Here's what we could do instead. A function is a really great way to solve this problem. We can have a function that takes that menu item selector, takes the date picker selector, and then does all that stuff that we were doing two times. It just does it one time, and now, we have a function that we pass in those two arguments, and stuff just happens.
The cool thing is if there's later menu item three, and four, and five, and six, and...I don't know what other kinds of dates you might have, "Pick your birthday and put it in menu item three". If there are later more of these, and as soon as you have two of them, that's a good sign that there might someday be three, and certainly once you have three of them, stop and write and function to do this for you.
As soon as you have two of them, that's the first moment that you should be thinking, "How likely is it that this need is going to present itself again?" When you have the third occasion, the need has presented itself too many times that you're still doing it in this repetitive way.
Once we see this, now we can see that, "Oh, we have this relationship between menu item one and start date, and menu item two and end date." Now, we can even take this and break it down even further using the each iteration method. Iterate over the keys and values in an object, and then this object makes it really, really easy if we have more cases of this same thing.
We just use these key value pairs as the arguments to our function here. We've gotten rid of all of that repetitiveness. We've made our code more reusable, and made it so that the next time this presents itself, we aren't copying and pasting and feeling guilty about it.
Here's some more repetitive code. It's a little bit different this time. Here, someone has selected something from a select menu, and based on what they have selected in the select menu, we're going to center a map on a location. You're like, "Well, if they chose Ireland, then let's center the location on Ireland. If they chose Clare, we'll center it on Clare. If they chose Dublin, we'll center it on Dublin."
It's like, "That's cool. What are you going to do when they center it on Amsterdam, or London, or anywhere else in the world besides these three places?" Again, maybe this will never present itself. Maybe they know that whatever it is that this user is selecting never in a million years will be anything other than Ireland, Clare, or Dublin.
But that's pretty risky, and even if that is true, there's still a better way to do this. We're literally doing the same thing. What changes from one location to the next is the latitude and longitude, and the zoom level of our map. Those three things change from one location to another. When we have a set of things, a set of data that are related to another thing, such as a location, then an object is a really good way to handle this instead.
Get rid of those repetitive if statements, and have an object that says, "For Ireland, here's the information. For Clare, here's the information", et cetera, and then we just look up which location we need from that object. Then have all that map centering code happen one time, instead of happening three different times.
The other fun thing about this code is that when it gets to...Let's say that the user has selected Ireland. When it gets to here in this code, if the user selected Ireland, we're done. We don't need to check if they selected Clare. We don't need to check if they selected Dublin. We know they selected Ireland. If we were to leave this code like this, we can throw in a return at the end of this, and be out, not make our code do any more work than it has to do.
Generally, you wouldn't want to have those ifs at all. You'd just want to have this straight up lookup. The cool thing, once we realize that we can put this data into an object, rather than spelling it out for each location, there are a couple of cool things. Number one, it becomes really easy to add another location. We just add another key value pair to that object, so it becomes really easy to add another location, and support another location, as long as we provide that same data.
All of these other bits of code that I've shown you so far, to some extent, I made them up. To some extent, I saw them from people asking questions in the jQuery IRC channel. This one's real, and this one is from a very large company that, at least in the U.S. Maybe you've even heard of them here, but they're from a very large networking company. I obfuscated it a little bit so it's not entirely clear.
The best part, truly the best part, is that was a real usage of this function -- can't make that up. That's good stuff.
It's almost not even worth talking about what the rest of this code is doing because the whole point is that it's really hard to tell that this code is doing, because this is a 33-line function, if I recall correctly. What it's doing is updating two table cells in a row. That's what it's doing. It's doing that based on all of this information that's been passed into it.
Then, the logic is really just...It's hard to follow. Of course, there are no tests to tell you what they should actually end up doing. You don't even really know if you're doing it right, if you accidentally should had said "standard price" instead of "not standard price," or something like that. It is finding these two table cells -- typos abound. It's finding these two table cells and updating them with some HTML.
Again, the reason we know --just at a glance -- that this is bad is two reasons. Number one, all those arguments coming in. Number two, the fact that it's 33 lines long and like nested IFs. Those are both pretty good signs. All three of those things are pretty good signs that like this is due for some refactoring.
What we can do instead is...All that information about the price, the standard price, the discount price, whether the price is deferred, or prorated, or all those things. We can put that into an object instead of having it be individual arguments. Then, instead of having this one big function that's responsible for not only figuring out what the HTML is that goes into these table cells, but also updating... yeah, finding those table cells, and then putting HTLM on the table cells.
We're going to break that into three separate tasks. We're going to have fairly simple function that just figures out what the HTML is for the price cell. We're going to have a fairly simple function. You can see it here. We're going to have a fairly simple function that figures out the HTML for the discount table cell. Then, we're going to have a function -- this update row -- that takes the row.
Whatever we figured out should be the HTML for the price cell. Whatever we figured out should be the HTML for the discount cell. We're going to have this function be responsible solely for taking that... It doesn't care what the HTML is. It doesn't care how you figured out what it was. It just says, "Thank you for this HTML. I will put it in a TD now."
What's really cool, though, is now we can write tests. Testing that other function, it's possible. I'm not going to say that it's not possible. You would be writing kind of functional tests at that point for like, "Did the right thing eventually happen?" If the wrong thing happened, you would have to go in and debug line by line to figure out where that wrong thing happened.
Now, we can write a test that says, "Hey, when I gave you this discount HTML, and this price HTML, did you put it in the table where it belongs? Good work". Then, we can write another one that says, "Hey, when I gave you price information that said the discount price was zero, did you send me back the right HTML?"
We can state all of these expectations really, really, clearly. I'm not saying that you shouldn't still write functional tests, but you can state your expectations of these individual functions [cut] patience of this whole big function. Again, the real whole big function is like a work of art. I initially tried to include it in this. It would have taken me the whole 50 minutes to try to explain what was going on.
The tests are a big benefit. The ability to write tests are a really big benefit of writing better, cleaner, more organized code. Again, that old function that they had, it worked. It worked just fine. It's on a site that probably serves, certainly, millions a month of page views. It works. If anything goes wrong, they don't know where it went wrong.
They don't have tests to prove that it's working. They might discover that it's not working by finding out that they just sold a product for $00.00 by mistake, or that they told a customer that the product was going to be $00.00, and then the back end system charges then $150.00, which is possibly worse.
We routinely see people who are assigning a value to a variable in their success callback on an AJAX request, then two lines later outside of the AJAX request, they're trying to use the value that they assigned because this concept of asynchronicity. We're so used to code that just. Line two runs, Line three, Line four, Line five, all the way down.
With AJAX, the code on Line three might run a second after the code on Line 5, or even longer, or maybe not at all because something went wrong. That can be a hard thing for people to get used to.
What's been really exciting for me has been the introduction of deferreds and promises into jQuery. How many people have heard of deferreds and promises? All right. Cool. That's like most everyone -- probably like 80 percent of you. How many of you have used them successfully? All right, more than a few of you. Nowhere near half, I would say. I'm going to talk about them just a little bit.
Here's an example where we have a function that's making two AJAX requests. We need to have both of them completed before we can execute the callback that was provided to the function. This is probably the most -- I don't mean this in a mean way at all -- naive way to achieve this is fire the first AJAX request when you get a successful response back, then fire the second one.
The problem with this, or course, is that you just lost a lot of the benefit of the asynchronicity. Because now, if each of these requests take 300 milliseconds, you're 600 milliseconds down the road before you're actually calling that callback. Because browsers can make more than one request at a time, if you could have fired them both at the same time and then known when they were done, you could have probably saved most of 300 of those milliseconds.
Another way that you'll often see people doing this is fire them simultaneously, and then trigger some counter when each of them succeeds. Then, check to see if that counter is two. If that counter is two, then we know we can fire the callback. That's' another way that you'll see people solve this.
In the world where deferreds and promises exist, I am going to go out and go on a limb and call both of those smells actually. They work, but they're not fully taking advantage of the tools available to you. Because you're not taking advantage of the tools available to you, you're missing out on some opportunity...some really nice opportunities that you don't get with a solution like this.
Here's what we can do instead. I've shortened those AJAX calls down to just "getJSON", because otherwise this gets long. I've shortened those AJAX calls down to just "getJSON". As of jQuery 1.5 and Dojo long, long ago had...I don't even know, from the beginning perhaps has had this. I can't remember if it was Mootools or MochiKit that first introduced promises. This concept has been around for a long time, this concept of promises and deferreds.
In jQuery 1.5, we saw if arrive in jQuery. That has sort of made these a mainstream tool that people are becoming much more familiar with and aware of.
In the old days, pre-jQuery 1.5... Late 2010, early 2011, I think, pre-jQuery 1.5 -- the return value of jQuery.ajax, or jQuery.get, or jQuery.post -- the return value of that was you got something back. You basically got the raw XMLHttpRequest. It wasn't a super useful object unless you understood that API. In jQuery 1.5, they made it so that, instead, what you get back is what's called a deferred. This deferred is magic. This deferred is awesome.
Here's a case where we're using the deferred. The idea of a deferred is that it gives you the ability to register your interest in the eventual outcome of an asynchronous thing. If eventually it fails, you can say, "I want to know about that". If eventually it succeeds, you can say, "I want to know about that". If it has already failed by the time you're registering you interest, or if it's already succeeded by the time you're registering your interest, it'll let you know about that, too.
What we can do here is say, "I want to know about this people request. I want to know about this task request." We can pass both of this to this also magically method called a "When." We end up writing this code that like reads...It's very English. It's very much like a sentence.
When these two things are done, then do this other thing. That's what we're saying in this code. When these two things are done, when this people request and this task request is done, then run that callback that was passed into our 'get two things' function.
We have to do a little bit of weirdness. I am going to make this go away on the next slide -- this thing where we have to access the zero item of what's returned. Already, we have dramatically shortened our code. We have completely managed the simultaneity -- the fact that we need to wait for both of these to be done. We've completely managed that without jumping through any hoops at all. That's cool.
The opportunity that we get out of this is even cooler. That whole "people zero" and "task zero" is pretty nasty. This getJSON returns a deferred, like we talked about. We can then call the pipe method on that deferred, and that lets us change, basically return the ultimate resolution of this deferred.
We can say, "Give me what you've got. I'm going to tell you what I want you to give to everyone else". That's what we achieve here, so we can say, "I know you got this object that has a people property, and that's where the people I'm after really are."
We can say, "Everyone who consumes the result of this, they really just want that list of people. They don't want the object that has the people property that contains an array of people. They just want that array". We can change the result of that Ajax request, and make that changed result available to everyone who consumes this on down the line.
We do the same thing with tasks. We, again, alter what's returned, so that what we're exposing to everyone else is nicer than what we got. Then, we just pass this callback as what we want to do when we're done, when the people requests and the task requests are done, and when this piping thing has already happened. Let's run that callback. So that's cool.
This is the other exciting thing, is we're returning this from this function. Now, whoever called this function has the opportunity to register yet more interest in the ultimate outcome of this. And so, maybe this deferred gets passed on up to who knows? If you want to do something in addition to this call back that was passed in initially, then you can.
We could even eliminate this callback that we're passing into the function entirely and just return the deferred. Or even better, even better, we can return the promise, which is a read only deferred, basically. Anyone who has a promise can learn about what happened, but can't change what happened, can't, or resolve any synchronous thing before the real asynchronous thing resolved.
Last thing I want to talk about is how writing mintier code opens the doors to bigger abstractions, abstractions that you just, like, you couldn't see, couldn't necessarily see the path from point A to point B or point A to point C. You have to get to B first before you can get to C.
And so, cleaning up these smells, eliminating these smells, leaves you with code that you'll start to look at it and be like, "Oh. I could do that and that and that, and then..."
It's also just nasty. Man, this is ugly, ugly stuff. And really easy. I know I've done it to miss a quote or get a double quote where I meant a single quote or whatever. Yeah, and goodness help you if you aren't carefully escaping the content that you're throwing into this and all that. Things can go downhill fast with this.
Probably the other smell about this is that this one function is doing a number of things. Number one, it's going out and getting the data. Number two, it's manipulating the data. Number three, it's turning that data into HTML, and number four, it's putting that HTML into the page. This one function that's probably 20 lines long is doing four distinct tasks.
Again, if you went to test this, all you could really test is, was the eventual outcome. You don't have a good insight into where something went wrong. The answer to this, and probably a lot of you are doing that... This was a way more controversial and mind-bending thing a couple of years ago. Probably a lot of you are doing this by now.
We've broken that get tasks section out to its own function, so that's not cluttering things up. We're taking advantage of the fact that we know about deferreds now, so we're making that request. We're munging the data, and then we are using that data to populate our template.
We can even do one better, and we're not going to go through all of this example here. But we can even do one better. Of course, if we put that template in our HTML, that template is coming down with every load of the HTML. And so, we can't. If anything else changes about our whole HTML page, then that's not a cacheable resource.
And so, if we're using lots of templates in our code, then we can't...We may be sending down quite a bit of data in our HTML page that's not cacheable if we're using that script tag strategy.