Pete Hunt - Scaling up and down: evolving your testing strategies

Fronteers 2014 | Amsterdam, October 10, 2014

Projects have different testing needs at different points in their lifecycle. In this session we'll have a look at the evolution of testing at Facebook and Instagram, how our needs changed over time, things we did right and mistakes that we made.


Pete, you probably want to be up here.

Are you giving a talk from down there? Is that the plan? Oh, OK.

Come on then.

So I think things are getting a lot better, in terms of like performance UX.

And I think we really care about these things because they're immediately front facing.

But I think we give less care to things that are a little bit more hidden.

I know we were speaking about accessibility earlier.

But another one of those things is testing.

And here to show us his battle scars in testing in large organizations, big welcome, is Pete Hunt! [CLAPPING] Hey everybody.

How's it going? Yeah.

There's so much energy at like the afternoon right now on the last day of the conference, right? So my name is Pete.

And I'm here to talk about evolving your testing strategy.

Whatever that means.

I just want to give you a little bit of background on the types of stuff that I've worked on, because a lot of this is about battle scars, mistakes that I've made, and stuff that I found works for the projects that I've been on.

And maybe they work for you.

Maybe not.

I don't know.

We'll see.

But I kicked off my career working on Facebook video.

And this was a project that was not getting a lot of love back when I first joined.

It was a hackathon project that a couple of people wrote and then promptly stopped maintaining.

And I was the only person working on it.

And so obviously there was a lot of quality issues there.

Then I moved more towards full stack development on Facebook photos, building a lot of stuff on timeline back when timeline first came out.

Later I moved over to Instagram web doing full front end stuff.

And that's when I really started getting into JavaScript and having-- like I guess it's fun.

And then I kind of took kind of a detour and stopped working on more product stuff and started working more on JavaScript frameworks, stuff that target developers.

And I learned a lot about the different requirements for testing products rather than testing developer frameworks, and underlying systems like that.

After that I moved over to Instagram monetization.

And so this was actually more of an engineering management and project management type of situation for me.

And I realized that the code that I typed into the computer and the way that you manage the people typing that code into the computer results in a different way of thinking about testing.

And finally, I recently left the nest and I started my own company called Authbox.

And we're building a brand new kind of software as a service platform.

And we have our own kind of new views on testing as well.

So I've kind of had my hand in a lot of different pies, and had to use a bunch of different quality strategies.

And I'd kind of like to share some of those with you today.

Sound good? All right.

All right.

There's a little bit of energy there.

$312 billion is blown every year on software bugs, according to these guys.

I have no idea how they measured that.

It's probably not accurate.

But it sounds really impressive.

And it seems to be motivating to think about software quality in that way.

So $312 billion buys a lot of stuff that is a lot, you know-- that we just shouldn't be wasting that much money.

So software quality is a big deal.

But when we say software quality, like what does that even mean? And there's this kind of expert in this field.

He's a consultant.

He's worked with Microsoft, and Amazon, and a bunch of other big companies.

And his name happens to be Dave Chappell.

Not the comedian.

But he's a-- Or I think he goes by David.

And he breaks down software quality by kind of three different perspectives.

The first one is called functional quality.

And that's the idea that the software does what it's intended to do.

So we set out.

We have a spec for our software, or we have an idea like what place in the market we want to fulfill and how we want to touch our users.

And functional quality means, yes, the software fulfills its mission.

There's another perspective on software quality called structural quality.

And that's, is the code itself well structured? So can we understand the code? Is it easy to write unit tests for the code? You know, when you dive into the code base, are you excited to work with it, or do you hate it? The third one is one that we don't talk about a lot.

And it's process quality.

And that is basically, how does our software development process and methodology, and how does the structure of our teams and how people coordinate their efforts, how does that affect the quality of the system as a whole? And so what I'd like to focus on today is improving process quality to improve functional quality.

So I want to improve the process that we go through when we think about software quality, and improve the kind of end result of our system.

Does it do what we intended it to do? Now, I haven't really touched on structural quality here because that's a religious war.

Everybody has different ideas of what good structural quality is.

There's a couple of high level principles there.

But everybody fights over it all the time.

And it's the web platform.

So it's hard to get good structural quality anyway.

So I like to think of software quality as requiring answers to three main questions.

The first question is, is there a defect? So a defect would be a bug, a missing feature, a performance regression, that kind of thing.

The second question I want to answer is, how bad is the defect? So when we have this giant backlog of stuff to do, how does this defect rank relative to other defects and other things that we need to do? How do we prioritize this defect? And the third thing is, how can we fix this defect? So we know the defect is there.

We've decided how important it is.

Now how do we actually go about executing the fix for that defect? So I'm going to ask you to remember these questions because we're going to come back to them a lot.

So is there a defect? How bad is the defect? And how can we fix the defect? All right.

You'll remember that? Can I stop driving that home? Because I'm going to drive it home way more later.

Let's talk about agile development.

Let's throw some buzzwords around.

So when you say agile development, I think a lot of people probably roll their eyes and think of consultants coming in with a bunch of UML diagrams, and talking about how we should draw circles on the whiteboard and that'll make our software better.

But it's actually, I think, a testament to how influential it really was.

Because I think all of us in this room, if you're building on the web platform, you're probably doing some sort of iterative or agile process.

I don't think anybody really does with the waterfall methodology anymore, because this is just so ingrained as a best practice at this point.

And the reason why agile was so successful was because it wanted to solve one main issue, which is responding to unpredictability in software development.

So if you go to agilemethodology.org,

which is the primary source for this, "The Agile movement proposes alternatives to traditional project management.

Agile approaches are typically used in software development to help businesses respond to unpredictability."

Now, in software-- Let's imagine that we have a timeline here.

The general life cycle of development looks something like this.

So we don't really know what we're building at the beginning.

We have an idea of kind of the high level goals of what we want to accomplish.

But we don't know much about the actual problem that we're solving.

And by the end of the process, we pretty much know what we've built, because we built it.

So how do we get from not knowing anything to knowing something? Well, we have to build.

And we have various milestones as we build.

So we start with a high level concept.

And this will come from design, or marketing, or some sort of PM type of person or team.

And we'll go from concept to prototype.

Now, you're milestones for your particular projects may be different.

These aren't really that important.

But the general progression is what I'm trying to hammer home here.

Which is we take a concept, and then we build kind of a rough sketch in software, in code for what we want it to do.

And then, rather than throw out that prototype and build a brand new production system, what usually happens is we usually evolve this prototype, and we add features and fix bugs in it until we feel pretty good about it.

So if you've ever talked to anybody about throwing out the code and rewriting the whole system, it generally doesn't work out well.

So you want to be able to evolve the system, if at all possible.

And then finally, you take your finished product and you push it to production.

Now, what I've just described to you is kind of how software evolves and is built. Nothing specific

about what Agile has to do with this.

So this begins the pseudo science part of the presentation.

A lot of this stuff is very difficult to objectively measure.

So I'm going to try to present a couple of ideas that I've had which are just kind of frameworks for thinking about things.

And they might help you stumble upon some good development techniques.

So the way that I look at the difference between the waterfall school of developing software and the Agile, or iterative, school of developing software is that your progress for developing waterfall software looks kind of like this, in that it's often a step function.

So for each milestone you kind of sit down and you think really hard about what kind of code you're going to have to write.

How do I design this system? You think through all the unknowns that you have.

And then you execute on it.

You write a bunch of code.

And then you're up to the next milestone.

And you do the same thing.

So you've got lots of periods of not a lot of progress, not a lot of code committed into master.

And then you briefly make progress.

So it looks like a step function.

Now, the problem with that is you don't really know what you're building.

So these steps right here are going to get longer, and longer, and longer, as you find out the decisions that you made before were wrong and you have to correct them.

Contrast this with iterative or agile development, which tries to smooth this process over.

So you have a bunch of small steps.

And you have hypotheses at each step.

And you test them out.

You build a little bit.

You kind of commit a small chunk at a time.

And as you gradually develop this space, you have a lot less risk upfront.

So the general hand wavy idea is that you make progress at every iteration, rather than having to backtrack, as you would with the waterfall methodology, or invest way too much time up front with very little progress.

So let's go back to those three questions.

Is there a defect? How bad is the defect? And how can we fix the defect? And how do we apply these questions to a piece of software developed using agile or iterative development? So if we want to ask a question, is there a defect, and we want to ask it at various stages in this process.

It doesn't really make sense to ask if the concept has a defect, at least if you're an engineer.

Because if the concept has a defect, that's really a problem with marketing, or design, or PM.

So I'd like to take asking if a concept has a defect out of the question.

So really the first time we have to deal with this question, is there a defect, is when we build our prototype.

And we're kind of experimenting, we're writing a lot of code, building this prototype.

And asking this question, do we have a defect, helps us learn a lot about what we're actually building.

And throughout the rest of this development process, feature development and production, we need to know if there's a defect as well.

Right, we need to know if there are bugs.

The question about how bad is the defect, on the other hand, we don't actually really care that much about how bad the defect is when you're building a prototype, because you don't really know what you're building yet when you're building this prototype.

So you just care if it works or not.

You don't really care about the quality of the prototype until you're much further along in the development process, because this is a vehicle for exploring the problem space.

So really you're asking this how bad is a defect problem when you're starting to build the production features, the final code, "final code," that you're going to push to your users.

So you don't want to ask this question earlier on in the development process, because it's going to be a waste of time.

And the last question that we're talking about, how can we fix the defect, we really only care about that towards the end of our software development process.

Because the features are changing all the time.

The prototype changes all the time.

And we don't actually have real users for this feature yet.

We're still in development mode.

We're always used to diving into the code and building out these things because they're not finished yet.

But when you get to production, it's important that whatever quality process you have in place let's you triage issues to the right team and let's that team act quickly on those issues.

Because it's in production.

And the longer we have this bug in production, the worse off we're going to be.

So we have a number of different techniques for answering these three questions.

The first is that we can just design the system to be resilient to problems.

So we could design our system to have certain properties and abstractions that make it easy for us to debug at various levels, and make it harder for us to introduce bugs or other defects.

Another thing we can do is code review.

And what code review is is basically before you commit anything to master, somebody else looks at your code and decides whether it's OK or not.

Contrast that with audits, which are similar to code review, but that happens after you've committed the code into master.

There's also a great suite of static analysis techniques that you can use to improve code quality as well.

So common ones would be type systems and other linting tools.

In the JavaScript world, which is where I mostly work, that would be something like TypeScript and a linting tool like JS Hint.

You could add runtime assertions too.

So what that means is when you write a function, you add a bunch of assertions at the top.

And if those properties aren't true, then you throw an exception.

So you can use that for type checking in a dynamically typed language, or checking that the range of some number is within an acceptable bound, or that an array isn't empty, or something like that.

You can also add monitoring in production.

So if our little line graph drops below some threshold, we see an alarm, and we realize, hey, we screwed up.

We should fix this.

You have unit tests as well.

Everybody has heard of those.

Integration tests, which is another form of automated testing.

And manual quality assurance.

So somebody actually clicking on the buttons and testing the app.

Now, if I was to get real hand wavy and add more pseudo science to this presentation, I'd show a chart like this, which is my interpretation of these various techniques and what questions they answer.

So I wouldn't look too closely at this, but this is just my general way of thinking about these things.

What techniques do I have in my toolbox? And what questions do they solve? Do they solve, is there a defect, how bad is the defect, or how can we fix the defect? So I've introduced these three questions, a bunch of potential answers to them, and then I kind of mentioned this agile development thing.

How can we tie all of this stuff together and fit in these code quality techniques with agile development? So I'm going to show you an example.

Let's imagine that I'm going to write fine grained unit tests for a prototype.

So I'm at this stage.

We have a concept for what we want to build.

And now we're just starting to build this prototype.

So let's actually think about what a unit test really is.

"In computer programming, unit testing is a software testing method by which individual units of source code, or sets of one or more computer program modules together with associated control data," that's a little wordy, "are tested to determine if they're fit for use."

So what that means is you take your modules, and by some definition of modules, maybe that's a JavaScript file or a common JS module, maybe it's a function, maybe it's a class, and you just test that one unit as opposed to testing the integration of all the separate units.

So if that one unit breaks or doesn't behave like you want it to, the unit test will fail, and it will tell you, hey, this one specific module doesn't work.

This sounds pretty great.

This will tell us whether there's a defect.

It'll tell us where that defect is.

And it's so fine grained, it might even tell us how to fix it.

But the problem is when you adopt unit testing prematurely you get a lot of false positives.

And I like to compare this, kind of the downside to prematurely unit testing, to the downsides of the waterfall development methodology.

So you remember this model of software development where we have to constantly backtrack, or we have to plan way too much up in advance for each milestone.

And then these step functions just-- it doesn't scale as the software gets bigger and bigger.

With premature unit testing you start to see kind of a similar thing.

So I'm sure a lot of people have been in this situation where you take your prototype and you refactor it, and then after you refactor your prototype a bunch of unit tests fail.

And those unit tests actually aren't even valid anymore.

And so that's kind of annoying.

But when you're in a bigger organization with a prototype that's maybe sat around for too long, sometimes it's not even clear if those unit tests should be failing or not failing, which ones are OK failures and which ones aren't OK failures.

And you'll also see this with some static type systems as well.

You'll refactor your prototype.

The types won't check out.

And then you have to spend a long time getting the type system to agree with you.

Again, these are our three questions.

At the prototype stage we only care about the first one.

We care if our prototype works.

We don't care how bad the defects are within the prototype.

And we don't care how we can fix the defects, because we don't even think-- we're not even sure if we're solving the right problem yet.

So we only really care about this first column over here.

And the good news is that means we can pull any of these techniques off the shelf and try them on our prototype, because we're that early in the process.

So when I'm very early in the development process, and I'm exploring a new problem space that I don't know what the final design's going to look like, I actually really favor a technique called integration testing.

And integration testing is the phase in software testing where the individual software modules are combined and tested as a group.

And this makes a lot of sense when you're early on in the process.

Because you just want to know if your prototype works.

And I don't know if I'm going to keep any of the code around in my prototype or not.

I don't know if those individual modules are going to stay the way they are, or be thrown out, or kept around forever, or maybe reconfigured in different ways.

So unit testing might not make sense for an early prototype.

But integration testing, which tests everything at once, and it's pretty easy to write, usually makes sense, because it validates the vague idea of what we want our thing to do.

However, integration tests aren't necessary a panacea.

Now, imagine that we've gotten this far along in our development process and our system is in production, and now we're kind of in maintenance mode.

So when we were prototyping we built a bunch of integration tests that helped us get from prototype to production.

But once we're in production, we need to answer some more questions.

We don't only care that there's a defect in our code.

We also need to know how bad it is so we can prioritize this fix amongst all the other things that we need to do.

And we need to figure out quickly how we can fix this.

So we can triage this issue to the right team.

And integration tests don't help us with this at all.

Because it only tells us really if there's a defect.

And it might tell us how bad that defect is.

But since an integration test is testing so many things, it doesn't help us at all with this third bullet.

So it's not enough to help us out.

But even worse, having these integration tests around, and relying on them, and treating them as reliable, can be really bad when your system starts to scale up.

Because your integration test has so many interrelated systems, they're not only slow.

They also lead to a lot of false negatives.

So, again, if we go to this kind of method of development where we get paused at each milestone, if you rely too much on integration tests, you'll make a code change, and then you'll have to re-run your flaky tests until they pass.

And each run of those tests may take a very long time.

And that sucks.

So we've got all of these different testing techniques.

I think we need to add another column here.

Not only which questions do they answer, which helps guide which ones are appropriate at all to use at each phase of the development life cycle.

But looking at their costs as well.

So when we talk about designing the system better, we're talking about thinking through the abstractions that we're going to use, and deciding which ones are right and which ones are wrong for our system.

So this is why a lot of people use frameworks, for example.

Because a lot of these framework developers have already thought through a lot of these problems for you.

But basically, you don't know what abstractions you really need until you need them.

So spending a lot of time upfront designing this crazy, complex data fetching model, or permissioning system, might not make sense until you've actually used that data fetching, or used that permissioning system.

So what I prefer to do is get the product shipped, and then refactor your way to a cleaner design, after you've discovered the entire problem space.

Code review.

I think you should always do code review.

It's cheap.

It doesn't take that much time for somebody to review a code.

And it's also really important for team communication.

So this is the way that we on board a lot of new engineers on the teams that I've worked on.

We start by reviewing their code, teaching them some of the idioms in our code base, and then they start to review other people's code, which gets them exposed to other parts of the code base, eventually developing competency throughout the whole thing.

And the best part about this is that humans are involved in code review on both sides of the equation.

So we can vary the strictness of the code reviews based on which stage in the project life cycle we're in.

So for prototyping, for example, code reviews might be more of a like, hey, that looks cool.

I'm just going to accept it and be really lenient.

But we're also developing that competency in that prototype amongst the entire team.

Whereas later in the development life cycle, when we're in production, and it really, really matters if there are defects, we can be really strict on the code reviews.

I treat audits as less effective code reviews.

They're mainly effective when you need to bring in a team to review your code base after the code's been committed.

And the reason that audits are a little less effective than code review is because you have to hold a lot more of the system in your brain when you're going through an audit.

But sometimes they're just really needed.

When you bring in an external security consultant, for example, and say, hey, make sure my thing works and is safe for my users, really the only way to do that effectively is to bring them in after the code has been committed, and have them audit it.

Unless you're fortune enough to have a security team at your company.

Static analysis is another technique that we have.

And it can make it harder to iterate quickly if your prototype is incorrect.

So this is, I think, a source of a lot of tension between people who like dynamic languages and people who like statically typed languages.

Which is when you're programming a statically typed language, it's a lot easier to refactor.

You can move code around, and you're pretty much guaranteed that you didn't make a stupid mistake when you refactor.

But it also feels less productive for some reason when you're starting to build out your software initially.

And the reason for that is that you don't have a coherent design for your software in the beginning, when you're prototyping.

You don't know what you're building.

So you can't really rely on the type system to save you from bugs at that point, because you don't even know what you're building yet.

But as you start developing, and as you start kind of productionizing your system, you're going to want the freedom to refactor.

And you're going to want the guarantees that the type system gives you.

So I like to bring this in later on in development.

And this is actually what Facebook did as well.

So Facebook started out as this big PHP app.

And PHP plays fast and loose with the type system.

And later on Facebook's history, they brought in Hack, which is a strongly typed version of PHP-- statically typed version of PHP, and gradually adopted that throughout the code base.

So what products would do is they would start with this weakly typed PHP, and they would gradually, as they kind of understood their problem domain better, add these type annotations, and eventually get the guarantees that a type system can give you.

Runtime assertions.

I think they're always a great idea.

It's very similar to code review, in that it doesn't take a lot of work to add them to your code base.

And they're really great for communication and thinking through problems too.

I found that just adding a couple of runtime assertions to my functions makes me think about the problem a little better.

And when I submit that code for review, my code reviewer has a little bit of a better time thinking about this, because they can be guaranteed to know these assertions will be true.

Monitoring is your last line of defense.

It doesn't really make sense to bring that in until the end.

Because, again, you're prototype.

You don't know what you're building.

By the time you get close to going to production though, you better have monitoring so you know the success and failure modes of your system.

Unit testing.

I use that in my case study earlier.

When you start to feel like the modules or the units that you're going to be testing in your code start to feel permanent, it makes sense to add unit tests.

So a good example here is React.

The design of React has stayed pretty consistent for the past couple of years.

So unit testing each of those modules in a very specific way makes a lot of sense.

Because one change to one module might break just that individual module, but we can run those tests very quickly and in an automated way, and point to exactly which flaw or which commit messed up that module.

And this is what we're actually doing at my new company as well.

Because even though we're at the early prototyping stage, we understand the design of our system, and we know kind of this problem space very well.

So even though I didn't suggest using unit tests for prototyping in that previous example, if you know certain parts of your system upfront, adding unit tests can actually be a very powerful way of developing, if you think-- if you have enough confidence that you know how the system's going to end up.

Now, integration tests are one of my favorite techniques for improving code quality.

And I actually think it's the most powerful way to improve the quality of your product early on, except for code review, with the exception of code review.

When I joined the Facebook videos team it was in pretty bad shape.

So video encodes would fail sporadically.

It touched all these different parts of the code base.

And the people working on those parts of the code base weren't necessarily thinking of video at the time, because video wasn't that important back then.

So people would make changes to the code base all the time.

There was no test coverage.

Very bad monitoring.

So by the time we pushed the code to production we had the problem baking in there for about a week.

So we would have to roll back the push, and then we would have to fix that change, and then push it out again.

And it just really slowed us down.

So I just committed this test that encoded a video and waited on the other end.

So the video went into the encode pipeline, and went through all this PHP touch, all these different back-end systems.

And it exercised our cache code paths.

It exercised our DB code paths and our video code paths, and the read paths as well.

And finally, asserted on the other end that we got a valid video.

Now, this is not a very high signal test if you're doing unit tests because it's testing so many different things.

But as an integration test, it really helped.

Because any time somebody committed something that may mess up videos, it told us that, hey, you shouldn't commit this.

This is going to break videos.

So this actually really helped us kind of just like stop the bleeding on Facebook video.

But as we started to refactor the code base, and clean up the code, and split apart these big modules into smaller modules, we realized that this integration test was slow, and it would fail all the time for reasons that like weren't necessarily legitimate.

So once our code base was in a better spot, and we had more people working on it, and it was more modular, we were able to actually delete that integration test.

So they're really great early on.

But as you kind of evolve the software, you might want to rely less on them, or pull them out entirely.

And finally, if you can afford manual QA, it's great at every phase of the development life cycle.

I mean, if you're building a prototype, and your prototype is kind of a consumer facing or UI type of product, a lot of times it's easier to just manually test it as the engineer, or product manager, or designer, rather than write selenium tests or something like that.

But as you get towards the end of the product development life cycle, and if you have a QA team, it's really great to start kind of focusing on that towards the end, rather than try to like hand off prototypes to your QA team earlier on, just because the communication, and the thrash, and the features.

And so one of the ways that we kind of discovered that this was useful, at Instagram anyway, was we were redesigning our entire Instagram.com experience

a couple of years ago.

And we found that we had these just giant style sheets.

And I hate CSS.

I don't know if everybody else here likes CSS.

But I hate CSS.

And we found that if our style sheets loaded in in the wrong order, our UI would break slightly.

And that's because we had conflicting CSS rules that depended on the way that they happened to be packaged.

And that was just a really big problem, because we had just this rat's nest of CSS code.

So we relied a lot on manual testing.

And what we would do is we'd load up the page.

And we'd navigate from the feed to the profile, and from the profile to the feed, and if it broke, we knew.

And if it didn't break, we were good, and we pushed.

And that didn't scale.

So what we ended up doing was trying to automate away manual QA.

And we wrote this open source tool called Huxley.

And it basically records user interaction via WebDriver, and plays it back, and then does kind of screenshot testing based on those interactions.

So it verifies that the pixels are the same.

And what we found is that asserting that the pixels are the same isn't a great way to write tests.

Because, again, if you're moving quickly, those pixels are going to change all the time.

So what we found is instead of treating that as a test case, we treated it more like a part of the code review process.

So when we would submit changes for code review we would have a bot that would automatically go, run through these flows, take screenshots, and attach those screenshots to our code review tool for that diff.

And then we could basically very quickly manually review, and say, hey, do those changes look good, without having to spin up a development environment and actually test those manually.

So that was kind of this hybrid code review/ testing QA approach that I thought was pretty clever.

So I thought I was really clever in thinking about testing in this way, and using different testing techniques at different points in the life cycle.

But then I realized that I made a big mistake.

And I forgot to plan ahead.

So when I was at this stage in building some products at Facebook, we weren't doing a lot of testing outside of integration testing.

And we did code review.

But that was pretty much it.

And by the time we started building features, and by the time we started shipping to customers, we realized that our testing tools and our continuous integration and stuff like that wasn't quite ready for the phase of the life cycle of the project that we were in.

So when you're at the prototyping phase, just because you're not investing time into testing, or into fine grained unit testing, doesn't mean you shouldn't be planning on it a month or two out, or depending on how long your sprints are.

So I think it's really important to use the right testing tool for the job.

But also think ahead and realize that you're going to have evolve these over time.

And start that early.

Thanks for listening to me.

I hope that this was informative.

[CLAPPING] (host) Come along to the question lounge.

All right.

(host) Oh, I nearly fell up the stairs there.

I've been waiting for that to happen all day.

So how does test-driven development blend into the three questions approach that you had? So we're actually doing TDD for Authbox right now.

And it kind of goes against one of those first examples that I gave, which was when you're at the prototyping stage, which Authbox is at right now, if you don't understand the thing that you're building, you shouldn't write these fine grained unit tests.

But it turns out actually we understand pretty well what our product is going to be right now.

So TDD makes a lot of sense, for this particular product.

Now, for a lot of other ones, especially in my career, TDD made no sense.

I'm thinking for user interfaces in particular.

Those change so often that I found that TDD is just kind of-- it makes you feel good, because there's a lot of green in the console.

But it often wastes a lot of time, unless it's easily testable or a well understood problem.

So it was the churn in the interface that was making that not make sense for you.

Is that right? Yeah.

Interface churn was one problem.

The tools available for testing UIs are also-- I think that the state space in UIs is a lot bigger and harder to understand than it is in a lot of systems where TDD has been used successfully.

So like number one, humans are looking at the user interface, not computers.

And so you can't like automate a human.


But if you just have a back-end system, you can understand how that data's being consumed.

And the other thing is that there's just a lot of different states that UIs can be in.


So not so much on the TDD side, but what testing tools have you used for testing sort of the more visual side? Is there something you've done? (pete hunt) So we built this tool called Huxley, which is on the Facebook open source page.

And that started out as like a set of helper methods for selenium.

And we ended up making it more of a code review tool.

There are other visual regression testing tools from, I think, "The Guardian" might have one, or-- The BBC did as well.

(pete hunt) Was it the BBC? OK.

(host) Is that something you've got experience with? Or have you found those useful? I mean, they're used in a different way.

So a lot of companies, I know that Google does this as well for some of their UIs, they have like a setup-- They call them goldens, where they're screenshots that we say, this is how the UI we shall look.

And then we assert that the UI always looks like that.

And that's something that would be great when you're in production and you know what you want to build.

But you want to hold off on that until you actually understand what your product is.

So when you're prototyping, moving quickly with a small team, you probably don't want to do that.

So you were talking about auditing and analysis.

If the failure is at that point, the code's already in master, right? So what's the strategy there? For auditing and what? Yeah.

So if a commit goes to master, it's past review.

But like the analysis phase, or the auditing phase, it fails at that point.

Is the code already in master at this point? I think at a lot of places it would already be in master.

I want to say that, I don't know this for sure, but I think that they develop Windows with like very, very long lived feature branches that last for like years and then get merged in.

So I'm not sure how that works.

But on all the projects I've worked in, it's all about minimizing the cost.

So if you can catch it at compile time, that's ideal.

Right, you type JS Hint, and it catches your bug.

That's amazing.

Then the cost goes up a little bit if you have to manually run code and click on a button.

That's a little bit of your time.

Then if it gets into master, and we have to catch it in QA, and then it has to get kicked back to engineering, and then kicked back to QA, that gets really expensive.

So it's really about how can we minimize the cost of these flaws.

(host) I found that some static analysis tools would sort of generate what look what kind of error severity when it is more of a style issue.

So do you block things like builds or pushes based on these analysis tools failing? And what's the process then if you want to say, no, actually this bit's OK.

We block commits on JS Hints in the JavaScript side.

And I think-- So we're not blocking on type checking yet at Authbox.

But we use TypeScript at Authbox.

And we have Travis basically continually building our code base.

So we'll eventually catch that in CI.

But it's really up to the engineer to individually like make sure it compiles, and make sure the tests run.

So what have been your experiences with TypeScript then? Has that improved things on the testing side? I guess that gives you a better heads up on potential errors based on type, right? TypeScript is cool.

I find myself fighting the type system a lot of times.

So we have a big code base that is using a lot of promises.

And promises are kind of this-- it's a promise of type T.

And a lot of times it can't quite figure out those types correctly.

I haven't kind of dived into why.

So what we ended up doing is we sat down for a while and we thought about the design of our system.

And the parts that took a lot of thinking about, and talking about, and nuance, we implemented those in TypeScript.

Because we wanted that to compile.

We wanted the type checker to really give us that sense of integrity.

But then for a lot of the plug-ins that plug into this general framework, we would just hack them out in JavaScript as quickly as possible, because a lot of them were pretty basic, regular expressions, or something like that, that plug into this framework.

So we use TypeScript for the parts of the code where we want to move really slowly and really confidently.

And we use JavaScript without any sort of type checking for the parts where we want to move quickly and we care a little less about integrity there.

So when you're using Promises, I mean, one of the complaints that I've heard, especially with the ES6 Promises, is it kind of gets in the way of monitoring.

So if you have a promise that rejects-- unless you have something in the promise chain explicitly catching and doing monitoring, you're not going to get window.onerror.

Has this been a problem? Have you got workarounds for this? So we were fortunate to be working mostly in Node.

And we can kind of pick our platform there.

So I noticed that two with Promises.

And that's a complaint that's commonly thrown at them.

There's a Promise implementation called Bluebird which hacks into, I think, some of the V8 internals, but I think it also works in SpiderMonkey as well, through some sort of magic.

And it does two things that I think are really great.

First, it has pretty good default, uncaught exception behavior, and lets you register handlers for those.

And the second thing is that it gives you really could stat traces, which is another thing that people complain about with Promises.

And I've found that the really serious libraries that I really like, they think a lot about what the error cases are.

And when you're a developer, like how long does it take for you to figure out what an error message means.

So I can't say enough good things about Bluebird, because those error messages are just so clear.

I mean, you call promise.longStackTraces pren,


And then it's like, oh, stack traces make sense.

This is pretty cool.

So one of the-- Well, you suggested with the Promises stuff that if a promise rejects, and it has no handler, either away, then that's when we should be sort of logging to the console, or to window.onerror that

there's been problem.

But the problem with Promises is that you can attach these handlers sometime later, right.

You can handle an error 5 seconds, 10 seconds later.

Have you found that to be a problem? Because I'm not sure if this is a problem that the real world has? So I don't actually like Promises that much out of the box.

I think it's better than callbacks, just because it flattens your code out a little bit, and it makes you think about error handling a little more.

But we actually use Promises with this thing in ES6 called generators, which I think a lot of people probably been talking about lately.

And so Bluebird has this thing called bluebird.coroutine, which

will turn a generator into a promise.

And basically you can yield a promise.

And then it will either send the return value back in.

Or it will throw the exception, using the traditional exception throwing mechanisms in JavaScript.

And so that actually gives you kind of the exact same exception semantics as you would in regular JavaScript.

So that's been awesome for us, and is also one of the reasons why more of our code base isn't in TypeScript.

So our kind of core framework where we plug everything into, that's written in TypeScript without generators because TypeScript doesn't support generators yet.

(host) Ah, OK.

But we're just really careful with constructing promises and handling errors in that part of the code base.

And then the rest of our JavaScript code base is written without TypeScript, but with generators and promises.

(host) So you were talking about unit tests versus integration tests.

I've always found that integration tests are really useful as an indicator of good API design.

Because especially if you're writing the tests before you create the stuff, you can use it as a playground for how APIs might interact.

And you can spot a bad API before you've written any actual code to use it.

Do we really need unit testing at that point? Can you get away with just a good, solid set of integration tests? Well, I mean, the difference between unit tests and integration tests is just the size of the unit.

Like you're talking about either one function, or a couple of functions that call each other, or like an entire system.

So I don't know.

I like to push integration tests for a while, because you get a lot of bang for your buck with integration tests.

And it's really when you start seeing those tests breaking all the time, and then you're not sure why they're breaking, at that point you need to be ready to start being like, OK, we're going to test things in a more fine grained way.

And so the trick is to be able to see that coming a little while in advance and start thinking, OK, we're going to need to start unit testing our individual modules pretty soon, because these integration tests are starting to get slow, or they're starting to get flaky, or we're pretty sure that we know our code base is going to look like this for awhile.

And so then you want to start writing code in a way that can be testable.

And a lot of people have spoken about different ways to make your code testable using dependency injection, or a service locator, or something like that.

So that was one mistake that I made in my career was like pushing the integration tests for too long.

And then when it was appropriate for us to use unit tests, we were like, oh great, we wrote all this code.

And now we have to rewrite it to use dependency injection.

Like that is not fun or a creative endeavor.

But if you're writing unit tests after the fact, you're going to miss some [INAUDIBLE] cases.

You're not going to get 100% coverage, right? How much of a problem is that? With getting 100% test coverage? I actually like to use code coverage tools on the unit tests only.

Not the actual code being tested.

Because the number of times that I've written a unit test, and it's been green, and I've like committed the code, and then realized like part of my unit test wasn't running because like a promise wasn't fulfilling, or some condition wasn't true that I was expecting to be true, is actually kind of a problem.

So I wish that all these test runners out of the box would just like fail a test if it doesn't have 100% test coverage.

Because I can't imagine a situation where a unit test wouldn't have 100% code coverage of itself.

But within the code base itself, I mean, again, I think it's one of those situations where you see a lot of green in the console, or you see that 100% code coverage, that indicates you wrote a lot of tests.

Does it indicate that you covered 100% of the state space of your application? Absolutely not.

So when it comes to like runtime assertions, is that running on production servers? Is that being shipped out in the JavaScript that sits on the client? And how much of an impact does that have on performance and file size? The runtime assertions? (host) Mm.

So when I was kind of learning about all this stuff, one rule of thumb that they have at Facebook is that code in production behaves exactly the same as code in development.

And what they mean by that is that there's no different branching in production that there is in development.

So that means that you can reproduce, if you put the same set of inputs into your development environment as in production, you can repro the bug and fix it.

And that's like saved our asses a bunch.

With that said, if you have these debug, these runtime assertions, that means you have to run those assertions in production, which like might be slow.

So if you use React, for example, we're pretty performance sensitive on the React team.

And so our runtime assertions for user code, and things that are expensive, they're actually runtime warnings.

So they don't throw exceptions.

They just debug the block to that console.

So that's one thing in the cases where it is a performance problem, we just log that to the council.

Because React uses a different build for production than it does for development, right? Right.


(host) Is that why? Is that really for performance reasons? Yeah.

Pretty much.

It's for runtime performance and JavaScript byte size.

So our debug assertions, they have these conditions, and then they have like these English language strings that we try to make pretty descriptive.

And so even for the ones that we're running in production, we strip out those friendly error messages.

And we actually save a nontrivial amount of bytes.

Even after GZip.

That's really cool.

Thank you very much.

Pete Hunt.

(pete hunt) Thanks.