Daniel Espeset - Making maps, the role of frontend infrastructure at Etsy

Fronteers 2014 | Amsterdam, October 9, 2014

The scope of Etsy's frontend is massive - our JavaScript codebase has grown by 50% in the last year to more than 3,000 files totaling almost 800,000 lines. Because of Etsy's dedication to continuous deployment, the code running in production changes 25 to 50 times every day. Because of our experimentation-driven development cycle, there may be multiple production versions of our features at any given time. These factors can lead to uncertainty and fear with rolling out upgrades, deleting old code, or confidently making changes. We'll see how the Frontend Infrastructure team works within this ecosystem to mitigate those risks, manage the asset build pipeline, builds tools to understand our frontend, automates migrations & deletes as much code as possible.



--here to tell us how it is done properly.

Give an over enthusiastic welcome to Daniel Espeset.



Good morning.

This is lovely, isn't it? Thanks, Jake, for that lovely intro.

I once, early in my career, brought down a website that I was working on because we were live editing files on the server.

There was only one production server.

And I had turned ONTAP to click on my track.

And while scrolling through the files, I apparently had accidentally grabbed something, and dragged, and dropped it, a whole folder of code.

And it took quite some time to track down exactly where it had gone.

So I'm incredibly sympathetic to these kinds of version troll difficulties.

So I'm really excited to be here.

Thanks so much for fronteers.

What an amazing conference this is.

This is cool.

I've definitely never spoken to this many people and never in a room as beautiful as this.

These little boxes on the sides are incredible.

I feel like I could live in one.

So my name's Daniel Espeset, as Jake said.

And it says up here in big letters.

And today I'm going to be talking about the role of frontend infrastructure at Etsy.

So the slides of this talk and some associated resources are available here at talks.desp.in/fronteers2014.

Some of the resources here are links to other talks that talk about some of the things I'm going to discuss in greater detail.

This page is totally just HTML with no styles.

In other words, it is fully accessible and totally performance optimized.

So I work on the frontend infrastructure team at Etsy in Brooklyn, New York with these fine folks.

They're all here in spirit.

Etsy is a marketplace.

People can sell and buy handmade goods.

This is a lovely pattern for a cross stitch of Amsterdam that you can buy.

It's a peer to peer marketplace.

And we emphasize the interaction between real people, real people selling, and making things, and real people buying them.

So in addition to the marketplace itself, we have a rich social features, such as a messaging platform, and forums.

And you can follow people in favorite shops.

And we have activity feeds and recommendation systems, and on and on.

So we have one million active sellers on the site, more than one million active sellers, and more than 26 million active listings.

We're pretty large.

And these numbers provide some context around that.

But more important than this kind of scale is the scale and speed at which we develop the site.

To that end, I'm going to all a brief tour of how Etsy engineering is organized and how engineers at Etsy do their work.

The first important concept is that we don't believe in silos.

Anyone can touch anything.

Every engineer works on a virtual machine that is set up with a complete development copy of etsy.com.

All engineers have root access to every other engineers VM, to most of our infrastructure, and the majority of our production boxes.

For things that you don't get automatic access to, it's usually just a matter of asking ops.

So generally speaking, any engineer at Etsy can change any part of our production code base or our infrastructure at any time.

So when things go wrong, we have blameless postmortems.

We trust that engineers would never do something that they knew would bring the site down.

So when an outage occurs, we see them as exciting new data points about are complex systems.

They lead directly to remediations that often involve better alerting, better tooling, automated tests, monitoring improvements, cultural exchange, like and lunch and learns, and other social topics.

So attending postmortems is a serious high point of being an engineer at Etsy.

So everyone deploys.

Everyone deploys their own code and pushes it through the whole process into production.

They set up monitoring, watching the production system once things are running to ensure stability and performance.

Everyone writing code is pushing code.

Everyone pushing code is monitoring our production systems.

Designers are included here.

So our designers who work on the website work directly as part of engineering teams.

And they write and push code to implement their designs.

And they also monitor the site for breakage.

So the way that we do all this is through a process called continuous deployment.

This is also known as the button.

So any engineer at Etsy can deploy the site to production at any time by pressing a button.

The philosophy behind continuous deployment is that making many small changes regularly is much less risky than making massive changes all at once.

We achieve this with feature flags and monitoring.

And let's look at an example.

So say I want to make a change to how our search page looks.

First, I'd add a new feature flag to fork the code at run time.

And it's defined in this configuration file.

Here, it's enabled to 0, so no one.

Then I would check for it in the search results code, and default to the existing behavior if the feature's not enabled.

So since this flag is turned off, no visitors will be able to trigger this empty code path.

We coordinate pushes via special IRC channel with some bots.

The bots don't enforce anything programmatically.

They just update the channel topic to reflect the current state of the push.

So the whole thing is a social construction.

You dot join a push train.

If you have a first person in the train, then you're the driver, and you're responsible for actually running the push.

When it's your turn to deploy, everyone in your train pushes their change to master.

And they report that they're in.

And then you load up this web interface that we use to do deploys.

It is called Deploinator.

It is one of several tools in our suite that is an inator.

But this is the only one I'm going to talk about here.

Although, there's also schemanader for making database changes, and some other inators floating around.

Also, I lied before.

There are two buttons here.

So the first one is "Get Saved by the Princess."

It builds the production version of the code base.

It performs translations, compiles our JavaScript and CSS, deploys the production server that is only accessible internally, which we call princess.

During this process, we also run a full suite of continuous integration tests, the results of which are shown in the IRC channel, and also here in the Deployinator UI.

During a deploy, there's tons of logs flying past here, although this screenshot, they are empty.

So once everyone pushing is given the OK, and the CI tests have passed, we press the second button, which actually puts the changes into production on Etsy.om.

When pushing, we use a tool called supergrep to watch the production logs in real time.

This basically tails every production log from every box in our system.

And when things go wrong, you will know immediately, because this will flood with errors.

We have tons of graphs.

We have so many graphs that we even have graphs about how many graphs we have.

And there's the dashboard up, particularly important ones to watch when you deploy.

And these little vertical lines you can see over here indicate deploys.

And so it's easy to see when an unexpected change follows your push, or to track down new issues to particular events that have occurred.

In addition to the deploy lines, we have other lines that show things like infrastructure rollouts and other events that affect the system as a whole.

So this entire process usually takes less than 15 minutes.

And it happens 50 times a day.

So now that flag that we added is on Etsy.com.

It's not doing anything.

There's not even any code in that path yet.

But now we're ready to iterate on the prototype.

So we can make small changes.

And we'll push those into production as often as possible.

The smallest possible change that can be pushed should be punished.

If I'm working on this feature with a team, my team and I will coordinate the work that we're doing by pushing to production.

Work rarely happens on branches.

Once we've got a bit of code in place, I'll configure the feature flag to be on for me and my teammates in production.

Once we have a fully featured change, we'll turn it on for all of the Etsy employees.

They act sort of as canaries in the coal mine.

And eventually, we can put a segment of real users onto the new code path.

We originally used these flags to do staged rollouts of infrastructure, slowly ramping up new infrastructure, first to 1%, then 15%, then 20%, and eventually 50%, and so on.

So this is now set to be shown to 50% of the users that visit Etsy.

So pretty soon, we started leveraging this to do A/B testing.

Now, we've taken this approach is very much to heart.

And the way we make changes to the site and add features mirrors the way that we push code.

We prefer to do it in as small increments as possible so we can evaluate their effect.

Dan McKinley, who's a former principal engineer at Etsy, and one of the architects of this model, calls it data driven products, or continuous experimentation.

This chart shows the evolution of this technique at Etsy.

So whereas before you might say, wouldn't it be cool if we had infinite scroll in our search results? Infinite scroll is great.

You don't have to reload the page.

You get all these results.

And then you go, and you build infinite scroll.

And it's this complicated feature.

And then when you're all done building it, you run this A/B test.

And maybe it fails.

And then you throw the whole thing away.

And everybody feels bad.

So now, instead of saying, wouldn't it be cool if, we say, would engagement increase if we had additional results on the search page.

When you frame the question like that, it becomes easy to test.

You just add an extra row of results to the search page.

You don't need continuously updating fetched assets from the server.

You can just add a row and run an experiment.

So adding experiments is easy.

And we try to make that as frictionless as possible.

We have lots of tooling around calculating statistical significance, and doing all kinds of different analytics.

And there's an analysis team.

And we do quite a lot of this.

It's now common to run dozens of experiments during the development of a new feature.

Something you learn when you start building products this way also is that most experiments fail.

You throw away the majority of the things that you think will work.

It's hard to learn how to not become attached to good ideas when it turns out they're bad.


So to recap, engineers are trusted and have massive access.

The code base is constantly changing.

The code base is filled with experiments in various states.

Let me be clear.

All of this works extremely well for us.

Event driven, continuously deployed code is amazing.

This experimental product workflow, it has been a revelation.

It also has some interesting downstream effects for our frontend companies.

So these graphs show the growth of our JavaScript and CSS over the last 12 months.

Our JavaScript has grown from about 2000 files to more than 3,100.

And that's more than 800,000 lines of code.

And our CSS code has grown from 1,400 files to more than 2,000 at about 360,000 lines of code.

So this may be par for the course for young startups.

But Etsy was founded in 2005.

So we're nearly a decade old.

I don't know if this kind of growth is typical of companies our size and scale and age.

But it surprised me.

It continues to surprise me how quickly we add code.

Etsy is made up of 300 different pages.

I say pages here because things, as you know, can become a bit ambiguous.

But basically, there are 300 different end points that receive requests and load their own asset bundles.

So by and large, they're very traditional.

They're rendered on the server.

Interactivity is provided with JavaScript using jQuery.

And there's a large scale single page app that is powered by Backbone that is rolling out soon.

But that is very much the exception and not the rule.

As we've seen, at any given time we may be running dozens of experiments per page.

If we assume that these experiments are all binary in nature, which they're not, and they're either on or off, then that's 2 to the power of 12, or 4,096 experiments on any given page.

So that's 4,096 different possible combinations of experiments because of the way that we bucket users randomly for each experiment.

If we multiply that number by the number of pages, we have 1.2 million

possible combinations that can result in HTML being delivered to any single user.

This number is a totally hand waving estimate that I completely made up, but is, if anything, conservative This means that reasoning about the frontend can be really hard.

Our code base is constantly changing.

Our code is always branching due to experiments.

So static analysis is extremely difficult.

Tests that use things like headless browsers are almost impossible for us to automate in a time frame that makes them useful.

Deleting code is scary.

A few years ago, an engineer deleted an unused IE specific style sheet.

When he pushed out the commit that deleted the file and the reference to the file from the pages, the push went out to the web servers.

And the file deletion happened before the code that reference it was updated.

And in that instant, requests that were being processed tried to load a page that references a now missing CSS file.

So they errored.

And that would be OK, except that the error page also included a reference to this file that had not yet been removed.

So it also errored.

And this went on to the point that it brought the site down for one hour.

And all of the web servers had to be power cycled in the data center.

So deleting a CSS and bringing down the site is about as scary as it gets.

So testing changes is difficult. A year ago,

just tracking down all of the places where an individual CSS module might be bundled into other CSS files and ultimately shown on a page could be really complex.

It involved many steps of graphing different parts of the code base, and was faulty because we don't know what code paths are on or off, what experiments may have been thrown away but not yet properly cleaned up, and so on.

So enter frontend infrastructure.

The frontend infrastructure team is about six months old.

Before we existed, we were part of the performance team at Etsy.

Performance is a natural precursor to frontend infrastructure, because things that we take for granted as being part of our frontend infrastructure now, like bundling, compiling, and optimizing JavaScript and CSS code, linting it, and so on, many of these things are at their heart performance optimizations.

So the performance team has maintained the Etsy asset build pipeline for a few years.

And it's out of the work on this build pipeline that the frontend infrastructure team came to be formed.

Because the build system takes the entire frontend code base as input and outputs the production build, the people who are responsible for maintaining this are uniquely interested in the entire space of our frontend code base.

This is the only place in our system where anything depends on being able to process and operate on the whole scope of what we're doing in our JavaScript and CSS.

In some sense, the frontend infrastructure team just leaves right here, right at this point.

It's the nexus of the whole frontend.

It turns out that this is a really interesting place to live.

And so what kind of things can we do here? So I'm going to explore this space by talking about a few projects that we've undertaken over the last year.

First, is our in house asset pipeline.

It's called Builda, which stands for build assets.

It as two operational modes.

The first is as a service on developer VMs, where it builds assets on the fly for development.

The second is in Deployinator, where it's used to do the full production build as part of the deployment process.

Years ago, Builda started life as a Python script.

This predated any standardized module system.

We had lots of home role async loaders, and attempts at module systems, and sort of half baked PHP sprocket implementation.

And there was a lot of things that happened in that time.

The Python one was eventually replaced with a PHP version.

And about a year ago, we rolled out this new version of no JS power Builda.

So this version of Builda is essentially a wrapper around the required JS AMD compiler, which parallelizes is the work being done across CPU cores.

It also layers in support for these legacy loaders, and generates file version numbers for cache busting, and so on.

Because of how often we deploy, this whole thing has to be really, really fast.

Because right now, we're at about two minutes, making it one of the slowest parts of the deployment process.

I get harassed about it constantly at work.

And it's second only to actually syncing all the files to the web servers.

So since our code base is growing so fast, it just gets slower and slower.

And we are still working to optimize this.

Given our development, our goal was to serve a full production build of the code opaquely on developer VMs.

So we wanted to run a persisted service that developers would not have to even know existed, that would allow them to save JS, pull the repo, and just always be able to have access to a production level build of their assets at any time.

So how do we do this? Here's our options.

We could build on the fly in response to HTTP requests for asset files, right? So we used to do it this way before we introduced AMD.

But now, individual files take too long to compile for this approach.

So we could watch the file system for changes, and do a complete rebuild when source files are saved.

There's like Grunt and Gulp plug-ins, and lots of module system compilers that allow you to do this.

However, because a full build takes several minutes to complete, this is not viable either.

So we could do the same thing, watch for changes from file system, but then only rebuild the targets that include the changed code.

And this ended up being the only option that fulfilled our requirements.

But it is complicated.

It is complicated because it means Builda needed to get much smarter.

Suddenly, it had to understand how all of our source code relates to one another.

Here's how it works.

When a source file is changed, it can be saved, or maybe just pulled in fresh changes from git.

Builda is notified.

Next, it looks up the dependency information in this dependency graph it stores in memory.

It updates this data every time it builds a file.

Once it knows what output files depend on the one that is changed, it updates them.

This solution gives us something really valuable, which is the dependency graph.

Because we needed people to be able to refresh the page is on their VMs and get updated quote quickly, we built this complex solution.

And what we didn't really fully appreciate was that this really valuable thing is going to fall out of it.

So we produce these every time we deploy, and every time an asset is saved on any VM.

The dependency graph is a simple structure.

Every file in the source code has a top level entry with some metadata, a list of child modules, a list of parent modules.

And these give us a complete representation of all of the JS assets in our systems and all of interrelationships.

Pretty soon, we were using this to look up which files might depend on others, to track down bugs, enable better testing, and understand the space of complex bundles.

This proved so useful, we built it as a standalone tool, Ranger.

Ranger takes this type of graph one step further and allows us to explore all of the interconnections between our frontend source code and the pages that include them.

We achieve this by combining three of these dependency graphs, the one that we just looked at from JavaScript, one that we now generate explicitly for our CSS code base, and then one that we generate by augmenting our render tier with distributed tracing that effectively logs all of the unique assets seen for a given page across all of the loads that happen on the server.

So here's the Ranger homepage.

You're going to recognize these charts from earlier.

This is where they live in our system.

Down below this are some aggregated statistics.

If we look up the Etsy homepage, we get these results.

On the top are all of the CSS files that we've seen this page use.

And below is the same for JavaScript assets.

Each of these lists is totally comprehensive.

So the files that are actually included on the page are expanded using those dependency graphs to include all of their children in this view.

If we click one of those JavaScript files, we get this view.

And there's a search ability there at the top that I'll come back to.

Below that is some git information about the last commit, a message, a link to the commit on our internal GitHub, some JS hint warnings, the results of some code complexity algorithms that we run.

If we scroll to the page down a bit, we have this back reference to all of the pages that include this JavaScript file, and which parent of this JavaScript file is actually being loaded by the page.

Below that is the list of files that this is a dependent of, and below that all the files that it itself includes.

Here are the results for CSS.

They're largely the same.

We have this summary at the top with the total number of selectors that we're defining in this file, the total number of imported selectors, the total number of files that we're importing, and so on.

If a CSS file includes a sprite image, we link to a special result page for that sprite.

We're trying to remove all use of all sprites across Etsy, and replace them with an icon font and SVG.

And we're using this-- creating these as a top level entity in Ranger has allowed us to really quickly establish what the state of our sprite removal process is right now, and where they're still being used.

Here's an example of the localized search functionality.

So here, we are searching for click in the gift card file.

And it gives us results that match in this file and all of its children.

So we've also integrated Ranger into Chrome dev tools.

We have a developer extension that we use for a handful of different things.

And we added this.

It gives you a menu here on the left of all of the resources that are loaded on this page.

And you click them, and get the full range of results right in the dev tools.

In addition to these UIs, Ranger has a full API for accessing the data.

And there's a command line interface installed in all the VMs that people can use to get this data in JSON format.

And that has been leveraged to build VIM plug-ins, and Sublime Text plug-ins, and so on.

So we got something else really valuable when we build Ranger, which is the set of files that aren't being used.

Before we had compiled all of this information, we had no way to detect what file simply weren't being loaded on the site at all.

So because we now have tracked every interconnection between pages in the files in our source code, we're able to surface several hundred unused files.

And removing these, of course, it reduces mental overhead for our developers, it speeds up deployments, it keeps our code base nice and tidy.

So this is great.

So once we got here, we had another question, which was, can we determine when just some of the file isn't being used? So to end, we built a new tool called Shrinkray.

This is still very much in the early stages.

But we are optimistic that it will be extremely useful.

So Shrinkray is a script that we ship to a small subset of users that analyzes CSS usage in the browser.

Here's how it works.

First, it picks a style sheet on the page.

Then it randomly picks 50 selectors via the CSS object model.

Now, due to cross origin request restrictions, this is only possible if you serve your CSS from the same origin as the page itself.

Then we search for the Dom elements that match the selectors we've randomly chosen using document.query selector.

We send the results of this test to the server.

And then we aggregate the data using a map reduced job in our big data stack.

Here's what those results look like surfaced in Ranger.

So for CSS files with apparently unused selectors, we show this summary at the top.

It includes each selector that does not appear to be used, the total number of bytes that this selector plus its rule makes up-- and they're prioritized by larger sizes-- and then the line number that it appears on.

So as I said before, this is still early days.

But I am I'm really excited to see how far we can get with this approach.

One thing that we've started doing in addition to this, is before we get here as part of the analysis, we also parse all of our JavaScript code into ASTs, and search them for strings that match, apparently unused selectors, so that we can rule out dynamically applied classes.

Finally, I'm going to discuss our motion recent endeavor, which is introducing SaaS, and migrating our code base to use it.

So first off, let's consider where we're at with our existing CSS code base.

We have an existing legacy in-house preprocessor as part of Builda.

It just inlines imports and adds version strings to things like image assets.

We've been linting for syntax errors.

But this was introduced fairly late in the game, before I arrived, but late enough.

So only complains if you try to introduce new errors.

There are thousands of existing problems that were never remediated.

So we have this kind of sprawling mess of Legacy CSS.

And our linter exists, but it's super tolerant of all of these existing failures.

So to give you some idea of the scope of this sprawl, this is every color defined in our CSS code base.

It is sized here based on how many times it is defined in our CSS code base.

So at the top, these rainbow rows are all colors that only appear once, exactly once.

And down here at the bottom, these gray and black bars extended several thousand pixels off the screen that way.

So it's clear that we need something like SCSS.

So why haven't we made the switch before now? There are a couple reasons.

And one is that there wasn't a team to take this on and really do it justice.

The second is that SCSS can be pretty frightening.

It gives you a lot of rope to shoot yourself in the foot with.

So we decided to take the following steps.

First, we wanted to restrict the SCSS functionality, no extends whatsoever.

We will never allow extends in our code base.

The semantics around extends are complicated.

It can lead to lots of surprising effects when you output the code.

And it causes problems when you try to use it in conjunction with media queries.

No mixins to start with.

We want to tightly control the way that we roll out mixins.

And we want to start with a core library.

And that has not yet been built. So for now, we're

not going to allow it.

We have hard limits on nesting levels.

We decided do all of this through lints.

We have lints to enforce these.

And we also wanted to have strict code style guidelines.

Frankly, I don't care what code style we use as long as it is consistent.

So we basically picked what the linter had to find, and picked and chose some styles.

So we wrote a converter script to turn our CSS into normalized SCSS.

We had to converge on the SCSS import syntax.

We had a bunch of different ways of defining imports.

And additionally, we had to fix 171,000 existing SCSS lint errors.

This converter script does that.

And then it also has to resolve hundreds of CSS rules and selectors.

This was easily the most time consuming part of building this conversion script.

So consider this.

Browsers are incredibly resilient a bad CSS code, and attempt to interpret it anyway.

If it can't make sense of it, it will be ignored.

However, the SaaS compiler is brittle.

It's a compiler.

And it will error out for syntax errors like this.

So thus, this dilemma.

Our code base is full of things like this.

If we remove it, we might break something.

If we fix it, we might be introducing a style that's never actually been applied, and break something that way.

So does anybody know what Chrome does with this RGB? Jake? Does this get applied? Can anybody? Yeah.

We didn't know either.

And we came across a bunch of these and had to figure it out.

So the answer is yes.

This is read.

This is totally fine.

This one is nothing.

It's blank.

That will not be applied.

So ultimately, our converter has a ton of code to handle all of these edge cases.

And given all this, we really wanted to figure out a way to gain maximum confidence in making this change.

One of the hardest things about working with CSS's infrastructure is that it fails silently.

It's failures can be incredibly destructive and almost impossible to detect.

And this is where people will often say, well, why don't you use a perceptual differer, and try to do image regression tests? And the answer is, because of all the branching in our code base, we can't possibly do that.

So how do we gain confidence in making a change like this? So to this end, we built a different tool.

And here's how it works.

We take our existing CSS code base.

And we compile it using our existing tooling.

So we get the build that we have started with the whole time.

Then we also run it through the converter script, and then through our new SCSS build pipeline.

And we get this new CSS build.

Then we parse both of these to CSS ASTs.

And we dif the ASTs to try to surface any functional variation between our previous output and our new output.

We surfaced a handful of things.

And by iterating on this and dropping the differ down to zero, we were able to gain a tremendous amount of confidence that we're not going to break something when we roll this out.

So how are we going to roll it out? So first, we added the converter and the new pipeline to our deploys.

We're shipping both the old build and the new one to all of our web servers right now.

But we're only actually serving the old one to our users.

Literally, later today the team is going to turn this code base on for admin, simultaneously enabling it for all of the development DMs.

So special thanks to fronteers for inviting me to talk here and not be in New York handling this switch.

I just get to talk about it on stage.

It's much better.

I'd make that trade every time.

All right.

So if things look good, and everything ends up running smoothly later today, we'll do a staged rollout until we hit 50% share for both versions of the code.

Now remember, at this point, all of our engineers are still just writing CSS.

There is no SCSS in our code base at this time.

Eventually, we're going to turn this on for everyone.

And then we'll run the converter script, and commit the results.

We're going to throw out the old pipeline, and we'll have a clean lintable SCSS source.

And people will be able to start writing SaaS code at Etsy, within reason, of course.

So that is what we've been up to this year.

So what have we learned from all these projects? So the first one is, disposability is way more important to us than modularity.

Optimizing for the ability to dispose of code is really hard.

Bill Scott at Paypal has done great writing about this.

He calls it throwaway ability.

We throw away a lot more code then we reuse things.

We were throwing away code all the time.

We're writing code that isn't and shouldn't be intended for reuse to prove or disprove theories about the way that our site works.

So everything that I've talked about is specific to our architecture.

But the frontend is getting more complex everywhere, not just at Etsy.

And we need new strategies to manage it and understand this complexity.

Even if you don't have a monolithic application like us, at a certain scale, any system requires tooling to understand.

I guarantee you that there are things you're working on now that you take for granted as being knowable.

But that will become unknowable.

So we can only gain full insight into what's happening on the frontend from the frontend.

Simulations and static analysis can only get us so far.

The browser is part of a distributed system.

It is not just a client that we publish content to.

So to this end, we have a lot to learn from operations.

If you work in an organization that has an operations team, try to get on their mailing lists and read everything they send out, and then he'll every good idea that you can.

When you build a new tool to gain an understanding of a problem domain, I guarantee you will uncover new questions to ask.

Being able to extend the tool that you use to raise those questions, to answer them, is a huge optimization.

Building tools that are flexible is a massive win .


So that's what we've been up to this year.

What's next? I don't have any fancy slides for this bit.

I'm going to muse a little bit.

One of the things that we're planning to do is try Shrinkray for JavaScript, serve a gently instrumented build of our JavaScript code to some subset of users to try and get an understanding of which code paths you might be executing and beacon that data to our servers.

Our error handling situation can use work.

Something that I would love to see in the future from browsers is for window.onerror to

respect the cross origin request headers, so that we can get access to proper line numbers and column numbers from JavaScript that's served from non-origin domain.

We're going to be open sourcing as much of this as possible.

A lot of the tools that I talked about at the beginning are already open sourced.

And we are planning to open source as much code as is applicable outside of our specific architecture related to the work that we've been doing on frontend infrastructure.

We're currently evaluating the way that we do rendering in the PHP side.

And that will have downstream effects in the way that we throw away code, and the way that we analyze it.

And we want to make more better style guides.

I'm not exactly sure where that's going to go yet.

But we have yet to find a method of having a style guide that's componentable, but also allows us to do this type of experimentation that we've been talking about.

And that is big problem that I think we're going to put a lot of resources into this year.

That's it.

That's all I got.

Thanks very much.

[APPLAUSE] Come and join me in the conventional lounge.

Oh, yes.



My first question, you were talking about SaaS and how it gives you a lot of rope to shoot yourself in the foot with.

I'm excited about this new weapon.

Can you tell me more? [LAUGHS] Yeah.

So one of the reasons that we've been resistant to rolling out Saas up to now is because of the emphasis we place on the ability for anyone working on our code base to be able to operate in any of the domains.

So in general, a designer at Etsy is writing CSS, and HTML, and maybe some JavaScript, and they're pushing that code.

Meanwhile, engineers are also writing CSS, and JavaScript, and pushing that code.

And the people with these different domains of knowledge are overlapping in kind of the code base where they touch things.

So our concern with SaaS-- and this was raised from designers who had had bad experiences with Saas code pages in the past-- was that our engineers would get in there, and engineer the crap out of the CSS.

And it would become too clever and very difficult to operate on as a designer who's not used to complex mixins, calling extends, calling functions that are loops, and generating things.

So another issue, another concern that we have, is that the ability with only a few small changes to your source code in SaaS to generate a hell of a lot more CSS.

And one of those things that we have implemented to try and keep an eye on that is a lint that looks at the total number of lines that are being added in a given patch, and then the total number of additional lines that the output will produce.

And if it's over some threshold, then we'll get an alert.

I suppose that was part of your reasoning for saying no to extends That's one of the big things that can blow CSS out.


So of the tools you mentioned, like Builda, Ranger, Shrinkray, which ones are open source now? Which ones are you planning to open source soon? Yeah.

So of those three, none are open source.

The ones that are open source that I discuss here are Deployinator and Supergrep, and a lot of the tools that we use to do those things.

They're a little bit more mature.

We are committed to open sourcing as much of those other pieces as possible.

Part of the reason that we haven't open sourced Ranger yet is because we've been iterating on it so quickly for our own use cases, that we haven't felt comfortable releasing it as a stable version yet.

And also, like I said, very tightly integrated.

The plan for Ranger is try to potentially move away from these in-house dependency graphs that we're generating, and instead build a system that you could just plug source maps into, and then include a script on your pages that will actually like so the connection to where things appear on the frontend, and so on.

So that's going to happen the next couple months.


A Google trick I will give you is just release it.

Put the word, beeper, on it.

[LAUGHS] Just any problems, just say beeper.

Doesn't matter.

Suck it up.


So how do you, when you were releasing these experiments, you've got multiple people working on different experiments.

How would you coordinate the roll out of those so they don't clash with each other, because these are new behaviors that haven't maybe interacted with each other yet? Yeah.


Good question.

So one of the things that I have really learned working at Etsy is that all of the hard problems are social.

The easiest part of our jobs is writing text files on computers.

And the hardest part is coordinating and communicating.

I don't know of any examples of experiments that have collided in such a way that they went terribly wrong, although there are lots of examples of experiments gone terribly wrong in their own right.

But I think mostly the way we do that is that anyone who's working on an area of the site that is going to be running multiple experiments is likely either one team that's going to be managing all of those experiments, or they'll be teams that are already closely interrelated.

I've never heard of two groups at Etsy working on a similar part of the site who aren't meeting basically every day about it.

So you mentioned you don't do a whole branching off the code bases.

Because I know working with GitHub or really good way to do code reviews is pull requests, from one branching to another.

How do you work around that if you're not using so much branching.


So we do exactly that.

So we branch the code base constantly.

We have, actually, a command line tool called Review.

So say you have some changes, and you're in master.

You just type Review, and you add the LDAP of the person you want to assign it to.

And it generates a branch, and pushes it, and assigns the person, and handles the whole thing.

And people get emails, and so on.

So we end up having several thousand branches.

This leads to other problems.

As a result of this automation, means people don't clean up their branches.

And we had thousands of branches on GitHub.

And it would cause weird GitHub errors like, once recently, I pushed some code.

And when I pushed it, I got a notification from GitHub that said I'd closed a pull request that was from several years ago that I'd never seen before and had nothing to do with.

And I totally freaked out, and went through and looked at all the diffs.

And it turns out that there was a bug in GitHub where if you closed a bug, and it was some thousands of commits ago, 50,000 hashes previous, if there was an open request, it would close all of them.

So we've uncovered some tricks at GitHub with our crazy branching.

But we don't merge branches outside of pull requests for the purposes of pushing.

So I find that if our CSS gets a bit too big, or JavaScript gets too big, one of the excuses we make is, well, it's cacheable, right? We got far future cache on this stuff, so it'll be OK.

But in a frequent deployment environment, cacheing almost becomes useless, right? What do you do about that? Yeah.

So the way that we manage that is that we assign all of the files version numbers at build time.

And the version number is the max last modified date of the file and all of its dependents.

I would love to be able to use MD5 hashes instead of modified dates.

But we actually inline version numbers into the builds because in case an async request were to occur-- we use require JS on the frontend-- we fully expect that no async requests should happen.

We're bundling everything.

But if one were to occur, it needs to have a version number in order to get the right file.

So we have to do that as a safety thing.

And it makes MD5ing tricky because you have versions all the way down.

So we set 10 maximum cache headers on all of our assets.

And those are only bumped if a change is pushed that actually affects the build.

So when you're tracking which pages you use, which assets, how do you cater for assets which are loaded dynamically? Do you have JavaScript that's loaded interaction, or CSS that's loaded interaction? How do you catch those things? Yeah.

So code that loads things interactively still requires a version number.

So we have always had this code that operates on the HTML that's being sent to the browser.

And it looks for anywhere where some kind of load may be happening, and adds a version number.

And it's actually in that version numbering code that we do the instrumentation to determine what's been seen.

So anything that gets a version, if you don't get a version number, you can be load anyway.

So we know that that's the choke point.

So when you're rolling out these experiments, how are you detecting issues? It's easy if it's fully in error.

But what if it's a design change that's-- maybe it doesn't look quite right.

Or, maybe it's just not giving a good enough user experience.

How are you testing that? Yeah.

So basically because we are pushing so often, it means that there are always people watching supergreps, so seeing live logs from the system, and then also watching these dashboards.

And the dashboards that I showed, every team has hundreds of dashboards of their own.

And people watch all these different things all the time.

So a lot of the way that we detect problems is by seeing downstream effects.

So say we were to push our a CSS change that actually caused some kind of button to disappear in the checkout flow or something, we would see checkouts plummet.

And that's a graphic people always have eyes on.

People like constantly being pushing-- always be pushing, is a motto.

Always be pushing means that people are always monitoring the system.

We've actually got ourselves into this state now where we don't really know what will happen if we go for too long without pushing.

On the rare times when we actually have-- like there's holidays, or other things, and it goes for sometimes without pushing, things in the system can get a little weird, because the deploys actually reset services on the web boxes and do other things.

So deploys are kind of the lifeblood of the system at this point.

And then one last question, this comes from Alex Sexton, who isn't here.

But he'd like to know what you think of the forthcoming Taylor Swift record.


Oh, that is a fantastic question.

I've really been waiting for someone to bring up Taylor Swift.

I'm optimistic.


I think that's the best we can end on.


It's been great.

Big hand, Daniel Espeset.

Thanks, Jake.


Post a comment