The EMyth Incident Revisited
This is the story of how a web development company was really excited to launch an updated version of its own website, and it went wrong in just about the worst way possible.
So I was watching this torrent of matrix-like text just roll across my screen, thinking about rocketlift.com because we’re working on rocketlift.com, and my eyes caught emyth.com in there, and my brain went, “Oh, that’s wrong. That’s not the site that we’re working on right now! Something is terribly amiss.”
– Matt Pearson
Spoiler alert: this is the story of the Rocket Lift’s very first makeover to it’s own website, and the debacle that followed. That’s right. This is a story about us, and how we accidentally deployed code onto our client’s server.
About Our Guests
Matt Pearson is an avid supporter of open source, open data, open standards, and the indie-web. He also has something of an obsession with information security and the tension between publicness and privacy in our ever more interconnected world. Matt manages Rocket Lift’s server hosting infrastructure — the hard work you typically won’t see — and he consults with our clients on information architecture and critical systems for information security, backups, and off-site redundancy. Before co-founding Rocket Lift, Matt spent seven years in IT at Whitman College. Matt lives in beautiful Seattle, where he and his wife are owned by a couple of cats.
For endless hilarity, follow Matt Pearson on Twitter: @matro_wtf.
Jed Bickford is Director of Product Development at EMyth. His team is focused on building simple tools that use the EMyth Point of View to enable business owners to create stability, growth, and freedom through their work with EMyth Coaches. He’s an entrepreneur at heart and lives in Ashland, Oregon.
Follow Jed on Twitter: @jedbickford.
Bryan Teoh is a composer, instrumentalist, and new media artist working out of Brooklyn, NY. His work often explores the relationship between acoustic/synthetic soundscapes, composition/improvisation, and popular/academic music utilizing custom software as well as instrumental work on the viola da gamba, cello, guitar, and piano. He holds a degree in theory/composition from The Lawrence Conservatory of Music where his studies of jazz and classical music were intermittently interrupted to independently venture into electronic music and contemporary studio technique. When not wading into increasingly esoteric areas of musical ephemera, Bryan enjoys cycling, posing as a coffee snob, referring to himself in the 3rd person, and attempting to make the perfect burrito.
Learn more about Bryan at sleepfacingwest.com. The songs we featured this episode can be found here and here.
- [Catherine] More and more, our lives and businesses depend on internet technology. These are stories about the people who pick up the pieces when it all falls apart. Welcome to Hard Refresh.
- [Matt] So I was watching this torrent of matrix-like text just roll across my screen, and my brain went, “Oh, that’s wrong. That’s not the site that we’re working on right now! Something is terribly amiss.”
[Catherine] A web development company was really excited to launch an updated version of its own website, and it went wrong in just about the worst way possible. Here’s your host, Douglas Detrick.
- [Douglas] Hard Refresh is a production of Rocket Lift, a web development company based in Portland, Oregon. I’m your host, Douglas Detrick, and I’m glad you’re with me.
- [Doug] So who was this web development team that botched the release of their own website? Well, it was Rocket Lift. Soooo…that’s us. The story we have for you today is about us, for sure, but it’s also about how code that developers write gets from their own computers to working on the web for the whole world to see. The person who does that for our team is Matt Pearson.
- [Matt] My name is Matt Pearson and I am basically the IT department at Rocket Lift.
- [Doug] I use the internet everyday, the same as all of you, but Matt is the person I depend on to know how the internet actually works. He supports all of us at Rocket Lift by keeping all our tools working.
- [Matt] …handling support tickets with other departments in the company, dealing with escalated things with clients, making all of our internal systems like email, our task management system, all the systems we built for our clients, that’s what I do. I make all that stuff go.
- [Doug] He’s the one who makes the code we write jump from our local computers to going live on the web. And he’s the one who jumps into action when a client’s site goes down.
Setting the Scene
- [Doug] We’ll get back to Matt soon, but to set the scene for this story, I need to bring in yet another Matthew. This is Matthew Eppelsheimer, Rocket Lift’s Managing Director, and Executive Producer of this podcast. And just to help keep things straight, we’ll always use “Matt” to refer to Matt Pearson, and “Matthew” to refer to Matthew Eppelsheimer.
- [Matthew] So, it was a festival, it was a party. We were enjoying ourselves. I invited people to bring their own beverage, which was kind of cheeky because it was a video chat hangout.
- [Douglas] A BYO website launching party over a google hangout?
- [Matthew] Yes.
- [Douglas] And tell me about the website you were replacing.
- [Matthew] The legacy website, our first website, was a very basic, one page, white background, black text. It just had a giant line drawing of a rocket on top. It was something I put together in Photoshop. It wasn’t very good, it wasn’t very professional, but it was good enough for my freelancing work.
- [Douglas] And who else was on that call?
- [Matthew] Tricia, our administrator, who really wasn’t involved in the technical work, but was involved in keeping us to our project schedule, just keeping the business running on time. She was a integral part of our company culture. And Tim, the designer, who was really responsible for the look of the site, and the look of our brand. He was the one who designed the logo.
- [Douglas] What would you say is important about that logo?
- [Matthew] The rocket itself went through several versions, at least 12 different versions where we were iterating on getting just the right feel for that rocket. The trail coming out of the rocket is actually connected to the “O” in the word Rocket. It’s coming off the circle at a tangent because they launch using the rotation of the Earth as a kind of a slingshot to get them going fast enough to be in orbit quickly. We deliberately did that… It’s kind of dog whistling to anyone who’s a space geek…this is designed by people who know stuff about space and care about not looking like idiots with their logo.
- [Douglas] So, we’re recreating Cape Canaveral, and we’re launching a rocket.
- [Matthew] So what was happening behind the scenes was we were playing The Final Countdown, that song from the band Europe.
- [Douglas] I don’t know it.
- [Matthew] Well, just so you are aware while doing this…
- [Douglas] I would love to hear it right now.
- [Final Countdown plays]
- [Matthew] You’ve almost definitely heard this before, right?
- [Douglas] Of course I have!…..Ok, I think that’ll do.
- [Final Countdown in the room fades out]
- [Matthew] So that gives you a sense of what was going on. And over the top of that, I was going down my checklist, and I said “Engineering?” and someone representing the developers on the team, probably Tim, said “Engineering is go.” And then probably laughed, because he was like “I can’t believe we’re doing this…” and then i asked “Design?” and Tim would have said “Go!” and so we tweeted design is go, and on down the list.
- [Douglas] Would you say we were partying like it was 1999?
- [Matthew] I would say we were, yeah.
- [Douglas (VO)] This was a big moment for us. If you couldn’t tell, Matthew is and was very proud of this website, and we were and are a very nerdy bunch of people.
Don’t Drink and Deploy…
[Doug] Instead of 1999, maybe we should say we were partying like it was 1969, the year of Apollo Eleven, and the moon landing. But I digress. All of the team but Matt Pearson, our IT specialist, was celebrating, with adult beverages in hand, because their part of the work was done. They had written the content, done the design, and shipped the code for Rocket Lift’s brand new website. But Matt had still had to deploy all the new code to our server, so he was most definitely not celebrating, at least not with alcohol, not yet. Why wasn’t he celebrating? This story will explain. It was late at night back when Matt was a sophomore in college. A friend’s laptop was on the fritz…
- [Matt] …and she had school assignments on this thing, and a paper was due and all these terrible things, right? Like, the computer cannot be dead.
- [Doug] Since he knows more about how computers than your average person, he’s the first call for anyone he knows who’s having computer trouble. If you add to this equation that Matt is an olympic-level nice guy, it means he finds himself fixing computers quite a lot.
- [Matt] I had some tools to boot the machine up as long as the hardware was fine, the machine would run with my special software, I could copy her stuff across and then blow away her computer’s hard drive and reinstall windows and bring it back to life.
- [Douglas] He copied her data to his drive, and checked to make sure it worked, and it had. Then as he was reinstalling Windows, it asked him if he wanted to wipe the hard drive clean. He clicked yes, then as the deleting and installing was happening, he thought…
- [Matt] Something doesn’t look right. Something doesn’t feel right. I think I did something terrible. Cancel. Cancel really quickly. Oh hells. This is not good. Did it work? Oh, it didn’t work. Everything’s gone.”
- [Douglas] Matt was panicking. Hard.
- [Matt] I think I might have even yanked the cable out…hard stop on everything. Let’s try this again.
- [Douglas] But, the damage was done. He had deleted not just files on his friend’s laptop’s hard drive, but also his own external hard drive, with all of his friend’s data on it.
- [Matt] All the person’s data and a bunch of my data everything that was on the laptop was all blown away. So there was no copy of this person’s entire digital world anymore.
- [Douglas] It was like he had run 99 yards of a hundred yard dash, and then fell flat on his face at the finish line.
- [Matt] So first of all after unplugging my drive, I stepped back, sat down on the floor and just didn’t move for a bit.
- [Douglas] Thankfully, all was not lost.
- [Matt] Here’s an interesting thing about how this part of a computer works. When you delete a file, you don’t actually delete that data…
- [Douglas] Just a tiny bit of data that references that data. It’s like taking the street sign off the street—the street’s still there, it’s just harder to find.
- [Matt] All that data that makes up the file is still sitting on your hard drive. On the physical thing that stores your information. Your computer just doesn’t know where to look for it anymore.
- [Douglas] Using his special software tools he….
- [Matt] …ran a scan of my poor little hard drive that had all of my data and all of this person’s data on it, and luckily, it was all there. So I was able to copy it out to yet another machine, and managed to get all the stuff back.
- [Douglas] He won the gold after all! When he looks back on this now, he can see that he’d set himself up for failure.
- [Matt] I was very sleep deprived at the time I did this. It might have been finals or midterms. Sleep was not something anyone was getting a lot of.
- [Douglas] And that stress helped him learn something about how he should work…
- [Matt] I do have a pretty hard and fast rule about computing under the influence of either alcohol or lack of sleep.
- [Douglas] Which is… just don’t do it, kids. Bad things will happen.
- [Doug] Now we go back to our website launching party. It was time to hit the button that would send the code for the new rocketlift.com to our server, which was hosted by WP Engine. WP Engine is a website hosting service company, one of a handful who specialize in WordPress hosting. We outsource hosting to companies like them to let them focus on what they do best. That frees to do cr eative work, and with some occasional troubleshooting. When Matt sends code to WP Engine…
- [Matt] …there’s a whole lot of text that spews across the screen when it’s doing this. So I was watching this torrent of matrix-like text just scroll across my screen and thinking about RL.com ‘cause we’re working on RL.com and my eyes just caught emyth.com in there. And my brain went oh, that’s wrong. That’s not the site that we’re working on right now.
- [Douglas] EMyth was one of our clients. We were doing some development work for them, so we had access to their server as well as ours. So, when Matt saw this, he tried to cancel the command, almost like when he yanked the cables out of his friend’s computer back in college.
- [Matt] But it was too late. All the code had already gone through to the WPE git server. It was beyond my control at that point.
- [Douglas] And it all happened really fast.
- [Matt] Depending on the amount of data that it has to move around it can run really fast. So the actual push from my computer to WP Engine was maybe half a second.
- [Douglas] But that was more than enough time for some really intense emotions to set in.
- [Matt] That warmth spreading through the entire body, panic feeling. I remember having that.
- [Douglas] He didn’t know exactly what had happened, but he knew he’d deployed code for our own site, to a client’s live web server.
- [Matt] If our site melts down, that’s par for the course, that’s totally my responsibility, I can own that. It’s ok for us to break our own site. It’s not ok for us to break our client’s site.
- [Douglas] So, while Matt was panicking like it was 1999, everyone else was still partying. And none of the rest of his team knew anything was wrong.
- [Matt] I think I said something like, “Hang on. I think something went terribly wrong. I’m gonna check something. Nobody move. Or something like that.”
The Moment from Matthew’s perspective
- [Douglas] Matthew, do you remember that moment?
- [Matthew] What I remember is that he got quiet for a while and I sort of didn’t notice. I just figured he was busy doing things. And a number of minutes later he said “Well, the good news is that I fixed it, I think.” And that may not be exactly the way it happened, but that’s how I remember it. And then he told us what had happened.
- [Douglas] So, how did you feel when he did that?
- [Matthew] I think that because of what was going on socially with the team, with this launch party, more than anything I think we all thought “Oh my gosh, this just happened.” We were kind of laughing. Not that we weren’t taking it seriously, but it wasn’t a crushing feeling of doom for me. It was a feeling of “Oh, I can’t believe this just happened! We just did this! What am I going to have to tell our client, and how could we have done this?” Why did it happen? How did it happen? But I don’t think I was saying any of that out loud because it was clear that Matt was already stressed out and feeling bad enough. He didn’t need to hear that from anybody that this shouldn’t have happened.
- [Douglas] From what he tells me, he was having whole body stress and terror, that was completely enveloping him at that moment. He was sick to his stomach, literally, and trying to figure out what to do.
- [Matthew] I had the luxury of not feeling bad about it, because he was worrying about it. And there was nothing I could do anyway. Where I did pull my weight was having to own it with the client. Taking responsibility as the team leader, our processes broke down, and please don’t fire us!
- [Douglas] So you were already thinking ahead to that moment when you’d have to tell the client?
- [Matthew] The first thought was “let’s confirm that this is fixed.” And then the second thought was “we’re gonna need to explain this somehow. Third fourth and fifth thoughts were to figure out what exactly had happened, and what the sequence of events was, and what would be useful to communicate to the client.
- [Two-Way Ends]
- [Douglas] This whole debacle centers around a computer program called Git. This system, and other systems like it, are a type of software that’s used for helping developers write new software. It’s called version control, and my co-producer Catherine Bridge is here to explain how it works.
Catherine Explains: Version Control
What’s a one-sentence description of what version control is?
- Version control is a system that allows for changes to be made to a code file or any kind of document in such a way that your past changes do not get lost, they can only be added to. Even deletions. This lets multiple people work on the same document and be able to try out multiple ways of writing the same thing, and always have a way to undo everything. They can work without always stepping on each other’s toes or irrevocably overwriting someone else’s changes.
- Let’s give an example that non-programmers can relate to. For example, my Grandmother is currently working on her memoirs, and to her, having different versions of her work is vitally important, so she has documents named “November of 2014 Draft” and “Semi Final Draft” and they’re all very similar, so she tends to get confused, naturally, and loses writing that she did in November, but hasn’t made it to the Semi Final Draft.
- Personally, I don’t know why she’s writing her autobiography this way. It would make me crazy.
- So, what my father occasionally has to do is read through these different versions and move all of the disparate sections into one document, what our own Matt Pearson would call the “source of truth.”
- In a way, my Grandmother and her editor, my dad, are doing version control for her memoirs.
- As you can guess, there is definitely a better way to do this, and that’s with an actual version control program, such as Git.
- Analogy of “kleenex” vs. “tissue”
- So what is git? Git is one particular version control software, just as Kleenex is a type of tissue.
- There are different programs you can use for version control. Git is the one that we happen to use the most frequently, but there are dozens of others, such as SVN, Mercurial and CVS. Each of these systems allow users to make commits, or tiny saves, and push them up to a repository.
What’s a repository?
- A repository is where all of these different versions get stored. It’s like the folder where my Grandmother stores all of her different files. She can go in and checkout what the manuscript looked like back in November of 2014. Similarly, a developer can go in and checkout a particular place in time in the code. This is very valuable when you’re looking for a bug, for example. You can look at a version of the code and go bug hunting, stepping along the line of commits until you see that bug present itself.
How do a developer’s changes get merged into the “source of truth?”
- Carefully! For our team, a developer’s changes gets at least one more set of eyes on it, often more. We see how that new code plays along with the the rest of the code or any other new code that might be getting worked on at the same time and if it passes our QA procedures, we merge it into the Source of Truth. The beauty of version control is that even once we merge the code into the Source of Truth, we can still see what changes were added and if need be we can remove or change those changes, if we find bugs down the road.
- Let’s say we teach my Grandmother to use git for her version control. Then, instead of having to copy her document and save it with a bunch of different unhelpful names like, “November of 2014 draft” and “Seriously, the final final draft, B”, she could, for example, change all the fonts for the titles of her chapters, commit that, then my dad (the QA guy in this analogy) could add those formatting changes to the Source of Truth. Those changes will be on the canonical version of her memoir and not just on version “Final Final Draft with Chapter font change Fall 2016”
Why use this system? What problem does it solve?
- Programmers and developers love safety nets. My Grandmother wants to make sure that she never loses her valuable life’s work. We all want as many safeguards in place to keep us from making an unrecoverable error. When I was first learning version control, my terror was that I was going to make a commit that was actually going to delete everything. Only as I’ve learned more about branching models and other safeguards did I realize that it was pretty much impossible for me to destroy everything. In fact, when I was first getting started, Matt Pearson didn’t even give me access to change the Source of Truth. I could only copy it. Matt is a very wise man.
- If you’re working on a team, not just you independently, you’re going to really want to use version control. As soon as you add even one more person to the mix, not only do you need a much stretchier safety net, but you’ve also got all the mechanics of sharing and collaborating to worry about. Version control systems are built to solve those problems, first and foremost. The safety net aspects are almost an extra bonus.
Perspective: Jed Bickford
- [Doug] Talk about data that needs to be protected! Catherine’s grandmother’s story is a powerful reminder that losing data can have profound consequences. Now that you’re up to speed on how version control works, you have a better idea of why this story happened the way it did. Now, I know you’re all dying to know what happened to EMyth’s website, so let’s cut to the chase.
- [Jed] My name is Jed Bickford, and I am the Director of Product Development at EMyth.
- [Douglas] Jed was ultimately responsible for maintaining emyth.com.
- [Jed] The first I knew about it was I got a call from Matthew the owner of Rocket Lift and I could hear the fear in his voice. And he told me he needed to talk about a mistake that they had made. And imediately my heart started pounding, because at this time, I’m responsible for our website, EMyth.com and Rocket Lift is our primary contractor, working on the website. And so, I pulled it up, to see if it was still alive. Right there on the phone. And it was!
- [Doug] Imagine if he’d seen rocketlift.com’s website there instead.
- [Jed] Then Matthew told me that his team had accidentally deployed their entire website, RL.com onto our server, and it was sitting there in another folder. And I was relieved. I was like, “Oh that’s all. Ok. Great!” No giant mess to clean up. Nothing to have to communicate to our employees and our customers. I was surprised that it was so easy.
How did it happen?
- [Douglas] So the worst of the worst didn’t happen. But we had still screwed up, and badly. Once the confusion subsided a little bit, Matt figured out what happened. Here’s Matt Pearson again.
- [Matt] So we were working on changes to RL.com and we had the RL.com code base checked out on various computers…
- [Douglas] The different developers working on the code sent their changes to Matt, and he merged them into a final version of rocketlift.com that was ready to go live. The next step was to send the code to the live web server.
- [Matt] So it’s super convenient. It’s the same workflow to put code into production as it is to share code with the rest of the team.
- [Douglas] After weeks of work, it was time to take the code live.
- [Matt] And it relied on setting up a second Git remote.
- [Douglas] Wait, Matt, what is a git remote?
- [Matt] A remote is basically a name and a url.
- [Douglas] The name and the url refer to another repository somewhere else where this same code is stored, usually on a web server. So, when Matt says a remote is “a name and a URL,” he’s saying that you tell your version control system to send code to that url, like sending a package to a street address. You give it a name so you don’t have to remember that whole URL. In this case, you might give the name “Nancy” to your live web server Then, you just tell Git to send your code to “Nancy,” and Git already knows the address to send your package to.
- [Matt] What happend was, when we were setting up the RL.com codebase on one of our computers, we set up the remote that would talk to WP Engine in such a way that it pointed to the EMyth server, rather than the RL.com server.
- [Douglas] So, he switched the address, putting EMyth.com as the address to send the code to, instead of rocketlift.com, which is where it was supposed to go. It turned out that it was a copy and paste error. He had copied the settings for EMyth’s repo, but then forgot to change the url to point to Rocket Lift’s server. It was one of those tiny mistakes that has big consequences.
How did we create this situation?
- [Douglas] Now I struggled a lot with understanding what a “remote” is in this context. If it isn’t the kind of remote that’s helping me fast-forward to my favorite parts of Orange is the New Black then I’m lost.
- So, here’s Matthew again, to explain in a different way.
- [Douglas] Can you explain what that is, in dry technical language?
- [Matthew] Ok. If you’re doing a push to the production remote, it means that the changes that you have prepared locally, you’re sending them to the live web server. In order to set that up, you have to have the address for it locally, and you have to give it a name. There’s nothing magical about having the address, that Git knows “oh, this is production.” You have to tell it that this is production. It’s kind of like storing contacts for people on your phone. You can name them whatever you want. If you put your mom in your phone, and you give her the name “mom” your phone doesn’t care that that isn’t actually her first name. So you can call it whatever you want. And also, if the phone number for your mom is wrong, your phone doesn’t know or care. In this case, Matt had set up his local address book and said production is this particular phone number. And it just happened to be exactly wrong. And because it was named “production,” that masked the fact that it was actually. He was thinking “production, I’ve set this up, I’m pushing to production.” He even checked to make sure he was pushing to the right remote by name, but he hadn’t thought to check if the remote had the right address for this project.
- [Douglas] It’s like that one time when you were trying to call your best friend olympic gold medal winner Simone Biles. You tapped her name in your favorites, but you accidentally stored your brother Simon’s number there instead. Hey Simon! I meant to call Simone, but how’s it going? You can see how that could happen to you, right?
Why we didn’t overwrite emyth.com
- [Douglas] Just like Jed had, Matt opened up his browser and went to emyth.com.
- [Matt] I saw the emyth.com site. I saw EMyth’s branding, EMyth’s posts. Everything was EMyth-y.
- [Doug] He thought he would see Rocket Lift’s website where EMyth’s was supposed to be, and fortunately, that’s not what happened. But, even though the worst of the worst hadn’t happened, Matt knew Rocket Lift had still really messed up. Because of the complexity of all these interlocking web technologies, it can sometimes be hard to understand the details of a problem right away. But knowing what happened? That’s Matt’s job.
- [Matt] The way WP Engine’s little deployer robot works is that it doesn’t delete things on the production site when they get deleted from the Git repo that you’re talking to.
- [Douglas] The “deployer robot” Matt’s talking about is an automated script on WP Engine’s servers that receives the code we send, and then sets it all up to run correctly, updating the live website. So, what happened is that we uploaded a bunch of files to the emyth repository, but it didn’t delete EMyth’s code. Since both sites were running WordPress, most of what we added was the same as what was already there. Only files that were different from what was already there would actually be added. So, what was different? The newly designed theme for Rocketlift.com we’d been working so hard on.
- [Matt] I remember spelunking in the EMyth codebase on the server at some point, and noticing there’s a suspicious looking theme in the themes directory, this isn’t an EMyth theme, this is rocketlift.com’s theme, what is this doing here?
- [Douglas] Whoopsy! So, we had essentially just added another theme to their code base, another design. But, unless someone had activated that theme to actually run at EMyth.com, nobody would actually see it. Fortunately, nobody did. The files for our new design for rocketlift.com were sitting harmlessly on EMyth’s server, not activated, and not visible. In other words, we dodged a bullet. All thanks to a clever safeguard in WP Engine’s deployment system.
Back to Jed: A deeper problem
- [Doug] But something that Jed Bickford said stuck with me. After he breathed a sigh of relief that EMyth.com was still up and running, he thought:
- [Jed] …how could you even do that? That takes quite a lot of extra key-strokes and commands to even create that kind of a situation.
- [Doug] To answer Jed’s question, how did this even happen, we’ve already dived deep into the mistaken setting in the git remote we used to deploy rocketlift.com. I can understand why Jed might have assumed it was a complicated error that created issue, but it was actually just a simple copy and paste error. But there’s a deeper answer too. The mistake really was a symptom of a deeper problem.
How to pick oneself up and dust oneself off: automating tools
- [Douglas] Matt got lucky that this safeguard in WP Engine’s system saved the EMyth website from total annihilation. But, with Matt being the ambitious Systems Administrator that he is, it wasn’t good enough to just leave well enough alone.
- [Matt] So the first thing we did was to add an item to our checklist that was a sanity check of this particular little setting.
- [Douglas] The setting that tells git where to send the code.
- [Matt] Once we figured out what was wrong and where the error had happened, it was very obvious to figure out how you check this. You run one little command and you see the url and you do a quick sanity check that the urls actually correspond to the project you’re actually working in.
- [Douglas] Rocket Lift uses a lot of checklists to streamline repeated processes. They help us remember important but mundane details, so that we can focus instead on more creative details. This is actually something Matthew first learned from EMyth — more on that later. So double checking that the remote has the correct address will keep us from making this mistake again. But since he knew he’d be repeating this process many times in the future, he wanted to make it even more foolproof.
- [Matt] The next layer of things is to make that sanity check not a human thing anymore. It’s not something in our playbook that we manually copy paste, run with with human eyes. We make it a piece of code in a program that actually sets up the repo for us.
- [Douglas] “Repo” is an abbreviation of “repository.” So he’s saying that he automated the process getting the correct settings into the repo.
- [Matt] So as long as you have your initial configuration setup and that file setup correctly, you’ll know it’s gonna work as expected.
- [Douglas] I know that to the non-developer, someone like me, this sounds rather more complicated than running through a checklist on, say, a piece of paper. But if you have to do it as many times as Matt does, and you’re as good with computers as Matt is, then it’s a good choice.
- [Matt] So we take as much of the human factor out of this as possible and make it a mechanical thing. And since it’s mechanical it’s easy to do lots and lots of times.
[Doug] This points to the deeper issue I mentioned before. Setting up the git remote, giving it the name and address, was a manual step. When there is manual work, done by humans, there is the opportunity for humans to make a mistake. The fact that a copy and paste error was even possible was only because this was a manual step. In the world of computers, you don’t need to take that as a given — you can automate routines, and have computers execute them flawlessly, time after time. Then we humans can focus on being creative…and partying. So, Matt wants to take this task out of human hands, because he’d like to spend his human time on other human things he’d like to do…like petting his kitty.
- [Cat meowing and purring]
Matthew on checklists
- [Douglas] When you make a mistake, especially a serious one, it’s easy to forget the panicked sensations you had as you realized what you’d done. It’s harder to turn those feelings into lasting change. I talked with Matthew again about how the business changed after this event.
- [Matthew] Yeah. So this is a classic example of what can be so frustrating about web development. Is that there are so many moving pieces. So many possible little points of failure where something can go wrong. But also, you could spend all day just thinking about all the possible things that could go wrong and trying to hedge against them, but that would be overkill. And you rely a lot on faith that human beings will work it out and not make lots of mistakes, but every once in awhile you stumble on something because you make the mistake and you realize, yeah there’s this single point of failure, never would have thought I’d make that mistake, and huge impact if you do make it. That’s where checklists are perfect. So in surgery, doctors and nurses ask when they’re starting surgery, “what’s the name of the patient” and “is the patient that we’re supposed to be operating on?” And that’s a far more tragic lesson embodied…
- [Douglas] So it sounds like there must have been an example of where someone got their pancreas taken out that shouldn’t have had their pancreas taken out.
- [Matthew] That’s why that question is on the list. You can have a checklist that covered every single possibility, but people don’t have the time or patience for that. And also there’s something, humans being emotional, not entirely rational creatures, we get a little offended. Like, “we’re professionals, we shouldn’t need to have all these questions asked.” So you try to limit the questions on the checklist to those that are really the most critically impactful. Simply making sure that where we’re about to send code is correct, that’s a good high impact question to be asking. There’s a lot of other questions we could be asking that wouldn’t be nearly as critical if they happened.
- [Douglas] Matthew and Matt can get seriously nerdy about this. It’s fun to listen to them do it. As long as you don’t listen for too long.
- [Douglas] To wrap things up, Matt cleaned up the extra files on EMyth’s server, and updated our checklists and automation. And we got rocketlift.com launched later on that night. The last thread was to alert Jed Bickford at EMyth about the mistake. In addition to the phone call to Jed, Matthew also wrote him this email:
[Matthew] Jed, Unfortunately I need to let you know that there was a five minute period this evening where we accidentally overwrote production files in emyth.com. This happened during a… [faded out]
- [Douglas] Fortunately for us, Jed was very gracious. Here’s what he said about the post-debacle cleanup process.
- [Jed] It’s really ironic that we’re talking about a screw up like this one because it’s one that we coach business owners on every day. We specialize in helping small business owners create stability and create growth through a set of systems in their business.
- [Douglas] Systems like Rocket Lift’s checklist for deploying updates to a website.
- [Jed] So many business owners never manage to build the business of their dreams. One that really supports their life and expresses their values because they spend all their time putting out fires like this one.
- [Douglas] We did have to go into panic mode to put out this fire. But improving our checklist was even more important for the long-term success of our team.
- [Jed] Matthew told me about how he’d clean it up. He and his dev ops guy did that really quickly and he came back even and shared with me some tweaks that he made their checklists for deployment to make sure that wouldn’t happen again. And I appreciated that gravity with which he held it. He really took it seriously, and I ended up respecting more how he handled it, and our relationship got better as a result of that.
- [Douglas] It’s almost like we were one of EMyth’s clients, even though in reality they were one of ours.
- [Catherine] Recording, editing, mixing, and sound design by Douglas Detrick. Production by Douglas and me, Catherine Bridge. Matthew Eppelsheimer is our Executive Producer. Interstitial music in this episode is by Bryan Teoh, there’s a link to purchase this music on hardrefresh.audio. The Hard Refresh theme was by the Brow. Be sure to subscribe on your podcast platform of choice, and please write us a review on itunes. Your review helps others find us. Head on over to hardrefresh.audio to learn more about our guests.
- If there was something in this episode that you have questions about or didn’t understand, we want to hear about it. We’re making this podcast for you! Leave us a comment at hardrefresh.audio, send us an email, or find us on facebook or twitter.
- Coming up next time, on episode four of Hard Refresh: Danny Reeves and Bethany Soule of Beeminder were tidying up their app’s database, and…
- [Catherine] Want to be on the show? If you, or someone you know, has run into a serious problem while working on the web, we’d like to hear about it. We’re looking for stories where internet technology was part of the problem, and human creativity was vital to the solution. Send us a message at hardrefresh.audio. Thanks for listening. Now, perhaps it’s time treat yourself to a hard refreshment?