Tales of a rewrite at Lyft

Making the switch from Obj-C to Swift- how Lyft rewrote their codebase in Swift


Transcript:

Hey everybody, thanks for coming to my talk this morning. My name's Keith Smiley and today I want to talk to you about rewriting your app from scratch. So I work at Lyft here in San Francisco like Ida said. Over the summer, we shipped version 3.0 of the Lyft app and it looked very similar from a user's perspective, but in reality it was actually a ground up rewrite in Swift, using none of the existing code. So if you guys saw Andy's talk yesterday we kind of did the opposite. Where, as opposed to building on this code that we already had, we just started from scratch. I think from this experience, we have some interesting takeaways, specifically around the process of how to complete a full rewrite. Some of the technical stuff that we went through because we started in the very early days of Swift and ended up shipping during Swift 1.2. And then also some of the cultural changes we've had to make at Lyft to make sure that this whole thing has actually been worth it.

...when people talk about rewrites it's normally about how dangerous they can be or how they often fail...

I wanna start with why you would ever actually want to do a rewrite in the first place. So when people talk about rewrites it's normally about how dangerous they can be or how they often fail, and end up in a ton of wasted time for developers, which is definitely something you should worry about. Often the correct answer is let's incrementally refactor our existing code instead. Because this way you can continue adding features to your codebase and you can clean up the parts of the codebase that no one wants to touch without slowing down development on your product at all. I think this is probably the right answer like all the time. But there's definitely some things that make this tough. It's often hard to get big refactors into big existing code bases because it can be super risky, you have to go through a ton of testing. Also like I said, you can continue adding features which is great, but at the same time, adding those features might accentuate the problem that you already have of the architecture of your app.

02:23: I also think there's the case where it can be too late to easily incrementally refactor your app. So for example, this image is a distribution of files by file size in our old codebase. So you can see there's something wrong, immediately just looking at this. So each bubble represents a different file and the size of it is the number of lines in that file and the largest one is actually a single 5,000 line file. So some of the other ones are like around 2,000 lines. And yeah, like I said, by just looking at this you can tell that there's something wrong. And I'm sure tons of you have seen this case before, so of course that 5,000 line thing is also a singleton that manages the entire state of the whole app- because why wouldn't it be? And this case is super hard to refactor because you can't just pull pieces out without affecting the entire system. And this is the point we were at, where we felt like it was a good time maybe to rethink about it.

Along with just this case, I do think there's some advantages in rewriting. And the biggest one I want to mention is the ability to just rethink the problem.

Along with just this case, I do think there's some advantages in rewriting. And the biggest one I want to mention is the ability to just rethink the problem. So we work on a wide variety of apps that solve tons of problems. And our apps evolve a lot over time. If you start something a few years ago, and you come back to it now, you're probably solving entirely different problems. For example, here's two versions of the Lyft app, a very old version on the left, and the current one on the right. And the original app was written for iOS 4, and it was built by a few developers over a few weeks, just like spike it out and see if it was a product at all. And they definitely weren't thinking about the problems that we may be solving four years down the road. There's only one ride mode, there's not nearly as much complexity in the app.

04:02: And I think it's super important to think that the problems you're solving now and the features you're writing to solve those problems were probably not even on the radar of the developers who originally started the app. And by doing a full rewrite you can kind of rethink these and rethink about the architecture of your app in order to solve these new problems. So to say that we chose to start a rewrite and we thought through all this and sat down and decided it was a great idea, is definitely an exaggeration.

Our rewrite actually started as an (...) experimental side project by a single developer on the team. I think this was super important to how our rewrite was actually successful.

Our rewrite actually started as a side project, experimental side project by a single developer on the team. And I think this was super important to how our rewrite was actually successful. Because this meant that at any time we could've stopped working on the rewrite and we wouldn't have lost that much time or money. The rest of the team was still working on the existing app, meaning that we were still adding features; we really weren't slowed down that much. And I think that starting small is one of the most important things you need to think about. As opposed to throwing your whole team on this, because then if it fails you're losing a lot more. All this hinges on the idea of risk when you're doing a rewrite and how risky of a project it could be. Depending on the size of your company or your app, rewriting it could be like one of the largest development projects you'd ever undertake 'cause of the amount of complexity. And if you don't complete it then you're wasting a huge amount of time.

05:17: So, like I said in the beginning, our rewrite had some interesting technical challenges since we started at the early days of Swift. For anyone who uses Swift, even today, you can tell that the tools aren't as mature, but they have gotten a lot better. And I wanna talk about just some of the stuff we ran into. So first off was compile times. Once you're working on a large Swift app, especially in the early days, anyone messing with Swift at that time, can tell you that compile times took a long time. It started because of the problem with no incremental compilation. So once we had a few hundred classes in the app, touching any single file meant you had to recompile the whole thing which could take up to like ten minutes on a developer machine.

So you'd change one line or you'd even change some white space and you'd have to wait 10 minutes to get another build. We also ran into some crazy problems. At some point we had to recompile our app twice, in order to get it to run on device. And since these took 10 minutes each time, it was a huge time sink. Today we're actually down to like two or three minutes, but compared to our old app which when it was entirely Objective-C, only took 15 seconds to compile, so we still have some room for improvement. And of course, not having a fast compiler means that not having a working lldb is also a problem. So until about three weeks ago, lldb didn't work at all, definitely not until Xcode 7, which meant that all of our developers were print debugging everything. And coupled with 10-minute compile times, every time they made a change, they had to add a new print statement, rerun the app, wait 10 minutes and see what would happen. And actually a lot of developers ended up cloning the repo multiple times so that they could work on it in multiple places, which is crazy.

...coupled out with 10-minute compile times, every time they made a change, they had to add a new print statement, rerun the app, wait 10 minutes and see what would happen.

06:51: So I think most people have forgotten about this by now, but in the early days of Swift, there was really no good bridge between Objective-C and Swift. The system frameworks were not perfect, but they were much better than third party frameworks. Everything was inferred as an implicitly unwrapped optional and a lot of things were imported as AnyObject. We have a ton of third party frameworks like Google Maps that are written in Objective-C and before there was a nullability API, we had to be really careful to be sure everything wasn't gonna be nil, or our app would crash. We actually still have problems with this today with third party libraries that haven't been annotated by their developers. We kind of treat everything from Objective-C that isn't annotated as optional to try to get around this problem but it's still a huge pain. Another thing in the early days of Swift was there was no continuous integration support, which we rely on a ton. We do all of our deploys and everything through CI. Travis and services like that had a really hard time keeping up in the early days. Even once they got Swift support, they had a hard time keeping up with the point releases of Xcode. And we always wanted to be on those updates because we wanted the newest features and bug fixes of Swift since those were more important to us at the time. But this also meant that we ended up having to do release builds on developer machines. Coupled with long compile times that meant developers were occasionally waiting 45 minutes for a release build on their machine and in the meantime of course they can't touch anything or they'd have to start over, which is crazy. Our CI builds today are much better but they still take like 15 or 20 minutes on the underpowered VMs they give us, so we're hoping there's some room for improvement there too.

So I wanna talk a little bit about reverse engineering and how useful it can be sometimes.

I don't want to go on too much of a tangent about this topic but I thought it was important to mention. So I want to talk a little bit about reverse engineering and how useful it can be sometimes. I think it's something we don't talk about enough in the iOS community. There's a great talk from Conrad Kramer, who works on Workflow, from a Realm meetup recently, where he actually reverse engineered the Lyft app to try to figure out how our URL scheme works. So it's definitely worth watching if you want to learn more, since I'm not gonna go on a deep dive today.

08:47: During the development of our rewrite, we had some problems with closed sourced third party SDKs, and we really didn't have any great solutions until we turned to trying to figure out how they worked under the hood. So, I have an example of that today. So, we updated the Google Maps SDK and we started seeing this dialogue in our app, which is kind of strange because we don't use Bluetooth for anything. We were kind of worried about this, we really needed the update from the Google Maps SDK for a new feature that they added, but this wasn't going to be okay for our users. So we dove into the Google Maps SDK. This is the screenshot of about half of the initializer that we're actually calling from the Google Maps SDK and we were trolling through here to see if we could find anything that would be an indicator that it could be this problem. And what we found, if you can read this, is a class called Beacon manager. If you've ever messed with Bluetooth stuff, you can tell the word Beacon is definitely a red flag for this problem. So by finding this private class we were actually able to solve this problem. This is code that actually shipped in our app for months where we're grabbing this class at runtime and overriding its initializer and just returning nil.

So of course this solved the problem, which was great, until they updated the SDK. I think that's a really interesting tool that you can use. We never would have been able to find that private class without diving into the source like that. Definitely something I think we should talk about more. Like I said, you should go check out that talk if you want to learn more about how everything works.

So, up to this point, like I said, we're trying to de-risk the rewrite as much as possible. So we actually only have one developer working on it up to now. And this meant we were kind of waiting on a really solid foundation so that we could add more developers and ramp up feature development so we could eventually catch up to our existing app in terms of features. And at this point we felt like it was worth adding a second developer because there was a solid foundation, and it was like, "Okay, we need to actually catch up if we're ever gonna ship this thing." We're really banking on the idea that the new codebase is goning to be so much better and so much easier to write features for that we're going be way faster. Because two versus about eight developers working on the other codebase at the same time, still adding features means that we really need to catch up.

11:02: I have an example of where this really paid off. So this is a screenshot, that you obviously won't be able to see all, of the mock ups for the onboarding flow in our app, which is hugely complicated. This is a new feature, one of the last big features we added to the old codebase. And this took two developers a month and a half on the old codebase to add. It was a huge project, and some of that was just because of the architecture that I mentioned before, some of it was because before this we didn't really have an onboarding flow at all so there wasn't much to build off of. This was all from scratch. And this is one of the first big features we actually added to the rewrite after it was completed on the old app. And it was distilled down to this. So this is a screenshot of Onboarding.storyboard that has the entire flow in these 10 screens, and this took one developer one week to complete on the rewrite, which was just a huge times savings. I mean that really proved how much more productive developers could be because of how much better the code was architected.

that really proved how much more productive developers could be because of how much better the code was architected.

So it might seem like I'm breezing through this, but this has been months of work. At some point we need to ease in the rest of the team, so that we can actually ship this thing and everyone's going be working on it assuming we don't cancel the project. At this point, we're still kind of worried that it might still not ship, we're still a few months out. But the team needs to eventually be able to join and to do that, they need to be used to the tools and the codebase to be productive on it.

12:22: So one way we were able to do this, especially because of how our rewrite was going from Objective-C to Swift, was we actually introduced Swift to the old codebase as well. Meaning that almost all developers were writing new features in Swift on the old codebase, so they were totally used to the language. And this meant when they were actually able to switch over they didn't have to learn a language and a codebase, they just had to learn the new architecture, which was really great.

Towards the end of the rear, we also rotated developers on, one at a time, to write features in the new codebase that they had also written in the old one. So they knew Swift and they knew the feature space, which just meant that they needed to understand how everything fit together in the new codebase, which was a huge improvement and kind of got everyone up to speed. And after we were done with these rotations, we actually reached feature parity with the old app. Meaning, that it was time to bring over the whole team and get closer and closer to shipping. So at this point, we were like a few weeks out. And, we met feature parity through those rotations and everything and the entire team joined on the project to bug-fix for a few weeks, which was great, because the whole team working together, fixing tons of bugs that our QA team found. And luckily, nothing catastrophic happened. At this point, we were still really worried that maybe we'll find some bug that's really difficult to fix and we'll still have to cancel this thing or we still have to push it back months, which was a huge worry.

Amazingly, actually, everything went really well and we shipped the project in the summer of this year and no one really noticed, which was the goal.

13:44: Amazingly, actually, everything went really well and we shipped the project in the summer of this year and no one really noticed, which was the goal. Until we started tweeting about it, no one really knew that we had just switched the codebase out from under the app. And I really wish at this point I could say that we just shut our laptops and went home and haven't touched it since. That would be great. But in reality, a lot has happened since then. One of the biggest things that I wanted to mention was the cultural changes we've had to go through at Lyft. Because one of the biggest things we still worry about is how do we make sure that a year from now, our code isn't the same quality it was a year ago that got us to this point. And we really needed to go through some big changes for this to happen. 

One of the biggest ones is just about code review in general. Our old code review process kinda looked like this. Everyone was doing code review because that was just commonplace but no one was really worried about the large problems. It was mostly, "Fix this white space," or "Here's our Style Guide, why don't you change this?" Which never really got to the meat of the stuff. I mean, that's important but at the same time we're sitting there, reviewing code of people adding functions to a 5000 line singleton. We're kind of avoiding the problem. So, to get around this with the new codebase, since we have a solid foundation, we've done a few things.

15:01: So, first off is we introduced SwiftLint into our project. If you're not familiar with this, it's an open source linter for Swift, written by JP and some of the other folks at Realm. This allows us to totally eliminate style comments from code reviews. Meaning that, none of those exist and all developers have to do when they are actually reviewing code is look for important issues about interfaces or architecture and stuff like that, which has been a huge improvement. The second thing we introduced was a document we called a “Gitiquette” about how to do commits and how to do pull requests so that we're all kind of on the same page. A lot of stuff that's very intuitive like; don't submit huge pull requests. But, a lot of stuff that we are able to codify, meaning that, when people submit pull requests, they are all kind of similarly formatted and reviewers can dive in and understand how it's supposed to work, which makes just reviewing code way easier. And the last and most important thing that I think that we've done is introduce a two-thumb system for pull requests. This means that two different developers from the team have to approve every single pull request that goes through code review. I think this has helped just immensely by getting different people from different backgrounds and different technical skill levels to look at every piece of code going in. Meaning, we all understand the codebase better than we would have before and we catch more problems in code review that we are able to fix before they get in and become legacy code.

Of course you can't ship a complete rewrite without there being problems.

16:48: Of course you can't ship a complete rewrite without there being problems. There were tons of small problems along the way. Way more than we have time to talk about, but I want to mention two issues that could have been pretty catastrophic for us, at least, at Lyft. And, both of these hinge on risk, which I've been talking about this whole time. If this project didn't ship, then we're going to go back and end up working on the old codebase. And so, one of the biggest issues we noticed was towards the end of the old codebase, the iOS team was very optimistic that it was gonna ship. So, most of the comments on PR looked like this, "Well, this doesn't really matter because in a month, we're not gonna use this codebase anymore, so we don't really need to fix this problem." And, we are very luck that this worked out for us, but in the case that we'd had to cancel the rewrite and go back to the old codebase, we would have been left way worse off than we were before the rewrite had started, which could have been a huge problem. And, the other problem, in the same vein, was timing. At some point, like I said, we kind of switched out the old codebase and new codebase without a lot of people noticing and it was really hard to figure out when to do that. Software estimation is extremely difficult. And Lyft specifically; There were a lot of employees who weren't on the iOS team who were very sceptical that we would ship ever and definitely ship on time, which was tough. We got very lucky, but it came down to a matter of days where people wanted to add features to the old codebase that we would have to had add to the new one as well that could have pushed the whole thing back for a huge amount of time.

I showed this graph from the beginning and of course I'm going to compare that to the one from the new app. You can tell that there's a huge difference on the surface. So the complexity is much more spread out across more modular reusable components, and you can see that not one thing owns everything anymore. It's also worth noting that the scale is kinda strange, but there's not a single file in this image that's larger than 300 lines, and the total line count went from 75,000 to 22,000 lines of code. So it's just like a huge amount of complexity reduced.

...the total line count went from 75,000 to 22,000 lines of code.

18:32: So the last thing I want to address is I think the most important question of this whole thing, and of course I don't think there's actually a good answer for this, so it's kind of a cop out, but I think that this is really the only question that matters with this. And short term I'll give some reasons for you guys, because of course I think it was worth it, but short term the code quality is so much higher. I showed that example of how much faster we're able to add features and how much easier it is to work with the codebase. I think that really matters. And I think the long term, that should stand the test of time as well because it'd be very difficult to get back to the architecture that we had in the old app, with the new one. Which, should mean that years from now we could still look back in this codebase and not have huge architectural problems that got us to this point.

...I think that everyone at Lyft will look back fondly on this experience.

So like I said, I think this is impossible to answer, but I think that everyone at Lyft will look back fondly on this experience. I definitely am happier working on this brand new Swift codebase than I would be dealing with a lot of these crazy issues that we've seen so far at the conference with interop. I have a few minutes for questions, also there are five of us from the iOS team out at our booth if we have questions that take too much time, so thanks.

Q&A: (19:10)

Q1: The question I have is; so you're doing the rewrite from a mature product that you've been working on for a long time. There must have been a lot of bugs in the past and you fixed them, and now we are doing the rewrite. Probably most of the people forgot about the concepts that were going in when you were fixing bugs, so were there a lot of problems where you found old bugs that are back in your new implementation? Was that a problem for you?

Keith: We definitely had some of those. I think that we didn't really use the old codebase as a reference implementation for the majority of the work, so it really just helped that we were looking at this problem with fresh eyes. But I think that the big thing here was we leaned a huge amount on the QA team to find problems like that, especially ones that they had found before in regression testing, that we could fix again.

Q2: Great talk, thanks for sharing. So you changed two major variables at once. One is rewriting the whole app, and the other is using Swift. So how much of the rewrites success do you think is attributable to both those major changes? What would have happened if you had rewritten this from scratch in Objective-C? 

Keith: That's a great question. I still think that we would have been way better off if we had rewritten it in Objective-C. I think that being able to rewrite it in Swift was just like a cherry on top where we're all much happier working in this new technology that's much less mature, but we're hoping that it'll do really well in the future. I definitely can't attribute the entire success to Swift, I think that that was just a great side benefit. And I think that if this had been three years ago, we still would have been in the same place, but probably would have rewritten it in Objective-C.

Q3: Could you list maybe some of the tools that you used in gathering metrics about before/after impact. You showed that visualisation, it would be great to know what tools you used to measure that. And maybe if there are any other metrics you used like simplimatic complexity to measure the quality of the code before and after the rewrite? 

Keith: I don't remember what that tool was called that we used for the graphs, but I can find it and tell you. We didn't really use any other tools like that. You saw how different those two graphs were. It was pretty easy to look at how everything worked and really be able to tell a massive difference. There was a presentation that the person who started the project did after everyone was moved on, it was like; this is the new architecture, this is how a thing works. We put views over here in this framework and stuff like that. So I think everyone kinda got on the same page there, but we didn't really use a lot of tools to measure that kind of stuff.