Swift with a hundred engineers
Tuomas Artman shares the good and the bad from Uber's rewrite
All right, thanks for having me up here. I'm Tuomas Artman. I'm the tech lead for architecture and frameworks at Uber, which means that I have an awesome job. I get to work with probably the most talented people I've ever met in my life. On architecture, defining how Uber build out their applications, used by millions of people.
Swift with a hundred engineers - motivation, architecture, learnings.
Today, I want to talk about what it means to be writing Swift code with a hundred engineers. You probably noticed that we released a new version of the rider main app, a week ago on Wednesday, and it has been fully rewritten in Swift. I want to talk about the motivation of choosing Swift, about the architecture- skimming over it very quickly, and most of the time I want to spend on the learnings that we got out of this big rewrite.
Uber's beginning- why a rewrite?
Starting four years ago this was the entire mobile team at Uber (*points to screen which displays a photo with three engineers*) and they built the foundation of what we use today, which is this application. This application has served us very well for the past four years. But as we've expanded and exponentially grown our mobile engineering team, we started seeing breakages of the architecture, we started seeing problems where feature development would become pretty hard. We had to test multiple code-paths because we shared a lot of view controllers amongst different teams. We really started to hurt with the old architecture because it was written by two engineers and we had grown the team to over one hundred, at that point. At the same time we saw that the UX of the product itself didn't scale either. We had launched in numerous cities and we started seeing problems where the product slider, down the bottom, would become pretty dense- because all the city teams wanted to launch new products in their city. So we wanted to at the same time do a full UX redesign on the rider application. Basically both of these problems; the architectural problems in the current application and the full redesign on the UX, lead us to the decision that Ben told us yesterday (in his talk) we shouldn’t do - which is just; change everything. Start from scratch. Not try to look at the architecture and fix it going forward, but start from scratch.
We did, in 2015, do a lot of correction makeovers, we tried to salvage the architecture. But essentially with a full UX redesign we came to a place where it was just safer and more optimal to redesign our entire application from scratch.
Architectural goals for rewrite - reliability & supporting Uber's future.
So we started off with looking at our architecture goals. What do we want from this rewrite? And there is basically two that stood out; four lines of reliability of Core Flows, which means basically at the lowest level crash free rates. Obviously meaning a lot more if your application doesn't crash but the user is still stuck at some screen. That's not reliable.
We also wanted to support Uber’s growth for years to come. We wanted this architecture to be as long lived as the previous one looking forward four years.
The choice is Swift
So basically these two goals led us to choose Swift. We knew that Swift was more safe, or at least we assumed. Nobody had tried it out in production yet.
We thought that type safety in the compiler would help us catch problems early on and not through crashes in production.
And we knew that four years from now Swift would probably be ready for prime time and be the only language in which Apple moves forward.
So we set out a pretty aggressive time line. We started in the beginning of the year, in February, we wanted to do things right because we had engineers that had spent quite a bit of time doing rewrites in previous companies and doing rewrites that had failed. We wanted to make sure that this one would succeeded, so we took five months and we took the core engineers from the platform team and we started to look at the architecture and we did nothing else for five months; architecture, frameworks, sort of getting the basics done- putting in linting. Writing out the frameworks that everybody would need and making sure that the base was perfect.
In June, we thought that we had a good architecture and we started on-boarding the Core Flow teams. Core Flow for us means taking a new uberX ride or an uberPOOL ride. So we added about twenty engineers to the bigger team and we spent two months with them trying to vet the architecture, trying to make sure that what we had come up with was actually suited to build a product upon. And it turned out that we had missed a few things. Like on the view side, once engineers had started doing transitions and complicated view manipulations we had to change the architecture a bit in order to accommodate their needs. But two months later we felt that we were in a place where we wouldn't have to do big migrations anymore on the codebase and we opened up the platform to everybody and we told everybody to port over their features if they wanted too.
Program teams at Uber work pretty individually, so we said; we want to launch in November, it's up to you whether you want to add your feature in or not, we will not tell you that you have to do it, we will not tell you when you have to stop supporting the old rider application. In the end some teams started immediately and used all the three months to get their feature over and enhance it, while others started in the last week before our lunch and just barely made it in. But essentially we launched in November, a week ago, and it was a pretty successful launch which was exciting for everybody.
So I don't want to spend too much time on the architecture itself because I think that you will be best served with the learnings of what we learned using Swift with so many engineers. But I’ll give you a quick overview of our architecture.
So we call it “Riblets”, that means; Router, Interaction and Builder and possibly a Presenter and a View. Those are the core components of one piece of the application. It's sort of a take on VIPER. We looked at MVVM, we looked at VIPER, we looked at MVC and we came up with our own innovation on top of VIPER. The big thing that we wanted to do is obviously compartmentalize everything and make everything testable. So each of these components within a Riblet had protocol interfaces, so we could take one unit out and test it perfectly. All the Riblets themselves would be managed in a tree. So we didn't have a state machine we had a state tree. Each of these boxes represent a Riblet and the core piece in our architecture was that we didn't want it to be view-based we wanted it to be business-logic based and we wanted all the business logic decisions to be very local.
So in this tree if you look at, for example, the ‘signup’ Riblet. It doesn't know it's parents, it knows that it has been injected with what it needs. It's dependencies have been fulfilled by its parents, there's probably a listener listening in on what the signup flow does but it doesn't know where in the tree it is. So it is truly independent and every single piece makes local decisions. Starting, for example, with the ‘App’ component. The ‘App’ component is interested in only one business piece; Do we have a session token or don't we have a session token? That's the only thing it listens in on. If the app component decides that we don't have a session token on the stream it routes to the ‘Welcome’, Riblet. If it, at some point, gets a session token it tears down that ‘Welcome’ component and goes to the ‘Bootstrap’ component.
After that, every single one on the right hand side of the tree knows that we are logged in. We have a token. They can use the token from the dependency injection and they don't have to care about a user logging out. If somewhere down the line a network call happens that invalidates that session token, the app component will know. It gets invoked via a stream, it knows that we don't have a session token anymore. It tears down the ‘Bootstrap’ tree and goes back into the ‘Welcome’ flow.
So this basically makes it possible for multiple teams to work on components individually without having to, sort of, talk to other teams. You can make your local decisions and you know that your dependencies are always being fulfilled.
A lot of files with a lot of lines of code
All of this created a lot of code. We had protocols between everything. We had components that compromised a Riblit, that compromised five different files. So in our codebase we have over five thousand files and over half a million lines of Swift code. In addition to some Objective-C that we still have some of our core components in and that's totally fine.
Lessons learned about Swift
Which brings us to sort of the lessons learned. Like; the good, the bad and the ugly things that we learned about Swift, that you should sort of look out for if you grow your team.
So starting with the good, obviously it's just a better language. You probably wouldn't be here if you didn't believe that. We pretty much use all of the language features that Swift gives us.
The first surprising thing was reliability. So it probably was like four months into the development of the architecture when I realized that; Hold on, I have not crashed my IDE or my application during the entire time. I asked my team and they said "Yeah, I haven't crashed my application either." And for five months even though we developed a fully new architecture, we just hadn't crashed, even in debug mode. The first one that we got was when we tried out 32 bit devices- we got an integer overflow when unpacking some JSON. That was the first crash that happened, during the entire development cycle.
So we were pretty stoked about this and we launched a week ago. A fully new app, obviously we had tested it internally and we had tried it out with our employees but it was still sort of very… yeah you know, I was scared when it went out, just to see how much we would have crashes. Our crash-free rate target is 99.99% and we are very close. I've never seen anything like this. First launch of an application and it's almost crash free.
The one thing that I have to take into account is you can't allow unconditional unwrapping of anything. Otherwise you won't have this crash-free rate. So we put linting in place to make sure that nobody unconditionally unwraps anything. You'll be caught and you can't submit that diff in, if you do it. That's probably the basis of having a decent application.
On the downside, you have to make sure that all these edge cases are somehow all caught. Like it doesn't matter if you just put if-let’s around everything and you don't handle the if-else case. Your application will probably just not work. So you have to use assertion that hopefully fires in the debug in enterprise mode, but not for the end user, and that will get you to a pretty crash-free rate.
Android engineers are now more welcome
The other thing that we found was that; Android engineers are more welcome now. Especially if they write Kotlin. They just jump over and they continue writing code like nothing else. Our architecture was a multi platform architecture so we decided to do the same architecture for both Android and iOS. We named everything the same, everything worked in the same manner and Swift basically was the language that enabled us to do that. If we had done it in Objective-C, I don't think we could have come to such a close relationship with the Android engineers or even have the architecture be the same across both sides.
Now, to the bad things. These are the most interesting things obviously if you learn through mistakes you learn through adversity.
Testing is hard
The first thing that we found out is that, testing is pretty hard. Swift is a static language and as such you can't really rely on your mocking frameworks that you had used in Objective-C. And because everything was protocol based in our side, we had to find a way to test these protocols. So for example here’s a protocol who’s implementor is creating a storing interface that lets you store data for a key and retrieve data for a key. Now if you have an interactor, some sort of business logic that you want to test, like if it gets a certain input I want to make sure that it stores something to disk. You have to have an implementation, you have to have a mock off this storing interface in order to test out that one of these functions is called. We started with creating these mocks manually, started writing code and essentially decided that this is not scalable. We can’t support this for multiple engineers.
So what we did is we generated a small script which turned out to be a little bit of a bigger script. It had a, you know, its own problems but eventually I think we got it right and now whenever you want to generate mocks for your protocols all you need to type is script/generate-mocks. It will go through your entire source code, search for these @CreateMock statements on top of your protocols, cause we hope that Swift will at some point give us attributes, and it will create the mocks for you. So when you're running through the codebase this protocol becomes a StoringMock that implements storing. What it will do, it will implement all the functions that are public in that protocol. It will give you counters so you can count have many times these have been called. It will implement the actual functions for you and whenever possible it will return default types. So for example in dataForKey you've got an optional NSData and the mock just returns nil, because that's perfect. It conforms to the interface and if you want to sort of test your input as well you can always call dataForKeyHandlers, set it with a closure and you can in your test, test that you get the right input from the piece that you’re testing.
Same thing for storageDataForKey it returns a StorageResult, which is an enum and we just, by default, return the first case in that enum. So that let's you very quickly start testing out and create all those mocks. I think we have about 100,000 lines generated off these mocks. 100,000 lines that we didn't have to code by hand.
On the other bad side, tooling issues. We call it “infinity indexing”. I don't know, you've probably seen it. The indexer keeps on going. I tried to spend two days with it and it just doesn't complete on our project for some reason. At the same time as a bonus it gives you a high CPU usage at 328%, your laptop becomes hot and you can use your laptop for maybe one and a half hours without it being plugged in. So very strange things, and this became more of a problem the more our code grew. We didn't have these problems before, but once we went over 200,000 or 300,000 lines of code- this started to be a big problem.
Also, the IDE started doing this: (*screen displays video of Xcode, with a string slowly being typed in*). This is not me typing slow, this is me having typed the entire string already but the IDE is for every single key stroke checking with SourceKit whether I’m writing correct code, and typing just becomes impossible.
So what can you do about it?
Well, you can do this (*screen displays video where Xcode is deleted*), if you feel like it. You can switch over to other applications. You can use AppCode a few of our teams switched over to AppCode. Some have a flow where they write the code in AppCode and then they copy and paste it back to Xcode and compile. Very weird things. You can contribute back to Nuclide, which is Facebook’s IDE. It doesn't support Swift yet but you know if you contribute maybe you can add it.
Basically what we did is; we added more frameworks. So we split up our application to multiple frameworks, which means each framework has less files, which means that everything becomes fast again. It seems that the more files you have in a framework the more tooling issues you will have.
So we initially had already the architecture defined, that we had multiple frameworks. I think we have 70 or 80 in total. So this became pretty easy for us to split it up even more. Obviously you can do this, you can just disable indexing, if you feel like just writing on grayscale text and not having code-completion. Which a few of our folks did as well.
The next bad thing; binary size. So any application’s budget is 100MB. After that you have to download it over WiFi. There are a few things that go into this. First, you have to be aware that structs can increase your binary size. If you have structs into lists they are created on the stack and they can increase your binary size. Initially we had all of our models as structs and the binary size implication was something like 80MB. Which wasn't really that good.
Optionals usage will increase your binary size as well. You will be using optionals, but the thing you don't know is that the compiler has to do a lot of things; It has to do checking, it has to do unwrapping. So even though its just a one-liner for you with a question mark, you get a lot of size in your binary.
Generic specialization is another problem that we encountered. Whenever you use generics, if you want your generics to be fast, the compiler will specialize them and give you quite a bit of binary size increase, as well.
And the Swift runtime libraries need to be included in your application. Everybody says that it's like 12-20MB. The actual download size, for us at least, is 4.5MB because they compress well because they haven't been encrypted. So 4.5MB in the actual download, for all three architectures, including the Watch app, is what we measured.
So what can you do about it?
Well you can play around with optimization settings. You can make sure that you can turn on a whole module optimization, sometimes it leads to a smaller binaries, often leads to larger. The most important thing is that you need to know where you are spending all the budget and we wrote a tool for that. So we went ahead and mapped every single symbol to a file and then we combined all those files and created this nice tool that lets you browse into basically the folder structure of your application. And look at every single Swift file and get the file size that it contributes back to the application.
If you want to see this open-source. Just scream out loud, I think we have the engineer here who wrote this and we are looking into open-sourcing many of the findings that we found.
The next bad thing; startup speed. This is pretty interesting because if you saw the WWDC talk the take away from that was- if you want a fast startup, just use Swift. Well, there's sort of a reality distortion field going on there. Once again, which is awesome. The problem is that usually the number of dynamic libraries in your binary directly linearly affect your startup time that you spend in pre-main. So startup time is pre-main and post-main. Pre-main is what happens before your main function is called and there is a lot of stuff that happens when you have a lot of dynamic libraries and a lot of time is spent.
For example, the Swift runtime libraries on an iPhone 6s take 250 milliseconds to do their thing and that means that’s 250 milliseconds that you just can't get back by using Swift. So that's kind of a bummer.
And we saw that the tooling issues that we faced are fixed by creating more frameworks and the more frameworks you have the slower your startup speed is.
So what can you really do about it?
Well you can relink everything back into your binary and that's what we did. So we built all these frameworks and then we have a post-build step that takes all the symbols out of these frameworks and links them together in your static binary and that's how we get away with the startup speed that we had.
Without this our application would probably launch in something like 4-5 seconds on iPhone 6's. Probably even slower on iPhone 4s’. With this trick we were able to take it down. You have to test all the time if you're interested about startup speed you have to start early. You can’t really rely on many of the tooling that Xcode gives you, like if it tells you how long you're spending in pre-main. Those figures are just plain wrong. They have nothing to do with reality.
You also will have issues with enterprise provisioning profiles. If your device has enterprise provisioning profiles, your application might take ten seconds to load depending on how many provisioning profiles you have. And we had some interesting bad devices as well. We have two devices that for some reason just run very slowly. iPhone 6 devices that we weren't able to figure out why they run 10x slower than everybody else's.
While you are doing all this relinking you can actually do something to increase your post-main time as well.
So one thing that we are trying out right now is to use DTrace to probe which symbols are accessed in the startup sequence. And because we do this relinking, we make sure that we link them in the correct order so that on older devices you don't have to load too many memory pages into memory. But usually you will sort of have the optimal set of pages that you read into memory during your startup. And the preliminary test that we had said we increased post-main time by as much as 20% on an iPhone 4s, by just putting this in.
Which brings us to; the ugly. You saw this yesterday and you saw this a year ago if you were at Swift Summit. Compile speeds are just horrible and we started having real issues, our base application, clean built, was something like 15 to 20 minutes.
We were pretty concerned so we asked everybody on the team; how big of a problem is this?
And the way we said it is:
This is the figure, pretty half and half:
Half of the people would with all of it faults, all the tooling issues, all the compile speeds, would stick with Swift. Others would switch back.
We added another question there:
“If there was one or two things that you could change about Swift, would that make you change your mind?”
This is what happened:
So we were like "Well this is actually a problem that we can solve. If it's only about compilation speed we can do something.”
Solving the compilation speed problem
So we started figuring it out. The first thing is that we can contribute back to Swift and you can as well. We tried out to not use type inference in our code and we looked into building a post-build script that would use SourceKit to figure out all the types and then just change your code to have all the type information in them.
Lastly, we started combining files, and we found out that combining all of our 200 models into one file decreased the compilation time from 1min35sec, to just 17sec. So we are like, "Hold on, this is interesting, combining everything into one makes it much faster." The reason for this is that, as much as I know, that a compiler does type checking for every single file. So if you spawn 200 processes of Swift compilers, it needs to 200x check all the other files and make sure that you're using the correct types. So combining everything into one makes it much faster.
And we stumbled upon, well actually AirBnb stumbled upon, this one trick and shared it with us like 2 weeks ago. Which is this (*points to screen*), which is very interesting. Whole module optimization does exactly what we want. It compiles all the files in one go. The problem with whole module optimization is that it optimizes, so its pretty slow. But if you add a user-defined custom flag; SWIFT_WHOLE_MODULE_OPTIMIZATION and set it to yes, as well as set the optimization level to none. It will do whole module optimization without optimizing. And it'll be super fast.
If your application is broken up into frameworks this is what you want to do, at least for now until Apple hopefully fixes the speed. So we implemented this we actually landed it over the weekend this week. And we went from 20 minute build times to 6 minute build times. Our largest framework which was basically the Core Flow, which had 900 files, previously compiled in four minutes and now it’s 23 seconds. You lose the ability to do incremental builds, but with 23 second build times on the biggest library, I don't really care. Most of the other targets have much less files and will really go much faster.
Uber is contributing to Facebook's 'Buck' and adding Swift support
But we can do something else as well, because once you do whole module optimization your CPU usage goes down to 30%. And we were like; "Oh all right, we can, you know, do something more now." We had to use Buck on the Objective-C side. Buck is superior dependency management, reliable incremental builds and remote built cache. It's a build system by Facebook. You should look into this if you have problems with build times. We did that previously for our Objective-C and Android builds and we had a 4x faster clean build rate. Our incremental builds were 20x faster because it used a remote build cache. So if you're compiling multiple targets and somebody else has compiled that code on some other machine it will be available in the remote build cache and it will just use that artefact. So it won't recompile anything. On Android it’s even faster, like 6x faster clean build time and incremental builds are just blazingly fast.
It’s not for Swift, but we are working on it. So we have been contributing back to Facebook, with Swift support. We started by adding Swift support for Xcode project file generation. This is available, I think today, or about to land. We use it internally already so you can tell Buck to create the project file for you, based on your folder structure.
Next up we are working on adding Swift support for Buck builds, which means we can use Buck to build our application. Lastly we want to look into integrating our Swift support for Buck into Xcode as well. So when you hit cmd + B, it will not use Xcode build, but it will use Buck to be built.
We think that we can go from the 6 minute build times that we have today to maybe 2 minutes or even less, with Buck. Which will essentially solve the problem of Swift compile times. This is all available just follow the Buck repo and you'll eventually see Swift support getting in.
So the take aways from this.
Look out for:
Figure out how to unit test.
And start using Buck if you feel that your team is growing.
Maybe the positive message here is that you will probably not run into problems if your team is small. If your team is large you will run into these problems but you will probably have some engineers to help you out, to figure these things out, because there are solutions to all of these problems.
So, that's it.
Follow us on uber.github.io for open source projects.
And eng.uber.com for some blog posts we will be doing about the architecture.
That's it- Thanks.