How to make an Objective-C API more Swift-like.
Hey everyone. I got a bit of a cough so I apologize in advance if you're hearing that. I heard it's Halloween so I wanted to do a Keynote presentation rather than Deckset. I thought of something scary, but actually monads are not scary at all. This is not what my talk is about.
Really I want to talk about ... Whoa, why does it not ... I don't even know how to operate. This is not scripted. I want to talk about retain cycles, because they are scary. This is also not what my talk is about. This is my book, I'm going to quickly note. We're going to do live coding, which is scary.
Let's go to Xcode and create a new project. We're going to call it Structs and Classes. Hopefully you can all read it if I open the main.swift file, this is readable right? Yeah, cool. So let's delete the comments and run this just to make sure that it's working. You can see “Hello, World” at the bottom ... Let me see if I can move this part, make it a little bit bigger. Awesome.
The title that you saw in the program was called “Swift Interop" and what I want to talk about is how we can Interop with Objective-C. We've seen this yesterday in great talks, and specifically what I want to talk about is how we can write a struct wrapper around objects. I think this is very important. This is how to make an Objective-C API a little bit more Swift-like.
This is how to make an Objective-C API a little bit more Swift-like.
Let's have a look at some structs before we start doing this. If we create an array and let's say we put 1, 3, 2 in there. What we do now, is we created this array using “let”. That means that this is an immutable value, because arrays in Swift are structs. If you create a struct using that, it's immutable. This is a big advantage because if you see a struct defined using let you know that it is never going to change again. This array will always be 1, 3, 2.
For example, if we want to write something like array.append, and append the number zero to that array, the compiler will not let us. It says, "Hey, you cannot do this. This is wrong." If you click on the red dot, it proposes a change. If I press enter, it changes the let to a “var”. This means that now we define the array as mutable, and now we can append things to the array. It's sort of magical.
There's other things that we can do. For example, we can say array.sortInPlace, and then let's print it. The sorting in place changes the array in place, so it's mutating the array. This is how we could sort, and if we define it using let again and I change this back, you'll see that the compiler warns us, and it says, "Hey, this is unacceptable, you cannot do this."
The compiler warns us, and it says, "Hey, this is unacceptable, you cannot do this."
This feels maybe a little bit magical and to see how this works under the hood, we're going to create our own structs. Let's create a person struct. I delete all of this. Inside this person struct, we're going to have 2 properties. We're going to have a name, which we define using let, which means that it will never ever be changed. Then we're going to have an age, which is defined using var. This is something we can change.
Let's create a person, and let's create a new person with a name, and an age. We can print this. This is how you can create a struct. This is hopefully still very basic, and very simple. However, if you remember what I just said is that, a struct defined using let is not changeable, so it's immutable. This change, what we, for example, would want to do, is say Lady Gaga.age++, and increase the age by one. The compiler will not let us. We cannot change any of the fields of the structs. This is illegal. Because we define it this way, the compiler will again help us, and if you press enter, it changes it to a var. Now we can change the struct. This is very, very nice behavior. We can never change the name again, because it's defined using let.
Another really cool thing about structs is that we can copy them. They have value semantics. If we create another copy of this saying theRealGaga, it copies the struct field by field. Now what we can do is change the age of one, and then print both of them. What should happen, is because they get copied field by field, the Gaga variable will have an age of 30 because we increased it, and the real Gaga, because we copied it at this time, it will still have an age of 29. Let's run this, and hopefully this will work. Cool.
This is very useful behavior, this copying. It works everywhere.
This is very useful behavior, this copying. It works everywhere. It works when you define variables, and it also works when you call functions. Let's create a function, celebrateBirthday. This function gets one parameter, the person. What we want to do inside this function is increase the age, and then print it. We just say person.age++, and then print “Happy birthday”, and then print the age, that's the person.age.
Now we’ve got a problem. Again, the Swift compiler warns us, you cannot modify this person. This is really because when you write a function, what you're really writing implicitly for the parameter is this, let. Any parameter in a function is automatically a let parameter. If I delete this again we can look at the fix and Xcode just helps us again. Whoa, it should have helped us. It inserts the var. I think it was because some kind of selection. It inserts the var, and now inside the body of the function we can change this person.
If you're used to functional programming you might think this is very nasty.
If you're used to functional programming you might think this is very nasty. Why does it put var there and what does this var do? Does it really change the person? Let's see how we can call this. I'll delete this, the realGaga, and just call the function. If it's true, what I just said, then what happens if you call this function, a copy gets made. This gets called with the copy, and then inside of the function we can change the copy, but outside it still stays the same. It makes a copy of the structs. What I expect to happen is that it brings happy birthday 30, and then it prints the value of 29.
Let's run this. That's cool. It worked. If we look at the type of celebrateBirthday, I'll zoom in a little bit, you can see that the var is not there in the type. This var even though you define it inside the function definition, or sort of here, it's only internal. It only means that inside the function you can change it.
As we all learned, having these top level functions is not something Swift designers like, so we need to create an extension. Let's do that. Let's move the stuff inside an extension on person. Indent it nicely. If we create an extension like this, we don't need to pass the person manually anymore. We can delete this part. Now instead of using person, we get this implicit value called self, so we can write self.age++, and print self.age, and because it's Swift, we don't really need to write the self., so we can remove this part, and remove this part.
We want some kind of mutating version of self, some kind of variable self.
Now there is again, some kind of problem. The compiler complains, "I cannot change this." This is because we're dealing with structs. You cannot just change the structs, not even inside a method. We want some kind of mutating version of self, some kind of variable self. Again, we can use the compiler's help and it'll insert the word mutating. This is also how it works in arrays, so array append methods, and insert in place methods, mark this mutating. That's why you can only call them on variables and not on let.
Let's verify this. We can now code Gaga celebrateBirthday, and this all works. Now of course because it mutates the structs, Gaga is now 30, so the age is now 30. Just to make sure, if we write let here, the Swift compiler says, "Hey, I cannot do this." You're going to mutate the thing, you define it using let, not going to work. That's how we can write structs. I'm going to delete all of this once more, and we're going to work with objects now. We know structs, let's do objects.
I want to create some sample data, and this is going to be some NSData. As we learned yesterday from JP's talk, this is a great way to store bytes, it's just not very Swift-like. Let's put “Hello, World" in there. Then we can write data using encoding. I don't know what happened to my auto-complete. Yeah, it's coming back slowly. Yeah, there we are. This data using encoding takes a string and creates NSData out of it. I use the exclamation mark because it's optional normally, but I know this is always going to work. Let's print it, and then hopefully we'll see the bytes on the screen.
Here we have some bytes. This is great. There's another called NSMutableData, this is cool because in Objective-C and in Cocoa the designers already had the foresight that mutable data is bad, or at least dangerous, and if we can have immutable data that's great. That's why there's NSData and NSMutableData. We can create myData and we can just do it like this and say NSMutableData. This creates an empty NSMutableData. We can now append to it. For example we can say, appendData and append some sample Data. Then we can print myData. This should print exactly the same thing.
In Cocoa this is very common, so there's NSArray and NSMutableArray, there is NSString, NSMutableString, NSDictionary, NSMutableDictionary, and the same for data. There are two main problems that I have with these objects. First of all, it doesn't have copy semantics, so if I write empty, create a new variable, and say this is myData, and now we print both of them. If it was a struct, I would expect that this empty is still empty even though we only append to myData. That's not how objects work. If we see this, they are both the same thing. What happens is if we define this let, it does sort of copy myData variable, but it doesn't copy the object, it copies the reference to the object. This is going to be important. We have a copy of the reference, but it's still pointing to the same object. When we call appendData on my data, we're modifying the same object for both variables.
There is an even bigger problem which I think, at least, is a big problem.
That's not so cool. There is an even bigger problem which I think, at least, is a big problem. Let's say we want to write some function process data. This function takes ... Let's see where my cursor is. This takes some immutable data, so it takes an NSData variable. NSData, typing is hard today. It takes some NSData, and what we want to do is we want to implement this function and it's really important for this function to work correctly, that our data is not empty. We can write a guard statement, and we can say data.length, chevron, zero, otherwise we return and bill. Now here we start processing the data.
The problem with this function is that NSMutableData is a subclass of NSData. Wherever we need an NSData we can pass in a NSMutableData. Let me delete some of these lines and call processData with my data. This works. This is, in a way, it's very scary. What if we would have called this on a background thread? We do dispatch asynch and we call process data, and then in the meantime we start changing the data, and append some sample data for example, or even replace some bytes. Maybe we delete all of the bytes that are in the data. If this process data function starts working on some kind of background thread, and it goes through this guard statement, and then all of a sudden another thread starts working on the same data, we might have a broken implementation. Everything might crash.
We get all of this implicit sharing when we're dealing with objects, and this is, in my opinion, very scary. It's a great source of bugs. A lot of Objective-C developers know about this and that's why, for example, in Objective-C we always have to define our properties using @copy, and we always need to copy things.
This is the only way we can be completely sure that we have our own copy and that nobody will ever change it again.
Let's do that. What we want to do really, is let's rename this input parameter to something like input data, and now we're going to say let data = inputData.copy, and because it's some old API we need to force cast it. Now we have inside our function a unique copy of the data. This is the only way we can be completely sure that we have our own copy and that nobody will ever change it again. There's two ways to deal with it. Now what we did is we sort of moved this responsibility to our function and we make sure inside the function that we have our copy. You could also move that responsibility to the caller of the function, but it's easier to make mistakes.
This is how you would do it if you come from a Cocoa world. I think it's not so nice, because with the structs, as we saw, we got this copying and stuff all for free. That's really what I want to have. Especially for classes like NSData, it would be nice to get this copying for free, and not have this opportunity for making mistakes. Let's delete all of this again, and let's write a struct.
We're going to write a struct data, and inside the struct we're going to store some mutable data. This is private because we only write some things that operate on it. We can just start with an empty mutable data, actually we don't need to type, and we need to write initializer. This is also fairly simple. Maybe we call it otherData, and it's NSData, and because in the initializer nothing is shared yet, we can just say data… Let me fix the typo. So we can just say data.append, and append the otherData. Now we can provide the default value, this is all still very standard, simple Swift. Now we can create data structs. We can say something like, myData = Data, and we can have an empty data and print it. This works, and we can also put something in there, so this works. We can put sampleData in there, and that works. We have a struct, we store data, but we don't do anything interesting.
I want to write 2 methods. The first one is to get bytes out of the data, and the other one is to append something to the data. Let's do that. For the bytes we can actually define a variable, because this is much nicer in Swift. It's going to be a UInt8 array, as we've seen yesterday in JP's talk. There's a function on NSData called getBytes and it has this complicated type signature. It wants a buffer and a length. Well, the length is easy, we can just say data.length. For the buffer it needs some kind of array of UInt8's.
We want to have an array filled with zeroes so that NSData can copy the data in there.
In Swift, actually, we can do some cool stuff with that. What we can do is, we can create an array of UInt8's. What we want to do is, we want to have the right size already, we want to have an array filled with zeroes so that NSData can copy the data in there. We want to have an array with the same length as the data, and we want to fill it with zeroes. Now, what we can do, and this is really cool, I think, about Swift, is that we can just say ampersand result, and it creates an unsafe mutable pointer out of that array. Then we can return it.
What we did is we wrapped this up in a nice simple property. You need to do this once if you're wrapping objects, but then from there on everybody in Swift can just say myData.bytes and print it. Let's run this. Yeah the bytes are there. That's cool, I think. This is in a way the easy part because it's only an accessor for reading bytes out.
Let's write the append. We're going to create a function append, and it's going to get some data. It just appends to the data “other”. Okay, so now what we can do, let's start with an empty data, and then append something. Let's start with an empty data and then append sampleData. Let's run this. Oh no, it compiles, this is not good. We defined as myData using let, it's a struct, why can we call append? This is not what we want, this is not the kind of semantics that we want. Again, here we need to think about how objects work in Swift, because this append function, it really needs to be mutating. The reason why the Swift compiler doesn't see it, is because we're not actually mutating the reference called data. We're mutating the object that the reference is pointing to. Swift will not notice. Objects are always mutable and Swift doesn't help you here.
What we really need to do is to write mutating. Now we get the error that we wanted and if we make it a var, it will work. The method was correct, it was just not mutating, because it was mutating objects, and not the reference. So far, so good. What about if we make a copy? Let's say we have myData, it's still empty at this point and we create a new empty data and we just copy, just like we did with structs. Then we append something to the myData, and then let's print both of them. What I expect is that the myData prints all these bytes, and then the empty one prints an empty array, because there are no bytes.
Oh no, we made a mistake again because what's happening here ... The moment Swift creates a copy of the structs, it copies it field by field.
Oh no, we made a mistake again because what's happening here ... The moment Swift creates a copy of the structs, it copies it field by field. We've seen this with the person, it copied both fields. Here there is only one field and that's this data field. What Swift does, it copies it, and because we're dealing with objects it copies the reference and it doesn't copy the actual object. Swift doesn't know how to do that. Both of them have a different reference, but they're pointing to the same object. That's why when you're dealing with these objects inside structs you need to do some more work and take some extra steps.
The way to make this work correctly is, inside the append we need to copy the object just before we append. What we need to do is something like data = data.mutableCopy and then force cast it. Sorry NSMutableData, and now hopefully we get the correct behavior. Swift will help us again and say, "Hey, I cannot change this data because you defined it using let. Make it a var." Then we make it a var and then this works. Now we also have this thing that if we would remove mutating, it wouldn't compile anymore, because now we're actually changing the variable. Let's see if this works. Let's run it. Hopefully now we get the correct behavior. Yay! We got this data, we made a copy, and then only once we start appending it makes a copy of the object underneath. Now we have this struct-like behavior. This also works when you're calling functions, and we get copies everywhere. We don't need to worry about this anymore.
Even though this is “functionally” correct, it's very expensive.
There's still one problem. Even though this is “functionally” correct, it's very expensive. To see why, let's make a for loop and append something 10 times. This is how you can loop over something 10 times, and then let's print, instrument this program a little bit and say “making a copy”, just to make sure that we know when a copy is made. If we run this, and append something 10 times, now we have a very large data string, and we have 10 times the printout making a copy. Really, this is not very efficient. The first time when we create this empty struct, we wanted to make a copy. Then in this for loop, we're not really sharing this NSMutableData with anyone. We would like this to be super efficient, and not make all these copies all the time. That's a bit painful.
What we can do there is use a function called isUniquelyReferenced, and it's function is going to check if an object is uniquely referenced by something. What we want to do is, if it's uniquely referenced, so basically nobody else is using this NSMutableData, then we just want to append and not make a copy. The function, we can use it like this isUniquelyReferencedNonObjC, and then we can call it with the data. If it's uniquely referenced, we only want to append. Otherwise we make the copy, because we're sharing it. This is not going to work I think. Let's run it.
It's almost correct. This is not going to work because of the function name. If we scroll up, it still makes 10 copies. The function name has non-Objective-C in it, so this only works in Swift classes, it's not going to work in Objective-C classes, it's just always going to return false. There's no nice way of doing this, so we just have to suffer through it and do it the hard way. What we need to do is, we need to somehow take this NSData and make it into a Swift class. In order to do that, like we've seen with Andy yesterday, we need to write this box class. We can just store anything that we really want in it. Then it becomes a Swift object.
This is how you can write the class, unbox = value, and now what we can do is change this NSMutableData to a box of NSMutableData. Then it's a Swift class, and now the compiler will help us and say, "Hey, everything is broken, so let's fix it." It's very easy actually. It's very mechanical, and you can do it in a much nicer way, but we don't have time for that today. We can say data.unbox.length, here also unbox ... It's very mechanical, we always just need to insert unbox. Let's see, I can hardly see my cursor, there we go. There you go to unbox, and whoa. Sorry about that. Here we go data.unbox, here we go data.unbox, and here we set it at unbox. Now we're almost done. We only need to one more thing, and here we need to box it up again. Hopefully now if we run it, everything is correct.
At least it's semantically correct. We have one large array and one empty array, and we only have one copy. This is how you can do it in a very efficient way, and this is also how the Swift structs work under the hood in the standard library.
With that, my time is up. I want to thank you and I'm happy to take any questions. Thanks.
Q1: Thanks Chris, a very beautiful, elegant talk. I have 2 questions. One is, you make a good point that with asynchrony the immutability is nice. You probably wouldn't want people to get the idea to use structs all the time instead of objects. You'd probably recommend sometimes to use objects. I want to give you a chance to say that. The other question is, in the name of efficiency, it would seem like every time I want to access a byte from this, at least logically, there's extra work going on to go through the structure access and go through the box. That's 2 more memory hits, and if you're blowing the decash on your machine, that could be severe. I'd like you to address that as well.
Chris: Yes. First of all, yeah, I'm not saying that we should use structs everywhere. Whenever you can, use them. Then the question is, when can you? To do that, I want to reference the talk Andy gave somewhere else about value types. It's been referenced a couple of times. It's really, really great. There he explains when you can use them and when it's better to use objects. There's also a post about this, saying that still 80% of the things will be objects, but when you can use value types, and when you can use structs, then they're actually really nice, and very helpful. It's only in certain cases when you're dealing with data mostly. When you're dealing with things like files, or things that are displaying on screen, probably you want to use objects. Yeah, and about the efficiency, this is not necessarily the most efficient way. It's more about showing how you can take an OOAPI and wrap it inside a struct, and yes, there will be some overhead compared to using NSData, but then you get safety for that. You can always optimize things that way I think.
Q2: Great talk. Thanks for that. I'm super curious about a few things with isUniquelyReferenced. It seemed super surprising that it just failed silently when you passed in an Objective-C type rather than say fiddle erroring or something like that. Just curious if you have any idea why that's the case, and why this wouldn't work with Objective-C types given that it should really just be using ARC, albeit slightly different versions under the hood.
Chris: The short answer is; I have no idea. The longer answer is what I think is happening is that Objective-C objects are in Objective-C runtime and Swift objects are in the Swift runtime. I don't know exactly how either one is implemented but I think that maybe they could make it work for Objective-C objects in the future. Now it doesn't work. There's another function called isUniquelyReferenced, without the non-Objective-C extension, and that one works only on Swift objects. Somehow I couldn't really get it to work correctly. These functions, they're a bit weird. I have to dive into it more. I looked into it a while ago, but I forgot the nitty gritty details. This is the one that I use now, and it works perfectly. I'm sorry, that's all I can say about that.
Q3: Hey there. You alluded in passing to a more elegant way of doing that unbox, unbox, unbox, but you didn't have time to look at it. I thought a question might give you some time.
Chris: Yeah, let's write it in one line, it's actually a one liner. I had to time it very tightly. Let's see where my cursor is, here. What we can do is we can name this guy internalData or something. What we can do is create a data var, and let's call it like this, and we can just return internalData, data.unbox. Now here we can just write data. appendData. Actually, this is the read only one, but here we still need to write data.unbox, but here we can write data.length, and data.length. This is actually very nice. This is how I would do the refactoring normally, because then you don't have to change any code.
Yeah, sorry, this is called internalData. This is how you could do that. There is some other things. You can also create a mutable variable, and do the copying in there, and all the isUniquelyReferenced. Then you can add more mutating functions that just use this mutable variable. This is a trick that was in one of the WWDC sessions. Yeah.
Q4: My first question was going to be what you just referenced, the programming in Swift with value types like read and write vars that you just alluded to. My second question is, given all the talks about open sourcing Swift, and language features, do you think that there is maybe a Clang attribute that we could use to bridge these kind of mutating attributes across the Objective-C runtime? Say if the NSData API had Swift attributes for append data and those functions.
Chris: Maybe. Yeah, unfortunately I don't know anything about the internals. I was hoping that maybe because of the Clang meetup down south we would get open source Swift today, maybe. I don't know, probably not, because then we would have heard it. I don't know about the internals, sorry.
Audience: I was just mentioning that you did allude to it and that you would have one variable, which is read only and that would have a second computed variable that was writable so you could always use the writable variable in your mutating functions to kind of podify that contract.
Q5: In this case you're waiting until you call append to make the copy of the NSData. Do you think it would be possible to do the copy when you actually copy the struct and using either overriding copy method or operator?
Chris: I'm sorry, if I understand correctly, you want to do the copy up front when we create the new variable, right? Yeah, so this is what I wanted to do at first too. Actually it's really nice, the way they do it now. First of all, I think it's not possible to do that, but it's really nice because it's a technique called copy and write. Only when we write, we make the copy. This allows us to pass these structs around everywhere, and no copy is made. You can call functions, you can do whatever, and no copy is made. Only when you start changing the data and there is another copy somewhere, then the copy is made. The compiler can therefore be very smart about the sharing. You can have super efficient structs and of course arrays need to be extremely efficient. This is how arrays work. In practice, once you start thinking more about it ... It took a while for me to grasp it, but it's actually super nice, and way more efficient and predictable.