Everything you ever wanted to know about Sequence & Collection

Soroush Khanlou at Playgrounds Conference, 2017

playgroundscon


Transcript

Soroush

Hi, everyone. I'm Soroush. I have a blog, I have a podcast, but we have a lot of material to cover, so I'm just going to skip over that stuff. 

The Standard Library is super, super deep. I feel like I could teach like a semester long college class on it. There's a ton of stuff in it, and even just Sequence and Collection, 20 to 25 minutes is not quite enough. So while I'm going to try to teach you everything you every wanted to know, I'm probably going to actually teach you almost everything you want to know. So, when you are working with a series of elements and they need to be ordered and you need to work with them in your apps, usually you just reach for an array. And 99 times out of 100, that's absolutely the right answer, but array and all the other collection types that are built into Swift are built on a complex and well thought out series of protocols that add behavior incrementally, and the thing about these different protocols lets you hook in at the right level and work with your sequences, with your collections, in the most effective performance ... Like matching with the performance characteristics that you want in all those ways that you want that code to look.

So learning about these protocols is really helpful and lets you do a lot of really cool things which we're going to learn about today. So everything is built on the back of a protocol called Sequence. Everything sort of inherits from Sequence in the way that protocols can inherit from other protocols. On top of Sequence is built Collection, on top of Collection is built BidirectionalCollection, and on top of BidirectionalCollection is RandomAccessCollection. These are the ones we're going to focus on today. 

There are two more that we don't have time to go into. MutableCollection descends from regular Collection and has its own semantics, and RangeReplaceableCollection comes from RandomAccessCollection. I'm hoping that with the stuff we talk about today that you all can go home and take a look at these protocols and their semantics and their requirements and really be able to understand them just by looking at the documentation. So that's my goal.

We're going to start our conversation from the bottom of this ladder and we're going to start with Sequence. So what is a sequence? Sequence contains the bulk of all the stuff that you're used to; the filter’s, the map’s, the flatMap’s. All of that stuff. Checking if a sequence contains something- it's basically just a list of elements. There are two kind of caveats when you work with sequences. One is that they can actually be infinite sometimes, and so if you try to map over an infinite sequence, you're going to have a bad day. The other thing is that this is usually not the case, but there are very certain cases where you can iterate over the sequence, once, and then if you try to iterate it again you will actually get no elements. So you are only ever guaranteed to be able to iterate a sequence one time. I've only ever seen this once and I really had to contrive a situation to make it come out of the standard library, but there is a case where you can set up a sequence, iterate over it and then try to iterate again and get nothing out. So you can only ever iterate a sequence once.

We are going to take a look at what that protocol looks like. There's some other stuff associated with it, but these are the primary components of a sequence. You have a function that makes an iterator. The iterator has to conform to IteratorProtocol, which seems pretty simple. But what's an Iterator? We go one level deeper. So yeah, Iterator is this associated type. When you declare this function; makeIterator, whatever type you put here automatically gets set up as your associated type and you have your constructed function and that is all good. So if we take a look at the IteratorProtocol, that also has its own associated type, and that type is called Element. So Iterator.Element will come up a lot as we work through this code.

The other component of the IteratorProtocol is where all the magic of sequences and iterators actually happens. So this function; next, basically you call it repeatedly, and every time you call it you get the next element, and you keep calling it until you get nil. When you get nil, that means that you've hit the end of that sequence or of that iterator. Iterators act as a cursor, and they point to the element of the sequence as you work through it. So they kind of know where they are, they're stateful in that sense. So that's IteratorProtocol, and IteratorProtocol is what Sequence is built on.

So we're going to build a sequence today. We're going to build a LinkedList. If you haven't studied up for any interviews, recently, you might not know what a LinkedList is. It's basically a thing that's built up out of small components and each component has a value and a pointer to the next element, and that next element also has a value and a pointer to the next element until you hit the end, and that end kind of terminates the thing. And this is kind of a natural fit for the way that Sequence works, and so it'll be a really fun way to get into how to build an iterator and how to build a sequence.

This is what our LinkedList looks like. It's going to be an enum. There are a couple of different ways to write LinkedList’s, but we're going to do it this way, cause I think it's swifty and fun. There's a keyword here; indirect. It just means if I reference this enum within this enum, please don't freak out, let me do that, and it will handle some stuff for you. So we're going to need that, just sort of an implementation detail, and then our LinkedList has, basically, two different states it can be in. Either that it is a node that has an element and the next element, or it is the end of the sort of the list. So this basically is our whole LinkedList, but it doesn't conform to Sequence yet, so we can't iterate over it, we can't map over it, we can't check if it contains something, we can't do any of that stuff. So what we want to do is we want to conform it to Sequence.

Conforming it to Sequence involves making this function called makeIterator, and we need that iterator type, but we don't have that yet. So we're going to have to build that first. Her is our Iterator type. It conforms to IteratorProtocol. There are some cases in the standard library where the iterator and the sequence are actually the same thing. I don't recommend doing this, it makes things kind of complicated and weird, but it does exist and it can be done.

So yeah, so since the iterator has basically a piece of state that needs to know where in the sequence it is, it has to have a mutable variable, called current, and that's just going to point to the specific node that we're currently looking at, and every time we call next it's going to update that. So that's what you initialize this iterator with, and that node is going to be generic over T, which means, and we don't know what T is, T is just going to be some type. It might be a string, it might be a user, could be anything. So we're going to need to declare our type the LinkedListIterator as generic as well.

And then we get to the meat of it. We have our next function and it returns an optional T, and the fact that we've declared that the type that next returns is a T, Swift is just going to figure out that what that means is that our Iterator.Element is going to be a T. You could do it in an explicit typealias, but you don't have to. This is enough.

So with an enum, there's not a lot you can do with it. You can kind of just switch on it and see what's inside. So let's do that, let's switch on current. We'll handle the easy case first. If it's the end, we just return nil, sequence is over. The other case is a little more complicated. So if we have a value, we're going to get both an element and we're going to get the next thing in the list. So it's easy to know what to do with the element, we just want to return it. What do we do with that next? We can use next to update current so that the next time you call next, the function, on this type, you'll get the value that's inside the next element in that LinkedList.

Now that we have our LinkedListIterator type, we know what we can fill in up top, and that completes conforming our LinkedList to Sequence. So we get all those handy functions, is lexicographicallyPrecedes(_:by:), which is a great sequence function. We get map, we get filter, we get all that stuff. Totally for free. We get for-in loops, we get the forEach function, everything. This is all the code that you need to make a LinkedList into a Sequence.

So let's take a look back at our LinkedList and kind of walk through what that iterator is doing. We created this LinkedListIterator, and let's say it's of type string, cause all of our elements are strings, and we create that LinkedList and update it with the current, which is going to be the head of our LinkedList, and when we call next on it, it's going to return A and the pointer of current is going to move to the next element and we step through, we call next again, it returns B and updates itself to the next element in the list, returns C and then it returns nil when it hits the end. And every time you call next after this, it's just going to keep returning nil. That is more or less how an iterator represents a cursor as it steps through your Sequence. Cool.

So we have kind of discussed how to build a Sequence, but what if we want to extend all sequences? There's lots of useful functions in the standard library, but the standard library doesn't have all the functions. We might want to add our own. And so, let's take look at this, for example. You have a number of users and you want to see how many of them are admins. So you filter by which ones are admins and then you count that resulting array, but the problem is that there is a resulting array there, so we're creating an array just to get the count off it and then we're throwing that array away. It's not also not really super-expressive. I'm not really filtering and then counting. I just want to count. So while this is fine- this is great. So I just want a function called count, I can pass it a block and it tells me how many of the objects pass that test. And if you write Ruby, you're very familiar with this function. And it's a useful one, so lets go ahead an add it to all sequences.

So we're going to need to crack open the Sequence type and build an extension. We know we're going to need a function called count, and it's going to return an int, and the parameter that it's going to take is going to be a function that is a block, and I call this block shouldCount, cause it makes it pretty easy to read, later in the function. The other thing I want to call out here is we need a way when we're inside the sequence to refer to what the type of the object that the sequence is vending, and the way that you do that is you use Iterator.Element. Protocols aren't generic, so there's no T, but this sort of fits in as that T. That fits in as that type you're going to be using.

That function takes an Iterator.Element and returns a bool, and with that we'll be able to count through how many items we have in the sequence that pass our test. So we know we're going to need a for-loop. The backbone of Sequence is this for-in loop. I tend to use the forEach function, but just by conforming to Sequence, you get this for-in loop. So we know that self is going to work inside this for-in loop, so we know we're going to need to iterate through it. So we've got that. We also know we're going to need to keep track of this state as we go through, so we're going to need to keep a running counter of how many of our objects pass the test and we know that we're going to need to return that at the end.

A simple boolean check, and I think it reads quite nicely, if I should count the element then increment the count. And that's it. That's all you need to extend Sequence to add your own functions to it. There's a lot of useful ones that are in there, but there's some useful ones you can add, as well. One useful one that I like a lot, I call eachPair.

This is useful in a lot of different cases. Basically it takes every consecutive element and pairs them together and lets you operate on those pairs together. So for example if you have a bunch of integers and you want to know the difference between every two elements, you can use eachPair. If you have a bunch of views that are next to each other, and you want to install an auto layout constraint between each pair of views, you can do that. So it's a super useful function, but writing it is a little bit more complicated than that count function that we wrote above.

So let's take a look at how you can do this just built into the Standard Library. So this is how you can do this with a fresh install of Swift. I'll walk you through this code step by step. You have a sequence, and you're going to make a copy of that sequence, and then you're going to drop the first element off of that copied sequence. Then when you zip them together, any extra elements that don't have a matching pair are going to fall off, and then when you zip them, each pair kind of goes together, and then magic move! Then you end up with pairs. So this thing is super useful, but what I really want to write is ... I don't want to write zip(sequence, sequence.dropFirst()). It's just ... It's a lot. It's not really very expressive. I want to write this. This is nice because I can keep it in chain, so I can do map and then I can do eachPair and then I can loop over that, and it just kind of fits in a little bit better. So I want to write this.

We have this naïve solution which is what you would think to write first, which is a lot like what we wrote for count, and basically you have a function called eachPair. It returns a sequence of… to a tuple of two of our Iterator.Element’s, and we just call that zip(self, self.dropFirst()). You can't refer to sequences directly, and so we have to use these AnySequence type erasures. I don't have time to go into type erasures today, but Gwen Weston gave a really good talk at Try Swift Tokyo last year. I would recommend checking that out. They're a useful tool when you're working with this stuff. In this case we happen to need it just to make things work out. Because, again, this is not an array. We don't know what kind of sequence it is. It could be anything. So we want to refer to it just broadly as a sequence.

When you try to compile this, however, you get an error; “Argument type ‘Self.SubSequence’ does not conform to expected type ‘Sequence’”. There are two associated types with Sequence. One of them is the Iterator type and the other type is Self.SubSequence. Usually you don't have to worry about it, the Standard Library will figure it out for you, and it'll just use this AnySequence type, but one of the limitations of the Swift protocol system means that you can't say "I have a SubSequence and that SubSequence is going to be a sequence". The onus of that constraint is on us when we add this extension.

So we're just going to add a quick thing that just says "Hey, just so you know, this only works when Self.SubSequence is also a Sequence". This should always be true, but we just have to say it to sort of satisfy the compiler. So we try to compile again. Another error; “Cannot convert return expression of type…” It's boring, who cares? The main part is that this second thing; Self.SubSequence.Iterator.Element is not the same as Self.Iterator.Element, and that makes sense. We said that our SubSequence was a Sequence, but we didn't say what was inside that sequence. So we have to add another constraint for that: Self.SubSequence.Iterator.Element == Self.Iterator.Element and this will happily compile.

This dance that you have to do with these associated types is annoying, but it is one of the parts of writing this Sequence based code. You don't have generics, you can’t have these associated types. They can be annoying, but they do work in generic constraints, so you can kind of dance around all the weird issues and you end up with code like this, but you get to write the code that you want at the call site, which is Sequence.eachPair.

I highly recommend this extension, it's useful and you'll find yourself trying to mangle indexes and be like "Oh, I need this! I'm going to iterate to the last element minus one, then I'm going to do index plus one" ... Each Pair is way easier. I highly recommend this for everyone.

Cool. So that is basically ... That's Sequence. Sequence is a list of elements, it can be finite or infinite and it can be iterated at least once. Usually more than once, but not always. 

So the next thing in our ladder is Collection. Every Collection is a Sequence and it adds a bunch of ... It adds some new requirements for you to implement, but it also gives you more strict guarantees. So here's the skinny on Collections. Collection inherit from Sequence, so you get all the stuff from Sequence. If you add an account extension to Sequence, you'll get that. You get your map, your filters, all that stuff. Collections are always finite, so you never have to worry about "Ugh, could this be infinite?". Never happens. It's also repeatable, so you can iterate it as many times as you want.

This is more or less what the Collection conformance looks like. We have another associated type here; Index. Index you'll see pops up in pretty much every one of these functions and properties, and it has to conform to Comparable, so we'll see why it has to conform to Comparable in a moment here. So these two kind of go together. You need a startIndex and an endIndex. If you think of a collection as an array, Index is going to be an int. For other ones like Dictionary, there's no real index type, so there's like sort of an opaque built in type that you never really have to touch, but it does exist under the hood. The index of a dictionary is not the key, it's not the string. So you have your startIndex and your endIndex, and then you have to be able to get the element at an Index, and then the last bit is you have to be able to advance your Index by one. And that lets you build everything that Collection needs.

With that, you get all the Sequence stuff, and we'll see how these four functions can kind of jump you to having a sequence for free and ... Yeah, so we'll take a look at that in a second. 

The one note I do want to add about Index after, is that this is new in Swift 3. In Swift 2, you just told the index to advance itself, but because this meant that the index then had to hold onto the collection in certain weird cases where the index didn't know how to advance itself, then you ended up making copies and the Swift team hates making copies. They talk about it all the time, and so they added this function, so you ask the Collection to advance any specific index. So of course an int is easy and you just add one, but for something like a Dictionary, it's not as straightforward. So that's why this function exists.

So you can kind of see how you could build a Sequence from just these functions. So let's take a look at how you might implement forEach. You get forEach for free, but let's take a look anyway. So you start from the startIndex, you set that to a mutable variable. And this is, again, this is why you need Comparable, so you can iterate, so you can check whether you've hit the end index yet, and then you call the block with the current index and then you ask the Collection, which in this case is self to advance the index one. One interesting thing to note here is that like an array, the end index is not actually accessible, so if you have an array of ten items, if you try to access the item at index ten, you'll get a crash. And it works the same here, and the reason is because, if you think about an empty array, an empty array is when the start index and the end index are the same, and so a collection with one element has to be where there's a start index and then one after that is the end index. So the end index can never actually refer to a specific item. So this comes up sometimes, it's somewhat frustrating to deal with, but there are tools and there are ways to deal with it.

When I want to implement a collection, usually I don't want to implement it manually, the way I do with sequences. I usually have some internal type that already is a collection and I just want to get conformance to it so that, for example, here I have an array of API errors that I want to make into a richer type, but my errors property is fileprivate, so I can't access it from the outside world. So what I want to be able to write is “errorCollection.contains some error with this ID", but I can't do that cause the entire public API of ... There is no public API for APIErrorCollection. So I want to conform it to Collection, and the good news is it's not very hard. Because APIErrorCollection contains a collection internally, you can just kind of forward everything right over. So when they ask for a start index and you tell them it's error's start index, as for an end index, well, it's error's end index, and you keep going down the line and just forward everything straight over.

You'll also notice I didn't explicitly say what my index type will be, just by adding, you know, this here as my index type, just by writing it in there Swift figures it out for me. So conforming your own collection is pretty much as easy as just forwarding stuff along to some kind of internal property that you want to also be a collection. And now, once we have our Collection extension, this thing compiles and works great. Cool. So we're coming around the bend.

The next level is BidirectionalCollection. BidirectionalCollection builds on Collection and the one trick that it adds is not only can you go forward as you can with Collection, you can also go backwards. So you can iterate it both ways. This is useful for reversing an array, which we'll look at, or reversing a collection, which we'll look at, and it's useful for a couple of other things.

Everything in Collection is kind of ... you just have to step through it one by one. So if you want the fourth element of a collection, you have to step from the first element, second element, third element and so on until you get there. BidirectionalCollection lets you do that but also go backwards. So the same way that Collection has ... It has other stuff, but it has index after index, BidirectionalCollection gives you one more function which is index before index. So you have to implement this yourself and then it will allow users of a BidirectionalCollection to do a couple of things.

So one of those things that you can do is you can get the last element very easily. Before, if you wanted the last element, you would have to step through everything and remember what the one before you was until you get to the end index and then return the one before you. It's a mess, who needs it? Check if you're not empty, grab the index before the end index and then return the item at that index. Much simpler than having to step all the way through your collection.

Cool. Yeah, another great example of something that you can do with a BidirectionalCollection much more easily than you could do with a regular Collection is reversing that collection. So if we wanted to do that, we know we're going to need to keep track of the array of items that as we go forward we're going to want to append from behind to the front of this new array. We know we're going to need our current index, but it's going to start at our end index, this time, and we know we're going to return those elements at the end, but what happens in the middle? Same as our forEach method. While we're greater than our start index, we will decrement the index one and then append to the array. And again, you can't access self.endIndex, so you have to do the decrement before the append, but otherwise very similar to the forEach. The Standard Library's implementation of this is a little bit different. It doesn't actually make a copy of that array as it goes forward, it holds on to the array and accesses it in reverse order, but this is just an example of how you can use the APIs in BidirectionalCollection.

The last protocol that we're going to talk about is RandomAccessCollection. So RandomAccessCollection, so far we've seen any time you want to access some element, you have to step through your Collection until you get to that element. RandomAccessCollection takes that away. It gives you constant-time access, (O(1)), if you've been studying for your interviews, constant-time access to any element in the array, so you can just say "I want the fourth element" and it'll jump right to the fourth element and get that for you. And the interesting and weird part about it is you don't need to implement anything new. All of the functions that you already have, you just have to modify them to meet certain semantic requirements. So what do those look like? There's two functions that really drive the random access collection. So there's index offsetBy and index from:to.

These types will basically ... They have a default implementation for regular collections which steps through one by one in O(n) time to get to where it needs to be, but that implementation is too slow for random access collection. So all RandomAccessCollection requires you to do is update these two functions to be instantaneous. So if you have indexes of ints, you would just say like plus four, and it would get to that fourth element. Other indexes have different ways to implement this, but as long as these two functions are O(1), constant-time can instantly jump forward, you're good. You get random access collection.

The other way you can kind of sneak in through the back door is with a protocol called Strideable. So strideable means ... And if you look, they're kind of very similar to those two functions we just looked at. Strideable lets you take some thing and just jump forward by some amount. If your index conforms to Strideable, you just get random access collection for free. And you don't need to implement anything, you don't need to change any functions, just as long as your index type conforms to Strideable, you're done.

Uh, cool. Slide 100 of 100. These two are the protocols that I didn't go over today, RangeReplaceableCollection and MutableCollection. With the way that we've sort of analyzed these protocols today, your homework is to go home and take a look at SwiftDoc or the Swift Apple documentation and take a look at these protocols and see if they make sense, see if you think you could implement them yourself. That is everything that you ever wanted to know about Sequence and Collection.


Links:

If you enjoyed this talk, you can find more info: