Creating Refactoring Transformations for Swift

Xi Ge from Apple at Swift Summit 2017


Video transcript:

Hello, everyone. My name is Xi Ge. I'm a software engineer working in the Swift team at Apple. My daily work is mostly focused on Swift tooling support. Today I'm going to talk about how we can create refactoring transformations for Swift.

My talk will be comprised of four parts. I will first introduce refactoring as a general concept. Next, I will talk about a Swift refactoring tool implementation in Xcode 9. After that I will go through a couple of examples, to show you how you can create refactoring action for Swift. Finally, I will describe a new Swift tooling infrastructure under development. Let's get started.

First of all, what is refactoring? Software is alive. So just like any living creatures, software system is always evolving. With every feature we implemented, and every bug we fixed, our software system is getting more functional; however, more complex.

This evolution comes with a cost. That is, increasing maintenance burden. We've seen this pattern in a lot of software systems. In some extreme cases, this maintenance burden of software systems becomes so great that we have to abandon, or declare the extinction of the software system altogether, and to reimplement the exact same function again from scratch.

To reverse this trend, and to ensure software is always maintainable, we need refactoring. According to Martin Fowler, who is the author of the first book on refactoring, defined refactoring as "a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior."

There are tons of studies showing the benefits of refactoring. For example, it has been shown as a short term investment of cost and time, but we can expect long term benefits. Refactoring can improve software longevity, and to ensure its long term relevance. It can also enhance nonfunctional attributes, such as readability, extensibility, and maintainability.

We know a piece of code needs refactoring if it smells bad. Some of code smells examples are; bad naming. In this case, the function name doesn't reflect what the function body does. 

Or code duplication. In this case, we duplicated X multiplied by X three times. For every code piece we duplicate, we're duplicating bugs, and we are duplicating future maintenance burden. 

Some other examples are long parameter list, long function body. I'm sure you all seen these smells in your own code base.

To help developers refactor their beloved Swift code, in Xcode 9 we shipped a new refactoring tool. The refactoring tool supports two broad categories of refactoring. One is global refactoring, where the code change will be across multiple different Swift files in your project. For instance, rename. Another category is; local refactoring, where the code change will be confined in a single Swift source file. For instance, extract expression to a variable, extract statement to a function, localize strings, and others. For this talk specifically, I will focus on the local refactoring part.

Before jumping into the implementation of the refactoring tool, I'd like to define one term. That is; Swift Toolchain. Swift Toolchain provides compiler and IDE tooling support to high level UI. Toolchain's built from the Swift code base. It's built very easily. All you have to do is to run our ready to use build script. An existing build of Swift Toolchain can be downloaded from the project webpage.

Let's see how we implement this refactoring tool. We implement Refactoring Logic in the Swift open source project, which will build into Swift Toolchain. Swift Toolchain will talk with Xcode, and Xcode will provide UI and refactoring support to the end user. Let's see how these three entities interact with each other.

A user will select a piece of code in his or her codebase. Xcode will ask Swift Toolchain about available refactorings on that position. Swift Toolchain will run applicability, checking for every support of refactoring and return a list of available refactorings on that position. Xcode will present that list to the user. User can specify one refactoring to perform, and Xcode will further dedicate this to Swift Toolchain. Swift Toolchain will compute all the necessary code edits to implement this refactoring and serialize that edit into Xcode. Xcode will finally take responsibility to update user's code base.

From this diagram, we see that the available refactoring in a specific code position is a dynamically generated list from the Swift Toolchain. How each refactoring will update user's codebase is entirely up to the Swift Toolchain. You remember, Swift Toolchain's built from the Swift open source project. That means, yes, a Swift local refactoring can be implemented in the open source. All you have to do is to check out the Swift open source project and start hacking on it. Since the Swift compiler is mostly written in C++, so will be the implementation of refactoring logic you will see later in the presentation.

Let's take a look at how we design refactoring in the Swift open source project. For a given refactoring, we will categorize it first. It can be a Cursor Refactoring, where users need to initiate it from a single cursor position in their source file. For instance, String Localization Refactoring. Another category is Range Refactoring, where users need to initiate it from a range selection in their source file; for instance, Extract Method Refactoring. We'll provide developers two pre-computed unlicensed result to facilitate the development of these tool refactorings. They are Cursor Info and Range Info, respectively.

From now on, I will talk about a Cursor Refactoring example. In our app, we all have a lot of string literals specified in one natural language. In this case, "Hello World" is in English. However, we still want our users from different culture background to view this string literal in their own language. For instance, we want to make this in Chinese as well. To achieve this end, we need to wrap the string literals that will be surfaced into the presentation layer with an API called NSLocalizedString. This code transformation is called string localization. This makes a very good candidate for us to provide refactoring support for so that the developer don't need to remember the exact API name.

Let's see how we can implement this refactoring. We first need to declare this refactoring in RefactoringKinds.def file in the Swift open source project. This C++ micro has four parts. It has a kind, indicating this is a Cursor Refactoring; it has an internal name- this will be used to generate the C++ implementation code for this refactoring; it has a description name that will be presented to the end user through Xcode UI; and finally, it has a stable key that will be used to uniquely identify this refactoring.

After declaring it, we need to implement the Applicability Checking logic. That is the logic to teach Xcode when it is a good time to show this refactoring as available. Let's see how we can implement this logic. All we need to do is to implement isApplicable function, with a given input of CursorInfo. We first need to check this CursorInfo points to our expression start. Next, we check if this expression being pointed to is a string literal. We further check this string literal doesn't have interpolation. If all these conditions are met, we found a good case to show this refactoring support for.

Next, we need to implement the code transformation part. That is to teach Xcode how to transform users code if this refactoring is specified. It's also very straightforward. We need to implement performChange function in this refactoring. We need to use EditConsumer to insert the API name before the start of the string literal. We need to insert the second argument after the end of the string literal, and that will finish this code transformation.

We have used CursorInfo in our examples. Let's take a closer look at it. It has a kind, which can be one of value reference, like a function name or type name; module reference; expression start; or statement start. It has a location that indicates where this cursor position is in user source file. If the cursor points to a reference, CursorInfo will contain the declarations of that reference, like value declarations, type declarations, or extension declarations. If the cursor points to an expression or statement start, it will contain the corresponding entities as well.

We have used SourceEditorConsumer to implement the performChange function. It contains several utility function for us to manipulate users input source code, like replace, insert, and remove. It has two subclasses. One is called SourceEditorJsonConsumer, which will be used to communicate with Xcode. The other is called SourceEditorOutputConsumer, which is used by our command line testing tool called Swift-refactor, to verify code transformation is up to our expectation.

Yes. We have seen one cursor refactoring example. Let's jump to the other side of this diagram and see a range-refactoring example. I will take extract method as such example. This Swift function calculates the average value of a integer array. Part of this logic is to sum up all values in this array to get a total. It's considered a good practice to extract this sum of part to its own function, so that we have a cleaner piece of code, and we have a reusable function. This transformation is called extract method. Let's see how we can implement this transformation in the Swift open source project.

Similarly, we need to declare this refactoring in RefactoringKinds.def file with a kind range refactoring. Next step, applicability checking to teach Xcode when is a good time to show this refactoring as available. All we have to do is to implement it’s applicable function with a given input of RangeInfo object. We first need to check this range is of the right kind. It needs to be single statement, multiple statement, or single expression. We must check this range has a single entry point. A counter example of having a single entry point is when we select multiple case statement in a switch statement. We need to check this range has a single exit point. A counter example of this is when we select a if statement, where the if branch returns, however, the else branch doesn't. We need to further check if this range should not have any off in the branch statement, like when we select continue or break keywords without selecting the surrounding loop. I skip the other conditions, but if all these conditions are met, we found a good case to show this refactoring support for.

Next part, code transformation, to teach Xcode how to actually implement this refactoring. It's also easy. We need to fill the body of a performChange function. First, we need to calculate the necessary parameters to pass down to the extracted function. Next, we need to generate the declaration of the extracted function by using a given configuration, the calculated parameters, and other pieces of information. Next, we need to calculate the insert position of this extracted function. When all these pieces of information is ready, we can insert the extracted function to the right position, and to replace the original selection with the function call to the extracted function.

We have used RangeInfo in the implementation. Let's take a closer look at it. RangeInfo has a kind. It can be one of single expression, single declaration, single statement, multiple statement, or a part of a expression. It has a start, and the end position of this range selection. It also contains ready with entities under the selection, including top level nodes, declared values, and referenced values. Now, we have implemented two refactorings already. It's a good time for us to show our refactoring in our existing version of Xcode. Let's see how we can do it.

We need to build a Swift Toolchain locally, by running the ready to use build script, or we need to copy this build toolchain into the toolchain directory. Next, we need to specify this toolchain to use through Xcode UI by doing this. It's under Xcode, Toolchains, and you will find the toolchain you just built and copied in the list, and you can specify that toolchain. After specifying this toolchain, you can enjoy the newly implemented refactorings in your Swift code base.

As you can see, the refactoring infrastructure is very extensible. In fact, since our first open sourcing this piece, we have already accepted multiple community-contributed refactorings. For instance, Will Ellis contributed collapse nested if refactoring, Kacper Harasim contributed converting strings concatenation to interpolation refactoring and converting force try to error handled refactoring. These refactorings will very likely show up in our future releases of Xcode.

Let's briefly summarize what we have so far. Adding Swift local refactoring can be so straightforward. All you have to do is to check out the Swift open source project and start hacking on it. We provide you service layer abstraction, pre-computed analysis results, called CursorInfo and RangeInfo. All you have to do is to fill the function body of isApplicable and performChange. Integrating with Xcode is also very easy. You can immediately test your implemented refactorings in a existing version of Xcode, thanks to the dynamic nature of Swift Toolchain. If you contribute your refactoring back to the Swift open source project, they will very likely show up in the future releases of Xcode.

These all sounds great, right? These all sounds awesome, but for Swift, we always want to achieve something more. We do notice that some design glitches in the existing infrastructure may hinder users contribution. For example, existing Swift infrastructure is not perfect for source tooling. For example, it contains source information loss, like in the existing infrastructure, we don't maintain and preserve comments and whitespace. Existing infrastructure has implicit nodes that don't have corresponding source entities. To implement refactorings and code transformations in the existing infrastructure, we have to manipulate the strings manually. It's tedious and error-prone. Also, it's not composable, especially when I have two edits overlapping with each other. Lastly, we have to use C++, which requires long learning curve, and it may not be so friendly to novice developers.

So- we are working on something else. How about the idea that we allow you to implement Swift transformation by using Swift? Yes, that's exactly what I'm going to talk about next. Thank you.

This thing is called libSyntax. It's a new tooling infrastructure under development for source tools, like linters, formatter, migrator or refactoring tools. It's designed specifically for source tools. It's part of the Swift open source project, and it provides us convenient APIs to modify and to generate Swift code. Regarding current status, currently, we're working on providing Swift with syntactic information via libSyntax APIs. In future, we plan to wire up with semantic information to enable more complex transformations.

I'd like to talk about the design goals of libSyntax, like comparing it with the existing infrastructure of the Swift compiler. Existing infrastructure, as I mentioned, has source information loss, whereas in the libSyntax world, we provide full trivia preservation, including comments and whitespace. Actually, you can round-trip. You can print the libSyntax nodes to its original source file. Existing infrastructure has implicit nodes for other actions in the compiler pipeline, like our generation, but in the libSyntax world, what you'll see is what you'll get. In the existing infrastructure, to implement code transformation, we have to manipulate strings manually, while libSyntax will provide us structured edits APIs and tree manipulation APIs. Existing infrastructure requires us to work in C++, while libSyntax will ship equally functional APIs in both C++ and Swift.

Even though libSyntax is still a work in progress, I'd like to show you two proof of concept code examples of how we can use its APIs. Let's do a meta programming exam exercise by creating a struct Foo in Swift. All we have to do is to use SyntaxFactory to create a struct keyword; use SyntaxFactory to create an identifier; use SyntaxFactory to create left and right braces; and put all these elements together into the initializer of StructDeclSyntax, so we get one programmatically.

Another example can be linter. Swift uses colons in several different places, like protocol conformances, parameter declarations, or argument labels. If our convention is to disallow whitespace between a theme and the following colon, we need the linter to automatically sanitize any violations of this custom.

Let's see how we can implement this linter via libSyntax APIs. To implement it, all we have to do is to override a SyntaxRewriter. SyntaxRewriter will traverse a Swift code base and rewrite source entities according to the specified logic. It will visit every token. It will use the token's parent API to get the following syntax node after this token, and it will check if the following node is a colon. If all these conditions are met, we found we can use withoutTrailingTrivia function in the visited token to remove the whitespace between the token and to the following colon. This will send syntax all violations of our no space before colon convention.

Yes, that's basically it. I'd like to summarize this presentation with several meta points. First of all, refactoring's great. Refactoring's super useful. We should all refactor our code frequently and ruthlessly because no matter how elegant our software is written today, they will almost for sure become insufficient tomorrow. Secondly, Swift refactoring support is a ever-growing list that is very friendly to open source contribution. If you have any idea how your refactoring action will help you deal with Swift, feel free to file a task in our open source bug tracking system., or you can try to implement the refactoring idea by following the online tutorial, which will provide you more details, such as how you can add a unit test to the implementation. Even better, you can contribute the refactoring back to the open source project and to share it with the rest of the world.

Thirdly, stay tuned for the latest development of libSyntax. When it's ready, we can use our favorite programming language to generate and to modify our favorite programming language. You can find the documentation of libSyntax API in the Swift repository. You can check out my colleague Harlan Haskins' presentation in try! Swift NYC, where he will give you more details about the implementation of libSyntax.

If you have any questions either regarding refactoring or libSyntax, don't hesitate to email us at [email protected] Immediately after this talk, there will be a related lab session. My colleagues and myself will be more than happy to answer any of your questions. Thank you very much.