Don Syme on F#, Microsoft’s functional programming language for .NET

Don Syme is a Principal Researcher at Microsoft Research, Cambridge and the designer of F#, Microsoft’s functional language for .NET. Visual Studio 2010 includes F# alongside C# and Visual Basic, making it widely available to Microsoft platform developers.

I met Syme at the QCon conference in London, in March 2010, and interviewed him shortly afterwards. Some quotes from that interviewed have already been published in an article for The Register, but I am now posting nearly all of it here.

image

What was the genesis of F#?

I’ve been doing functional programming since 1992, ML-family functional programming. I got exposed to ML programming through my supervisor at the Australian National University, one of the people who worked on the first version of ML with Robin Milner, the father of the ML series of languages.

So during the 90s I had a lot of experience the sheer effectiveness of the languages. I was doing mainly symbolic programming. Then I had the good fortune to arrive at Microsoft when the .NET platform was in its early genesis. I’d had a lot of experience with the Java language and contributed to the design of C# and C# 2.0.

There are two streams to programming language design, implementation and research. One is to take existing languages like C#, and to modify them and extend them with the kind of features that we think are productive for programmers, that we think are beautiful and elegant in their combination of the mathematics that lies behind them and the power that gives you as a programmer. So we did that in .NET 2.0, with .NET generics and C#.

But the other thing in the work is to take programming languages that exemplify some other paradigm, such as a logic paradigm, or a perhaps a data-parallel programming paradigm, and push that through as a practical and viable tool. F# started as a domain-functional programming tool that I had been using, wanting to see that paradigm being strong on the .NET platform.

We started F# as a research project, with many possible outcomes in mind. Those outcomes might have been as simple as making a tool which we would be comfortable using for our own personal work, through to having a research-quality language implementation similar to the other functional language implementations. But one of the great  things about the early days of F# was taking the functional paradigm, having it in the context of .NET, and then engaging with real-world programming tasks, be those simple tasks that a tester might perform in writing test scripting, through to working with some of the financial institutions, and understanding the kind of work they did.

We opened up this ability to have a language implementation, a functional language implementation, and trial it very easily in real-world situations. We could experiment with the functional paradigm in practice, and understand where it was most beneficial, how it would fit in and what it should and shouldn’t be used for.

From 2003 to 2006 it was very hands on, to get the earlier versions of the language out there. They were a bit scrappy in some ways, but we really got engaged with actual users doing interesting applied programming tasks. That took us through to 2006 and the development of the language.

How would you describe your own involvement?

We’re now a team of about 12. That grew from me alone in 2002/3. In 2005 I was joined by James Margetson who worked full time with me on the research versions of F#; and then in 2007 we broadened out to a much larger team for the productisation.

Why was F# integrated into Visual Studio ahead of say Ruby or Python?

[pause] It may not come down to the technical questions. I’ve never considered that comparison point before.

We often talk in Microsoft in terms of a language portfolio. The question comes up, have we got a balanced portfolio? When we look at languages like Ruby and Python we think, what’s the right release cadence? One of the main questions comes down to what sort of release cadence do you want, and are you looking to do every two months releases, one month releases, highly reactive fix the bugs the next day kind of releases, or are you on six-month, one-year timescales? There are pros and cons to each of these approaches.

There is a expectation in the Ruby and Python communities of a fairly high release cadence, but I’m not familiar with the decision-making for those communities. In the case of F# there was a feeling that there was a strong thirst from users, especially in the financial institutions, for a functional language implementation which was more on the longer release cycle, a supported, reliable, one to two year release cycle, and I think that was the logic of the decision.

How does F# differ from what people know in say Objective Caml?

The core language of F# is heavily inspired by OCaml. In fact, if you look back at the ML languages, the core of these languages has been surprisingly stable, from the early seventies. It’s a question of what you do around that, and one of the major questions is about object-oriented programming, which is one facet to what we call programming in the large. Another question is what you would historically have called module system design, which is another aspect of programming large scale systems.

F# differs on those design decisions from other ML languages because the aim is to build a language which does integrate very nicely into the .NET component development model, and that means we embrace .NET object-oriented programming, which is a very important part of the design process with F#, especially as you evolve F# code from your exploration of a problem space, and evolve it to long-term maintainable code. So that’s one major design difference.

Another major design difference is with regard to parallel programming, where we embrace the idea of lightweight threads and what we call lightweight agents in the core language. We use techniques that come from Haskell for that purpose, which is some of the techniques I talked about at QCon.

If I’m a C# or C++ developer, what practical, business-benefit reason is there to look at F#?

When we look at the programming spectrum, a categorisation we find helpful is to look at data-oriented programming, parallel and control oriented programming which may involve message parsing and building agents and parallel control structures, and presentational programming, doing user interface design work, games. You might also put algorithmic programming as a separate part of that. So you if you look at a typical game set up, there’ll be a portion of user interface work, there’ll be a portion of algorithmic and data-oriented work, and there may also be some parallel or network programming, communication programming.

When we look at where F# brings real benefits, we’re definitely thinking about the data, the parallel and the algorithmic side of programming. Now we do see F# very much as additive to Visual Studio, providing extra. If existing C# and VB developers are in their comfort zone with regard to say heavily user-interface oriented work, though of course that will involve a degree of data-oriented, parallel, algorithmic work, you wouldn’t immediately say that they should retool in F# or should change their work.

F# is attractive in places where the object oriented paradigm isn’t necessarily a good fit for the kind of work that’s being done, which would be in the work that is heavily oriented toward the data oriented, the parallel, the algorithmic side of programming.

That said, there are excellent VB and C# programmers who can work very comfortably in that domain, but often they’ll approach it in a very functional kind of way, in their use of those languages.

To give a hands-on example, in a typical group of quantitative finance analysts, you’ll have a mixture of highly mathematical-oriented developers and analysts, and people who are professional programmers. We find that F# is very attractive to the analysts and the quantitative experts in that situation.

So it’s less a matter of thinking how an existing C# or VB developer should adjust their work with regard to F#, it’s more thinking who else who is involved in the data parallel and algorithmic side of programming, who else can be brought in and attracted to the .NET programming ecosystem because of the presence of F# in Visual Studio? It’s more about enabling people who perhaps don’t have the right tool for the job at the moment, than about changing the work of the existing family of programmers.

Are you suggesting that if someone is happily programming in C# that they needn’t investigate F#?

I certainly don’t think it is critical that people must look at F#. We do get consistent feedback from C# programmers that by learning F# they think about their C# programming differently, and they understand how to apply the functional programming methodology to their C# coding. And of course C# can be used in a fairly functional mode for certain kinds of problem solving, and that tends to be result in improving people’s C# programming skills.

So we’d definitely encourage people to understand it and to look at it as a language. We do also encourage people to think about where’s the sweet spot for the adoption of F# in their organisation. It may be there are some organisations where F# is not appropriate to be used. You might think of a company building a set of web sites using ASP.NET, we wouldn’t immediately say that F# should be used in that kind of situation. But if you’re building a game and you have to build a physics engine, then F# is a very nice fit if you’re going to program that in managed code. If you’re building an AI engine, then F# is an excellent language for implementing those core computational components which might fit as part of a larger C# or VB application.

You show some amazing comparisons of C# vs F# code in which the F# is very much more concise. What is that makes it able to solve those problems in so many fewer lines of code?

One of the key design decisions running all the way through F# is that it’s a strongly-typed language with substantial use of type inference, so we like to think that where C# dipped their toes in the water with type inference in C# 3.0, F# takes that three or five steps further. It will do type inference over the entire scope of a file, for example, and at the component boundary it still gives you the facilities you need to pin down the types as needed.

There are three or four other major language features in addition to type inference, which contribute to that. There’s tuples or discriminated unions, and object expressions which allow you to implement interfaces in a succinct way in an expression context; there are technical features such as sequence expressions, which are related to C# iterators, which allow you to express the process of generating tables of data very succinctly; and finally there are things like asynchronous and parallel programming in the Async feature.

In a sense F# is just using what we know from functional research over the last 10-15 years and saying OK, here we have a set of ideas that we know is able to capture and express a very broad range of computations in a very succinct and clear way, and each of these features represent a distillation of thirty years of experience with making programming in an expression-based, data-oriented language simple, clear but strongly typed. It is also compositional which plays a role, not only are things more succinct, but they actually compose very nicely as well. You can see that with the pipelining in F# or the Agents in parallel programming.

So in a sense it is just taking this body of knowledge from the functional programming community and translating it to the .NET programming setting.

Not all programs end up as succinct as the examples I showed. There are plenty of programs where the code reduction factor won’t be so great, but yes, there are particular kinds of programming where things end up massively more succinct.

Can you in a few words say what makes F# so suitable for parallel and asynchronous programming?

There’s three things that F# brings to parallel and asynchronous programming. The first is a real focus on reducing the amount of mutable state in your programming. This means that your mutable state is to be boiled down to the absolute essentials of what has to be mutable. It’s either localised – and I don’t just mean encapsulated inside a class, like syntactically encapsulated, but logically isolated and used locally. In some sense, it’s a local type. If you do have some mutable state it’s local to the user interface thread or local to an agent. But in general you can often completely eliminate the mutable state through this consistent set of functional programming techniques, often by passing some data around explicitly, rather than propagating the data everywhere implicitly by these sort-of global mutable tables. So a focus on immutability first is a major factor.

The second is this Async programming feature which essentially allows you to add lightweight reactions to the system, so you can have many objects waiting to be activated by a callback of some kind, and you can program these objects without doing what’s called inversion of control. You can program a series of sequential execution, a series of web requests for example, go to one web site, go to the next web site, go to the next web site and so on, and you can write what we call asynchronous workflows to express this logic which would otherwise be encoded as a set of callbacks all the way through your code. This is extremely important when you’re talking about handling errors in a series of asynchronous calls or perhaps accumulating a set of resources across the calls and making sure we clean up file connections and other things that happen during a computational process.

The last thing we bring is an Agent-based programming model built on top of the asynchronous model.  This lets you define many hundreds of thousands of agents in memory, in a single process. And this is critical if you’re reacting to many different external events, such a web crawler having many different I/O requests outstanding at the same time, or processing many different images in parallel.

I recall you talked about active vs reactive code at QCon – why has this become more of an issue today?

There are two kinds of parallel programming that really dominate applied parallel programming today. One is making the most of your multi-core CPUs, and the other is making the most of the I/O bandwidth that you have to the outside world, the ability to have multiple I/O connections at the same time. We talk about those as CPU parallelism versus I/O parallelism.

The key thing about building I/O parallel systems, the ones which scale very highly, is that you’re often in a situation of wanting to have many different agents executing on your machine at the same time. If you are implementing a chat server that has to server 50,000 users simultaneously, one excellent way to model that is to think of that as 50,000 different agents each handling a chat session. So this means you have to have lightweight threading of some kind, to be able to express this kind of problem in the clean way that you get if you have that many agents.

It’s helpful to distinguish between multi-threaded and CPU-intensive code as being active computation, and I/O parallel code as being reactive to events happening from the outside world. Reactive programming has always been important, but becoming more so partly because of the rise of these server-side applications. We understand the importance of systems like Twitter, or systems like chat servers, heavily I/O intensive server-side applications. Although they are highly parallel applications they spend all their time reacting to messages, they’re almost like telephone exchanges, they’re just really getting packets of information coming in, reacting, and sending some results back.

We see the same thing in so much server-side programming with scalable web sites and other such systems. So  the importance of reactive programming is that the machine itself always now has to be seen as part of a larger piece of networked hardware/software working together. Even on a typical client-server application, we’re beyond the days where we can think of a machine as an isolated entity. We have to think of it as part of a larger system.

This is the fundamental thing, the rise and rise in the importance of network systems and programming those network systems, that puts an extra emphasis on programming reactive systems.

There’s always been a bit of an emphasis on reactive systems because of user interface work in client-side programming, and F# also brings some added value there, but there’s a big overlap there with the designer space. You’ve got two tensions there, one is programming the logic of those client-side applications, and the one is making them look great, all the designer work that you get with Visual Studio and the other designer-based tools. So the balance is a bit different on that side.

What about unit testing, what’s the unit test framework for F#?

The unit test framework that we’d recommend is to create a C# application which hosts the unit tests, using the C#-based unit test framework that you get with Visual Studio 2010, or the Visual Basic testing framework, and use that to test the F# DLL. You can also use other unit test frameworks such as NUnit, for example. A typical story would be to take a standard .NET based unit testing framework and to follow the recipe for using that with F# code.

That said, F# also has F# interactive, to do interactive evaluation and testing. Things vary a bit methodologically, so the test-driven side of things is probably more along the lines of build a prototype in F# code, write some tests immediately in F# and execute those, and then transition those across to a unit test framework such as VS test or NUnit. So there is actually a methodological difference, and I think many people coming from the C# background notice those methodological differences almost more than the actual language differences in some ways. People who thought they knew F# come to an F# talk and watch me develop a bit of F# code and experiment with some inputs and go back and extend the F# code very much in a kind of script development methodology, which is quite different from the overall programming methodology you apply to C#. So that was interesting feedback, to do with the amazement of doing that interactive experimentation with your code, that people notice.

Looking forward, how do you see F# evolving?

F#’s key role is as a productive language for typed, functional and object-oriented programming on .NET. I think we will always be keeping that focus. It’s about productivity in the functional and object-oriented programming paradigm. That will be our guiding light, making functional programming work in practice, and making people highly productive with the paradigm.

We have a huge opportunity with F# to see the functional programming paradigm break out of the data-oriented kind of work, the symbolic programming kind of work, through to the modern world of web programming. And I don’t just mean writing user interface applications, I mean writing these scalable server-side systems. I get a real sense of excitement as I show web programming with F#, that we’re tackling some of the fundamental complexity of building these scalable systems, and we’re providing language tools which allow you to tackle some of these fundamental complexities.

There are other ways to tackle those complexities as well, you look at systems like MapReduce systems, or the Hadoop systems, or the cluster-based computing systems, and these are all orthogonal to F#, so a huge part of the ongoing work with F# is to make F# work in conjunction with these systems. And to change people’s understanding of how functional programming applies in the context of building those kinds of systems.

I see a big opportunity for people to exploit the language features we have, to apply functional programming outside its traditional box. One great example of that is a platform called WebSharper, which is an F#-based framework for doing client-server programming, and AJAX style programming, all in F#. It’s a little bit like Google Web Toolkit lets you do client-side programming all in Java, this lets you do it all in F#, and it doesn’t add to the language, it’s built on the language elements. It allows you to take part of your program, which is sitting on the server, and to translate that dynamically to JavaScript and have it execute as JavaScript on the client. We see these very interesting uses of the language element that F# gives you.

A similar thing is people taking F# code and doing the same sort of dynamic translation to run it on a GPU system, and also seeing F# running on the database though a LINQ-style query execution. So we’ve got these wonderful foundations in the language, and it’s often about connecting it to the actual applies programming world.

Another example is Azure programming. I’m absolutely amazed how beautiful and powerful this Azure SDK is. I was able to get the SDK and write a highly-scalable, cloud-hosted, TCP-based F# component in my first 5 hours of using the SDK . That really is a remarkable moment in programming, that we have this functional programming paradigm that is just so perfect for sitting on the end of a TCP socket connection and serving out scalable results from a single machine. We can suddenly do scaling this up into a cloud host which a simple and easy transition using those kind of tools. So a huge part of F# going ahead will be about connecting it to the tools that fit very nicely with the functional programming paradigm.

What’s ahead – there’s another set of things which are very close to my heart. One of the key things that makes F# work as a language is that once you have the data under the strongly typed discipline of functional programming, then working with the data is extremely fluent and sweet. One of the things we’d like to work on is the transition point between the world of data coming in from the outside and the world of strongly typed programming within F#. You’ll see in future versions of F# huge advances in how we acquire data from external sources, whether those be databases or whether those be internet-hosted databases, or whether they be XML files. Making it even easier to get your hands on the data that drives your work.

Have you got any tips for learning F#?

I’ve been very pleased with the quality of the F# books that are available. Ones I’ll particularly mention are Programming F# by Chris Smith, the O’Reilly book, and Real-world functional programming with examples in F# and C# by Thomas Petricek, which really captures the spirit of cooperation that has been a major part of the development of the first version of F#. The introduction by Mads Torgesen, a program manager, and one of the architects on C#, shows the cooperation between the F# and the C# team, and captures the way that they see F# be almost a turning point for the .NET platform in improving the viability of the functional programming model.

With F# programming, what we want is not that people get lost in thinking about programming, what we want is that they’re concentrating on their domain. That means if you are a financial engineer, a financial quant, you should be spending most of your time thinking about your domain problem, you should be thinking about quantitative finance and the data that you’re accessing. We want you to be thinking a little less about worrying about your class hierarchy, and more about worrying about solving your particular mathematical, data, parallel kind of problems that you actually have to do to deliver the value to your customer.

F# Books on Amazon.com

 

F# books on Amazon.co.uk

  

One thought on “Don Syme on F#, Microsoft’s functional programming language for .NET”

  1. Thanks, very sexy as my first contact with F#. The more declarative, the better – “how” things could be generated and optimized on the fly everywhere 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech Writing