Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Part I - Generalities

1.1 Background, Genesis and Aims

Here is an excerpt from \(\beta eta\), the fortnightly magazine of UNSW’s Computer Science and Engineering student society. It appeared the year after the first Informal Methods course:Footnote 1

Practical applications of formal methodsfor Computer Scientists

Last year one of my lecturers posed a very simple question to a class containing about fifteen 3rd and 4th year students: “Write a function that, given an array A containing n integers, computes (and returns) the maximum value contained in the array”. Essentially, write the “max” function. Naturally, I thought at the time that this was obvious and simple. I proceeded to write...

figure a

By using this simple example, the lecturer made a beautiful point: formal methods can be practical!

You can use formal methods in two ways. You can study formal methods from a theoretical perspective. You can rigorously and thoroughly apply formal methods to algorithms or systems in an isolated, academic exercise. Nice results can be obtained from this, such as the work done on [mechanically proved correctness of critical software]. This process, however, is too long and time-consuming for most developers working in industry.

Alternately, you can learn the techniques of formal methods and apply them in practical ways. This means you don’t have to prove every statement and every property to the \(n^{th}\) degree. Instead, by using simple reasoning during the coding process [...] you can help reduce the number of bugs introduced into a system. Additionally, it will make finding bugs easier by being able to eliminate from consideration parts of the codebase that you can quickly show are correct.

If you ever get an opportunity to learn the techniques of formal methods from a practical perspective, take it. The techniques will change how you write code, making you a more efficient and accurate developer.

What is remarkable about this (unsolicited) article is not so much its content, but that it was written at all. One is surprised that the author was surprised...

Of course formal methods can be practical. In fact they are basic, and essential to any kind of good programming style. As the writer points out, it’s a question of degree: it does not have to be “to the \(n^{th}\) ”.

We knew that already... didn’t we?

1.2 Who is This “We”? And Where Did Formal Methods Go?

Who We Are... and Who “They” Are. Imagine this situation, black-and-white simple in its significance: the “we” above is having a casual conversation with a fourth-year computer-science student who – it is already established – is a very good programmer, respected for that by both peers and teachers. We’re discussing a small loop.

“Ah, I see. So what’s the loop invariant?” I ask.

“What is a loop invariant?” is the reply.

“We” are talking to a “they”; and that we-they conversation, and others like it, is where the motivation for an “Informal Methods” course came from. It was not that these students couldn’t understand invariants etc. It was that they had never been given the chance.

We have to turn “them”, those students, into “us”.

Where Formal Methods Went. Formal Methods did not start in one place, with one person or one project. But in the early 1970’s there were a few significant publications, from a small number of (already, or soon to be) prominent authors, that set the stage for a decade of progress in reasoning carefully about how programs should be constructed and how their correctness could be assured. Amongst Computer Science academics generally the names, at least, of those authors are well known: not a few of them are Turing Award winners. But outside of Formal Methods itself, which is a relatively small part of Computer Science as a whole, few academics can say what those famous authors are actually famous for. And even fewer can say what some of them are still doing, right now, as this article is being written.

Why aren’t those authors’ works now the backbone of elementary programming, not a part of every student’s intellectual toolkit? Part of the reason is that although the ideas are simple, learning them is hard. Students’ brains have to be ready, or at least “readied” by careful conditioning: the ideas cannot be pushed in by teachers. They have to be pulled in by the students themselves. Teachers operate best when answering questions that students have been tricked into asking. Even so, many excellent teachers simply are not interested in those ideas: they have their own goals to pursue, and not enough time even for that.

Another problem with Formal Methods is evangelism: there was, and continues to be, an urge to say to others “You have to do this; you must follow these rules... otherwise you are not really programming. You are just hacking, playing around.” Formal Methods, like so many other movements, generated a spirit of epiphany, of “having seen the light” that encouraged its followers too much to try to spread the message to others, and too eagerly. And often that brought with it proposals for radical curriculum re-design that would “fix everything” by finally doing things right. Another “finally”, again. And again.

In general those efforts, so enthusiastically undertaken, never made it out of the departmental-subcommittee meeting room — except to be gently mocked in the corridors, by our colleagues’ shaking their heads with shrugged shoulders, wondering how we could be so naïve. And now, years later, those efforts are mostly gone altogether, forgotten.

So that is where Formal Methods went, and why it has not become a standard part of every first-year Computer-Science course. And “we” are the small group of Computer Scientists who are old enough to have had no way of avoiding Formal-Methods courses during those “decades of enthusiasm”. Or we might be younger people who are intellectually (genetically) pre-disposed to seek, and achieve that kind of rigour: after all, in every generation there are always some.

But now we leave the matter of “us”, and turn to the question of “they”.

1.3 Operating by Stealth: Catching “Them” Early

In my own view, the ideal place for an informal-methods course is the second half of first year. A typical first-half of first year, i.e. the first computing course, is summarised

$$\begin{aligned} \begin{array}{c} \!\text {Here's a programming language: learn its syntax. And} \\ \!\!\!\!\!\! \text {here are some neat problems you never thought you} \\ \!\!\text {could do; and here are some ideas about how you can} \\ \!\!\!\!\!\!\!\!\!\!\!\!\! \!\!\!\!\!\!\!\!\!\!\!\! \!\!\!\!\!\! \!\!\!\!\!\! \!\!\!\!\!\!\!\! \text {program them on your very own.} \\ \;\;{} Experience ~\text {for yourself the epic late-night struggle with}\\ \;\;\text {the subtle bugs you have introduced into your own code;}\\ \!\!\!{} savour ~\text {the feeling of triumph when (you believe that)}\\ \;\text {your program finally works;} gloat \text {over the incredible in-}\\ \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\!\!\! \!\text {tricacy of what you have created.} \end{array} \end{aligned}$$
(1)

This is true exhilaration, especially for 18-year-olds, who are finally free to do whatever they want (in so many ways), in this case with “grown-up” tools to which earlier they had no access: most first-years have been exposed to computers’ effects for all of their conscious lives, and there is nothing like the thrill of discovering how all these mysterious and magical devices actually work and that, actually, you can do it yourself.

Become a Tenth-Level Cybermancer! Your younger siblings, non-CS friends and even parents will now have to cope with cryptic error messages and bizarre functionality that you have created.

Once we accept (or remember ) the above emotions, we realise that the key thing about formal-methods teaching is that for most students there is no point in our trying to introduce it until after the phase (1) above has happened and has been fully digested: if we try, we will just be swept out of the way.

A typical second-half first-year course is elementary data-structures; and indeed second-half first-year is a good place for it. But informal methods is more deserving of that place, and it is far more urgent. Survivors of the first half-year will now understand something they didn’t have any inkling of before:

Programming is easy, but programming correctly is very hard.

This moment is crucial, and must be seized.

They have not yet mistakenly learned that making a mess of your initial design, and then gradually cleaning it up, is normal, just what “real programmers” do — and that lost nights and weekends, sprinting with pizza and Coke, are simply how you play the game. And it is not yet too late to stop them from ever learning that mistaken view. This is the moment we must teach them that programming should not be heroic; rather it should be smart.

We want programmers like Odysseus, not Achilles.Footnote 2

So the “stealth” aspect of informal methods, in second-half first year, is that it should be called something like

  • Programming 102, or

  • Taking control of the programming process, or

  • Practical approaches to writing programs that Really Work, or

  • Getting your weekends back.Footnote 3

Having just experienced the pain (and pleasure) of over-exuberant coding, and its consequences, they are now – just for this small moment of their lives – in a small mental window where their teen-aged energy can be channelled into forming the good habits that will help them for the rest of their professional careers. In short, they are vulnerable, at our mercy — and we should shamelessly take advantage of it.

By second year, it could be too late.

1.4 Making Contact

After six months’ exposure to university-level programming, many first-year students are on a trajectory to a place that you don’t want them to go: they think that their journey in Computer Science is going to be a series of more elaborate languages with steadily increasing functionality, a growing collection of “tricks of the trade” and more skill in dealing with the unavoidable quirks of the software tools we have. And it will be all that. But it should be more: how do we catch that energy, and deflect these would-be tradesmen onto a path towards proper engineering instead?

In my view, our first move has to be making contact: you must first “match their orbit”, and then gradually push them in the right direction. It’s like saving the Earth from an asteroid — first you land on it, and then a slow and steady rocket burn at right-angles does the job. The alternative, a once-off impulse from a huge space-trampoline, simply won’t work: the asteroid will just punch through and continue in its original direction.

In Sect. 2, below, some suggestions are made for doing this, the gradual push. For second-years,Footnote 4 it’s to develop in the first lectures a reasonably intricate (for them) program, on the blackboard in real-time, in the way they are by then used to: guesswork, hand-waving and even flowcharts. You indulge in all the bad habits they are by now developing for themselves, and in so doing establish your programming “street cred” — for now, you are one of them.Footnote 5

But this is where, secretly, you are matching their orbit. (For fourth years, paradoxically, you should use an even simpler program; but the principle is the same.Footnote 6)

In both cases, once you have caught their attention, have “landed on the asteroid”, by going through with them, sharing together, all the stages mentioned in Sect. 1.3 above – the epic struggle, the subtle bug, the savour of triumph, the gloat over intricacy – you “start the burn” at right angles, and gradually push them in the direction of seeing how they could have done it better.

Your aim is to train them to gloat over simplicity.

2 The Very First Lecture: Touchdown

2.1 Begin by Establishing a Baseline

Much of this section applies to any course, on any subject, and experienced lecturers will have their own strategies already. Still, as suggested just above, it’s especially important in this course to make a strong connection with the students and to maintain it: you are going to try to change the way they think; and there will be other courses and academics who, with no ill will, nevertheless will be working in the opposite direction, suggesting that this material is unimportant and is consuming resources that could be better used elsewhere. With that said...

First lectures usually begin with administrative details, and that is unavoidable. But it’s not wasted, even though most people will forget all that you said. It’s useful because:

  • Although no-one will remember what you said, to protect yourself later you will have to have said it: “Assignments are compulsory”, “Checking the website is your responsibility”, “Copying is not allowed”.

  • It will give the students a chance to get used to your accent and mannerisms. To reach them, you first have to let them see who you are.

  • You can say really important things like “No laptops, phones, newspapers, distracting conversations, reading of Game of Thrones are allowed in the lectures.” Really, not at all. (If you don’t say this at the beginning, you can’t say it later because, then, it will look like you are picking on the person who is doing that thing. Only at the very beginning is it impersonal.) This one is really important, because part of getting students to want to understand what you are presenting is showing them that you are interested in them personally. If you don’t care about students who are not paying attention, you will be perceived as not caring either about those who are.

  • You can warn them not to be late for the lectures. (Do it now, not later, for the same reasons as just above.) Being on-time is important, because if they are not interested enough to be punctual they won’t be interested enough to be responsive in the lecture itself, and so will form a kind of “dark matter” that will weigh-down your attempts to build a collaborative atmosphere.

But do not spend too much time on this initial stage: remember that most of it won’t sink in and that, actually, that doesn’t matter. Your aim with all the above is simply to get their attention.

Once all that is done, continue immediately with a programming exercise — because by now they are beginning to get bored. Wake them up!

The details of the exercise I use are the subject of Sect. 4.1. The exercise is not used for making people feel stupid; rather it’s used instead for showing people, later, how much smarter they have become.

The exercise is handed out, very informally, on one single sheet of paper, handwritten (if you like); and then it’s collected 10 min later. It’s not marked at that point, and no comment is made about when it will be returned: just collect them up; store them in your bag; say nothing (except perhaps “thank you”).

Like the introductory remarks, this exercise will probably be forgotten; and that’s exactly what you want. When it returns, many weeks later, you want them to be (pleasantly) surprised.

2.2 Follow-Up by Cultivating a Dialogue with the Class: And Carry Out an Exercise in Triage

Delivering a course like this one can be seen as an exercise in triage, and the point of the initial dialogue is to carry this out. A (notional) third of the students, the very smart or very well-informed ones, will get value from the course no matter how badly you teach it, and you should be grateful that they are there. Amongst other things, they provide a useful validation function — for if they don’t complain about what you say, you can be reasonably sure that any problems you might have are to do with presentation and not with correctness. And they can be used as “dialogue guinea-pigs” (see below). In fact these students will listen to and attempt to understand your introduction. But it wouldn’t matter if they didn’t, since they have already decided to do the course, and for the right reasons.

Another third of the students might be there because they think the course is easy, and that they can get credit while doing a minimal amount of work. They are not a problem in themselves; but neither should they consume too much of your effort and resources. They will not listen to your introduction, and it will make no difference to them, or to you, that they do not. If you’re lucky, they will get bored and leave the course fairly early (so that they don’t have to pay fees for it).

It’s the last third, the “middle layer”, who are the students you are aiming for. It’s their behaviour you want to change, and it’s them you want to excite. They will listen to your introduction, but not so much because they will use it to help them in planning or preparation; instead they will be trying to figure out what kind of person you are, and whether the course is likely to be fun. You must convince them that it will be.

So your constraints are mainly focussed on the top- and the middle groups: for the first, tell the truth (because they will know if you do not); for the second, simply be enthusiastic about what you say and let them see your genuine pleasure that they have come to take part in your course. The introduction could therefore go something like the following:

(In-)Formal Methods are practical structuring and design techniques that encourage programming techniques easy to understand and to maintain. They are a particular kind of good programming practice, “particular” because they are not just rules of thumb. We actually know the theory behind them, and why they work. In spite of that, few people use them. But – by the end of this course – you will use them, and you will see that they work.

Unusually, we do not take the traditional route of teaching the theory first, and only then trying to turn it into the practice of everyday programming methods. Instead, we teach the methods first, try them on examples; finally, once their effectiveness is demonstrated, we look behind the scenes to see where they come from.

Thus the aim of this course is to expose its students – you – to the large conceptual resource of essentially logical and mathematical material underlying the construction of correct, usable and reliable software. Much of this has been “lost” in the sense that it is taught either as hard-core theory (quite unpopular) or – worse – is not taught at all. So there will be these main threads:

  1. 1.

    How to think about (correctness of) programs.

  2. 2.

    Case studies of how others have done this.

  3. 3.

    How to write your programs in a correctness-oriented way from the very start.

  4. 4.

    Case studies of how we can do that.

  5. 5.

    Why do these techniques work, and where would further study of them lead.

For (1) the main theme will be the use of so-called static rather than operational reasoning, that is thinking about what is true at various points in a program rather than on what the program “does” as its control moves from one point to another. This is harder than it sounds, and it takes lots of practice: explaining “static reasoning” will be a major component of the course.

The principal static-reasoning tools, for conventional programs, are assertions, invariants and variants. If you have never heard of those, then you are in the right place. Usually they are presented with an emphasis on formal logic, and precise step-by-step calculational reasoning. Here however we will be using them informally, writing our invariants etc. in English, or even in pictures, and seeing how that affects the way we program and the confidence we have in the result.

For (2), the programs we study will be chosen to help us put the ideas of (1) into practice, as they do need lots of practice. Usually the general idea of what the program needs to do will be obvious, but making sure that it works in all cases will seem (at first) to be an almost impossible goal. One’s initial approach is, all too often, simply to try harder to examine every single case; and “success” might then be equated with exhaustion of the programmer rather than exhaustion of the cases.

Our alternative approach (3) to “impossible” programs will be to try harder to find the right way to think about the problem they are solving — often the obvious way is not the best way. But getting past the obvious can be painful, and tiring, though it is rewarding in the end: a crucial advantage of succeeding in this way is that the outcome – the correctness argument – is concrete, durable and can be communicated to others (e.g. to yourself in six months’ time). Further, the program is much more likely to be correct.

Doing things this way is fun, and extremely satisfying: with (4) we will experience that for ourselves. It’s much more satisfying that simply throwing programs together, hoping you have thought of every situation in which they might be deployed, and then dealing with the fallout when it turns out you didn’t think of them all.

Finally, in (5), we recognise that the above (intellectual) tools all have mathematical theories that underlie them. In a full course (rather than just this article), we would study those theories — but not for their own sake. Rather we would look into the theories “with a light touch” to see the way in which they influence the practical methods they support. Those theories include program semantics, structured domains, testing and compositionality, and finally refinement.

2.3 Making Contact with the Top Layer: The “Dialogue Guinea Pigs”

In this first lecture, you should begin figuring out who the smart, confident people are. You will use them as a resource to help the other students learn to take the risk of answering the questions you will ask throughout the course. Although the goal is to make the students feel that they can dare to answer a question even if they are not sure, in the beginning you have to show that this is not punished by embarrassment or ridicule.

So: find the smart people; but still, when you ask your first question, ask the whole room and not only them. Pause (since probably no-one will answer), and then pick a smart person, as if at random, who will probably give an answer that will be enough to work with even if it’s not exactly right. Make having given an answer a positive experience, and others will be keen to join in.

As the weeks pass, you will increase your pool of smart targets; and you will establish that answering the questions is not threatening. Furthermore, having established a question-and-answer style allows you to fine-tune the pace of a lecture as you go: a conversation is always easier to manage than a prepared speech.

2.4 Making Contact with the Middle Layer: The Importance of “street Cred”

By now, the students have sat through your introduction, and they have probably completely forgotten that they did a small test at the beginning of the lecture (Sect. 2.1).Footnote 7 Do not remind them.

Instead, grab the middle layer of your audience by actually writing a program collaboratively, i.e. with their help on the board, right before their eyes — that is, after all, what the course is supposed to be about. Writing programs. But choose the program from material they know, ideally a program they have encountered before, and do it in a way they expect. Your aim here is not to dazzle them with new things which, you assure them, they will eventually master. Rather you are simply trying to make contact, and to reassure them that you really understand programming in exactly the way they do, and that you “can hack it” just as they can. Roll up your sleeves as you approach the board; literally, roll them up. This is the Street Cred.

A second aim of this first exercise is to establish a pattern of joint work, between you and them and among them. You aim to form a group spirit, where they will help each other; this is especially important later, when the students who understand the new ideas will enjoy helping the ones who have not yet “got it”. Again there is something peculiar about this course: the message sounds so simple that people, initially, will think they understand when in fact they do not understand it at all. A camaraderie operating outside the lecture room is the best approach to this. It is terribly important.

In the second-year version of this course, I chose a programming assignment from those very students’ first-year course and abstracted a small portion, a slightly intricate loop, and obfuscated it a bit so that they wouldn’t recognise it except perhaps subliminally. (It’s described in Sect. 5 below.) I then did the program with them, on the board, in exactly the same style I thought a second-year student might do it.Footnote 8 It was careful, operational and hand-waving... but the program worked,Footnote 9 and the class experienced a sense of satisfaction collectively when, after a hard but enjoyable struggle, we had “got it out”.

I then had an insight (simulated), suggested a simplification, and did the same program again, the whole thing all over, but this time using a flowchart. Really, an actual flowchart in a course on “formal methods”. The resulting program, the second version, was simpler in its data structures (the first version used an auxiliary array) but more complicated in its control structure: exactly the kind of gratuitous complexity a second-year student enjoys. When we finished that version, as well, we had begun to bond.

This program, and its two versions, are described in Sect. 5 below. What the students did not know at that stage was that the very same program would be the subject of their first assignment, where they would develop a program that was smaller, faster and easier to maintain than either of the two versions we had just done in class (and were, temporarily, so proud of). Having developed the earlier versions with them, all together, was an important emotional piece of this: the assignment shows them how to improve our earlier work, not theirs alone. It’s described in Appendix B below.

3 Follow-Up Lectures, “Mentoring”, and the Goal

A difference between the fourth-year version of the course and the second-year version was that “mentoring” was arranged for the latter. Neither course was formally examined: assessment was on the basis of assignments, and a subjective “participation” component of 10 %.

Mentoring (explained below) was used for second-years because of the unusual nature of this material and, in particular, its informal presentation. As mentioned just above (the “second aim”) the risk is that “young students” can convince themselves that they understand ideas and methods when, really, they do not. The point of the mentoring is to make sure every single student is encouraged to attempt to solve problems while an expert is there who can give immediate feedback and guidance. In that way, a student who does not yet understand the material will, first, find that out early and, second, will be able to take immediate action.

The 40\(+\) students in the second-year course in 2014 were divided into groups of 8 students, and met once weekly for 30 min with one of three mentors who (already) understood the material thoroughly. (Two of the mentors were Ph.D. students; and all three were veterans of the fourth-year version of the course.) The student-to-mentor allocations were fixed, and attendance at the mentoring sessions was compulsory (and enforceable by adjusting the participation mark).

The mentors were instructed (and did) ask questions of every student in every session, making sure no-one could avoid speaking-up and “having a go”. In addition, the three mentors and I met for about 30 min each week so that I could get a bottom-up view of whether the message was getting through. The mentors kept a week-by-week log with, mostly, just a single A, B or C for each student: excellent, satisfactory or “needs watching”. (If you ask the mentors to write too much, instead they won’t write anything: this way, the more conscientious mentors will often put a comment next to some of the C results without your having to ask them to.)

With all that said, it is of course not established that the mentoring was effective or necessary: the course has not been run for second-years without it, and so there is no “control” group. And the course has no exam. So how do we know whether it was successful?

Success is measured of course against a goal; and the goal of this course is not the same as the goal for a more theoretical course on rigorous program-derivation. For the latter, it would be appropriate to an examination-style assessment testing the ability (for example) to calculate weakest preconditions, to carry out propositional- and even predicate-logic proofs, to find/calculate programs that are correct by construction wrt. a given specification etc.

In fact the two goals for this course are as follows: the first targets the “upper layer” identified in Sect. 2.2 above. Those students will do very well in this course, and it will probably be extremely easy for them. Indeed, it will probably not teach them anything they could not have taught themselves. Are they therefore getting a free ride?

No, they are not: the point is that although they could have taught themselves this material, they probably would not have done so — simply because without this course they’d have been unaware that this material existed. As educators we strive to maintain standards, because the society that pays for us relies on that. In this case, the good students, the longer view is that it’s precisely these students who will take a real “formal” Formal Methods course later in their curriculum, and then will become the true experts that our increasing dependence on computers mandates we produce. Without this course, they might not have done so.

The second goal concerns the “middle layer”. These students would never have taken a real Formal Methods course and, even after having taken this course, they probably still won’t. But, as the comments in Appendix F show, their style and outlook for programming have been significantly changed for the better — at least temporarily. They will respect and encourage precision and care in program construction, for themselves and – just as importantly – for others; and if they ever become software-project managers or similar, they will be likely to appreciate and encourage the selective use of “real” formal methods in the projects they control.

These people will understand this material’s worth, and its benefits; and they will therefore be able to decide how much their company or client can afford to pay for it — they will be deciding how many top-layer experts to hire and, because they actually know what they are buying, they will be able to make an informed hard-cash case for doing so. Given the current penetration of rigour in Software Engineering generally, that is the only kind of case that will work.

References to Section 4.1 and beyond can be found in Part II; appendices are at the end of Part II.

4 Part II – Specifics

4.1 Binary Search: The Baseline Exercise in Lecture 1

We now look at some of the actual material used for supporting this informal approach to Formal Methods. We begin with the “baseline” exercise mentioned in Sect. 2.1.

Binary Search is a famous small program for showing people that they are not as good at coding as they think. But that is not what we use it for here. Instead, it’s used to establish a starting point against which the students later will be able to measure their progress objectively.

The exercise is deliberately made to look informal, “unofficial”, so that very little importance will be attached to it, and it’s done (almost) as the very first thing, to give people the greatest chance of forgetting that it happened at all. Thus it’s a single sheet, with instructions on the front and boxes to fill-in on the back, almost like a questionnaire or survey. The actual version used was handwritten,Footnote 10 further increasing its casual appearance. Furthermore, doing it that way makes the point that the emphasis is on concepts and precision in thought, not on - or other word-processed “neatness” of presentation. (For this article, however, I have typeset it to avoid problems with my handwriting: in class, they can just ask what any particular scrawl is supposed to be.)

A typical answer is in Fig. 1, on the reverse side of the handwritten version. (The front side, including the instructions, appears in Appendix A.) The typewriter-font text and ovals were added in marking. The hand-written comments and arrows were there in the handed-out version, in order to give very explicit help about what was expected and what the possible answers were. The light-grey text in the boxes of Fig. 1 are the student’s answers.

Fig. 1.
figure 1

Typical marked answer to the Binary-Search exercise of Sect. 4.1(Color figure online).

The (typewriter-font) comments in the marking are important, even if it’s only a few: if the program is wrong, for your credibility as a teacher you must identify at least one place it clearly is wrong. That way the student won’t blame you; and it also makes the point that you actually bothered to read the work.Footnote 11

You don’t have to find all errors, however, since this test is not marked numerically for credit. The aim is merely to find enough errors so that the students will be pleased, later, when they realise that the techniques they have learned would have allowed them to write this program, first time, with no errors at all.

5 The Longest Good Subsegment

This problem forms the basis for the first assignment, whose role in the overall course was described in Sect. 2.4 above: in summary, the problem is solved using “traditional” methods, in class – in fact, it is done twice. Once that has been endured, this assignment does it a third time, but using instead the techniques that the course advocates. The hoped-for outcome is that they will achieve a better result, by themselves but using these methods, than they did with the lecturer’s help without these methods.

5.1 Problem Statement

Assume we are given an array A[0,N) of N integers A[0], A[1],..., A[N-1] and a Boolean function Bad(n) that examines the three consecutive elements A[n], A[n+1], A[n+2] for some (unspecified) property of “badness”. We write A[n,n+3) for such a subsegment, using inclusive-exclusive style.

Just what badness actually is depends of course on the definition of Bad; but for concreteness we give here a few examples of what Bad might define. If subsegment A[n,n+3) turned out to be [abc] then that subsegment might defined to be “bad” if abc were

  • All equal: \(a=b=c\).

  • In a run, up or down: \(a{+}1=b \wedge b{+}1=c\) or \(a{-}1=b \wedge b{-}1=c\).

  • Able to make a triangle: \(a{+}b{+}c\ge 2(a\mathbin {\max }b\mathbin {\max }c)\).Footnote 12

  • The negation of any of the above.

Note that however Bad is defined, using Bad(n) is a programming error, subscript out of range, if \(\texttt {n}<0\) or \(\texttt {n}{+}3>\texttt {N}\).

Now a subsegment of A is any consecutive run A[i,j) of elements, that is \(\texttt {A[i],A[i+1],\ldots ,A[j-1]}\) with \(0\le \texttt {i}\le \texttt {j}<\texttt {N}\). We say that such a subsegment of A is Good just when it has no bad subsegments inside it. A Good subsegment of A can potentially have any length up to the length N of A itself, depending on A; and any subsegment of length two or less is Good, since no Bad subsegment can fit inside it. (Bad subsegments of A have length exactly 3; and since they can overlap, it’s clear that A can contain anywhere from 0 up to \(\texttt {N}{-}2\) of them.)

The specification is of our program is given in Fig. 2.

Fig. 2.
figure 2

Specification of the first programming assignment

5.2 First Idea

An Operational Approach. A typical second-year student’s first thoughts about this problem might be along these lines. First go through all of A and store its “bad positions” b in an auxiliary array B. Then find the largest difference \( maxDiff \) between any two adjacent elements b of B. Then – after some careful thought – set l to be \( maxDiff {+}2\)... or something close to it.Footnote 13

Notice the operational phrasing “go through...”, and the slightly fuzzy “something close to it” (one more than that? one less? exactly that?) A typical student would code up the program at this point, using the guess above, and see whether the answer looked right. It’s only an assignment statement, after all, and if it contains a one-off error well, it can be “tweaked”, based on trial runs, without affecting the structure of the rest of the program. This is how second-years think. (And they are not alone.)

We developed such a program (given below), interactively at the board, without hinting even for a moment that there might be a better way: indeed the whole point of sneaky exercise is to get away with having the students agree that this is a reasonable solution, one indeed that they might have come up with themselves.

“Interactive”, by the way, is important. The ideas of “static reasoning”, invariants, assertions and so on are so simple when finally you understand, but so hard to grasp for the first time. (That’s why the mentoring described in Sect. 3 is important: the students who are not yet assimilating the approach must be helped to realise for themselves that they are not.) Building a “let’s all work together” atmosphere encourages the students to help each other; and so another contribution of this program-construction exercise carried out collectively is to build that collegiality in the context of something they understand: operational reasoning about a program. They understand operational reasoning.

Our aim is to change the reasoning style while maintaining the collegiality.

The Program Developed at the Board. The program we came up with is given in Fig. 3.Footnote 14 It uses an auxiliary array but, in doing so, does achieve a conceptual clarity: first find all the bad segments; then use that to find the (length of) the longest Good segment.

Fig. 3.
figure 3

First approach to the “bad segment” problem

5.3 Second Idea

Use a “Conceptual” Extra Array. The more experienced programmers in the group will realise that, although having the auxiliary array B is useful for separating the problem into sub-problems, logically speaking it is also “wasteful” of space: a more efficient program can be developed by using a conceptual B’s whose “currently interesting” element is maintained as you go along. After all, that’s the only part of B you need.

This idea of a “conceptual” B is very much a step in the right direction. (It’s related to auxiliary variables, and to data-refinement.) This is an insight that can be “simulated” during the presentation of this problem.

A second more advanced technique that the better students might consider is to use “sentinels”, in order not to have to consider special cases.

But the resulting control structure turns out to be a bit complicated; and so, in order to see what’s going on, we use a flowchart. The flowchart in Fig. 4 is the one we developed at the board, all together interactively in class. Again, this was very familiar territory (at least for the better students).

The code shown in Fig. 5 is the result.

5.4 Static Reasoning and Flowcharts

Ironically, the flowchart Fig. 4 gives us the opportunity to introduce static reasoning just as Floyd did in 1967:Footnote 15 one annotates the arcs of a flowchart. Experience seems to show that, initially, students grasp this more easily than the idea of something’s being true “at this point in the program text”.

Fig. 4.
figure 4

Flowchart for the second approach to the “bad segment” problem

Fig. 5.
figure 5

Code for the second approach to the “bad segment” problem

Fig. 6.
figure 6

Annotated flowchart for “bad segment” problem: outer code

Fig. 7.
figure 7

Annotated flowchart for “bad segment” problem: inner code

An example of that applied to this problem is given in Figs. 6 and 7: no matter that the assertions there are complicated, even ugly. The point is that the assertions are possible at all, a new idea for the students, and crucially that their validity is entirely local: each “action box” in the flowchart can be checked relative to its adjacent assertions alone. There is no need to consider the whole program at once; no need to guess what it might do “later”; no need to remember what it did “earlier”.

That’s a very hard lesson to learn; experience with these courses suggests that it is not appreciated by course’s end, not even by the very good students. It is only “planted”, and grows later.

5.5 The First Assignment, Based on This Example

The assignment based on this example is handed out about a week (i.e three hours of lectures) after the operational developments described above. Other material is presented in between. (The material of Sect. 6 works well for that.)

Handed out too soon, the whole exercise looks like trickery or even mockery: why should the lecturer go through all that stuff only to say, just afterwards, that it’s all wrong? Left to with the students to “marinate”, it will come to be seen as a solution that they thought of themselves. And then they are ready to appreciate a better one.

The assignment is reproduced in Appendix B.

6 Tools for Development and Proof: Software Engineering

6.1 Proofs by Hand; Proofs by Machine

It’s tempting to base a computer-science course on a particular programming language, a particular IDE, a particular program-verifier. If the lecturer is already familiar with the tools, then the lectures and exercises can very easily be generated from the specifics of those tools; and indeed “not much energy required” lectures can easily be given by passing-on information that you already know to people who just happen not to know it yet (but don’t realise they could teach themselves). The lecturer can simply sit in a chair in front of the class, legs crossed, and read aloud from the textbook.

There are, however, some disadvantages too. The first two of them apply to any tool presentation in class; and the third is more specific to formal methods. They are:

  • Even if it’s easy to describe programming-language syntax to the class, there is no guarantee that they will be listening or, if they are, that they will remember what you have said. In the end, they’ll have to look it up anyway. And then what were the lectures for?

  • From the fact that the lecturer seems to think the particular sequence of buttons to push in an IDE is important, the students will conclude that the sequence indeed is important. But is it really?

  • Since an automated prover can check your reasoning and find any mistakes, one could conclude that it’s not necessary to try to avoid making mistakes in the first place. In that case, why don’t we get machines to write the programs, instead of merely check them?

In spite of all those caveats, there is an overwhelming advantage to using tools: without them, no serious (software) project of any size can be rigorously completed. It is simply not possible to master that complexity, either alone or in teams. How do we navigate between these two opposites?

What’s important is how a tool is presented: it’s just a tool. If you’re lecturing about building circuit boards, you don’t spend hours describing which end of a soldering iron to hold: students will figure that out pretty quickly for themselves. And in the end, a robot will do the soldering anyway. What’s crucial is that they know what solder is for, and what therefore the robot is doing. The details of a particular robot are uninteresting and unimportant.

In this course, the (experimental) principle was adopted that things should not be done using with computer-based tools until those same things, perhaps in smaller instances, have been done by hand without them.Footnote 16

In the case of static reasoning over programs, then, do we explain what an invariant is informally, conceptually, and how you intuit your way into finding one? Or do we explain an invariant as something which when typed into a particular tool will allow it to say “loop verified”. (Or do we pretend that the invariant follows inescapably, inevitably by careful manipulation of the symbols in the predicate-calculus formulation of your loop’s postcondition?)

In this course the motto for finding invariants is “beg, borrow or steal”. It makes absolutely no difference where an invariant comes from; but it makes all the difference that you insist on having it, and then what you do with it from that point on.

6.2 Proofs by Hand

The very first example in the list below is used early in the course, informally:

“If you know that x equals 1 now, and the statement \(x{:=}x{+}1\) is executed, what can you say about the value of x afterwards?”Footnote 17

It’s easy, and indeed it looks too easy. With the further exercises in that list, we build our way from easy cases to hard ones, looking for the painful point where the problem is still small enough that you think you should be able to work it out in your head but, whenever you try you keep getting the wrong answer. By doing that, we are creating an irritation, a curiosity and, actually, a need.

That need should be satisfied by the “Hoare assignment rule”, but instead of presenting it as an axiom of a beautiful programming logic (while remembering that the very idea of have a programming logic is beautiful in itself), it is simply presented as an algorismFootnote 18 for assertions and programs. Presenting it as a definition is premature: remember that the students have not yet realised that there even is a thing that needs defining.Footnote 19

Thus what these exercises are designed to do is to take the students from simple assertional reasoning, where operational thinking works, to more complicated examples where they still understand what they are trying to do but find the details have become difficult to control. At that point, they are ready to be given a method — but still, crucially, a method to be used by hand.

figure b
figure c

6.3 Proofs by Machine

By the time the examples above have been worked through, using by-hand calculation based on the substitution definition of an assignment’s action on a postcondition, a second boundary has been reached. Even with that assignment rule, it’s become too easy to make simple mistakes in the calculations — not because they are difficult to understand, but because they have become larger and are more encumbered with the trivial complexities common in programs generally. And, indeed, the amazing effectiveness and the novelty of the rule has, by then, worn off as well. Having grown used to calculation rather than guesswork, the students are ready for the transition to a tool that does those calculations for them. Crucially, however, they know what those calculations are and thus what the tool is doing (although not, perhaps, how it does it).

It’s the same two transitions that move, first, from counting on your fingers to using positional notation on paper and then, second, from paper to a pocket calculator. For us, the pocket calculator is a program verifier. Having done this in stages, though, we understand that the calculator/verifier is just doing what we did on paper, but faster and more reliably. And what we were doing on paper was just, in turn, what we were doing with our fingers/operationally, but with a method to control detail.

In Appendix C we give the same exercises as above, Figs. 12, 13, 14, 15, 16, and 17, written in the language of the program-verification tool Dafny: it will be our pocket-calculator for program correctness. Naturally the exercises should however be done by hand, in class, before the “pocket calculator” is used. Or even before it’s revealed that there is one.

6.4 Dafny

From the Dafny website:Footnote 20

Dafny is a programming language with a program verifier. As you type in your program, the verifier constantly looks over your shoulders and flags any errors.

One presentation of Dafny is embedded in a web-page, where you can simply type-in a program and press a button to check its correctness with respect to assertions you have included about what you want it to do.Footnote 21 In Fig. 8, the first part of the assertion exercises from p. 25 above has been (correctly) completed using the template of Fig. 12 (in Appendix C), and that has been pasted into the Dafny online-verification web-page.

In Fig. 9, the same has been done but with a deliberate error introduced; and it shows that Dafny found the error. When explaining this to the students, it cannot be stressed too much what a remarkable facility this is. They will be used to syntax errors (annoying, but trivial); and they will be used to run-time errors (fascinating, unfortunately, but also time-consuming). What they will never have seen before is a computer-generated warning that means “Your program compiles correctly but, even if it does not actually crash at runtime, there are circumstances in which it will give the wrong answer.” It is – or should be – one of those moments we all remember from our undergraduate days that we were truly amazed, that mentally we “went up a level”.Footnote 22

Fig. 8.
figure 8

The Dafny on-line verifier applied to Fig. 12 completed correctly.

7 Return to Binary Search

In Sect. 2.1 the one-page handwritten Can you write Binary Search? questionnaire was described, given almost at the beginning of the very first lecture and then (we hope) forgotten. What happens when we return to it? What material should have been covered in the meantime?

7.1 Why Return to Binary Search at All?

Our aim is to convince students that the techniques we are presenting are worth their while. Most people, at least when they are young, are curious: curiosity can almost be defined as the urge to find out things irrespective of their expected utility. But university students have begun to lose that, at varying rates of course; but the effect that teachers must confront is that many students will learn things (from the teachers) only if they believe it will somehow repay the effort. So we have to convince them of that.

Once they are reminded of the Binary-Search exercise in Lecture One, they will probably also recall that they felt uncomfortable, a bit at a loss. That is exactly what we want, because we want them to feel comfortable now, a few weeks later, to notice that difference, and then to attribute their happiness to what they have learned between then and now.

Fig. 9.
figure 9

The Dafny on-line verifier applied to Fig. 12 with a deliberate error.

7.2 The Invariant: Begging, Borrowing, Stealing? or Maybe Donating?

In Sect. 6.1 we suggested that, for students at this level, it does not matter where an invariant comes from. In simplest terms, that is because techniques of invariant synthesis can’t effectively be taught unless the students are already convinced that they want an invariant at all; and that will take more than a few weeks, or even a few terms. We begin here.

For the moment, we will simple give invariants to the students, without pretending that they should be able to figure them out for themselves. What is the point of reproaching them for not being able to do a thing we know they cannot do (yet)? For the Binary-Search program, we donate the invariantFootnote 23

$$\begin{aligned} \texttt {A[0..l)}\quad <\quad \texttt {a}\quad \le \quad \texttt {A[h..N)}, \end{aligned}$$
(2)

a piece of sorcery, a Philosophers’ Stone and, without worrying (now) about where it came from, we simply show them what it can do. It converts their earlier unease into a feeling of confidence and even a little bit of power: it’s a magical item, as if gained in an RPG, that gives them an ability that can set them apart.

The key issue is whether we present an invariant as a friend or an enemy. If we say “You can’t develop loops properly without knowing first how to find the loop invariant.” then the invariant is an enemy, trying to stop them from doing something they know perfectly well they can do without it (even if, actually, they can’t).

The invariant is a friend if you say “Remember that program, Binary Search, that seemed so hard to get right? If only you’d had one of these, like (2) above, then it would have been easy.” And then you go on to show how easy indeed it now is for them. (Showing that it is easy for you would be missing the point: more about that immediately below.) Once they understand that, then they will be motivated to learn how to find “one of these”, by themselves, for other programs that they encounter later.

There will be an exhilarating moment, for you, when finally one of the students asks “How do you figure out invariants in the first place?” The spark of pleasure you will get is that they want to know. That’s completely different from your having to convince them that they need to know.

7.3 Collaboration and Local Reasoning; Cardboard and Masking Tape

Now the aim is to use (2) to solve the problem posed in Sect. 2.1. Recall that the class has been divided into groups, for mentoring (Sect. 3).

Write the mentoring-group names on small pieces of paper, and bring them to class together with a small cup.Footnote 24 Pick one of the slips from the cup, thus picking a mentoring-group “randomly”. (In fact, you should rig this to make sure you get a strong group. The theatre here is not in order to seem clever — rather it’s to avoid any sense among the students that you might be picking on individuals. Standing in front of the class is a challenge, for them, and you defuse that by making them feel relaxed by the process and maybe even amused by way you carry it out.)

The First Group: Writing the Loop Skeleton. Call the selected group to the front, and explain to them that they will write the loop initialisation, guard and finalisation. Write on the board at the left and explain that “we” (you and they) are going to fill in the ???’s.Footnote 25

The (something), you explain as you glance at the cup, will be another group’s problem. (Nervous laughter from the audience.)

figure d

Now you have to lead them through the steps of filling-in the missing pieces. You will have to say most of it; and it will be hard because, at first, either they will not speak at all, or they will give incorrect answers. The point of having them at the front, even if they say little, is that afterwards what they will remember is that they contributed collectively to finding the solution (just by standing there) even if actually they contributed nothing concrete. The alternative, of pointing to the group still sitting in the audience, does not work nearly so well.

A good piece to start with is ?L?,?H?, because it’s the simplest but also a little bit unexpected. What, you ask them, will establish \(\texttt {A[0..l)}{<}\texttt {a}{\le }\texttt {A[h..N)}\) trivially? Drag out of them (you will probably have to) that \(\texttt {A[0..l)}{<}\texttt {a}{\le }\texttt {A[h..N)}\) is trivially true when \(\texttt {l}=0\) and \(\texttt {h}=\texttt {N}\), because both array segments are empty. Watch them struggle to interpret what this might mean for the coming loop; discourage them from doing so; assure them that it is unnecessary.

Work similarly on ?G? — what is the simplest test you could pick so that \(\texttt {A[0..l)} < \texttt {a} \le \texttt {A[h..N)} \wedge \lnot \texttt {?G?}\) implies \(\texttt {A[0..l)} < \texttt {a} \le \texttt {A[l..N)}\). Again, it’s “obviously” \(\texttt {l}\ne \texttt {h}\), since \(\lnot (\texttt {l}\ne \texttt {h})\) is of course \(\texttt {l}=\texttt {h}\), just what’s needed for that implication. Again you will have to drag; again they won’t really believe that you can program this way.

Finish off with ?E?, and then ask them to sit down.

The Second Group: Sketching the Loop Body. Now take a piece of cardboard big enough to cover the work just done, and tape it over the board so that the work is no longer visible. Pick a second group’s name from the cup, and invite them to come to the front. Write in the middle of the board

figure e

and go through the same process with this group. Of course the \(\mathtt {A[m]<a}\) after the then gives the text ?C? away; but they will hesitate because they think it can’t be that simple. You are teaching them, by doing this, that it is that simple.

Cover-up the results when it’s done.

The Third Group: Finishing Off. For the third group, write

figure f

and

figure g

and split the group in two. Get one half to do ?T? and the other half to do ?E?.

You might have to explain why it is allowed for the \(\mathtt {A[0..l) < a <= A[h..N)}\) suddenly to reappear. Explain (remember this is informal methods) that it carries through from the beginning of the loop body and you really should have written it for the previous group — but you left it out to reduce clutter: there was no assignment to any of its variables.

Once it’s done, just ask them to sit. No need to cover this up.

The Fourth Group: Finishing Up. Now remind the class what has been done: three independent groups, working effectively in separate parts of the program, from the outside-in (not from first-line to last-line), and “no idea” whether the three bits will fit together, however correct they might be with respect to their individual assertions.

Then invite a fourth group to come up and remove the cardboard, and to write a single program combining the pieces already there. Tell them that they don’t have to write the assertions, except for two: the invariant, just at the beginning of the loop body, and the overall postcondition, at the end of the whole program.Footnote 26

Be sure they write it neatly, so that the whole class can read the program (they believe) they collectively have just written. Then hand back their earlier answers, from the first lecture long ago, without commenting on them:

Really, no comment: say nothing. Nothing at all.Footnote 27

Just wait until you see that people are looking up, starting to pack their things away, ready to go. The minute or two (in my experience) before that is usually so quiet you can almost hear the neurones firing. Make it last, let it run: you won’t get many moments like it.

Then simply end the lecture at that point: no announcements or reminders. Just “See you next week.”

8 Using Dafny to do Top-Down Development

Getting the most out of Binary Search, in Appendix D we hammer home the idea of program development “from the outside in” rather than from first statement to last. This approach, though known instinctively by many, was brought especially to prominence by Niklaus Wirth in the early 1970’s.Footnote 28

Surprising to me personally, though taught this principle as an undergraduate, was that use of Dafny on a program of any complexity actually forces you to do this “stepwise refinement”: it is not an option. In effect, Dafny makes a necessity out of a virtue. And here is why.

Novices with Dafny (as I still am) at first expect to be able to write the complete code of a small program (i.e. Binary Search in this case) and then to “do the right thing” by including the assertions and invariants that they have worked out for themselves, perhaps on paper. Then – one push of the magical button – Dafny will tell you whether those assertions verify. That’s the theory.

In practice, Dafny can sometimes give a third answer, and that in two forms. One is “Can’t verify.” (paraphrased) which means literally that: Dafny has not found your assertions to be wrong; but neither can it prove them to be right. The second form is the “whirring fan” where your laptop simply never returns from the proof effort; eventually you must interrupt the verification yourself.Footnote 29

Once you have experienced this often enough, you realise that a step-by-step approach is much better, where you check at each stage that Dafny can verify what you have done so far. And – crucially – the “so far” has to be from the outside-in if you are not to run the risk ultimately of finding e.g. that your very last assertion won’t verify, forcing you to start all over from the very beginning.

Appendix D shows this step-by-step approach with Binary Search. Although it looks like an awful lot of text for such a simple program, in fact each stage is made from the previous one by a cut-and-paste copy followed by alteration of a very small part. It’s much more efficient than it looks.

And, most important of all: it works.

9 Fast Forward: Meeting the Real World

9.1 Motivation and Principles

Finally, we must leave (in this presentation) the Binary Search: we “fast forward” from there to the very end of the course.

The material above describes a prefix of the course, perhaps the first third. The middle of the course (not described here) charts a course through examples and techniques that aim to end with a real-world example, briefly treated in this section. The precise trajectory taken through that middle part depends on the characteristics – strengths and weaknesses – of the students in that particular year; and being able to adapt in that way requires a store of ready-made topics that can be selected or not, on the fly. That store will naturally build up as the course is repeated year by year.

But however we pass through the middle, we want to end with the feeling “This stuff really works. It’s practical. And it makes a difference.” The topic chosen to finish off the course, and its associated assignment, is deliberately designed to be one that, as a side effect, dispels the mystery around some part of the undergraduate computing experience.Footnote 30

In this case the mystery was “What really happens when you type in a command to a terminal window?” As we all know (but some students do not), a program is run. But how does that program access the resources it needs? And how do you use the methods here in order to make sure it does that correctly?

9.2 Programming the cp Command

We chose the copy command “cp”, which must read and write files from... somewhere. To do that, it uses system calls, and it must use them correctly.

The assignment begins with an abstract model of the system calls open, creat, read and write that – crucially – are based on real operating system calls, complete with the nondeterminism of how much is read or written (not necessarily as much as you asked for), and the operating-style convention for end-of-file that is indicated by reading zero characters.Footnote 31

Only this faithful modelling of what really is “out there” will give the students the conviction that they can and should apply these methods to what they will find when they really get “out there” themselves.

The assignment text is given in full in Appendix E, and is commented upon there. About one third of the students got high marks for this, more than expected. They used all these techniques that were new to them in February, but which after June they will never forget:

  • Abstraction of interfaces.

  • Refinement of code through increasing levels of detail.

  • Invariants and static reasoning.

  • Automated assistance with verifying program correctness.

  • Programs that work first time, even under the most bizarre scenarios.

Imagine what Software Engineering would be like today if all students experienced those exciting accomplishments in their very first year.

Can we make it that way tomorrow?

10 Conclusions... and Prospects?

It would be tempting to conclude from the remarks in Appendix F that this course has been an outstanding success. But we can’t. Whether the students enjoy a course, and whether they think they benefitted is quite different from whether a course actually achieved its objectives.Footnote 32 And, so far, we have no real, objective evidence that these are better programmers than they were before or, more significantly, than they would have been without (In-)Formal Methods.Footnote 33

I think it’s fair, though, to say this course has at least not been a failure. Why is that important? It’s because so many Formal Methods courses have failed. That is a critical problem: the skills the students never get a chance to learn, because of those failures, are precisely the skills they need. And they need them at an early stage, to bring about a significant improvement in both their own accomplishments and the expectations that they have of others both now and later, of their future colleagues and employers. It’s ridiculous, actually scandalous that they do not have these elementary techniques at their disposal.

As for prospects: it depends on whether this approach is portable and durable. It has already been taught by two other lecturers, who report positively. But true progress will come with an integration so inextricably into the matrix of conventional curricula that it cannot be undone when its patron moves on. Rather than being the icing on the cake, which can always be scraped off, Informal Methods must be the rising agent distributed throughout.

Invariants, assertions and static reasoning should be as self-evidently part of the introductory Computer Science curriculum as are types, variables, control structures and I/O in the students’ very first programming language.

Can you help to bring that about?