Extreme Programming in APL
Exploring XP principles in an APL project


Sunday, July 6  

Posted an article The Experience of Being Understood. Requirement specification looks a lot like language acquisition. Reflecting on what Wittgenstein had to say about language acquisition suggests an unusual development practice.

posted by Stephen Taylor | 8:09 PM


Friday, June 27  

Tonight I want to say something about the effect that communicating through the code is having on the project.

When Sarah & I started working together we quickly ran up against the limits of our abilities to reason verbally about the processes we are automating. There are very many variables and cases. The calculations she described were not complicated, but the conditions for applying them were. Very. I suspected that this reflected optimisation long ago to minimise calculation, after which it just got difficult to reason across the cases. An actuary would be better trained to abstract a general framework within which to reason, but Sarah isn't an actuary.

So we tackled the case examples one by one, using control structures (IF, THEN, ELSE, &c.) in the code to relate the rules to the conditions for applying them. Case by case the control structures proliferated.

At first I refactored the code only inside the control structure elements. Had I refactored the control structures as well I should have destroyed the framework that anchored the work we were doing together. Sarah & I were already using the code as a visual representation of the conditions, to ensure we could see while we talked.

I came to see that I could refactor the control structures only where Sarah could follow the refactoring. Sometimes I have been able to teach her this. She sees how multiplying a number by a condition is the same as saying it's zero unless the condition holds. I would eventually be able to refactor to my heart's content only when we had finished and no longer needed to refer to it!

But that severely limited the benefits we could get from refactoring. Pity, because they included finding simpler abstractions from among the detail.

In retrospect it seems I did two things right at this point by luck. We both find the pair programming work as intensive as XP practitioners report it. So we break for a few minutes each hour to stretch. (At the moment, when we've achieved a milestone we've been aiming for, I pick and read aloud a poem from A.A Milne's When We Were Very Young.) I also vary the rhythm a bit by spending a few minutes every now and again deconstructing some recent expression in immediate-execution mode. So besides the variables and functions she's named herself, Sarah also has some insight into what the primitives are doing. For example, I've showed her how a Boolean expansion maps a list of numbers into a longer list, and she has used her knowledge to spot and challenge a line where this was not done as we had agreed it should be. So the first thing I think I did right was to keep discussing and explaining the code as I wrote it.

The second thing was to build upon this learning to support aggressive refactoring. Sarah likes the XP prescription for “merciless refactoring”, and enjoys seeing simplicity emerge from complexity.

When early in the project we coded the rules for calculating Market Value Reductions, we wandered in a depressingly long sequence of nested control structures. When we came recently to the rules for calculating annuities and policy fees, we reached for an array structure that would allow us to represent all the cases. This entailed analysing and recreating rules embedded in an Excel workbook, rules that Sarah doesn't know — she just uses the workbook.

We started by refactoring the Excel spreadsheets in the workbook, displaying and naming new intermediate results until we could see what varies from case to case and what is the same. This allows us to rewrite the rules in APL.

In my last post I compared the ephemeral views Excel's auditing tools provide of the information flow with the stable view from the APL code. Even after refactoring, the visual metaphor of the spreadsheet limits how much can be displayed on a monitor. Sarah made the same comparison today and said she finds the APL code considerably easier to follow than the Excel workbook.

I've posted a small Dyalog APL workspace with a working copy of the function we've been editing today. I think many programmers will find it remarkable that a non-programmer should prefer this description to an Excel workbook.

When I began this post about communicating through the code I intended to write separately about the effects on communication quality and on the code quality. But it turns out that they are closely entangled. Weak communication produced loose code which was hard to refactor. Tight communication produced dense code and lots of refactoring opportunities.

In fact Sarah is now contemplating using the insights she's gained from our work to review and refactor the manual processes we're automating.

Next post: thinking about requirements specification as a Wittgensteinian language-game suggests some interesting practices.

posted by Stephen Taylor | 1:42 AM


Tuesday, June 24  

Pair programming with the customer

Earlier I described how I was cutting code as part of my communication with the trainers. They tell me what the system is to do, I code that and we verify the results. In this application domain, my description in APL of what's wanted takes similar time to the trainer describing it in English. So it's like a translation exercise, translating English to APL.

At first I described this as "pair programming with the customer". Then after reflection, I said it wasn't. The aim of pair programming is to raise code quality. The aim of what the trainers and I are doing is to raise communication quality.

We've been doing this for 2 months now and it's time to look at the results. I started working with 2 or 3 trainers, for the last month with Sarah alone.

Tonight, some key practices and communicating through the code with the customer, an unexpected result. Tomorrow, some unexpected effects on communication and code quality.

Programming practices

When the code produces wrong answers Sarah and I step through it with the interpreter, examining the intermediate results. So I write the code to produce the same intermediate results that the manual procedures do.

I don't want any other intermediate results, so ideally the only variable names in the code are names for Sarah's intermediate results.

Ideally, to avoid creating any other variables, I use a single line to represent how one intermediate variable is calculated. I tend to write long lines.

I also create anonymous D-fns 'in line' to avoid creating other variables, even to avoid referring to the same variable twice in one line. (Leading me, I think towards the verb trains in J.)

I want to support skim-reading the code. In skimming code, I need to see only the flow of information between variables. In skimming, dense chains of symbols between the curly braces around a D-fn register simply as do something here.

I also use D-fns merely to highlight the flow of information. For example, where a new value is calculated from a few other variables, I often compose the line as one or a few D-fns so that in skimming the code it is clear that the new value is derived as some function of certain others.

I cannot overemphasise the value of being able to read the code at two levels.

This is similar to the use of Excel's Auditing tools, which allow you to display the precedents or dependents of cells in a spreadsheet. The tools draw arrows over the spreadsheet to show where one number is used or obtained. (You have to read the cell formulas to see exactly how. This corresponds to reading the contents of my D-fns.) But it does not take many arrows to make a big mess, so you need to turn Excel's Auditing on and off for cell after cell to trace the flow of information. In contrast, everything in the APL code is available to a visual scan. The code can be skimmed to follow the flow of information, or the eye can stop to review the detail encapsulated in a D-fn.

A related practice is to name quite small fragments of code, either a D-fn or a derived function, and comment it, so that later it can be read in context by its semantics rather than its mechanics. For example, in code we were writing today, in two places we needed to convert 3-lists into 4-lists by duplicating the first element. That was the mechanics. The semantics was 'replicate the NPR to produce a Reduced'. So we declared a function

rnr „ 2 1 1°/¨ © replicate NPR as Reduced



and used rnr in lines where the reader needs only its semantics, not its mechanics. Unsurprisingly, this semantics for the derived function works only in the context of a single function, so rnr got defined in and localised to that one function.

Communicating through the code

I discuss the code with Sarah as if she were a novice programmer. I review the information flow, pointing to the variable and function names (which she chose) and saying in English what the primitive functions are doing. Occasionally I deconstruct a primitive expression in immediate-execution mode to illustrate or confirm a point, sometimes just to share a particularly elegant or powerful expression. Much of this is in the spirit of play: playing with a complex and fascinating toy.

Today Sarah challenged a line of code. Correctly. In a few related lines, I'd been using Boolean expansions to map calculated results into parts of a collection of arrays. Suddenly Sarah leaned forward and pointed at one of them. "Shouldn't they be for all the columns?" She was right, and had been able to see that the code didn't do it.

So Sarah skim-reads the code and through it we communicate about the rules we're automating.

This is fascinating. I had similar experiences in the 1970s with customers who actually were novice APL programmers. But Sarah doesn't write any APL.

It reminds me of my ability to read Italian. In limited contexts I'm a competent reader and hearer of Italian. Drawing on menu Italian and word roots shared with English, French and Latin I've acquired a vocabulary that meets my needs on holiday and allows me to skim-read newspaper articles.

It also reminds me of the language games Wittgenstein imagined in Philosophical Investigations. Sarah & I play a language game. It doesn't involve her speaking or writing any APL; it's not the same language game that programming or learning programming is. What are the uses of code fragments in this language game?

Similarly, we couldn't play the game with just any APL code. Certain properties of my code are essential to this working. I suspect that the way it supports skimming is crucial, also the use of Sarah's nomenclature in recognisable spellings.

It would be interesting and useful to identify what those properties are and consider how they can be replicated in other languages.

I would like to pay tribute here to what Ken Iverson taught me in 1977 about teaching APL; his emphasis on exploration, hypothesis testing, Socratic dialogue and working in pairs.

posted by Stephen Taylor | 10:32 PM


Thursday, June 5  

Back from Italy and XP 2003.

How do you automate in three months a procedure it takes 6 months to train a new clerk to do?

No matter how smart you think you are, you've got an analysis challenge.

A pitfall to avoid is refactoring the manual procedures. These have been optimised to to minimise human work. In this project's case, some of the complexity comes from shortcuts taken to minimise calculation. With the machine, we would prefer to optimise to minimise the number of rules. Calculation is mostly free.

So spending time in analysis to reduce the number of rules to follow -- refactoring the original process -- is tempting. But it takes time we don't have. So Plan A is always to code it exactly as the clerks do it.

I can still refactor the code without consulting the trainers. I've already done some of this. The very concise APL source code allows patterns to show up at a high level. Since we now have a useful library of test cases, I can refactor these and test. In fact, doing this allowed me to catch what I believe were misunderstandings on my part, which had not been caught by the application test.

But refactoring this way has a drawback. The first version of the application code follows the trainer's instructions precisely. So it's easy to resolve any discrepancies between our results: we use the interpreter to step through the calculation, examining the results as we go.

I can't overemphasise how useful this has been. The communication process has been dramatically shortened. Effectively I'm now pair programming with the customer, who is reading the APL code off the screen, seeing her instructions reflected in control structures. (So APL is 'unreadable'? Match that, Java programmers!)

Refactoring the original process to simplify the rules is a whole new ball game. Slavishly following the existing rules, no matter how baroque they might be, has a clear advantage. You know that it will turn out, that you will produce the same result after some predictable amount of work. In refactoring the procedures, you don't have the same assurance. It's almost the same as redesigning the procedures, except you've got a test suite to get you home. Given that the trainers understand the reality represented by their procedures, it will probably all work out in reasonable time.

These reflections prompted by several days effort spent now on redesigning a table in which intermediate results are presented to the users. Using our system to display this provoked the users to suggest a more orderly presentation than the one they use themselves. We're still working out the consequences.

I felt a twinge of unease when I heard we were doing it. The lesson learned: on a sprint project like this, refactoring the users' processes can be a significant cost.

That said, we are getting it straight and it looks like the calculation rules will come out significantly simpler, with good consequences for later work.

posted by Stephen Taylor | 6:10 PM


Friday, May 9  

Kevin wanted his picture on this site! Here you go, Big Guy.

posted by Stephen Taylor | 9:49 PM


Sunday, May 4  

Ted Codd died.

posted by Stephen Taylor | 10:30 AM


Friday, May 2  

We've mopped up most of the rules for the iteration's output documents and have coded the easy ones. We're way ahead of of our estimated progress, so it's time for our business user to add stories.

Since we came so far by dropping out complicated low-volume quotes, we could drop them back in again. Instead, as we've recommended, we're to push on to the next types of quotes.

This will reveal sooner more of the data we need for the mainframe link.

posted by Stephen Taylor | 5:58 PM
About this blog
Quotes
Links
Code
Archives