for loop (thing) by haggai - Everything2.com

There are two really different things which go by the name of for in different programming languages: the counter for loop (e.g. in Pascal), and the general for loop (e.g. in C). There's also the foreach thing (as in csh), but we'll get to that later. To compare these, I'll try and explain why we have for. So, first off, what is the for loop for? Then, we'll see the two "for"s. And for dessert, we'll talk about encapsulating complex fors in iterators (as in Python or C++/STL). I'm starting with some "philosophy of programming languages" musings, but hopefully this will help explain why superficially similar things called for are different, and it will lead into some discussion of actual languages' for commands.

What is the for loop for?

A loop is a bit of code which we'll be executing over and over. It's the generic and ubiquitous concretisation of the concept of iteration. Almost always, we want to keep looping until some condition occurs. Many languages provide this simple capability through a while loop (or do-while], and/or through a repeat until loop -- the difference between the two is whether you first perform the loop body (repeat), or first check for the exit condition (while). You might continually read input from the user, until she enters a specified delimiter, or iterate a numerical algorithm until your error bound is sufficiently small. You could do this with an if and a goto or jump (if your language is yucky enough to have those), but that doesn't emphasise the loop condition. The while (condition) loop underlines this important bit of your code's structure. (In fact, since goto is almost always used for this purpose, or where a separate function is called for, it is quite uncalled-for in any high-level programming language.)

However, one very common type of loop performs the exact same operation for different values of one particular variable. The loop might have to increment all the values in an array, or calculate a function for a list of inputs, or just perform something a given number of times. For these examples, the important variable might contain (for each iteration) a pointer or index into the array, an index to or the actual value of the current input, or the number of the iteration we're on. In many cases the loop variable is actually a loop index (and is often called this, even when it isn't an index to anything). This is probably why the leading generic loop variable name is i.

Call it what you will, the loop variable is (conceptually) special in that the loop code always does "the same thing", in some sense, and then "advances" the loop index to the next value. You can do this with a while loop, but that doesn't emphasise the loop index. If the last line in your while block "just happens' to increment an index (or follow a pointer, or however you move to the next value) it doesn't stand out as much as when the syntax makes it occupy a different place. So, given it doesn't provide you with much extra functionality (or even save much typing), this syntactic encapsulation is really what for is for.

This matter of having syntax emphasise program structure is important several different ways. It makes code more readable (which is not to be sneezed at), and develops common idioms. But it can also help compilers (maybe even interpreters) optimise the resulting machine code, by saying more about what's going to happen when the entire block is executed. This is most important by far where (for) loops are concerned, as the heaviest part (computationally) of almost any program or function is one or several particular loops, and computer architecture is designed with their optimisation in mind. But that's a whole other matter.

Counter for loop, general for loop

If you examine the examples above of loops and their indices, you'll notice they can all probably be implemented using a counter: an integer index, counting from 0 (or 1, in some languages) up to some given length or number. In a way, this is always true, but in some cases you'll be losing the syntactic encapsulation praised just now. If you're just counting your iterations, or performing an operation on values of i from 0 to 99, your encapsulation is perfect. If you actually have an array nums, and you never use i itself, but only access the ith value, numsi, then syntactically you're not saying all you could about your special (loop) variable, but an intelligent reader (and and intelligently written compiler) can probably figure it out. But what if you're going through all the permutations of a given size, and the way you determine the next permutation has nothing to do with the number of iteration you're on? You could do this with a (formal) loop index which is just a counter, but then you've obviously got some other "real" loop index hidden away inside the for block, so again it doesn't emphasise the loop index. You'd probably be holding a permutation in some other variable, with some bit of code responsible for generating the next value. This basically means you've gone back to the while loop structure, and the syntax hasn't been any help to you.

This brings us to the great divide between different languages' for loops. Pascal and its ilk (Fortran, Algol, Basic and others) concentrate on the counter loop cases: an explicitly named variable is assigned successive values in a given arithmetic sequence, usually on the lines of for i := 3 to 17 step 2, and may not be assigned to (changed) within the for block. Whether or not the parameters may be floats (non-integers), and how the for statement is constructed is of minor importance. This abstracts precisely the idea of a loop with a simple counter index, making things very clear to people or compilers reading the code. It's not very general, though.

C-style for abstracts some other properties of for loops, namely that they have some initialisation before the first step, some "while" condition, and some simple way of proceeding to the next step. If it is a counter loop, it will just come out as for (i = 3; i < 17; i += 2), but no variable is assigned special significance or properties; all three parts of the statement could refer to different variables, or whatever. For the permutation example above, the for statement might be for (p = IdPerm; p != LastPerm; p = next_perm(p)) (assuming you had all the types and constants defined, and next_perm the function to choose the next permutation). This flavour of for is certainly very flexible. The values of the loop variable can be strings or integers or some user-defined type, they advance by whatever method is chosen, and there needn't even be one specific for variable. Yet the information on the basic framework of iteration is right there in the three-part for. Possibly, though, it's marginally easier to compile optimised code for a counter-style loop in Fortran than it is in C, because more restrictive syntax means more knowledge of what's about to happen (e.g. to the loop variable). I doubt anyone takes this very seriously nowadays.

foreach

Many shells (e.g. csh) and some real languages (e.g. Perl) offer a foreach command. This is not like a for command at all, since there is no rule specified to advance from one loop variable value to the next. Instead, an entire list or array of values is given, and the loop variable is assigned each of them in turn. This is sometimes a useful shortcut when you're crunching text (and anyway you get your values as lists from globbing or greping or other manipulations), but for long or hard-to-generate lists it's no good at all.

Encapsulating complex iterations -- OO iterators

Flexible as it is, the C-style for sort of assumes all three parts of the for statement are very brief. What if the loop "step" isn't ++i (increment i), but instead requires several lines of code? You could encapsulate that functionality in a function (as in the next_perm example), but then with complex data you might end up with an initialisation function, a condition function, and a step function; as usual, you lose the emphasis. Object oriented programming philosophy would deem this wasteful clutter.

C++ and the Standard Template Library that accompanies it use the C-style for, but define iterator objects which follow a given protocol; in particular, iterators have a ++ operator and a dereferencing operator, so you can do for ( p = start; p != end; p++ ), where p is of your very own class T, using whatever's defined in operator++ to advance. All of this will work for any iterator, so as long as p is declared of templated class T, the same for loop will handle all different iterators.

Python recently took this a step forward (mostly by exposing some internals). Iterators in Python are not just some things you can use in a for loop; they are the only objects acceptable. Python gives up the three-part for, but doesn't turn it in for a counter loop. Instead, the syntax is just

for i in values:
  ... whatever ...

Where values is any iterable object (basically, an object from which you can obtain an iterator whenever you like). The whole chore of encapsulating the iteration framework is removed from the for loop: the type or class of values is responsible for that. The loop variable is named explicitly (though Python itself makes no special use of this). A list, for example, is an iterable object; it iterates the way you'd expect, by returning the next item in the list every time. Other classes may return any values: aside from a small technicality, the only requirement of an iterator is that it must support the next() method; the for command above simply does (internally) i = values.next() whenever it loops round. Whenever the iterator is done (having checked whatever condition is dear to its heart), it raises an exception, which the for catches, causing it to exit the loop.

values could be just a list, given explicitly or generated by something like range(100) to do i=0, i=1, ..., i=99. For a true counter loop (without the wasteful pre-generated list), Python provides the xrange iterator; xrange(100) will supply the values 0, 1, ..., 99, but without ever generating the entire list. Other iterable objects, of built-in or user-defined classes, provide the for loop with the classical three parts of the for loop, but bundle them away together so the detailed workings don't have to be written out again and again. The loop over permutations which we've been considering might be written as for p in perm_range(perm1, perm2):. Note that here the perm_range object is a single object which can be passed around, and used by any function which uses the iterator protocol, whether or not it was designed with perm_range in mind. For instance, Python converts any iterable object into a list by looping over its iterator, and appending each value to the previous ones.

Incidentally, once you have iterators, it's easy to write operators to combine them into more complex iterators, or to transform them in various ways. Python offers the latter in map, filter, and list comprehension.

Infinity bottles of beer on the wall in c++	Learn to Program: Loops	Ripping off porn sites	iterator
Smallest number greater than 0	Curly brace family of programming languages	foreach	6.001 Spellbook
self-extracting executable	while loop	Computerized replacement dog brain	Superstring Theory
goto	infinite loop	Fermat's Last Theorem	bounded iteration
cyclomatic complexity	loop	bubble sort	Scheme
iterable object	Imperative Programming	list comprehension	I rewrote the world