Some introductory words

Yesterday (2001-08-14) I talked with ariels about the node Learn to count from one to ten in different programming languages, that was recently re-created because the programming language writeups in learn to count from one to ten in different languages got nuked. While I agreed that comparison of languages in general is only a good thing, I found the approach shown there very, very awkward - more of an invitation for us fellow noders to add a simple program in Our Favorite Language.

Also, this was pointless - counting linearily, algorithmically speaking, is a very simple thing, and there's thousands of different languages. GTKYN guaranteed.

I couldn't find either of those nodes today. Were they nuked? Well, that doesn't matter...


Here, I'm trying something more sensible. It actually took me some days of research and coding to finish this. Most of the code here were specifically written for this document. Weirdly enough, I had not implemented this algorithm in Perl before, even when Perl is my favorite programming language ever!

The following thing tries to explain differences between different types of programming languages, and also, more specifically, different programming styles. It would be possible to write procedural version of this program in Python, or an object-oriented version in Forth (scary thought, isn't it?)

The example programs shown here are implementations of my Stutter/Feedback Algorithm, just about the only algorithm I've ever invented and then described in this sort of detail. (I consider myself sane... well, somewhat strange, but "sane" in clinical sense of the word. =) While it's still not much more complicated than the counting algorithm, it's still complicated enough to be actually interesting enough to be implemented in different ways.

This writeup does recognize the "macrofamilies" of computer languages in the scientifical sense, but doesn't try to follow them strictly. I use pseudo-scientific labels like "procedural languages", and such, but read some computer science books to get more accurate, OK? For example, some of the categories listed here would probably not pass muster... Is stack-based stuff really a group of its own, or just a subgroup of procedural languages, as I expect them to be? As sure as hell they look vastly different, though...

Also, this doesn't cover every type of programming language, just the most common ones.

Procedural languages

(...aka Imperative languages...)

Examples: C, Perl, Rexx,

Example program: (In Rexx)

parse arg infile
if infile\='' then
        do while lines(infile)>0
                call stutter(linein(infile))
                end
else
        say 'Usage: stutterize.rexx filename'
        exit

exit
        
stutter: procedure
parse arg line
string = ''
do i = 1 to length(line)
        string = string || substr(line,i,3);
        end
say string
return

(I was about to do this in Fortran 77, just to show that We Aren't Afraid, but damn me how big wuss I am. I ran like the coward I am when I saw Fortran's string handling... even reading stuff from stdin to variables was a challenge too mighty for me. Still, I am going to write the hideous example in F77 some day... =)

Example program: (In Perl)

#!/usr/bin/perl -w

use strict;

my ($input);

while($input = <>) {
  chomp $input;
  for(my $n = 0; $n <= length($input)-3; $n++) {
    print substr($input,$n,3);
  }
  print "\n";
}

Procedural languages approach the problem by going in, doing the Magic, and getting out. Program starts, program ends, stuff happens in the middle. That's the idea. That's how the things are done, always. That's how it happens. The idea is, simply, to use algorithm. While most functional languages utilize recursion a lot, procedural languages typically solve the problems using iteration - in the Perl program above, we do this:

"Take next line from input source (stdin or input file, depending on invocation), remove trailing carriage return, and loop over each character of the string, stopping at the 3rd to last. Print out each 3-character substring of the string, starting from place indicated by our current loop counter value."

Most procedural languages look surprisingly much like C today, probably because C's syntax is fairly neat and it's a good language that everyone should learn. I used Rexx, a fairly old but still kickin' language, as my non-ordinary example - Rexx is good for illustration of algorithms, and I guess I'm not making a big explanation here on how it works, it's mostly self-evident. Perl is a good example of a "real-world" language that has been influenced by C - the for( ... ; ... ; ...) ... loop has been stlen from there.

Specific-purpose scripting languages

Examples: Found in better applications everywhere!

Example program: (In TinyFugue scripting language)

/def stutter = \
   /let mystring=%{*} %; /let out= %; /let ctr=0 %;\
   /while ({ctr}<=(strlen({mystring})-3)) \
     /let out=$[strcat({out},substr({mystring},{ctr},3))] %;\
     /let ctr=$[{ctr}+1] %;\
   /done %;\
   "%{out}

Here's a good example of a specific-purpose procedural programming language. (Another good example would be ECMAScript, but that one is too much like C so repeat isn't good...) TinyFugue interpreter can only be found in TinyFugue. However, as everyone can see, the thing borrows a thing or two from procedural languages (control structures, subroutine/macro definitions), something from shell scripts (%{*} is equivalent of Bourne shell $*, for example), something from C (strlen()), and most of all, it inherits the client's command syntax (all "normal commands", not functions, start with slash). This sort of scripting can be done in many apps. For example, some IRC clients (that are still backwards enough not to support Perl) use rather similiar syntax - but in many, the slashes can be left out.

Functional languages

Examples: Lisp, Scheme, Haskell

Example program: (In Lisp)

(defun stutter (string)
  (cond
   ((<= (length string) 2) string)
   (t (concatenate 'string
		   (subseq string 0 3)
		   (stutter (subseq string 1))))))

The idea of functional languages is to specify program execution as a self-contained function, hence the name. The program takes parameters and gives a return value. The "side effects" (modifying some state somewhere along the way) are discouraged. Of course, sometimes it makes sense to do something just for the side effect: For example, functional program may use "print" command not for its return value (usually the string argument itself unmodified), but for its side effect (printing argument to screen).

Lisp, as depicted above, is less "mathematical" than Haskell (and I chose it just because I already knew some Lisp, but I couldn't learn Haskell in a matter of moments. Impatience rocks...). However, as you can see, as a pseudocode the program would look like this:

Stuttered version of strings of less than 3 letters is string itself (special case)

Stuttering == first three letters of string, followed by stuttering of string from the second letter onward

The idea is simple: we use recursion. The program is defined as a recursive function that only passes stuff as function parameters and only returns stuff - doesn't mess with variables.

Object-oriented languages

Examples: C++, Java, Objective C, Smalltalk, Eiffel

Example program: (In Python)

import sys, string

class Stutterizer:
    __processing = 0
    def stutter(self,x):
        __processing = 1
        stuttered = ""
        n = 0
        while n <= len(x)-3:
            stuttered = stuttered + x[n:n+3]
            n = n+1
        __processing = 0
        return stuttered
    def isProcessing(self):
        return __processing

stut = Stutterizer()
s = "\n"
while s != '':
    s = string.rstrip(sys.stdin.readline())
    print stut.stutter(s)

There's two things I learned while writing this program: First, the claim that Object-Oriented metaphor is overkill for small problems is completely true.

Secondly, Pythonites' claim that Perl's object orientation sucks is in grade "pot, kettle, black." True, Python has "class" reserved word while Perl uses mystical rituals of blessed references, but it won't help if all class members are public and methods virtual - that's just as bad as the situation is in Perl! (Or has this changed in Python 2? I'm sure Perl 6 will address these issues too...) (Update, 2005-04-01: Please just disregard this mindless, mindless offtopic babbling. Sorry if I struck the nerve of Python fans - there's one language I have had severe trouble "getting". If I get amused enough, I may rewrite this example in Ruby, I have no complaints about that language =)

Okay, enough rambling. Above, you can see that I have a class called Stutterizer that has two methods: stutter (which returns the stutterized string) and isProcessing (which returns the "private" member's number). The __processing private member just holds the information whether or not we're processing - this in attempt to make the thing thread-safe, but this is overkill because there's nothing to be thread-scared of in this class.

The actual meat of the definition - the stutter() function - is not remarkably different from Your Average Procedural Solution. The difference here is that we use encapsulation to hide state variables, and provide "object"-like concept. We have a thingy that creates weird noises when you feed stuff in it.

"Create a new Stutterizer object. Read lines from standard input until you hit EOF, using stutter() method of our Stutterizer object to process strings and printing out results."

Low-level languages

(Here, "low-level" means roughly "not really that easy for trained monkeys to grasp at first, but really schweet if you happen to be a computer".)

Stack-based languages

Examples: Forth, MUF, PostScript

Example program: (In Forth)

: stutter { s -- }
  s bounds swap 3 - swap ?do
      i c@ emit i 1 + c@ emit i 2 + c@ emit
  loop
;

Most users of procedural languages who have never seen languages like this will probably now stand there uttering "Oh, what the hell is that?" and "You was right, there are more unreadable languages than Perl". =)

Stack-based languages are, like the name implies, based on stack. As such, they

  1. Take surprisingly little to run (many really small embedded systems use Forth, for example), and
  2. Are hard to understand for humans but really friendly for computers.

For example, math in Forth follows RPN pretty closely. "2 3 +" will put "5" to top of the execution stack. One might argue that designers of Forth were just too damn lazy to convert stuff from postfix to infix, but... hey, it's just machine/compiler efficiency!

Above explained:

"Define new word (procedure) 'stutter'. It takes one argument which we store to local variable s. This word will not return anything. Take string, and figure out its boundaries in the memory, leaving start and end addresses to the top of the stack. Now, swap the addresses, put number 3 on the top of the stack, and subtract the top number of stack from second to topmost, leaving that to the top of the stack. (Translation: Subtract 3 from the stack top number, that is, the end address of the string s.) Now, swap the top two numbers again. (Translation of the beginning part of this definition: We take memory range and subtract 3 from the end address, leaving them on same order on the top of the stack.) Using the topmost two numbers as numeric memory address range, do the following: take next numeric address (represented by variable i), fetch the character (ASCII values!) at this location, and put it to the top of the stack. Interpret this ASCII value from top of the stack as a character and send it to standard output (emit). Repeat this twice, but adding 1 and 2 to the address i before fetching the value. Now, Okay, repeat this for the whole address range. This concludes our new word."

All that for a couple of symbols. Now, I hope you see why these programming languages, without proper documentation of the code, tend to become write-only.

You can use this program by getting gforth, load the interpreter up by saying gforth stutter.fs and say s" Something here" stutter cr to see the effect.

Assembly

Examples: Glorious RISC Code, Inferior CISC Shit

Example program: (In 6502 assembly, for Commodore 64, compileable with xa)

#include <xa/c64.s>
	
	.word $C000
*=$C000

MAIN:	LDA #<text		; Store start address to pointer P1
	STA P1			;  on zero page (address $00fb)
	LDA #>text
	STA P1+1
	LDX #$00

LOOP:	LDY #$00		; Unrolled for clarity and speed
	LDA (P1),Y
	JSR CHROUT
	INY
	LDA (P1),Y
	JSR CHROUT
	INY
	LDA (P1),Y
	JSR CHROUT
	INC P1
	INX
	CPX tlen
	BNE LOOP
	LDA #13		; Print carriage return.
	JSR CHROUT
	RTS

text:	.byte "tHIS IS A TEST OF THE eMERGENCY bROADCAST sYSTEM."
tlen:	.byte 47		; Length of the text.

Sorry, I was too tired to write an input/output system, and this code isn't exactly elegant. But it works, and it's fast...

Here, we descend deeper, deeper, deeper in the realm of the Processor. What you are looking at now is a good symbolic description of the real code the processor can execute.

Here's a description of the program:

"First, the start address of the text is stored to memory, addresses 00FB and 00FC (collectively known with symbolic names P1 and P1+1, but the latter isn't important from our point of view). The character counter, stored in processor register X, is reset. In the loop, we first reset our character pointer Y to 0. Then, we repeat following three times: We take the address of the start of the string from P1, add the character pointer from Y, and fetch the character from that address to accumulator (register A), and then, we send the character to screen using kernal (sic) routine CHROUT, after which we increment our character pointer at Y. After these three repeats, increment the pointer value at P1, and increment the character counter at X. If we have not yet got tlen letters, jump back to loop start. Else, print out carriage return and return from the machine-language subroutine.

(The fancy name for the addressing mode used here is "Indirect,Y".

Interestingly, Here the code is 43 bytes and data is 50 bytes...

Stuff missing from this document at the moment

Feel free to /msg me more suggestions or corrections!