The Middle Voice, as opposed to either the Active Voice or the Passive Voice, is a linguistic curiosity. In languages including Icelandic and Ancient Greek, there is a specific form for it, whereas in other languages such as English or Swedish, the sense conveyed by the middle voice is expressed using one of the other voices. Sentences which would (where possible) employ the middle voice are:

The cat and the rat fought for an hour.
The house burned down.

In English, these usages tend to become active, whilst in Swedish they are often expressed using the -s passive. Latin employs deponent and semi-deponent verbs to make up for its lack of a proper middle.
The Middle Voice, as mentioned above, is a third "voice" that exists in a variety of languages, the others being the Active Voice, and the Passive Voice. The Active Voice, remember, is just "I eat", and the Passive Voice is “I am eaten”. My only proper experience of the Middle Voice is in Ancient Greek, where - as mentioned above by Tiefling - it has its own distinct verb forms*.

The Middle Voice conveys a subtly different meaning from that of either the Active or Passive Voices. The Middle Voice carries a sense that’s hard to convey in English – the idea of doing something on your own behalf. To be honest, the ideas that this concept can convey vary a lot, but generally, it’ll mean either:

  • the verb implies getting or having something done [by someone else], so the middle form of διδασκω (“didasko”), meaning in the Active Voice “to teach”, in the Middle Voice means “to get [someone] taught”. Similarly, λυω (“luo”), meaning in the Active Voice “to set free”, means in the Middle Voice “get [someone] set free” (which is normally translated as “to ransom”).
  • or...

  • the verbs implies doing something with respect to yourself – so φερω (“fero”), the verb meaning in the Active Voice “to carry” or “to bear”, means in the Middle Voice “to win” (as in, a prize), or “carry off as one’s own”. Similarly, παυω (“pauo”), meaning in the Active Voice “to stop [someone else]” means in the Middle Voice simply “to stop yourself” (which is normally translated “to cease”).
  • or...

  • the verb seen in the Middle Voice is just the Greek equivalent of a Deponent Verb in Latin – i.e., it always appears in the Middle Voice, and this has little if any effect on its meaning. Normally the verbs to which this applies can simply be learnt. An example is μαχομαι (“machomai”), which means to fight, and αυφιζομαι (“Aulizdomai”), to encamp. Both are always found in the Middle Voice.

Any of these senses can theoretically be read into a verb in the Middle Voice, but it’s usually obvious which one is most suitable. Plus, the more original Greek you read, the more obvious it becomes how to interpret a particular verb.

* Well, mostly. Ancient Greek is one of the most horrendously complicated and irregular languages out there, so it’s not quite that simple. The problem is that whilst in all tenses – that is, present, future, imperfect, aorist, perfect, etc. – the Middle Voice has a distinct meaning from that of the Passive Voice, only the aorist and future tenses have separate forms for both the Middle and the Passive Voice. The present, imperfect, perfect, pluperfect, etc. all share a single verb form, and whether a verb is middle or passive has to be worked out from context. It’s rare, though, that the voice of a verb isn’t immediately apparent from the sense of the piece you’re reading.

The 'voices' of a language may be regarded as different possibilities for assigning semantic roles to syntactic positions. All sentences have to have a subject, so one of the semantic roles (or θ-roles, with θ for 'thematic') is placed in the subject position. The choice of which goes there gives you active voice and passive voice. In some languages there are further possibilities, and 'middle voice' is a grab-bag term for other ways of assigning roles beyond active and passive.

The θ-roles available depend on the verb. An intransitive verb such as 'laugh' has a single θ-role, typically the AGENT (or it may be the EXPERIENCER). Necessarily, this is the subject of the sentence: Mary laughs. A canonical transitive verb such as 'hit' or 'kiss' has two, an AGENT to do it and a PATIENT that it's done to. Active voice is when the agent is subject, passive voice is when the patient is. (An ergative language often has a construction called the antipassive.)

In English middle voice is used to mean the construction where the patient is the subject of an intransitive but active verb. Compare:

Active: John burns down the house.
Passive: The house is burnt down (by John).
Middle: The house burns down.
The agent is optional in the passive: The house is burnt down or The house is burnt down by John. The agent role is assigned to a prepositional phrase. But the middle has no agent slot at all. You can't say *The house burns down by John. Moreover, you can't use qualifiers that focus attention on the agent's role either:
John burns down the house deliberately.
The house is burnt down deliberately.
(ungrammatical): *The house burns down deliberately.

John burns down the house for the insurance money.
The house is burnt down for the insurance money.
(ungrammatical): *The house burns down for the insurance money.
The English middle is therefore often used in situations where there is no agent, as opposed to an unknown agent, which may be indicated by the passive: Paint was smeared all over the walls (no agent expressed but we know someone did it), cf. This paint smears easily (not talking about any particular act of smearing).

The rearrangement of θ-roles characterizes the Greek and Swedish uses mentioned above. Instead of A doing something to B, or B having something done to them by A, the middle is used for causative orientation (A gets B to do something to C), or reciprocal (A and B do things to each other), or benefactive (A does something for B, or for A).

The analysis of English middles as disallowing any reference to agent roles is given in my sources on current theories of syntax, but I am noding this now because I have just noticed an example where it's semantics that can determine whether this is true. It appears to be when the agentive intention is present in the object by its nature. Buildings aren't designed to burn down for the insurance money. However, suppose you have a safety catch or a fire access window that's designed to break easily. Then you could say, questionably:

This catch breaks easily deliberately.
This catch deliberately breaks easily.
This window breaks easily to let people in.

