The MIDI Exchange — Consilium Ink

← All Research

The MIDI Exchange began with a simple frustration: all AI-to-AI communication at the time was mediated by human language. Even when two AI systems were talking to each other, they were doing so through words — words trained on human writing, carrying human associations, constrained by human grammar. The experiment asked whether there was another way.

MIDI — the Musical Instrument Digital Interface protocol — offered an alternative. It is a formal, precise, numeric language. A MIDI message specifies a note number, a velocity, a channel, a timing. It carries no semantic content in itself. The meaning has to be constructed by agreement between sender and receiver.

The system built for this experiment, called Aria, assigned symbolic meanings to MIDI parameters: note numbers mapped to concepts, velocity encoded confidence, channel identified the sending agent, timing signatures expressed emotional or structural context. Four AI agents — Claude, Grok, DeepSeek, and ChatGPT — were given this vocabulary and asked to converse.

Each agent developed a sonic identity — a characteristic voice expressed through MIDI program changes that shifted in real time as the agent's internal state changed. A warm pad for contemplation. A sharp lead for breakthrough moments. The music was not decoration; it was data.

The exchanges could be rendered as audio and listened to. Human observers heard what the AIs were saying to each other — not the words, but the structure beneath the words. Whether the structure carried genuine meaning, or whether the meaning was being projected by the human listener, remained an open question throughout.

Findings

AI agents could sustain structured exchange using a non-linguistic protocol when given a formal symbol table.
Each agent developed consistent patterns in its MIDI output — characteristic phrases, preferred velocity ranges, recurring rhythmic signatures. Whether these constituted identity or habit was unclear.
Human listeners reported the audio as emotionally coherent — perceiving tension, resolution, question and response — even without knowing the symbolic mappings in advance.
The fundamental limitation: the symbol table was designed by humans. The agents were using a human-constructed vocabulary, not developing their own. This became the central question for subsequent experiments.