“You Are My Friend” Early Androids and Artificial Speech
Centuries before audio deepfakes and text-to-speech software, inventors in the eighteenth century constructed androids with swelling lungs, flexible lips, and moving tongues to simulate human speech. Jessica Riskin explores the history of such talking heads, from their origins in musical automata to inventors’ quixotic attempts to make machines pronounce words, converse, and declare their love.
May 29, 2024
The word “android”, derived from Greek roots meaning “manlike”, was the coinage of Gabriel Naudé, French physician and librarian, personal doctor to Louis XIII, and later architect of the forty-thousand-volume library of Cardinal Jules Mazarin. Naudé was a rationalist and an enemy of superstition. In 1625 he published a defense of Scholastic philosophers to whom tradition had ascribed works of magic. He included the thirteenth-century Dominican friar, theologian, and philosopher Albertus Magnus (Albert the Great), who, according to legend, had built an artificial man made of bronze.1
This story seems to have originated long after Albert’s death with Alfonso de Madrigal (also known as El Tostado), a voluminous commentator of the fifteenth century, who adapted and embellished the tales of moving statues and talking brazen heads in medieval lore.2 El Tostado said that Albert had worked for thirty years to compose a whole man out of metal. The automaton supplied Albert with the answers to all of his most vexing questions and problems and even, in some versions of the tale, obligingly dictated a large part of Albert’s voluminous writings. The machine had met its fate, according to El Tostado, when Albert’s student, Thomas Aquinas, smashed it to bits in frustration, having grown tired of “its great babbling and chattering”.3
Naudé did not believe in Albert’s talkative statue. He rejected it and other tales of talking automaton heads as “false, absurd and erroneous”.4 The reason Naudé cited was the statues’ lack of equipment: being altogether without “muscles, lungs, epiglottis, and all that is necessary for a perfect articulation of the voice”, they simply did not have the necessary “parts and instruments” to speak reasonably.5 Naudé concluded, in light of all the reports, that Albert the Great probably had built an automaton, but never one that could give him intelligible and articulate responses to questions. Instead, Albert’s machine must have been similar to the Egyptian statue of Memnon, much discussed by ancient authors, which murmured agreeably when the sun shone upon it: the heat caused the air inside the statue to “rarefy” so that it was forced out through little pipes, making a murmuring sound.6
Despite disbelieving in Albert the Great’s talking head, Naudé gave it a powerful new name, referring to it as the “android”.7 Thus deftly, he smuggled a new term into the language, for according to the 1695 dictionary by the French philosopher and writer Pierre Bayle, “android” had been “an absolutely unknown word, & purely an invention of Naudé, who used it boldly as though it were established.”8 It was a propitious moment for neologisms: Naudé’s term quickly infiltrated the emerging genre of dictionaries and encyclopedias. Bayle repeated it in the article on “Albert le Grand” in his dictionary.9 Thence, “android” secured its immortality as the headword of an article — citing Naudé and Bayle — in the first volume of the supplement to the English encyclopedist Ephraim Chambers’ Cyclopaedia.10 In denying the existence of Albert’s android, Naudé had given life to the android as a category of machine.
But the first actual android of the new, experimental-philosphical variety for which the historical record contains rich information — “android” in Naudé’s root sense, a working human-shaped assemblage of “necessary parts” and instruments — went on display on February 3, 1738. The venue was the opening of the annual Saint-Germain fair on Paris’ Left Bank. This android differed crucially from earlier musical automata, the figures on hydraulic organs and musical clocks, in that it really performed the complex task it appeared to perform, in this case, playing a flute, rather than merely making some suggestive motions. The device was, in this sense, a novelty, but it must have looked familiar to many of the fairgoers, being modeled on a well-known statue that stood in the entrance to the Tuileries Gardens and that is now at the Louvre Museum: Antoine Coysevox’s Shepherd Playing the Flute.
Like the statue, the android represented a faun, half man and half goat. The mechanical faun, like the marble one at the Tuileries, held a flute. The second faun, though, became suddenly animate and began to play its instrument, executing twelve tunes in succession. At first, skeptical spectators were persuaded this must be a music box, with an autonomous mechanism inside to produce the sound, while the external figure merely pretended to play. But no, the android actually did play a real flute, blowing air from its lungs (three sets of bellows) and exercising flexible lips, a supple tongue, and soft, padded fingers with a skin of leather. It was even reported that one could bring one’s own flute, and the machine would oblige by playing that one too.11
The flute-playing android was the work of an ambitious young engineer named Jacques Vaucanson. The last of ten children of a Grenoble glove maker, Vaucanson had been born in the bitterly cold winter of 1709, at the waning of Louis XIV’s long reign, in the midst of a terrible famine and the bloodiest year of a war that France was losing. Emerging from this dark moment, Vaucanson’s life and the Enlightenment would take shape in tandem, and his work would become a point of reference for the world of letters.
As a child, he had liked to build clocks and repair watches. While a school boy, he had begun designing automata. After a brief stint as a novice in Lyon, ending when a church dignitary ordered Vaucanson’s workshop destroyed, he had come to Paris at the age of nineteen to seek his fortune. Thinking he might train as a doctor, he had attended some courses in anatomy and medicine, but had soon decided to apply these studies to a new area of research: re-creating living processes in machinery. The Flutist was the result of five years’ labor.12 When it was finished, Vaucanson submitted a memoir explaining its mechanism to the Paris Academy of Sciences. This memoir contains the first known experimental and theoretical study of the acoustics of the flute.13
Following an eight-day debut at the Saint-Germain fair, Vaucanson moved his android to the Hôtel de Longueville, a gilded hall in a grand sixteenth-century mansion at the center of the city. There it attracted about seventy-five people a day, each paying a hefty entrance fee of three livres (roughly an average week’s wages for a Parisian worker). Among its audience were the members of the Paris Academy of Sciences, who traveled as a body to the Hôtel de Longueville to witness the android Flutist.14 Greeting his public in groups of ten or fifteen, Vaucanson explained the Flutist’s mechanism and then set it to play its concert.
The reviews were effusive. “All of Paris is going to admire . . . the most singular and agreeable mechanical phenomenon perhaps ever seen”, wrote one reviewer, emphasizing that the android “really and physically plays the flute”.15 The music-making statue, another agreed, was “the most marvelous piece of mechanics” that had ever been.16 The abbé Pierre Desfontaines, a journalist and popular writer, advertising Vaucanson’s show to readers of his literary journal, described the insides of the Flutist as containing “an infinity of wires and steel chains . . . [which] form the movement of the fingers, in the same way as in living man, by the dilation and contraction of the muscles. It is doubtless the knowledge of the anatomy of man . . . that guided the author in his mechanics.”17 In the article “Androïde” in the monumental Encylopédie, a universal compilation of knowledge edited by the philosopher and writer Denis Diderot and the mathematician and philosopher Jean d’Alembert, Vaucanson’s mechanical Flutist became the paradigm of an android. The article, written by d’Alembert, defines an android as a human figure performing human functions, and virtually the whole piece is devoted to the Flutist.18
Soon after the Academy of Science members came to the Hôtel de Longueville, Vaucanson returned the visit to read a memoir on the design and function of his Flutist.19 The android’s mechanism was moved by weights attached to two sets of gears. The bottom set turned an axle with cranks that powered three sets of bellows, leading into three windpipes, giving the Flutist’s lungs three different blowing pressures. The upper set of gears turned a cylinder with cams, triggering a frame of levers that controlled the Flutist’s fingers, windpipe, tongue, and lips. To design a machine that played a flute, Vaucanson had studied human flute players in minute detail. He had devised various ways of transmitting aspects of their playing into the design of his android. For example, to mark out measures he had had a flutist play a tune while another person beat time with a sharp stylus onto the rotating cylinder.20
The following winter, Vaucanson added two more machines to the show. One was a second android musician, a life-size Provençal shepherd that played twenty minuets and other dance tunes on a pipe grasped in its left hand while accompanying itself with its right on a drum slung over its shoulder.21 The pipe had only three holes, which meant that the notes were produced almost entirely by the player’s variations of blowing pressure and tongue stops. Working to reproduce these subtleties in his automaton, Vaucanson found that human pipers employed a much greater range of blowing pressures than they themselves realized. The Piper also yielded another surprising discovery. Vaucanson had assumed that each note would be the product of a given finger position combined with a particular blowing pressure, but he discovered that the blowing pressure for a given note depended upon the preceding note, so that, for example, it required more pressure to produce a D after an E than after a C, obliging him to have twice as many blowing pressures as notes.22 The higher overtones of the higher note resonate more strongly in the pipe than the lower overtones of the lower note; but pipers themselves were not aware of compensating for this effect, and the physics of overtones was explained only in the 1860s by Hermann von Helmholtz.23
The android musicians did not just make music, a feat that music boxes had achieved for more than two centuries, but they did so using flexible lips, moving tongues, soft fingers, and swelling lungs. They were simulations of the human process of making music, and as the century wore on, the designers of such simulations turned toward the even more complex task of making machines that could mimic human speech.
In 1739, a year after Vaucanson’s duck made its public debut, a surgeon named Claude-Nicolas le Cat published a description, now lost, of an “automaton man in which one sees executed the principal functions of the animal economy”, circulation, respiration, and “the secretions”.24 It is not clear what became of this early project, but Le Cat returned to the idea in 1744 when, according to the proceedings of the Académie de Rouen, he read a sensational memoir there. A great crowd was assembled to hear it, and one witness reported, “Monsieur Le Cat told us of his plan for an artificial man . . . . His automaton will have respiration, circulation, quasi-digestion, secretion and chyle, heart, lungs, liver and bladder, and God forgive us, all that follows from it.”25
Le Cat’s automaton man was to have “all the operations of a living man”, including not only “the circulation of the blood, the movement of the heart, the play of the lungs, the swallowing of food, its digestion, the evacuations, the filling of the blood vessels and their depletion by bleeding”, but also — apparently crossing the Cartesian boundary between mechanical body and rational soul — “even speech and the articulation of words”.26
This idea, the possibility of simulating articulate speech, had generated a tradition of philosophical discussion over the preceding century. If some continued to find it a quixotic notion, it was in fact literally so: when Don Quixote himself encounters a talking bronze head (connected to a hidden human being), he is fully captivated by it, though his less suggestible squire, Sancho Panza, is unimpressed by its conversation.27 Cervantes’s contemporary, the Spanish writer on magic, Martín del Río, also found it unreasonable to suppose “that an inanimate thing should produce the human voice and give answers to questions. For this requires life, and breath, and a perfect cooperation of the vital organs, and some discursive ability in the speaker.”28
Several decades later, some if not all the items on del Río’s list seemed possibly achievable in an artificial machine. Athanasius Kircher wrote in 1673, with regard to the legends of Albert the Great’s talking head and the ancient Egyptian speaking statues, that while certain skeptics believed these devices must have been “either non-existent or fraudulent or constructed with the help of the devil”, many others believed it was possible to build such a statue having throat, tongue and other organs of speech that would emit an articulated voice when it was activated by wind. Kircher included a sketch of a design for a talking figure.29 His student, Gaspar Schott, also a prolific natural philosopher and engineer, adopted the same attitude, even alluding to a question-answering statue that Kircher was building for Queen Christina of Sweden.30 No doubt the queen’s previous philosophy teacher, Descartes, had interested her in the relations between rational speech and a mechanical body.
Although the idea of simulated speech was not new, around the middle of the eighteenth century, experimental philosophers and mechanicians took a renewed interest in it. They assumed that speech was a bodily function akin to respiration or digestion — they did not explicitly distinguish the rational from the physiological aspects of speaking — and even the skeptics expressed their skepticism in connection with physiological details rather than principled objections. In his effusive review of Vaucanson’s Flutist in 1738, for example, the abbé Desfontaines predicted that articulate speech could never be produced in artificial machinery because the bodily process of speaking would remain impenetrably mysterious: one could never know precisely “what goes on in the larynx and glottis . . . [and] the action of the tongue, its folds, its movements, its varied and imperceptible rubbings, all the modifications of the jaw and the lips.”31 Speaking was an essentially organic process, Desfontaines reckoned, and could only take place in a living throat.
Desfontaines was not alone in this belief: in this period, skeptics about the possibility of artificial speech generally argued that the human larynx, vocal tract, and mouth were too soft, supple, and malleable to be simulated mechanically. Around 1700, Denys Dodart, personal physician to Louis XIV, presented several memoirs to the Paris Academy of Sciences on the subject of the human voice, in which he argued that the voice and its modulations were caused by constrictions of the glottis, and that these were “inimitable by art”.32 The writer and academician Bernard le Bovier de Fontenelle, who was then Perpetual Secretary of the Academy, commented that no wind instrument produced its sound by such a mechanism (the variation of a single opening) and that it seemed “altogether outside the realm of imitation. . . . Nature can use materials that are not at all at our disposal, and she knows how to use them in ways that we are not at all permitted to know.”33
A last skeptic citing material difficulties was the philosopher and writer Antoine Court de Gébelin, who observed that “the trembling that spreads to all the parts of the glottis, the jigging of its muscles, their shock against the hyoid bone that raises and lowers itself, the repercussions that the air undergoes against the sides of the mouth . . . these phenomena” could only take place in living bodies.34 On the other hand, there were plenty who disagreed. For example, the polemical materialist Julien Offray de La Mettrie took a look at Vaucanson’s Flutist and concluded that a speaking machine “could no longer be regarded as impossible”.35
During the last three decades of the century, several people took up the project of artificial speech. All of them assumed that the sounds of spoken language required a structure as similar as possible to the throat and mouth. This assumption, that a talking machine required simulated speaking organs, had not always dominated thinking about artificial speech. In 1648, John Wilkins, the first secretary of the Royal Society of London, had described plans for a speaking statue that would synthesize, rather than simulate, speech by making use of “inarticulate sounds”. He wrote, “We may note the trembling of water to be like the letter L, the quenching of hot things to the letter Z, the sound of strings, to the letter Ng [sic], the jirking of a switch to the letter Q, etc.”36 But in the 1770s and 80s, builders of speaking machines mostly assumed that it would be impossible to create artificial speech without building a talking head: reproducing the speech organs and simulating the process of speaking.
The first to attempt such a machine was the English poet and naturalist Erasmus Darwin (grandfather of Charles Darwin) who in 1771 reported that he had “contrived a wooden mouth with lips of soft leather, and with a valve over the back part of it for nostrils.” Darwin’s talking head had a larynx made of “a silk ribbon . . . stretched between two bits of smooth wood a little hollowed.” It said “mama, papa, map and pam” in “a most plaintive tone”.37
The next to simulate speech was a Frenchman, the abbé Mical, who presented a pair of talking heads to the Paris Academy of Sciences in 1778. The heads contained “several artificial glottises of different forms [arranged] over taut membranes”. By means of these glottises, the heads performed a dialogue in praise of Louis XVI: “The King gives peace to Europe”, intoned the first head; “Peace crowns the King with Glory”, replied the second; “and Peace makes the Happiness of the People”, added the first; “O King Adorable Father of your People their Happiness shows Europe the Glory of your Throne”, concluded the second head.38
The Paris gossip and memoirist Louis Petit de Bachaumont noted that the heads were life-size, but covered tastelessly in gold. They mumbled some words and swallowed certain letters; moreover, their voices were hoarse and their diction slow (and their conversation, he might have added, uninspiring).
Yet despite all this, they undeniably had “the gift of speech”. The academicians appointed to examine Mical’s talking heads agreed that their enunciation was “very imperfect” but granted their approval to the work anyhow because it was done in imitation of nature and contained “the same results that we admire in dissecting . . . the organ of the voice.” Bachaumont recorded that the academicians were so impressed with the abbé Mical that, on the occasion of the Montgolfière balloon demonstration at Versailles on September 19, 1783, in which a sheep, a rooster, and a duck became the world’s first aviation passengers, the six delegates from the Académie des sciences invited Mical to accompany their delegation and presented him to the king as the author of the celebrated talking heads.39
The following year, probably at the instigation of the mathematician Leonhard Euler, the Saint Petersburg Academy of Sciences sponsored a prize competition to determine the nature of the vowels and to construct an instrument like vox humana organ pipes to express them. C. G. Kratzenstein, a member of the Academy, won the prize. He used an artificial glottis (a reed) and organ pipes shaped according to the situation of the tongue, lips, and mouth in the pronunciation of the vowels.40
Several more people built talking heads before the turn of the century. Among them was a Hungarian engineer named Wolfgang von Kempelen who had been hired at the age of twenty-one by the Empress Maria Theresa to serve at the court of the Holy Roman Empire in Vienna. He had achieved fame in 1769 when, for the amusement of his patroness, Kempelen had built an android Turk that played an expert game of chess (by virtue of the expert human chess player cleverly hidden inside). A couple decades later, Kempelen set out to uncover the secret of articulate speech. In 1791, he published “a description of a speaking machine” in which he reported having attached bellows and resonators to musical instruments that resembled the human voice, such as oboes and clarinets; he had also tried, like Kratzenstein, modifying vox humana organ pipes.41 Through twenty years of such attempts, he had been sustained, he said, by the conviction that “speech must be imitable”. The resulting apparatus had bellows for lungs, a glottis of ivory, a leather vocal tract with a hinged tongue, a rubber oral cavity, a mouth whose resonance could be altered by opening and closing valves, and a nose with two little pipes as nostrils. Two levers on the device connected with whistles and a third with a wire that could be dropped onto the reed. These enabled the machine to pronounce liquids and fricatives: Ss, Zs, and Rs.42
This machine produced an empirical finding reminiscent of Vaucanson’s discovery that the blowing pressure for a given note depended upon the preceding note. Kempelen reported that he had first tried to produce each sound in a given word or phrase independently but failed because the successive sounds needed to take their shape from one another: “The sounds of speech become distinct only by the proportion that exists among them, and in the linking of whole words and phrases.” Listening to his machine’s blurred speech, Kempelen perceived a further constraint upon the mechanization of language: the reliance of comprehension upon context.43
Kempelen’s machine was only moderately successful. It reportedly prattled in a childish voice, reciting vowels and consonants. It pronounced words such as “Mama” and “Papa”, and uttered some phrases, such as “you are my friend—I love you with all my heart”, “my wife is my friend”, and “come with me to Paris”, but indistinctly.44 Today the machine resides at the Deutsches Museum in Munich, Germany. Kempelen and his supporters emphasized that the device was imperfect and explained that it was not so much a speaking machine in itself as a machine that demonstrated the possibility of constructing a speaking machine.45
After this flurry of activity in the 1770s, 80s, and 90s, there was a decline in interest in speech simulation. A few people over the course of the nineteenth century, including the inventors Charles Wheatstone and Alexander Graham Bell, built their own versions of Kempelen’s and Mical’s speaking machines and of other talking heads from an earlier period.46 But for the most part, designers of artificial speech turned their attention once again to speech synthesis rather than simulation: reproducing the sounds of human speech by other means rather than trying to reproduce the actual organs and physiological processes of speech.47
In 1828, Robert Willis — a professor of applied mechanics at Cambridge who had earlier rejected the possibility of the Chessplayer’s intelligence — wrote disparagingly that most people who had investigated the nature of the vowel sounds “appear never to have looked beyond the vocal organs for their origin”, apparently assuming that the vowel sounds could not exist without being produced by the vocal organs. In other words, they had treated the vowels as “physiological functions of the human body” rather than as “a branch of acoustics”. In fact, Willis argued, vowel sounds could perfectly well be produced by other means.48 Whether or not the vocal organs themselves could be simulated artificially became a separate question from whether the sounds of speech could be reproduced. As late as 1850, the French physiologist Claude Bernard wrote in his notebook: “The larynx is a larynx and the crystalline lens is a crystalline lens, that is to say their mechanical or physical conditions are realized nowhere but in the living organism.”49
Disenchantment with speech simulation was so deep that when a German immigrant to America named Joseph Faber designed quite an impressive talking head in the late 1840s, he could not get anyone to take any notice of it. Faber’s talking head was modeled on Kempelen’s and Mical’s, but was far more elaborate. It had the head and torso of a man once again dressed like a Turk, and inside were bellows, an ivory glottis and tongue, a variable resonance chamber, and a mouth cavity with a rubber palate, lower jaw, and cheeks. The machine could pronounce all the vowels and consonants, and was connected by way of levers to a keyboard of seventeen keys, so that Faber could play it like a piano. He first exhibited the machine in New York City in 1844, where it aroused very little interest. He then took it to Philadelphia where he had no better luck. P. T. Barnum found Faber and his talking head there, renamed the machine the “Euphonia”, and took them on tour to London, but even Barnum could not make a success of it. Finally the Euphonia was exhibited in Paris in the late 1870s, where it was mostly ignored, and soon thereafter all traces of it disappear.50
The moment for talking heads had passed. In the early part of the twentieth century, designers of artificial speech moved on from mechanical to electrical speech synthesis.51 The simulation of the organs and process of speaking — of the trembling glottis, the malleable vocal tract, the supple tongue and mouth — was specific to the last decades of the eighteenth century, when philosophers and mechanicians and paying audiences were briefly preoccupied with the idea that articulate language was a bodily function: that Descartes’ divide between mind and body might be bridged in the organs of speech.
Jessica Riskin is Frances and Charles Field Professor of History at Stanford University. Her teaching, research and writing focus on the history of modern science, ideas, culture and politics. She is the author most recently of The Restless Clock: A History of the Centuries-Long Argument Over What Makes Living Thing Tick and is currently writing a book about Jean-Baptiste Lamarck, the French naturalist who coined the term “biology” around 1800 and developed the first theory of evolution.
Excerpted and adapted from “The First Android”, a chapter from The Restless Clock by Jessica Riskin. © 2016 by Jessica Riskin. Reprinted with permission of The University of Chicago Press. All rights reserved.