Linguistics, the study of language, has attracted a lot of interest at Proof School this year.

This comes as no surprise to me, since linguistics engages some of the same mental faculties as math and computer science. Linguists use evidence from natural speech and writing to infer the rules of how languages work—rules that native speakers often follow without knowing that they know them. And, as parents in our community surely know, many Proofniks are squarely in the intersection of "highly verbal" and "fascinated by rule systems"! There has been (as of yet) no course on the subject, but it has popped up in clubs, in competitions, and in students' independent projects.

This program was begun by Mira Bernstein last year, who ran a club preparing students for the North American Computational Linguistics Olympiad (NACLO). An annual contest for both high schoolers and college students, NACLO is designed to generate interest in linguistics through puzzles. No formal background in the subject is required; the puzzles can be solved through logic. A typical puzzle from this year presents a sample of sentences in Tshiluba, a language spoken in the Congo, with their English translations. By analyzing these examples, students must figure out how to translate other sentences. NACLO puzzles also introduce students to linguistics-driven technologies such as parsers, OCR, and machine translation. (These puzzles often focus on the amusing ways such technologies can go wrong!)

When I took the reins from Mira this year, I envisioned a club that would split its time between solving NACLO puzzles and learning more broadly about linguistics. But it became obvious by the second meeting that the broad lessons were generating much greater excitement—the puzzles had done their job, and now students were hungry to learn "real" linguistics. A complication arose: I have no real training in the subject, just a lot of amateur interest. But through a combination of homework and bravado, I managed to lead the club, even if I wasn’t able to answer some of students' deeper questions. We mined our own language for hidden rules governing the morphology of wh- words, vestigial verb prefixes (like the "be-" in "befriend" or the "for-" in "forgive"), and phonological assimilation. Working our way through the IPA consonant chart, we learned how sounds are produced—and laughed a lot while figuring out how to generate possible sounds that don’t exist in English. (That "brr" sound that signals "I'm cold"? Yeah, that’s a consonant in some languages.)

Even though we did not practice for NACLO specifically, Proofniks had a good time with the contest, and one did well enough to qualify for the difficult second round.

Meanwhile, two students took their interest in linguistics to another level by creating their own languages. Constructed languages, or conlangs, are an ingredient in many fantasy universes: think of Klingon in Star Trek, Dothraki in Game of Thrones, or the Elvish tongues invented by J. R. R. Tolkien, who was a philologist as well as a writer. An interesting conlang is not just an existing language with replacements for the words; it has unique features that shape and are shaped by the culture of its imagined speakers. In this vein, one student created a language with grammar that obligates speakers to distinguish whether an action is "on purpose" or "by accident", and created different cases for alienable and inalienable possession. He also designed a beautiful script for writing this language, a sample of which is below:

The other student designed a language where a small number of basic root words can be combined into innumerable compounds, in ways that reflect the thinking of its speakers. Thus, the word for "nature" joins the roots for "ocean, river, wind, and trees", while the name she gave herself in her language can be translated as "happy song".

Linguistics also made an appearance in this year’s Math Burst, where four middle schoolers explored the possibilities of formal grammars—recursive rule sets that produce strings of letters or words. Here is a simple example of a formal grammar:

S → na na S
S → Batman

To form a string using this grammar, we begin with the start symbol S, then apply the rules in any order until all S’s have been eliminated. For instance, we might apply the first rule three times, then the second rule:

S → na na S → na na na na S → na na na na na na S → na na na na na na Batman

This grammar is only able to produce strings of the form "(na)2k Batman", where the 2k indicates an even number of repetitions. Not a very deep language! However, with more complex rule sets, we can try to mimic the grammar of natural speech. If you diagrammed sentences in high school, you will recognize how non-trivial this problem is, even at the level of syntax (or form) ... not to mention semantics. (The distinction between syntax and semantics is indelibly illustrated by Noam Chomsky’s perfectly grammatical sentence, "Colorless green ideas sleep furiously.")

Native speakers of a language know their grammar at a practical level, but can have trouble describing it. The rules are hard to formulate precisely, and exceptions abound. Formal grammars are in some ways a straitjacket—only certain kinds of rules are admissible at all, so trying to capture natural speech with a formal grammar disciplines the mind. Just to get started, the Math Burst group had to overcome such basic problems as allowing noun phrases with stacked adjectives and assigning the correct article ("a" or "an") to these phrases. Soon, however, the group had a computer program working that could spit out plausible (if slightly skewed) sentences like “The big big aardvark chews an orange dog.” One member of the group stocked this program with a custom rule set designed to write automatic poetry. I will close with a sample of her work, or perhaps it is her work’s work:

the group of ostriches ponders an ostrich .
an ostrich wraps around a swift olive while a group of cats dances around the
airplanes; meanwhile, the deafening flowers fight .

-- Austin Shapiro