Regex Anglorum

As I was walking the dogs this morning along our usual route, Halley and Harry stopped to sniff a greasy spot in the road which had been occupied the day before by an unlucky squirrel. “Hmm… some otherscavenger must have cleaned it up,” I thought, then corrected myself: “scavenger, or scavengers.”

English has no specific form to indicate “one or more” — nouns can be singular or plural, not (usually) inclusive of both. The more advanced syntax for Regular Expressions uses ‘+’ to indicate this set. I propose that we adopt this into English, as in “some other scavenger+ must have cleaned it up.” That would naturally cause some consternation for present tense verbs when one of these nouns is the subject, but we could use the same ending there: “The scavenger+ eat+ all the dead squirrels.” (I never understood why the ‘s’ occurs on singular verbs instead of on the plural).

Of course, in a regex /scavenger+/ would mean “scavenge” followed by one or more “r”s. We’d need to say /(scavenger)+/ instead.

Wait, that isn’t right either — that just indicates one or more occurrences of the word scavenger. Not one or more scavengers. Abstract symbols are a bit foreign to Regexen, but they do exist in the more advanced syntaxen. Take Ruby’s, for example:


Now that says,”one or more of what the word ‘scavenger’ represents.”

The English language is not nearly so rigorous. Take the plural, for instance. You might think that a regex equivalent for “scavengers” would be:


That says, “two or more of what the word ‘scavenger’ represents.”

In English, though, we also use the plural form for zero. “Yes, we have no bananas.” Let’s add that ambiguity to our expression:


The star says, “zero or more of the previous pattern” — so this accurately represents the idea of “two or more bananas, or none at all”.

Of course, in English I could use the singular form for zero as well: “Yes, we have no banana.” That would carry the connotation that we could have exactly one banana, but we don’t. We’re going to need a smarter squirrel.

Tags: english, language, regexen, ruby


View more posts from this author