PATTERNS
   

MINI ALMANAC


Calendar

Moon phase


Highlights:

Norbert Wiener

IG-NOBEL 2005

The Da Vinci Code

Holy Blood, Holy Grail

The Solomon Key

NOBEL MEDICINE 2004

IG-NOBEL PRIZES
2004

The first email

Concerned Scientists write to Bush

Economics Nobel 2003

Chemistry Nobel 2003

Medicine Nobel 2003
Literature Nobel 2003

Physics Nobel 2003

Life on Mars ?
Rosalind Franklin and the Discovery of Double Helix

Good Bye Dolly
On Stonehenge
The Loss of Columbia
IG Nobel 2002
The invention of :-)
West Nile Virus
Asteroid Impact?
Molecule Hunt
Tuxedo Park
Ancient Trade Routes
Pop Singer to Fly In Space
Great Ideas

Computational Genomics

Bioinformatics


Baraka

The Universe in a Nutshell
Copenhagen, the Play
Count of Monte Cristo
Nobel Prize 2001
John Nash
Echelon
Kernel Methods

Ig-Nobel Prize
Einstein's Brain
Space Turism
Floating City
Mir's Blast
Origins
Great Books
Nobel Prize
In the mind of:
Serial Killers
The secret shuttle
Are we aliens?
Studying ET
Dinosaurs
Bonobo
Pattern Analysis
Early Vibrators
and Hysteria
The CYB.ORGs
among us
Book: Darwin
Book: Russell

 

ZIPF'S LAW

The Quark and the Jaguar
by Murray Gell-Mann

This is an excerpt from
The Quark and the Jaguar
by Murray Gell-Mann,
Freeman & Co, 1994

... Often, however, we encounter less than ideal cases. We may find regularities, predict that similar regularities will occur elsewhere, discover that the prediction is confirmed, and thus identify a robust pattern: however, it may be a pattern for which the explanation continues to elude us. In such a case we speak of an "empirical" or "phenomenological" theory, using fancy words to mean basically that we see what is going on but do not yet understand it. There are many such empirical theories that connect together facts encountered in everyday life.

Suppose we pick up a book of statistical facts, like the World Almanac. Looking inside, we find a list of U.S. metropolitan areas in order of decreasing population, together with the population figures. There may also be corresponding lists for the cities in individual states and in other countries. In each list every city can be assigned a rank, equal to 1 for the most populous city, 2 for the next most populous, and so on. Is there a general rule for all these lists that describes how the population decreases as the rank increases? Roughly speaking, yes. With fair accuracy, the population is inversely proportional to the rank; in other words, the successive populations are roughly proportional to 1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, 1/8, 1 /9, 1 /10, 1/11, and so on.

Now let us look at the list of the largest business firms in decreasing order of volume of business (say the monetary value of sales during a given year). Is there an approximate rule that describes how the sales figures of the firms vary with their ranks? Yes, and it is the same rule as for populations. The volume of business is approximately in inverse proportion to the rank of the firm.

How about the exports from a given country in a given year in decreasing order of monetary value? Again, we find the same rule is a fair approximation.

An interesting consequence of that rule is easily verified by perusing any of the lists mentioned, for example a list of cities with their populations. First let us look at, say, the third digit of each population figure. As expected, the third digit is randomly distributed; the numbers of 0s, 1s, 2s, 3s, etc. in the third place are all roughly equal. A totally different situation obtains for the distribution of first digits, however. There is an overwhelming preponderance of 1s, followed by 2s, and so forth. The percentage of population figures with initial 9s is extremely small. That behavior of the first digit is predicted by the rule, which, if exactly obeyed, would give a proportion of initial 1s to initial 9s of 45 to 1.

Rank
n
  City Population
(1990)
Unmodified
Zipf's law
10,000,000
divided by n
Modified
Zipf's Law
5,000,000
divided by
(n - 2/5)3/4
1   NewYork 7,322,564 10,000,000 7,334,265
7   Detroit 1,027,974 1,428,571 1,214,261
13   Baltimore 736,014 769,231 747,639
19   Washington, D.C. 606,900 526,316 558,258
25   New Orleans 496,938 400,000 452,656
31   Kansas City, Mo. 434,829 322,581 384,308
37   Virginia Beach, Va. 393,089 270,270 336,015
49   Toledo 332,943 204,082 271,639
61   Arlington'Texas 261,721 163,934 230,205
73   Baton Rouge, La. 219,531 136,986 201,033
85   Hialeah, Fla. 188,008 117,647 179,243
97   Bakersfield, Calif. 174,820 103,093 162,270
Populations of U.S. cities from the 1994 World Almanac compared with Zipf's original law and a modified version of it.


What if we put down the World Almanac and pick up a book on secret codes, containing a list of the most common words in a certain kind of English text arranged in decreasing order of frequency of occurrence? What is the approximate rule for the frequency of occurrence of each word as a function of its rank? Again, we encounter the same rule, which works for other languages as well.

Many of these relationships were noticed in the early 1930s by a certain George Kingsley Zipf, who taught German at Harvard, and they are all aspects of what is now called Zipf's law. Today, we would say that Zipf's law is one of many examples of so-called scaling laws or power laws, encountered in many places in the physical, biological, and behavioral sciences. But in the 1930s such laws were still something of a novelty.

In Zipf's law the quantity under study is inversely proportional to the rank, that is, proportional to 1, 1/2, 1/3, 1/4, etc. Benoit Mandelbrot has shown that a more general power law (nearly the most general) is obtained by subjecting this sequence successively to two kinds of modification. The first alteration is to add a constant to the rank, giving 1/(1 + constant), 1/(2 + constant), 1/(3 + constant), 1/(4 + constant), etc. The further change allows, instead of these fractions, their squares or their cubes or their square roots or any other powers of them. The choice of the squares, for instance, would yield the sequence 1/(1 + constant) 2 1/(2 + constant)2, 1(3 + constant)2, 1(4 + constant)2 etc. The power in the more general power law is 1 for Zipf's law, 2 for the squares, 3 for the cubes, 1/2 for the square roots, and so on. Mathematics gives a meaning to intermediate values of the power as well, such as 3/4 or 1.0237. In general, we can think of the power as 1 plus a second constant. just as the first constant was added to the rank, so the second one is added to the power. Zipf's law is then the special case in which those two constants are zero.

Mandelbrot's generalization of Zipf's law is still very simple: the additional complexity lies only in the introduction of the two new adjustable constants, a number added to the rank and a number added to the power 1. (An adjustable constant, by the way, is called a "parameter," a word that has been widely misused lately, perhaps under the influence of the somewhat similar word "perimeter." The modified power law has two additional parameters.) In any given case, instead of comparing data with Zipf's original law, one can introduce those two constants and adjust them for an optimal fit to the data. We can see in the chart on page 94 how a slightly modified version of Zipf's law fits some population data significantly better than Zipf's original rule (with both constants set equal to zero), which already works fairly well. "Slightly modified" means that the new constants have rather small values in the altered power law used for the comparison. (The constants in the chart were chosen by mere inspection of the data. An optimal fit would have yielded even better agreement with the actual populations.)

....

 

dickran.net - Copyright 2004- In association with Amazon.com

[an error occurred while processing this directive]
 
Quotable Quote

Random Link

History of Technology

Is this Monument Telling the Truth ?



This monument in downtown Boston is at odds with a recent Congress resolution, granting to Antonio Meucci - not Alexander Bell - moral rights for the invention of the telephone .... more
 
Improbable Research

The 2005 IG Nobel Prizes were awarded in a ceremony at Harvard University.

THE 2005 AWARDS:

CLICK HERE !

 

... read more