Sunday, December 10, 2006

Need for Reporting Community Gas Consumption Statistics

Supply and demand will never fix the oil crisis. Miles per gallon is a pacifier. This narrow focus treats consumers like babies. A broader, adult perspective is warranted thinking in terms of reducing our per capita oil consumption on a community wide scale. Until we do that, government and oil companies will dictate the agenda, while transportation policies falter.

Commuter gas consumption is proportional to time and distance traveled, which aren't directly observable. SOV and SUV drivers are merely scapegoats. With built in time delays, public transit penalizes long distance commuters. So they opt out. But we need commuters to opt in, with time saving alternatives. Time is oil!

Theoretically, great gains in energy conservation are achievable by self-organizing commuters. If those who work in the same vicinity were neighbors, then direct point-to-point bus, van, or carpool service would reduce gas consumption while relieving traffic congestion, when measured at the community level.

Seeing how self-organization works at a job site is simple using this low-tech technique: display a street map with pins marking commuter home locations, so people can organize themselves. Getting people to see problems in a different light is the first hurdle. Moving the pins to cluster together means moving residences. The obstacle to implementing this idea is changing behavior in the real estate market. Raising consumer awareness to new perspectives is just the beginning.

Commuting is akin to a sorting problem. From computer science the fastest sorting algorithm for repetitive sorts is a presort. Let me explain. Take a deck of cards and try to rearrange it into numerical order by suit. If the deck is presorted then it's already in goal order. Anything out of order takes time. The greater the disorder the longer the time penalty. Commuters are the largest transportation segment of repetitive sorting, traveling the same route every workday for months or years. As a group over time, work and home locations change little.

The best theoretical solution is for commuters to live within walking distance of work, but cities aren't designed to support this option. Why not? Rehabilitating disaster areas, like New Orleans after Katrina, should seriously consider this game plan. The next best solution is for commuters, who work in the same vicinity, to live within the same neighborhood. This is the presort solution. A computer disk defragment program is an analogy. Buses line up at park and ride lots to deliver fans to sports games. This is more of the same idea.

Neighborhoods could organize themselves to shop "downtown" or at Wal-Mart en masse, supported by business sponsored point-to-point transportation services. Local governments could tax businesses on the number of parking spots they maintain, offset by the number of pooled transportation patrons they serve. This would get corporate America involved in finding oil smart solutions and aid efforts to modify consumer behavior. Reinforcing policy changes like these are needed at all levels of government.

Using analogies, the weakness of existing public transit alternatives is easy to expose. Scatter a deck of cards face down. Pick four company locations, one for each suit. Design a set of transit lines with stations along the route to serve this mock community. Turn the cards face up and check the results. Repeat this exercise with sports trading cards, which have many more teams to represent companies.

Experiment! Analogies allow us to fathom the incomprehensible, better than any other method. We're missing the big picture on oil policy because the problem space is beyond our grasp. We need to open our minds eye to better solutions than the current batch have hatched. Education leads the way.

How did we get into this mess? The government tax and spend strategy favors mass transit over citizen mobilization efforts. Supply side economics favors disorganized consumers. Effective organization requires better ideas with tailored information so individuals can best serve their own long range interests, not just short term whims. At its core we need a market forces strategy, with better informed producers and consumers.

One of my colleagues pioneered the simple rules for simulating flocking behavior, used in the computer graphics and artificial intelligence fields. But flocking behavior rules, like estimated gas mileage and oil prices, are proving inadequate. We need to model migration behavior, which encompasses a longer range perspective. We need to introduce new attitudes to help visualize how migration is possible, since migration is necessary.

Oil is a non-renewable resource. We need to instill survivable and sustainable habits to better judge our heading in our collective migration forward. Measuring and reporting energy consumption is a start. Successfully avoiding a catastrophic collapse of civilization may depend upon it. A rapid return to normal is impossible. Higher oil prices are the new norm.

Calculating seasonally adjusted per capita gas consumption statistics and reporting them at the local level would help promote a mature, migratory mindset. The data should be available, but it's not in this format or widely publicized. Differing by location, it's the volume of all oil deliveries over time divided by the number of local residents.

Factors like heating and tourism will make spatial comparisons unreliable. Temporal comparisons will be more instructive. Like a report card tracking progress, temporal trends should encourage much needed hope, incentive, innovation, and action. The competition between regions could prove interesting. Just imagine. What would Los Angeles be like if commuters were self-organized?

Saturday, August 5, 2006

Machine Translation: Library format

The Gistmass ELF (Extensible Linguistic Framework) Library contains converted text files of well-known prose including nursery rhymes, famous quotes, essays, articles, books, and the like. If somebody says, ‘Four score and seven years ago,’ it registers an exact hit. But if somebody says, ‘Five score and eleven years ago,’ it senses a similarity but not an exact hit. It sounds a little odd to both us and to the Gistmass ELF Machine Translation program.

The format of Library files is encoded. Library files are split in two, with a data fork and a format fork. The data fork is filled with sequences of root word indexes. For instance a verb includes past tense, present tense, and future tense; like ‘ran,’ ‘run,’ and ‘running,’ but they’re all the same root word.

Root word index numbers help speed up searches. Two groups of words can be converted into bit arrays: one bit per index. A bitwise AND operation compares two bit arrays for coinciding bits. A zero result means total independence. A non-zero result means coincidences exist, because the same index bit is ‘on’ in both arrays, which indicates that the same word belongs to both groups. The index of every resultant ‘on’ bit reveals the shared words. Here’s a truth table, where ‘1’ means ‘on’ and ‘0’ means ‘off.’

bit0101
wise0011
AND0001

This is the bitwise AND operation; only both bits ‘on’ results in an ‘on’ bit in the result, the last column. Bring up a calculator program and enter 2 then click the square root function, copy that number and do the same for the number 3. Here is what we get.

‘√2 = 1.4142135623730950488016887242097’
‘√3 = 1.7320508075688772935274463415059’

Let’s use these numbers to generate two random sequences of bits, changing every digit into a single bit. Let’s set every digit from zero to four ‘off’ and every digit from five to nine ‘on.’ Using this rubric, we’ll convert the two square roots into two aligned bit arrays, and draw a line under the second array. Now let's perform the bitwise AND operation. Just mark the ‘on’ bits using the table as a guide.

00000001100100110011001111000011
01000101011111110101010010001011







1


1

11


1



1




11

The bitwise AND is one of the fastest operations in a microprocessor. Let’s convert these bit arrays into index numbers with one on the left and thirty-two on the right. Here's two sequences of numbers, which represent the bit indexes of the ‘on’ bits:

‘8, 9, 12, 15, 16, 19, 20, 23, 24, 25, 26, 31, 32’
‘2, 6, 8, 10, 11, 12, 13, 14, 15, 16, 18, 20, 22, 25, 29, 31, 32’

Now, let’s write down the numbers that are common to both lists. ‘8, 12, 15, 16, 20, 25, 31, 32’ Was that faster or slower? It was slower. What if I shuffle the numbers out of sequence? Will that be faster or slower? It will be even slower, and the same is true for computers. Do you have any appreciation for the speed of various algorithms? Does the term ‘order of complexity’ ring a bell?

Order of Complexity
CNlog NN log NN2
C10110100
C100220010,000
C1,00033,0001,000,000
C10,000440,000100,000,000

These are common ‘orders of complexity’ where ‘N’ is the number of items under consideration. The first column is called ‘constant’ because the length of time it takes to perform the algorithm is independent of the number ‘N.’ For example; a thermostat decides whether or not to turn on the air conditioning, in a fixed amount of time. Temperatures cover a wide range, and heat sources vary, but the thermostat only determines whether the temperature is above or below some preset reference temperature, and then acts accordingly.

The second column is called ‘linear’ because the length of time it takes to perform an algorithm is directly proportional to the number ‘N.’ For example; contractors get paid by the hour. The hourly rates for different people may vary, but pay is directly proportional to hours worked. The third column ‘log N’ increases much slower than N. For example; this is the approach used to look up words in an alphabetized listing, like a dictionary or a telephone book. The fourth column ‘N log N’ gets progressively slower as ‘N’ increases. For example, counting out loud, where the larger the number the longer it takes to say all the digits. Like one million, forty-eight thousand, five hundred and seventy-six compared to saying six.














The last column ‘N’ squared slows down in a hurry as ‘N’ increases. Let’s consider the monitor configuration utility. The application adjusts the resolution of the display. If we double the resolution, then the number of pixels goes up by a factor of four, or two squared. If we tipple the resolution, then the number of pixels goes up by a factor of nine, or three squared.

Now, suppose we want to find phrases in our Library. Each search phrase is independent of all the others, and the size of the Library can increase. This is inherently an ‘N’ squared problem, but we want to speed it up as fast as possible. Let’s imagine we’re searching for the phrase, ‘Four score and seven years ago.’ This string has thirty characters, including spaces, but only six words. The Library contains thousands of files. Just one pass through every file takes minutes. Instead, we need to quickly identify those files with the greatest potential of containing this phrase. If we convert this phrase into root word indexes, then the capital F is redundant and so is the plural of ‘year.’ We then have six root word indexes that offset into a dictionary or into a bit array to detect coinciding bits. Our maximum index is sixty-four K which fits into sixteen bits or half of a thirty-two bit word.

Each index corresponds to a dictionary entry. Separate dictionaries exist for different languages, people, places, animals, plants, fictional characters, product brands, medical terms, acronyms, chemicals, and other specialized disciplines. The most common words appearing in the Library files are assigned the lowest index numbers, while the least common words are assigned the highest index numbers. Offline, we run a histogram for the entire Library to find out how often each word is used. As I said before, each Library file is encoded, and the data fork contains sequences of root word indexes. The format fork contains punctuation, capitalization, special characters, which verb tense and which dictionary is in use. In addition, each disk has a Master file that includes the precompiled bit arrays of every Library file on the disk. For every root word in every Dictionary, a bit is set ‘on’ for all the Library files containing that root word. A bitwise OR operation sets bits ‘on’ in a bit array without erasing any bits already set. Here’s another truth table, where ‘1’ means ‘on’ and ‘0’ means ‘off.’

bit0101
wise0011
OR0111

Let’s replace the bitwise AND results, we did before. Now let’s mark the ‘on’ bits using this table, and perform the bitwise OR operation.

00000001100100110011001111000011
01000101011111110101010010001011

1


1
111111111
111
11111

1
11

So our phrase has six root word indexes, and we want to find all the Library files that contain all six words. The Master file contains a precompiled list, for every root word in every Dictionary, of every Library file containing that root word. We can use it to compare the Library file lists for the six root words in our phrase. Would it be faster to compare the files as numbers or as bit arrays? Bit arrays. Let’s extend our example to six rows. Here are the results of four more square roots, and the corresponding bit arrays.

‘√5 = 2.2360679774997896964091736687313’
‘√6 = 2.4494897427831780981972840747059’
‘√7 = 2.6457513110645905905016157536393’
‘√8 = 2.8284271247461900976033774484194’

00000001100100110011001111000011
01000101011111110101010010001011
00010111110111111110010101111000
00010111001100110110110100101011
01011100000101101101001011101010
01010010001010100111000110010010














1

















In the computer this involves fetching six words of memory and five AND operations. So now we have a result with a bit set ‘on’ for every file that contains all six words. We need to get from that bit array to the Library file. We have thirty-two bits representing thirty-two files. If we had a corresponding table of thirty-two file references, then we can use the bit index as an index into the file reference table. In this case it’s only one file, but an outcome of many files is possible. Testing ever bit is thirty-two operations, but table lookups are faster, just as counting fingers is slower than memorizing ‘five plus five equals ten.’ Let’s group the columns together in groups of eight columns.

If the least significant bit is bit zero, then that bit is bit seventeen. A byte is eight bits. Eight bits can address two hundred and fifty-six unique locations. We can precompute a table offline that translates bits into indexes. Every combination of eight bits is a unique index into the table. Each table entry points to a null terminated string containing numbers from one to eight indicating which bits are ‘on’ for that arrangement of bits. Thirty-two bits contains four bytes. The base pointer into the file reference table needs to be adjusted for every byte of the bit array so that a number from one to eight indexes the correct file reference. Then we open a Library file, with a file reference, and search for the exact sequence of words in the phrase, and we report a hit if an exact match is detected. It’s all accomplished much faster than opening every file on a disk and searching for the phrase anywhere in the file.

But that only covers thirty-two files. If we double the number of files, we double the number of operations. That’s a linear ‘order of complexity.’ Thirty-two times thirty-two is one thousand twenty-four. A thousand files in not a lot, for our application. Unless we have a way to rule out files that’s thirty-two times the number of operations.


Master File Record
1ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef
2ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef
3ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef
4ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef
5ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef
6ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef

What you see here is the format of one record in the Master file. This example shows thirty-two unique letters: capital ‘A’ through ‘Z’ and lower case ‘a’ through ‘f.’ The cells along the bottom row, row ‘6’ contain bit arrays representing thirty-two files. That’s ten twenty-four files in all. Every cell with multiple letters combines the bit arrays using a bitwise OR. Notice the top row. Row ‘1’ contains all the letters in one cell. That means all the bit arrays referencing every Library file are combined using a bitwise OR. Each cell represents one word in the Master File Record. The total number of cells, and words in a Master File Record, is one less than double the number of cells in the bottom row. A Master File Record exists for every root word in every Dictionary.

Notice the number of cells in consecutive rows doubles. The next row would contain sixty-four cells. The row after that one hundred twenty-eight, and so forth depending on the number of Library files on the disk. But let’s stick with this example of thirty-two cells. Suppose one of the words we’re interested in only occurs in lower case letter cells. How many cells is that? It’s fourteen.

That’s less than thirty-two. That means only cells with lower case letters have bits set ‘on’ and all other cells are zero. Let’s walk through the steps. To make it easier, let’s call a cell by its row number and the first letter it contains. Cell ‘1A’ is non-zero. From that we know at least one file contains the word we seek. So we move down to row ‘2’ where cell ‘2A’ is zero. We ignore that branch. Half of the Library files are eliminated. We know that cell ‘2Q’ has bits set, without testing it. So we move down to row ‘3’ where cell ‘3Q’ is zero. We ignore that branch. Three quarters of the Library files are now eliminated. We know that cell ‘3Y’ has bits set, without testing it. So we move down to row ‘4’ where both cell ‘4Y’ and cell ‘4c’ are non-zero.

At this point we have two branches to consider, so we need additional help. We perform a bitwise AND test of cell ‘4Y’ using all root words in our phrase. If the result is zero we eliminate that branch, otherwise we push cell ‘4c’ on a stack to consider later, and move down to row ‘5’ where cell ‘5Y’ is zero. We ignore that branch. We know that cell ‘5a’ has the bits set, without testing it. So we move down to row ‘6’ where both cell ‘6a’ and cell ‘6b’ are non-zero. We perform a bitwise AND test of cell ‘6a’ using all root words in our phrase. If the result is zero we eliminate that branch, otherwise we push cell ‘6b’ on the stack. Any non-zero, bitwise AND result on cell ‘6a’ represents one or more Library files that contain all the root words in our phrase. So we go through the process of searching any of those files for an exact match. If we find one, then we post it.

On the way down the rows, we need to keep track of all the branches we pass that still need testing. We put those cell addresses on a push down stack, which is just a LIFO buffer. We know how deep the stack is in advance: one less than the number of rows, which equals the base two logarithm of the number of cells in the last row. When we finish searching a file, we pop the stack and continue the process until the stack is empty. Then we’re done.

This same binary search mechanism helps determine context, by searching the Associative database files for coincidences, like between the words ‘coconut’ and ‘automobile’, which might occur together in a hierarchy of physical objects.
_________________________
K = 1024 in computer terminology, which is 210; and 64K = 216, which fits in 16 bits.

Monday, July 31, 2006

Primordial Construct: Space-Time Continuum

The space-time continuum represents familiar territory. Four dimensions, three spatial and one temporal, define any given point at any given time in the universe. Comprehensive for the purpose it serves, a paradigm describes a limited, isolated realm in its entirety. However, aspects of the universe exist outside the paradigm of the space-time continuum, e.g. color.

Past
«
Present
Δ
Future
»
North
N
Up
West
Origin/Location
East
+
Down
South
$

Within the context of space-time, the compass rose is a tool used to specify the location of any place on earth, measuring latitude North or South, and measuring longitude East or West. In addition, a few common terms represent height Up or Down. Elevation measures distance of the surface above sea level. Depth measures distance in water below the surface. And altitude measures distance in the air above the ground or water surface.

Without a fixed frame of reference, the four dimensions of space-time remain ambiguous. The standard terms used to designate an absolute point of reference or origin on earth include the following: equator, prime meridian, sea level, Greenwich Mean Time (GMT), and calendar.

Whereas the transition between North and South, between Up and Down, and between Past and Future seems continuous; our notions of East and West don’t mix, like oil and water. This characteristic is no accident. Between some extremes lie shades of gray, but other extremes are black and white.

Time operates in peculiar ways. Only the Present actually ever exists, and it is eternal. The Past can’t be changed, and the Future never arrives. Despite appearances space and time are inseparable. Time, as we experience it, is measured by the rotation, orbit, and procession of our planet Earth against the backdrop of the stars, including Sol, our sun.

Metric dimensions of distance, mass, and volume come in the units: meters, grams, and liters. Roughly equivalent English values come in the units: inches or feet or yards or miles, ounces or pounds or tons, and teaspoons or tablespoons or cups or quarts or gallons.

Time is measured in the units: seconds, minutes, hours, days, weeks, months, years, centuries, and millennia.

On vastly different scales are the special purpose units: angstroms and light-years.

Primordial Construct: Template

The Primordial Construct Template designates indiscriminate place holders for labeling the Primordial Construct. After many attempts, ten distinct aspects proved adequate for a comprehensive chart of existence. The ten labels are linked into four pairs, comprising eight directions, plus two additional nodes.

Past
«
Current
Δ
Future
»

North
N
Out
Minus
Home
Plus
+
In
South
$


A slight modification of the space-time continuum model suffices to produce ten unique labels that are used as Primordial Construct references. Four of the eight directions remain unchanged: North, South, Past, and Future. Up and Down are relics of a narrow, flat earth mentality. In and Out will prove more useful, because they substitute well for Down and Up respectively, relative to the center of the earth, while also encompassing the concept of scale. East and West are replaced with Plus and Minus respectively, since they not only convey right and left directions but increase and decrease as well. The word Current, with its double meanings comprising both now and flow, substitutes for the node Present. For the final node, anchoring the spatial dimensions, the label Home conveys a sense of origin, balance, stability, and change in discrete stages, as in the word homeostasis.

Symbols

A symbol identifies each of the eight directions and two nodes, while at the same time tagging terms related to Primordial Construct attributes.

« Past, resembles a rewind button symbol
Δ Current, in mathematics delta represents a small slice of time or space, somewhate resembles the triangle on a play button
» Future, resembles a fast forward button symbol
N North, magnetic north
Out, up arrow
Minus, represents subtraction or decrease
Home, approximately equal to symbol represents homeostasis
+ Plus, represents addition or increase
In, down arrow
$ South, magnetic south, dollar sign represents materialism

Beyond the coordinates in four dimensional space-time lie additional properties of a place. Ambiance provides richer texture, and that is the point in starting out with abstract dimensional labels of an indiscriminate nature. In progressing from denotation to connotation, resolving the unique properties of each aspect of the Primordial Construct is the intent of this exercise. In hindsight, ten elements proves a necessary and sufficient number of candidate categories for representing the Primordial Construct.