When to cheat

3 Jan 2009, 10:31 AM

Sometimes, when trying to optimize a computer system, you get to a point whereby on one hand the next x% of optimization will take orders of magnitude more time than the previous x%. And on the other hand, you can completely change your marginal optimization cost if you take a different approach by approximation. Good systems can have hardware thrown at the problem to scale. As has been mentioned elsewhere, hardware is often cheaper than programmers, so we tend to go as far as we can by taking this approach - until it no longer works. Perhaps a more efficient algorithm can be implemented to lower the marginal cost of scaling. Unfortunately there comes a point whereby traditional algorithmic optimizations fail to change the equation enough to manage your costs. At this point you may need to cheat to scale.

Knowing when to approximate requires an understanding of the costs of doing so. If your system is responsible for proprioception in an automated x-ray system, then getting it wrong is to be avoided at all possible costs. Everyone would like to be as accurate as possible, but this is not always cost efficient. If you are running an analytics system being wrong costs less than you might think. Any system that produces statistics based on sampled observations has room for approximate solutions. Programmers tend to forget that such statistics contain error. Programmers tend to think that if there code is free of logical flaws, then the output must be error free. If you observe 12 users clicking on an ad that was displayed 1,000 times then the click through rate must be 1.2%. Click through rate, in this case is a statistic - a way of summarizing raw data. However, click through rate is also a measure of some innate clickability that is driven by the ad, its context and its viewers. Note that we can't possibly measure these abstract quantities, so we use our observed behavior of 12 clicks from 1,000 ads as guide to navigating the underlying complex abstract system of interactions that determine the clickability of an ad.

As soon as we produce a report that states that click through rates are 1.2% on all Fridays then we are implicitly giving the caveat of 'based on all Fridays we have seen.' But we must admit some error as soon as anyone tries to generalize about all future Fridays from that one statistic. A statistic used as an estimate must come with an acknowledgement that without seeing the entire population there will always be the chance of being wrong. A correct software system coupled with perfect data capture ensures that the sample statistic will be correct but there will always be a chance for error in its estimates. In evaluating the cost of deviating to algorithms that are only probabilistically correct or involve some form of approximation, we must concentrate on the increased cost caused by a possible increase in the rate or nature of errors.

There are a number of things to consider when determining the amount of error introduced when producing estimates, but all methods involve two numbers, a confidence interval and confidence level. These will depend on what you are estimating and how you go about it. Estimating a population maximum behaves differently in the presence of approximation than does the estimation of the mean or median. Approximating by taking every second sample may have a different impact than only looking at the first half of a sample. To begin one needs to look at the baseline error in your estimate as produced by an algorithm that uses no approximation. We may find that our estimate of Friday click through rate of 1.2% has a confidence interval of +/- 0.2% at a confidence level of 95%. This tells us that 95% of the time, when we try to estimate click through rate it will fall between 1.0% and 1.4%. We then examine how changing the method of measurement by introducing approximations will change the confidence interval for a given confidence level.

From this point we compare the marginal cost of error against the difference in cost of optimization by approximation and the cost of scaling using the current cost to scale. Approximation may chance the confidence interval from +/- 0.2% to 0.3%. How much this costs you is a question of risk management and depends on the economics of the business that you are in. Click through rates might be used to tweak an advertising budget. If you are buying ads on a cost per click basis, then the cost of the increased uncertainty in click through rate will be determined by the cost of each click. The cost of implementing an approximation versus traditional scaling will be determined by how analytically optimal your current system is and the costs associated with scaling be it by hardware, software or approximation. If the increase in cost caused by approximation is lower than the cost of buying more hardware or implementing more complex algorithms, then cheat.


Comments (1)

What is i2pi?

2 Jan 2009, 19:58 PM

Apologies about the formatting here, LaTeX doesn't look so pretty as HTML. For a pdf version, click on the words 'pdf version'.

People often ask me what i2π means. To me, i2π represents the pleasure of stringing together simple principles to arrive at a beautiful understanding of the nature of something. Here I am going to share with you the way I think about i2π in a mathematical context. This doesn’t represent the full history or complete derivation, but it is how I like to think about things. If you want to learn more, Wikipedia is a good place to start. If you want to see things through my eyes, for just a moment, stick around. If you scroll ahead it may look a little scary to the mathematically uninitiated, but we won’t be using anything more complicated than mid-range high school math to get there. And I think the journey is worthwhile.

To understand what i2π is all about we really have to start with Euler’s constant, e. Euler’s constant is a special number in mathematics and it appears in many equations across mathematics, physics, statistics and economics because it has a number of unique properties. My favourite property is a variation on what is know as Euler’s identity:

ei2π = 1.

And that is why we are starting our i2π journey with e. My next favourite property of e is the relationship between ex and its derivative dex-
dx, namely

dex    x
dx- = e .

Those who have taken high-school calculus take this for granted. Those who didn’t or who have forgotten it are probably scratching their heads. For their sake, and my own, lets quickly review derivatives.

Derivatives are a convenient way of describe the slope of a line. Take the equation for the line y = 2x, then for each unit increase in x, we get 2 units of increase in y. The slope of a line is the ratio between the change in the output of the function that describes it with respect to the change in the input. In this case, x is the input and y is the output, and increasing x by 1 increases y by 2, so the slope is 2:1 or simply 2. The notation df(dxx) describes the slope of some function f(x) as x changes. The equation y = 2x has the same slope for all values of x, so we say the slope is a constant. More complicated functions, like f(x) = x2 (in the figure below) are curved, so the slope changes as we change x.



Without going too deep into calculus, it is known that   2
dxdx- = 2x. In other words, for any point x, the slope is 2x. If you look in the figure above, when x = 0, the slope is 0. As x increases to the right of zero, the slope gets steeper and steeper. To the left, as x becomes increasingly smaller than zero, the slope becomes increasingly negative. In fact, for any k
  k
dx--= kxk-1.
 dx

For example, dx7-
dx = 7x6.

When you take the derivative (find the slope) of most functions, the answer is usually some modified form of the function you started with. However, the exponential function, ex is the simplest example of the case where the derivative is equal to the function. Up to this point, we haven’t even worked out what the numerical value of e is, but let us try to define e by starting with the fact that dex
dx = ex. To do this, we will need to take a detour into the world of factorials.

Imagine that we have 5 books that we want to place on a shelf. How many different ways can we arrange them? Working methodically, from left to right, there are 5 possible books we could put in the leftmost spot on the shelf. Once we choose the first book to place there, we are left with 4 possible books to place along side it. And once we choose that book, 3 books remain. After placing the third book, only 2 more remain, and so on. This means there must be 5 × 4 × 3 × 2 × 1 = 120 ways of arranging those 5 books. If we had 100 books, it would take up too much paper to write down 100 × 99 × 98 × × 1, so we use the shorthand 100!, which we pronounce ’100 factorial’. One hundred factorial is a big number. If we built a book arranging robot that could do one billion arrangements per second, and had one billion of them running in parallel since the big bang, they still wouldn’t be finished trying all the possible ways to arrange the books. Thank god for the Dewey Decimal system, eh?

Ok. So factorials are just a shorthand way of writing down a special type of successive multiplication. We can use the formula for the slope of xk to find the slope of

        k
f(x) = x-.
       k!

We can re-arrange this to be

      xk    1 k     k
f(x) =-k! = k!x = Cx .

If you multiply some function by a constant, C, then the slope is also multiplied by C. From the rule we saw 2 paragraphs ago, we now that

df(x-)= -d-Cxk = Ckxk-1
 dx    dx

We also know that k! = k × (k - 1)!. For example, 6! = 6 × 5! = 6 × 5 × 4 × × 1. So

-d-xk-  -kxk-1--
dx k! = k(k- 1)!

Cancel out the k in the numerator and denominator, and we get

df(x)-= -xk--1-.
 dx    (k- 1)!

Now lets tackle a slightly more complicated function,

        0   1    2    3
g(x) = x-+ x- + x- + x-+ ...
       0!   1!  2!   3!

The first 2 terms of this function are pretty simple. Recall that factorials count the number of ways of arranging objects. There is only one way to arrange zero books, so 0! = 1. Also recall from math class that x0 = 1, so

x0
0! = 1.

There is only one way of arranging one book and x1 = x, so

 1
x- = x.
1!

Therefore

             x2  x3
g(x ) = 1+ x + 2! + 3! + ...

I know this comes out of nowhere, but just go with flow. What is the derivative of this function? Derivatives are additive so we can just do each bit individually and add them together. The derivative of 1 is 0, as it is a flat line - hence no slope. The derivative of x is 1, as it is a line with a 45 degree slope. And we just worked out the rest in the previous paragraph, so

                  2    3
dg(x)-= 0+ 1 +x + x- + x-+ ...
 dx              2!   3!

Hey, wait up. If we drop the leading 0, which we can, then that’s just g(x) again.

dg(x)-
 dx  = 0+ g(x) = g(x).

To see this requires some majorly deep and majorly simple insight. When things are deep AND simple, they are beautiful. What just happened was that even though the first term of the sum disappeared by turning into zero, the rest of the sum remained. Because we defined g(x) to go on from  3
x3! + to infinity, for each term that drops off the front, there are an infinite number of terms to make up for it. Infinity less one is still infinity.

Recall that we defined ex to have the property that

  x
de- = ex
 dx

and we now have a function where

dg(x)
  dx  = g(x).

These properties are the same. The function has itself as its derivative. And it just so happens that g(x) is ex. To understand why, we need to look at Taylor’s Theorem. So point your browser to Wikipedia if you are so inclined and join us in the next paragraph to continue.

Great. Welcome back. The point of this post is to explain i2π and so far we have only covered e, so let’s get a move on and have a look at i and π. I assume everyone is cool with 2.

What is i? i is the square root of negative one.

√ ---
  - 1 = i

So, i2 = -1. When I first encountered i I asked a family friend / math professor to explain it to me. All the books I had read just talked about ’complex numbers’ and I wanted to understand what made them ’complex.’ She explained to me that they aren’t complex, in the sense that complex means difficult. They are just different to the normal numbers we usually encounter. In school you would have learned that the square root of negative numbers is undefined. But they turn up so frequently that man invented a new class of numbers to allow us to define them. Numbers are just symbols for the abstract concept of quantity. And i is just a symbol for the square root of negative one. While it is not possible to have i books or bananas, we can still do mathematics with i and end up with real world numbers. For example,

 2        √ --- √ ---
i = i× i =  - 1 × - 1 = - 1.

And i4 = i2 ×i2 = -1 ×-1 = 1. So while we can’t buy i bananas, we can buy i4 bananas, because i4 is 1. As you keep on raising i to higher and higher powers, a pattern emerges. i1 = i, i2 = -1, i3 = i2 ×i = -i and i4 = 1. When we look at i5 we find i5 = i4 × i = 1 × i = i, and the pattern repeats. For no apparant reason, lets sum up all the powers of i:

i+ i2 + i3 +i4 + i5 + i6...

= i+ (- 1) +(- 1× i)+ (- 1× - 1)+ (- 1× - 1× i)+ (- 1× - 1× - 1)...

= i - 1 - i+ 1+ i- 1...

To see the pattern more clearly, lets split up the odd an even terms

i2 + i4 + i6 + i8 + ...= - 1+ 1 - 1+ 1- ...

i1 + i3 +i5 + i9 + ...= i- i+ i- i+ ...

Out of sheer curiosity, lets find out what pattern would we get if we expanded eix using the formula we found before.

             i2x2   i3x3  i4x4
eix = 1 + ix +----+ ----+ ----+ ...
              2!    3!    4!

We know that the pattern in the i’s comes out nicely if we split out the evens and odds, so lets call the even part of the right hand side of the equation a(ix) and b(ix) for the odds:

           2 2    4 4   6 6
a(ix) = 1 + ix-+ ix--+ i-x-+ ...
            2!    4!    6!

Now using the -1 + 1 - 1 + 1 - pattern, we get

          x2   x4  x6
a(ix) = 1- 2! + 4! - 6! + ...

Likewise for the odd terms,

       (      3   5    7     )
b(ix) = i x- x- + x- - x- +...
             3!  5!   7!

Now when you went across to Wikipedia to check out Taylor’s Theorem, you will have seen that

           x2   x4   x6
cos(x) = 1--2! + 4! - 6! +...

Which is totally the same as our a(ix). I may not be hip and fresh with the rock'n'roll and skateboarding, but I know cool when I see it, and that is cool. We also know that

sin(x) = x- x3 + x5 - x7+ ...
            3!  5!   7!

Which means that b(ix) = i × sin(x). Putting a and b together, we find that

 ix
e  = a(ix)+ b(ix) = cos(x)+ i× sin(x)

Now that we have gone from summing polynomials to trigonometry, it may be coming clear where the π fits in. π is a special number that defines the ratio between a circle’s diameter and its circumfrence. If a circle has a diameter of one furlong, then its circumfrence will be π furlongs. π is also used to measure angles, the same way as degrees are. In a circle there are 360 degrees, but mathematicians like to say that a circle has 2π radians. That is, if you have a circle with a radius of 1 foot, then the circumfrence will be 2π feet. If you were to walk around this circle through 1/8ths of its circumfrence, you will have moved 45 degrees, or π
4 radians.

The sine and cosine take an angle in radians and tell you the x and y coordiants of that point on a circle. If you walk 45 degrees anti-clockwise around a circle starting from the point (1,0), then you will end up at position

(   (  )    (  ))
 cos  π-,sin π-  .
      4      4

If you walk 2π radians around the circle, you will have gone 360 degrees and end up where you started. So (cos(2π),sin(2π)) = (1,0)

So the function e tells us the coordinates of where one ends up after walking ω radians around a circle of radius 1. We have been writing down our coordiates as (x,y), where x = cosω and y = sin(ω), but as we found out earlier,

eiω = cos(ω)+ isin(ω) = x+ iy.

If we think of i as being a symbol to represent our distance in the y direction, then we can convert from x + iy to (x,y). And if we walk around a full circle and end up at the beginning point of (1,0), we can convert back to 1 + i × 0 = 1. Therefore

 i2π
e   = 1.

I like i2π as this term comes up across a range of mathematical equations and you can go a long way in learning about mathematics by looking in the origins of this famous identity. Simplicity, depth & beauty.
Comments (4)

Twit or Love

30 Dec 2008, 20:44 PM

I recently rejoined twitter and was completely blown away by the amount of discussion on twitter that is about twitter. I am reminded of a friend who once told me that the only thing you learn by taking drugs is how to take drugs. Conscious of this, I have refrained from mentioning twitter on twitter, but given that I have talked about it before on this blog, I feel OK bringing it up here.

This morning I saw a post from my friend Aaron:

neonarcade I wish Twitter was more about what people are doing again, and less a generally boring conversation about Twitter and social media.
I was compelled to act. My initial thought was to reskin twitter, or make a firefox plugin that removes all mentions of twitter from the feed. However, by the time I reached the office I settled on a single service site, as is the style of the time. So I now present Twit Or Love. Right now, the word 'twitter' has been mentioned 1.54x more frequently than the word 'love.' I use a highly accurate Markov-Chain Monte-Carlo technique to arrive as this unbiased estimate. My Gibbs sampler (unrelated to melting snow with salt) is coded in Erlang with the front end dynamically generated by a Scala program that writes out Ruby on Rails code. It is hosted on EC2. And uses map reduce. Enjoy


Comments (2)

Gibbs Free Economics

22 Dec 2008, 16:56 PM

This weekend marked the first time I shoveled snow. To cope with the odd sensation of being very cold while being very hot & sweaty, I thought about the mechanism by which salt melts ice. It has been a while since I studied physics and chemistry, but I had a basic notion of what was going on. I don't want to give it away early on in this post, so I'll share with you the chronology of my process of trying to confirm my understanding of why salt melts ice.

I started off with a simple Google search for "salt snow." The top link was to a page on About.com (as often is the case.) And the article was totally useless (as often is the case.) The page gave me some information, specifically it told me

Freezing point depression is a colligative property of water. A colligative property is one which depends on the number of particles in a substance. All liquid solvents with dissolved particles (solutes) demonstrate colligative properties. Other colligative properties include boiling point elevation, vapor pressure lowering, and osmotic pressure.
All true. But no knowledge is contained within that paragraph. I followed some links and came across another page trying to explain the mechanism. Coupled with the animation, the crux of the argument is presented in the paragraph:
Adding salt to the system will also disrupt the equilibrium. Consider replacing some of the water molecules with molecules of some other substance. The foreign molecules dissolve in the water, but do not pack easily into the array of molecules in the solid. Try hitting the "Add Solute" button in the animation above. Notice that there are fewer water molecules on the liquid side because the some of the water has been replaced by salt. The total number of waters captured by the ice per second goes down, so the rate of freezing goes down. The rate of melting is unchanged by the presence of the foreign material, so melting occurs faster than freezing.
Here we get a better sense of why - namely it has something to do with equilibrium between the solid and liquid phases of water. They don't mention it in the paragraph, nor does the animation give any indication, but the solid phase of water (ice) is a crystal. Water molecules share electrons between oxygen and hyrdogen forming H20. In crystalline form, these molecules align to form an organized structure. Wikipedia starts heading us along an interesting route as a full third of the discussion of Ice Ih (common ice) is dedicated to Proton Disorder:
The protons (hydrogen atoms) in the crystal lattice lie very nearly along the hydrogen bonds, and in such a way that each water molecule is preserved. This means that each oxygen atom in the lattice has two protons adjacent to it, and about 101 pm along the 275 pm length of the bond. The crystal lattice allows a substantial amount of disorder in the positions of the protons frozen into the structure as it cools to absolute zero. As a result, the crystal structure contains some residual entropy inherent to the lattice and determined by the number of possible configurations of proton positions which can be formed while still maintaining the requirement for each oxygen atom to have only two protons in closest proximity, and each H-bond joining two oxygen atoms having only one proton. This residual entropy S0 is equal to 3.5 J mol−1 K−1. There are various ways of approximating this number from first principles. Assuming a given N water molecules each has 6 possible arrangements this yields 6N possible combinations. Given random orientations of molecules, a given bond will have only a ½ chance that it has exactly one proton, or in other words, each molecule has a ¼ chance that its protons lie on bonds containing exactly one proton, leaving a total number of (3 / 2)N possible valid combinations. Using Boltzmann's principle, we find that S0 = Nkln(3 / 2), where k is Boltzmann's Constant, which yields a value of 3.37 J mol−1 K−1, a value very close to the measured value. More complex methods can be employed to better approximate the exact number of possible configurations, and achieve results closer to measured values.
Hmmm. Entropy.. Equilibrium.. Sounds like Thermodynamics. Some more clicking gets us to an explanation in terms of Gibbs Free energy. The answer given on the page basically states that by the first and second laws of thermodynamics, salt melts ice. Why? Well, by the second law, entropy wants to increase and salty water has greater entropy than ice so it is a lower energy state, and for energy to be conserved (the first law), ice will turn into water to maintain the total energy of the system. This is certainly a more satisfactory answer than 'because water has colligative properties.'

But why does salty water have greater entropy than pure water? What is entropy?

To understand this, we have to think at the level of statistical mechanics. Entropy is a convenient probablistic measure of complicated stochastic processes. Entropy (in the thermodynamic sense) is a measure of disorder in a system. As we saw from the description of proton disorder in ice, entropy measures the number of different ways the components of a system can be arranged to firm different structures. Without diverging too far into a discussion of urns filled with different numbers of red and blue balls, it is easy to see that for ice to be ice there are only a relatively small number of ways by which the hydrogen and oxygen atoms of water can be arranged with respect to eachother. However, in liquid water, there are many many, but not an uncountably infinite number of, ways by which you can arrange the water molecules and still have liquid water.

The key thing about the role of entropy in melting snow is that each water molecule is indistinguishable from another. Thus, when considering the number of states (or arrangements) that would result in liquid water, swapping two molecules of water would not count as a new state. As you add a solute such as salt to the liquid the number of distinguishable states increase even as the total number of molecules in the system remain constant.

We began this search with a proof by definition, namely that salt melts ice because the fact that salt melts ice is a colligative property of water. We then saw found that salt melts ice because it minimizes the Gibbs free energy of the system. We then found out that this was due to increasing entropy. Along the way, I may have inadvertantly introduced a proof by obfuscation.

Of course, nothing I have said during my hand-waving explains why salt melts ice. But we know that it does. The thermodynamic explanation relies on a model of the underlying statistical mechanics - entropy being a descriptive statistic of an underlying complex system. And the underlying statistical mechanics is just a model of the statistical quantum dynamics, and so forth until turtles. Salt doesn't always melt ice, it is just very likely to do so. Much of microeconomics follows a similar path of evolving explanations, but the predictive power of the results depends on the quality of the model of the individual agents. Water 'wants' to maximize entropy and conserve energy, but not as much as people want to maximize a simple model of utility.


It would be remiss of me to end this Wikipedia link fest without mentioning Shannon entropy from information theory.
Comments (0)

Proof

22 Dec 2008, 15:10 PM

Via God Plays Dice I came across a great list of mathematical proof techniques, of which my favorite is probably Proof by Induction:

Proof by Induction

Proof by Induction claims that

math

where math is the number of pages used to contain the proof and math is the time required to prove something, relative to the trivial case.

For the common, but special case of generalising the proof,

math

where math is the number of pages used to contain the proof, math is the number of things which are being proved and math is the time required to prove something, relative to the trivial case.

The actual method of constructing the proof is irrelevant.

Great jokes don't need explanation, but I just can't help myself. The equation listed above is actually Faraday's law of induction, which describes how a changing magnetic field induces an electrical field. C.f. mathematical induction, which is a method of mathematical proof typically used to establish that a given statement is true of all natural numbers. It is done by proving that the first statement in the infinite sequence of statements is true, and then proving that if any one statement in the infinite sequence of statements is true, then so is the next one.

Mathematical induction is often misunderstood by those trying to apply it, often resulting in what Uncyclopedia calls Engineer's Induction:

Suppose P(n) is a statement.

  1. Prove true for P(1).
  2. Prove true for P(2).
  3. Prove true for P(3).
  4. Therefore P(n) is true for all math.

And to snuff out any life remaining in the joke, the technique above only proves P(n) for n = 1, 2 & 3. To make it a real proof you would need to show that if P(n) is true for some n then it must be true for n+1, which coupled with the proof for P(1) would show that it was then true for all natural n.


Comments (0)

Hold your horses

17 Dec 2008, 14:09 PM

I like Felix Salmon. He's a pretty awesome finance and economics blogger and on the few occasions I've spoken with him, he seems like a genuinely nice guy. But a recent post left me scratching my head:

John Gapper worries that taxing non-diet sodas would be regressive. I worry that it wouldn't even do what it is designed to do, which is reduce obesity: all it would do is increase the amount of diet sodas consumed, and there's a strong link between diet-soda consumption and increased obesity.

Unless and until there's some empirical evidence that switching from non-diet to diet sodas helps people lose weight, this should be filed under "very bad ideas".

If you follow that strong link link, you'll see:
Fowler is quick to note that a study of this kind does not prove that diet soda causes obesity. More likely, she says, it shows that something linked to diet soda drinking is also linked to obesity.

"One possible part of the explanation is that people who see they are beginning to gain weight may be more likely to switch from regular to diet soda," Fowler suggests. "But despite their switching, their weight may continue to grow for other reasons. So diet soft-drink use is a marker for overweight and obesity."

Right. Correlation is not causation. In fact, if the elasticity of demand for soda is not correlated with propensity for obesity, then a Pigovian tax on soda would make for a great instrumental variable to see whether diet soda has any causal relationship with obesity.


Today's xkcd comic is fantastic. Check it out.
Comments (4)

Wesabe + Perl + PostgreSQL

16 Dec 2008, 22:56 PM

I was looking for some code that would allow me to automagically populate my database with transactions from Wesabe. It needed to work with Postgres, and Perl would be a decent language for the job. I found it here. Enjoy.


Comments (0)

Morons

16 Dec 2008, 20:41 PM

It is good to know that $1,793 power cables are useful even without replacing the miles of standard copper between the power station and your house.

George,

Shortly after receiving two 5ft Quadlink power cables, I dragged them to my office. I am employed in a 24/7 engineering support environment and thought my "always on" computer would be a good starting point to burn in the cables. Working 12 hour night shifts I have grown accustomed to eye fatigue while working with a computer. Several busy days later (currently on my 9th straight 12 hour night) other priorities and a busy work schedule had caused me to forget about the cables burning-in beneath my desk. Yesterday I noted how uncharacteristically good my eyes felt. Today, just an hour ago, a coincidental look at my feet put me face to face with the Quadlink.

I have tried to identify another cause for this remarkable change but am unable to attribute the effect to anything but the Quadlink power cable. I trust you'll find that satisfying.

Needless to say I intend to purchase a replacement cable for my home audio use as soon as possible - I don't think this one will be leaving my computer monitor anytime soon. I am anxious to loan my cables to others in the office who have complained of similar eye fatigue problems.

Best Regards, Wade
See here. And then here.


WTF: Where does a night-shift support desk worker get the $ to spend on such crap?


Comments (0)

Benford's Law

14 Dec 2008, 19:28 PM

A few weeks ago at lunch we were discussing forensic accounting and Benford's law came up. I've always been a little uncomfortable with the law, which stipulates that the distribution of leading digits in 'natural' measurements follow a non-uniform distribution. I believed the law but never had a good grasp of why. In undergrad I recall proving the existence of the law by starting with the assumption that if such a law exists, then it must be scale invariant. That is, if there is some law that describes the way leading digits of measurements are distributed, then it shouldn't matter whether you measure the underlying quantity in Celsius or Farenheit; that is, it should remain after any linear transformation. From that point its pretty simple to derive the distribution. But it was a fairly weasely way to answer the question as I started with the assumption that the law must exist. Mathematical proofs frequently start with the assumption that the converse is true and then show that if the converse were true it would be non-sensical, hence the converse of the converse must be true - this is called proof by contradiction. Proof by assumption of truth is not really proof. Either way, my 'proof' was accepted and my discomfort with the law remained until today.

Last night I was browsing through Hacker's Delight, a compendium of neat coding tricks that would get you fired at most workplaces with the charge of being too clever and abstruse. And in my reading I came across Bedford's law and decided that today I would finally take the time to grok it. As has become my default browsing behavior, I started at Wikipedia and came across a description and picture that finally let the pieces click. I also got to feel a little bit better about the shame that I have carried from my undergrad 'proof', as they include my same argument as evidence of the law:

The law can alternatively be explained by the fact that, if it is indeed true that the first digits have a particular distribution, it must be independent of the measuring units used. For example, this means that if one converts from e.g. feet to yards (multiplication by a constant), the distribution must be unchanged -- it is scale invariant, and the only distribution that fits this is one whose logarithm is uniformly distributed.

To neatly close off the issue, I logged into a database at the office and grabbed 123 million financial values taken from around 65 thousand companies from around the world. It is nice having access to this kinda data. And lo and behold the distribution of leading digits follows the law perfectly. The black line in the picture at the top of this post is the observed frequency and the red dotted line is what the law would have one expect. Cool.

 digit   observed   expected 
     1 0.30200305 0.30103000
     2 0.17712325 0.17609126
     3 0.12545546 0.12493874
     4 0.09689768 0.09691001
     5 0.07977275 0.07918125
     6 0.06637933 0.06694679
     7 0.05717501 0.05799195
     8 0.05021505 0.05115252
     9 0.04497842 0.04575749

Comments (1)

Financial session tracking

6 Dec 2008, 17:35 PM

I have a love/hate relationship with Mint and Wesabe, two Web 2.0 online financial tracking applications. They do a few things well and have great potential but still manage to leave a bad taste in my mouth. The strength of both services is that they consolidate all of your transactions from your various banks and credit providers and present the information in a unified format. This alone is huge for me. Even within Chase's own website, I can't get a single consolidated view of all my accounts. However, the sizzle on the steak is that they take advantage of network effects in tagging your transactions. For example, when one user tags a cryptic line item such as '238927 12/04SOU EQUIN' as 'Gym Membership', all future transactions across all user accounts will benefit. That said, I still find most of my time using these sites (I have accounts on both) spent tweaking their automatic suggestions for tags. However I understand that me spending 5 minutes correcting their 10% classification error rate is better than spending an hour hand classifying every transaction [1]. I also take issue with some of the interface design choices, especially on Mint, whereby quick searching and sorting is prevented by their excessive use of 'pretty' transitions and AJAX style effects. I usually keep my mouth shut about UI issues, but given that I've spent the last week designing a visualization app, I feel suitably qualified.

But my biggest grief is one I raised both with the founder of Wesabe and on this blog at one time. I can't seem to find my original post on the topic, as it was lost when I went through the Blogger to Wordpress switch - I'll try and extract it later. But anyway, my point was that I was disheartened by the lack of double entry book-keeping. This problem manifested itself on Wesabe when it congratulated me for earning $5k when in reality I had actually just paid off a $5k balance on my credit card - net economic impact of far less than $5k. This annoys me.

This morning I was doing my semi-regular perusal of my accounts and came across a balance advance from a line of credit and I was curious as to where the money went. It is a royal pain to do arbitrary searches with either of the sites, so I opened up my SQL clone of their site that I scrape on a regular basis. From there it was easy to find all transactions within a time window that matched the dollar amount of the balance advance. This is a non-GAAP technique, but even for the 8 accounts my wife and I have, joining by transaction timestamp, amount and debit/credit is a pretty accurate way to build a financial journal - in the accounting 101 sense of the word 'journal.'

I don't know why these services don't offer this feature. It is easy to see useful extensions that bring much more color to our financial activity. Just in the same way that session management gives you oodles more insight into online marketing transactions than simply looking at individual requests in a web server log file. Now maybe I'm the only one nerdy enough to specifically request double entry accounting, but in these times everyone wants to know more about their financial data. Couple this data with logs from my cell phone bill, cable service, Netflix and we'd soon be able to fairly accurately recognize the rationale being each transaction with our banks. If only there was some kind of online Vault service that could own this data for me...


[1]: It is also really difficult to determine their false positive rate. It's easy enough to work out when they can't work out a tag, but sometimes it takes quite a while to realize a $20 purchase tagged as 'Pet Food' was really 'Movie Tickets'.

PS: Note to Mint.com, stop calling your inane pie-charts 'Trends', you are starting to piss me off.


Comments (1)
All previous posts

consulting

personal projects

etc