Statistics question

SLAM: debunk creationism, pseudoscience, and superstitions. Discuss logic and morality.

Moderator: Alyrium Denryle

Post Reply
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Statistics question

Post by mr friendly guy »

I am in another debate which the opponent has gotten me sidetrack from the main thrust which he is currently getting hammered (which may explain why he focusses on this side track). I suspect he is trying to bury me in jargon because he made quite a few mistakes earlier such as misreading what I wrote several times, and failing to count properly. But going on

It concerns statistics and the term bias as its used. Basically we have two estimates.

The first method underestimates what it purports to measure. We both agree on that.
The second method can both underestimate and overestimates what it purports to measure. We both agree on this.

Being conservative in nature (in regards to estimation) I choose the first method. That is if subject x shows a higher value than subject y as measured by the first method, I can definitely say its higher because this method underestimates (subject x). Whereas with the second method, if subject x shows a higher value than subject y, hell if I know it really is higher because it could have overestimated (subject x).

He prefers the second method and argues because its unbias. That is bias in a statistical sense based on how its calculated and not the perjorative as the term is used in everyday language. This is actually dubious because if you follow the sources they mention how they correct for bias but can't completely. However that isn't my main question.

Now I thought bias in statistics simply refers to the fact that the estimated value is different from the real value. In which case, any estimation which overestimates or underestimate would be bias by definition. His reply was (and I have sniped snuff off)
Bias in statistics is when the Expected Value of the error term is positive. Or the Expected Value of the error term is negative. Which is not the same thing as saying any single sample could have either a positive or negative error term.
Am I missing something here?

PS - This guys argument style is eerily similar to people who bluff, ie use jargon without explaining what they mean and talking about how smart he is for knowing all these other terms, and how implies he has taken some mathematical course. Funny thing is he fails to understand basic algebraic concepts, like if you multiply a constant with a higher number, the product will end up bigger than if you multiplied the same constant with a lower number. This is why I suspect he is just trying to bury me with jargon.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
The Grim Squeaker
Emperor's Hand
Posts: 10319
Joined: 2005-06-01 01:44am
Location: A different time-space Continuum
Contact:

Re: Statistics question

Post by The Grim Squeaker »

Random Q: Are you talking Bias-Variance tradeoff in the context of machine learning/prediction? Bayesian methods?
(I.E - What's the background and exact methods discussed out of curiosity?)
Photography
Genius is always allowed some leeway, once the hammer has been pried from its hands and the blood has been cleaned up.
To improve is to change; to be perfect is to change often.
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

No I am not talking in the context of machine learning.

It was in GDP / capital nominal (first method) vs GDP / capita PPP (second method). Essentially if a country with a cheaper standard of living will have higher GDP in PPP terms than in nominal terms. As mentioned because PPP overestimates, one could make a case that if a figure shows subject x having a higher GDP / capita in PPP terms it might have overestimated, so we aren't really sure if the "hypothetical real" value is higher. In nominal terms this isn't a problem, although you run the risk that it might take longer to note when a country's GDP / capita has surpassed another one.

Going on, I am then also obliged to compare any two countries using one of those methods, ie I can't pick and choose PPP when it suits my argument that one has a higher GDP / capita and then use GDP / capita nominal when a different set of two countries are being compared.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
Magis
Padawan Learner
Posts: 226
Joined: 2010-06-17 02:50pm

Re: Statistics question

Post by Magis »

mr friendly guy wrote:It concerns statistics and the term bias as its used. Basically we have two estimates.
Minor nitpick: I think you mean to say that you have two estimators, not two estimates.
mr friendly guy wrote:He prefers the second method and argues because its unbias. That is bias in a statistical sense based on how its calculated and not the perjorative as the term is used in everyday language.
Your opponent is potentially correct on this point. The first estimator will be biased if it systematically under predicts the parameter being estimated. However, just because the second estimator can both under predict and over predict the parameter doesn't mean that it's unbiased.
mr friendly guy wrote:Now I thought bias in statistics simply refers to the fact that the estimated value is different from the real value. In which case, any estimation which overestimates or underestimate would be bias by definition. His reply was (and I have sniped snuff off)
Bias in statistics is when the Expected Value of the error term is positive. Or the Expected Value of the error term is negative. Which is not the same thing as saying any single sample could have either a positive or negative error term.
Your opponent is correct. A biased estimator is one whose expected value differs from the true value of the parameter.
mr friendly guy wrote:Am I missing something here?
Maybe?

Look at it this way. Let's say you're interested in the mean of some population, so you take some N samples of that population. The mean of the samples is not likely to be the mean of the population (example: the mean of 100 samples of human bodyweights is not likely to be the same as the mean human bodyweight of the entire human population). In fact, if you take N samples and calculate the sample mean, and then take another N samples, and calculate their sample mean, the two samples means will be both different from each other and different from the population mean, except in very convenient situations. However, the sample mean is an unbiased estimator of the population mean because if you were to take an infinite number of N samples,and then take the average of the infinite number of sample means, it would be the same as the population mean. This is equivalent to saying that the expected value of the estimator is the same as the true population parameter.

That should answer your question about unbiased estimators.

But the question I think you should be asking is, "Why is an unbiased estimator necessary for your discussion"? In fact, I'm not sure why the two of you are considering GDP/capita as estimators in the first place. They are estimating what parameter of a distribution? Estimators are functions to estimate population parameters using sample statistics. I'm confused by the entire statistical context of your debate, so maybe I'm the one missing something here.
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

Magis wrote: Your opponent is potentially correct on this point. The first estimator will be biased if it systematically under predicts the parameter being estimated. However, just because the second estimator can both under predict and over predict the parameter doesn't mean that it's unbiased.
Do you mean just because the second estimator can both under predict and over predict the parameter doesn't mean that its biased, rather than unbiased?
Magis wrote: Your opponent is correct. A biased estimator is one whose expected value differs from the true value of the parameter.
No that was what I thought bias meant as well. Its just that if a estimator overestimates or underestimates, why should it not be considered bias? When I queried that he just restated what I defined bias as, but in the manner which you see above. In other words he is not actually trying to explain it, but just restating a definition, which could be because he himself doesn't understand it and is bluffing.
Look at it this way. Let's say you're interested in the mean of some population, so you take some N samples of that population. The mean of the samples is not likely to be the mean of the population (example: the mean of 100 samples of human bodyweights is not likely to be the same as the mean human bodyweight of the entire human population). In fact, if you take N samples and calculate the sample mean, and then take another N samples, and calculate their sample mean, the two samples means will be both different from each other and different from the population mean, except in very convenient situations. However, the sample mean is an unbiased estimator of the population mean because if you were to take an infinite number of N samples,and then take the average of the infinite number of sample means, it would be the same as the population mean. This is equivalent to saying that the expected value of the estimator is the same as the true population parameter.

That should answer your question about unbiased estimators.
So an unbiased estimator is simply one where it should match the parameter we are measuring if we take sufficient samples, but it can be inaccurate if we take insufficient samples?
But the question I think you should be asking is, "Why is an unbiased estimator necessary for your discussion"?
That's a good point. He was the one who brought it up when he was losing which is something presumably he feels more comfortable discussing. I had simplistically assumed that bias would refer to when you overestimate or underestimate so I was puzzled why he kept on bringing it up.
In fact, I'm not sure why the two of you are considering GDP/capita as estimators in the first place. They are estimating what parameter of a distribution? Estimators are functions to estimate population parameters using sample statistics. I'm confused by the entire statistical context of your debate, so maybe I'm the one missing something here.
Would you like the long story or the short version? :D Essentially the GDP / capita issue was a side track from the main trust but has been hilarious for me as he misreads several things, quotes from sources and then rejects them when they don't give the answer he wants (no he says wiki sucks in the same post he uses a wiki link to justify his position and ignores the fact that I used the wiki page purely because it summarised all the results from various sources like the world bank, IMF conveniently on one page), believes he knows more than the IMF, and now is essentially saying the World Bank methodology sucks (even though he used the World Bank figures when it suited him).

Back to the original topic. He feels that the success of a country's cultural products in this case television shows / movies (ie who can we get to buy these), is predominantly dependent on that country having a high GDP / capita because people would like to copy the lifestyle of such a rich country. This is strange of course when you consider the most successful movie, Jame's Cameron Avatar had the American lifestyle of polluting your world and killing natives, and I am not sure how many people want to copy that lifestyle. :D In other words, not all successful films show a glamourous American lifestyle. Or to put it another way, the country with a lower gdp/ capita would tend to gravitate and buy films from one with a higher gdp / capita based purely that they are poorer than the rich country, and they want to ape that lifestyle of the richer country. To be fair, he later concedes thats not the only reason but he feels its a very important one.

I feel the success of a film was mainly dependent on production values (ie special effects, script, etc) and that being rich ie having a high GDP / capita is less important than market size. So I feel having a high GDP/capita can contribute, but not in the same way he does. Whereas he feels just having one makes people want to copy the lifestyle of the rich country and buy their stuff, I feel its mainly secondary to allowing them to put in high production values in their shows/movies.

Going on, as I said having a high GDP/capita is actually secondary to a large market. For example China has a "middle income" range of GDP / capita while Australia has a high income GDP / capita. Yet Chinese films (if you watch the higher budget) have a much higher production values than what we churn out because of their market size. To go another direction, Norway has a higher GDP per capita, yet in Australia we can't actually watch Norway films in our cinemas. We show Chinese and Indian films in selected major cinemas, even though they have a much lower gdp/capita than us, whereas Norway is higher than us.

Now that I have set the stage, lets explain where this sidetrack came from. I pointed out Australia buys predominantly American shows both when our gdp / capita was lower and we still continue to do it even though in the last few years we are higher. If my hypothesis is correct, we should still continue to buy their shows as long as their shows have higher production values than ours despite us having a higher gdp/capita. If his hypothesis is correct, we should stop buying / buy less because we now have a higher gdp/capita and now longer need to ape their inferior lifestyle.

So it becomes necessary for him to argue that Australia has a lower gdp / capita. The problem is most major sources from the World Bank, IMF, UN, CIA factbook all list us as having a higher GDP /capita than the US. At least in nominal terms. He tries to dispute this in two ways

1. All those sources are bullshit - because they don't show a drop in Australian GDP/capita with the fall in the Australian dollar. They do (he simply can't count), but then he comes up with, "ah but they don't fall enough". This comes about because those sources uses an average value of exchange rates to counter fluctuations whereas he prefers any exchange rate number which gives him victory. At this point he has disputed World Bank methodology even though earlier he used world bank figures. Go figure. :D This is where he is losing quite bad so we get to the second point.

2. He will at the same time argue purchasing power parity is a better measure of gdp/capita because it shows the US having a higher gdp/capita in PPP terms.

Going on, unlike your bodyweight example, its harder to get an objective number of size of economy. Let me explain. If we have two countries we can simply take their GDP in their local currency and using an average exchange rate convert it to a third currency usually US dollars. We can therefore compare who has the largest economy. We can also do that with GDP / capita. This is assuming of course both countries trade internationally, unlike say the old USSR. This is of course GDP/nominal. Now some countries have a lower cost of living, so one dollar in China buys more than one dollar in the US. So economists try to take this into account and come up with what's called purchasing power parity. PPP has the risk of overestimating a country's economy (famously when the World Bank cut China's economy by 40% in 2008 in PPP terms) and then increased it again in 2014 in PPP terms (hence it had underestimated) it. So unlike body weight, a kilogram in China is the same as a kilogram in the US, but a dollar in China isn't the same as a dollar in the US, in the sense it can buy more.

Now obviously this "real value" of the economy is hard to pin down. Basically if PPP has country x has a larger gdp/capita, for a given year, is it really? Could be. But we could have overestimated it as witnessed by those revisions I talked about. If nominal measurements has country x with the larger gdp/capita, I am more certain this is correct because it underestimates it. For this season and being a conservative fellow myself, I simply just use the nominal figure. Obviously his entire argument is now dependent on Australia not having a higher gdp/capita and is essentially saying despite all this, PPP is an unbias marker so should be used irregardless of whether it overestimates or underestimates. As I said I had simply assumed bias means that the value of gdp/capita given in either PPP or nominal terms is not the same as the "hypothetical real value" so was puzzled as to why he kept on insisting it was non bias. When I asked he simply restates the definition in a different manner.

I suspect based on what you are saying, he is using estimators in an incorrect fashion.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

1. All those sources are bullshit - because they don't show a drop in Australian GDP/capita with the fall in the Australian dollar. They do (he simply can't count), but then he comes up with, "ah but they don't fall enough". This comes about because those sources uses an average value of exchange rates to counter fluctuations whereas he prefers any exchange rate number which gives him victory. At this point he has disputed World Bank methodology even though earlier he used world bank figures. Go figure. :D This is where he is losing quite bad so we get to the second point.
It was too late to edit at this point, but I feel I haven't done justice to the stupidity displayed.

He also quotes from a real source which he defines as one "which moves markets". Unfortunately his source shows the Australian GDP /capita even higher than the IMF figures which doesn't support his position. Hilarity ensures when he denies the figure is higher. Let that sink in for a moment.

He also argues that with his exchange rates, the Australian gdp/capita should drop by a certain amount. Unfortunately for him when you plug in his figures, it still comes out higher than US gdp/capita.
This he ignores when pointed out and pretends I am refusing to give my own exchange rate (why should I when I am using his).

This might explain why he is now focussing on PPP being a non bias marker, because all his other moves look pathetic.

Edit - and before someone asks why I am still debating this guy, well its because for now its still a lot of fun.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
Ziggy Stardust
Sith Devotee
Posts: 3114
Joined: 2006-09-10 10:16pm
Location: Research Triangle, NC

Re: Statistics question

Post by Ziggy Stardust »

mr friendly guy wrote: Do you mean just because the second estimator can both under predict and over predict the parameter doesn't mean that its biased, rather than unbiased?
Remember what an estimator is. It is a decision rule for calculating an estimate of some unknown population parameter based on observed data. Very rarely will the realized value of your estimator (that is, your estimate) actually coincide with the true value of the population parameter. An estimator is a RANDOM value (i.e. you will get a different value every time you calculate it). When you say that an estimator "can both under predict and over predict" the parameter, you are abusing terminology; all you are saying is that the interval over which the realized values of the estimator can vary straddles the true value of the parameter. Which typically (though not definitively) implies unbiasedness by definition.
mr friendly guy wrote: No that was what I thought bias meant as well. Its just that if a estimator overestimates or underestimates, why should it not be considered bias? When I queried that he just restated what I defined bias as, but in the manner which you see above. In other words he is not actually trying to explain it, but just restating a definition, which could be because he himself doesn't understand it and is bluffing.
I think you are conflating the idea of bias with that of precision. Try and think about it a different way.

Think about a dart board. Say the bulls-eye of the dartboard is the true parameter value, and each player is a different estimator. If one player (i.e. estimator) hits the bulls-eye (or very close to it) on every shot, then they are both accurate and precise. If the second player never hits the bulls-eye, but their shots tend to cluster around the bulls-eye, you would say they are accurate but not precise. Both of these players are unbiased with respect to the bulls-eye, but one is simply more precise (i.e. less variability). Another player may always hit the same spot on the dartboard about 6 inches away from the bulls-eye; we would say this estimator is precise but not accurate. This would be a biased estimator. If the last player is neither accurate nor precise, their shots would hit the dartboard completely randomly and that estimator would have little or no power at estimating the parameter value. Here is a little picture that illustrates this.

In your case, you have two estimators. One of them is biased, i.e. inaccurate, but precise. It always hits a spot on the dartboard that 'underestimates' the true parameter value. The other one, so far as I can tell from what you describe, is accurate but imprecise. It doesn't hit the bulls-eye directly, but clusters around it. It is accurate, but not precise.

Now, depending on the circumstances and what exactly you are measuring, either estimator may be more useful. The accurate/imprecise one is useful because it is unbiased; however, bias is an asymptotic property by definition. That is, as the size of your measurement set increases, the estimate will converge towards the true value of the parameter. However, with smaller sample sizes (or even a single realized estimate), your ability to make strong inference is limited by the variability of that estimator. The other estimator may be biased, but it is precise and consistent with respect to that bias. That is, in a small sample size situation, it will likely be more useful for inference than the unbiased estimator; since the variability is lower, and the bias measurable and consistent, you may be more accurately able to find the true value of the parameter using the biased statistic. If the bias is always, for a meaningless example, 5, you know that no matter what you get for that estimate you can just add 5 and have a reasonable guess for the parameter value.

For a more rigorously mathematical example, look at the Poisson distribution. If you are trying to find the probability of no events occurring over some given number of time units (minutes, days, whatever), the maximum likelihood estimator for a Poisson process is biased. The variance of the unbiased estimator is so high (i.e. it is so imprecise), that the biased estimator actually has a smaller mean squared error, making it more useful for inference.
mr friendly guy wrote: So an unbiased estimator is simply one where it should match the parameter we are measuring if we take sufficient samples, but it can be inaccurate if we take insufficient samples?
Kind of. This is where we need to differentiate between the concepts of bias and consistency. An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of the sampling distribution of the estimator is equal to the true parameter value. Note that this does not imply that any given realized value of the estimator will be equal to the parameter value. On the other hand, an estimator is consistent if, as the sample size increases, the estimator converges to the true value of the parameter being estimated. Note that this also implies that the bias converges to 0. If an estimator is consistent, it is asymptotically unbiased; any individual estimate may be biased, but the bias of a sequence of these estimates converges to 0. An estimator can be unbiased but not consistent, and an estimator can be biased but consistent. Here is a short article with a Python script that works through an illustrative example of this difference.
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

I am starting to think my opponent is using the bias term and estimators in a context is wasn't suited for. Which I suppose goes on with my original assumption that he was trying a tangent and throwing jargon out. I had made the mistake of conflagurating bias with precision, and initially thinking he was making a reasonable argument at face value.

It sounds like these cases are used when we measuring things with a) the same units b) we derive things from a sample size to put it simply. For example comparing two estimators to work out the average mass of a population. I don't think it works the same way in the economic example. For two reasons

a. We aren't measuring the average person's wealth in a country from a sample size, and then extrapolating it for the whole country. We are measuring total wealth of a country and then dividing it by the population. In other words we have the ENTIRE population already.

b. The two methods of GDP nominal and GDP PPP are in different units, unlike the say a bodyweight example. Here is where the main confusion I think comes about. This come about because $1 in country A may buy more than $1 in country B. Unlike bodyweight, where a kg in country A is the same as a kg in country B. The reason we do GDP PPP is because of this problem, and we acknowledge that our primary unit in GDP nominal, eg the US dollar isn't a true reflection. In fact using the above terminology, it would have problems with accuracy and precision because in a country like China, $1 would buy more, but in a country like an EU one, $1 would buy less.

So basically if we don't try and take into account cost of living standards, we could underestimate the true value of an economy (with a lower cost of living standard) relative to one with a higher cost of living standard. For example China has a lower cost of living to the US. If we compare economy sizes without taking into account lower costs of living, China comes up around 50% of the US. If we try to take the living costs into account, we can say that relative to the US, China would be actually be more than 50% of the US size. How much more? Maybe 70, 80 percent of the US? Maybe it comes out even bigger than the US. This is especially problematic when the measurement used to correct for lower living standards has been revised several times. For example China by this measurement became 40% smaller in 2008 (because they had overestimated its economy), and then revised upwards for 2011 figures by 20% (so they had in turn over corrected and underestimated). So by their own standards, they can overshoot and undershoot their mark. There lies the crux of the problem and where my argument stems from. Going on.

What PPP is trying to do is finding a hypothetical "true value" of the economy taking into account this fact (lower or higher costs of living) into account. The unit is not in US dollar, its in international dollars. Now to get the data to do this conversion, they very well have to take sample of sizes of the price of goods, and depending on how they do it, those estimators may very well be unbiased. However the strength of the estimator is in the value of products it is studying. I don't think this follows in working out economic size. An obvious reason why is you are going to run into problems with the fact that different countries prefer to consume different goods.

So in a sense we are trying to estimate the "true value" of an economy by taking into account costs of living, we aren't doing it from a sample size of people's wealth, nor are we even measuring things in the same unit.
Now, depending on the circumstances and what exactly you are measuring, either estimator may be more useful. The accurate/imprecise one is useful because it is unbiased; however, bias is an asymptotic property by definition. That is, as the size of your measurement set increases, the estimate will converge towards the true value of the parameter. However, with smaller sample sizes (or even a single realized estimate), your ability to make strong inference is limited by the variability of that estimator. The other estimator may be biased, but it is precise and consistent with respect to that bias. That is, in a small sample size situation, it will likely be more useful for inference than the unbiased estimator; since the variability is lower, and the bias measurable and consistent, you may be more accurately able to find the true value of the parameter using the biased statistic. If the bias is always, for a meaningless example, 5, you know that no matter what you get for that estimate you can just add 5 and have a reasonable guess for the parameter value.
I did use a similar argument why I preferred to use GDP nominal over PPP. If we are purely in the business of saying which economy is bigger (in this case GDP / capita), I can definitely say that the measurement which underestimates (because it fails to take into account living costs) would definitely tell me if one economy is bigger than the other (assuming I am comparing a lower cost of living country to a high cost country). If the lower cost of living country has a higher GDP nominal, it will definitely have the larger economy (assuming no errors in our measurements). Whereas if a country has a higher GDP PPP, I am not sure if its necessarily bigger or smaller (witness the revaluation of China's economy as an example). My opponent and I both accept that the properties of these measurements, but we obviously disagree on its implications.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
Ziggy Stardust
Sith Devotee
Posts: 3114
Joined: 2006-09-10 10:16pm
Location: Research Triangle, NC

Re: Statistics question

Post by Ziggy Stardust »

mr friendly guy wrote: a. We aren't measuring the average person's wealth in a country from a sample size, and then extrapolating it for the whole country. We are measuring total wealth of a country and then dividing it by the population. In other words we have the ENTIRE population already.
This still requires knowing the population size as a parameter. In order to get the wealth/pop quotient, you are necessarily removing a degree of freedom from your data, as you need to estimate the population size. Thus, standard notions of bias still apply.
mr friendly guy wrote: b. The two methods of GDP nominal and GDP PPP are in different units, unlike the say a bodyweight example. Here is where the main confusion I think comes about. This come about because $1 in country A may buy more than $1 in country B. Unlike bodyweight, where a kg in country A is the same as a kg in country B. The reason we do GDP PPP is because of this problem, and we acknowledge that our primary unit in GDP nominal, eg the US dollar isn't a true reflection. In fact using the above terminology, it would have problems with accuracy and precision because in a country like China, $1 would buy more, but in a country like an EU one, $1 would buy less.
The easiest statistical solution to this (if I am understanding you correctly) would simply be to convert GDP nominal and GDP PPP into an ordinal/rank scale. While $1 may not be worth precisely the same in the US or China, you can compare the variable worth of the dollar by means of relative rankings (i.e. in what country is $1 worth the most? the second most? etc.).
mr friendly guy wrote: For example China by this measurement became 40% smaller in 2008 (because they had overestimated its economy), and then revised upwards for 2011 figures by 20% (so they had in turn over corrected and underestimated). So by their own standards, they can overshoot and undershoot their mark. There lies the crux of the problem and where my argument stems from. Going on.
That has nothing to do with bias. Overshooting or undershooting the predicted value of a predictor is a function of variance, not bias. A completely unbiased predictor is still capable of both overshooting and undershooting. Bias refers to a systematic deviance. That is, say the true mean in your example is 30%. Variance describes the degree to which your unbiased predictor varies around 30% (from 20% to 40%). However, if your predictor is biased (by, say, +10%), than the variance of this predictor varies around 40% (from 30% to 50%). Get it?
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

Ziggy Stardust wrote:
This still requires knowing the population size as a parameter. In order to get the wealth/pop quotient, you are necessarily removing a degree of freedom from your data, as you need to estimate the population size. Thus, standard notions of bias still apply.
This is a good point, although both measurements most probably have the same population. Its the size of the economy that is different. However it does mean that while their biases "cancel out" or rather they both have the same biases in this respect so its silly to point out one is less bias than another using population size; it does mean that his claim that only one measurement is non bias is dubious at best.
Ziggy Stardust wrote:

That has nothing to do with bias. Overshooting or undershooting the predicted value of a predictor is a function of variance, not bias. A completely unbiased predictor is still capable of both overshooting and undershooting. Bias refers to a systematic deviance. That is, say the true mean in your example is 30%. Variance describes the degree to which your unbiased predictor varies around 30% (from 20% to 40%). However, if your predictor is biased (by, say, +10%), than the variance of this predictor varies around 40% (from 30% to 50%). Get it?
I think I am getting it. In which case I would argue that the variance is the bigger problem based on what we are arguing (which includes criteria my opponent has set). He obviously doesn't think so, but damn I am pretty sure he is using the word bias in a perjorative everyday English sense rather than as a statistical sense as he is claiming.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
madd0ct0r
Sith Acolyte
Posts: 6259
Joined: 2008-03-14 07:47am

Re: Statistics question

Post by madd0ct0r »

since neither of you are economists, I'd recommend taking world bank gdp pp data, and if that is unavailable, look at http://search.stlouisfed.org/search?&cl ... =GDP%20ppp

This is one the world standard economic data resources, used for all sorts of models (like how many billion dollar powerplants we need to build)
"Aid, trade, green technology and peace." - Hans Rosling.
"Welcome to SDN, where we can't see the forest because walking into trees repeatedly feels good, bro." - Mr Coffee
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

Long answer - from what I understand its debated among economists whether PPP or nominal is the better measure. Certainly China uses nominal (it didn't say it was the 2nd largest economy until it surpassed Japan using nominal measures, whereas it would have surpassed years earlier using PPP). Strangely enough Japan also didn't concede until it fell behind in nominal measures. Funny enough I don't see many sources saying India is the third largest economy either (it would be in PPP terms, but in nominal terms India isn't even in the hunt sitting at number 10).

I bet you when China surpasses the US in PPP terms (possibly at the end of this year) the US and China would still say, no "USA is still number one." I feel for various reasons stated its safer to go with nominal. My opponent of course was quite happy to debate nominal terms, until they no longer give him the answer he wants and now insists on PPP. Its clear its just a ploy to pick whichever number suits him.

Secondly there is no point using the World Bank figures. Because he already quoted the World Bank figures when they suited him and now rejects their methodology when it doesn't give him the answer he wants. Every time I rub this little fact in, he just runs away.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
madd0ct0r
Sith Acolyte
Posts: 6259
Joined: 2008-03-14 07:47am

Re: Statistics question

Post by madd0ct0r »

If he rejects world bank methodology and has a better method he shouldn't be pissing around the internet when he could be working at goldman sachs.

PPP has its uses - its what the models for appliance ownership (like fridges) use, and seems to be especially relevant when the product in question can be produced domestically AND imported. That does make it the relevant measure to your argument, although I personally think you're both completely wrong. Films are sold where there's marketing AND a market for them. It's not as simple as 'american films'. There's summer blockbusters; effect heavy plot light extraganzas like gravity; thrillers (universally understood), indie comedies (which are harder to escape their home country based on cultural jokes) ect ect ect.

If the film industry in a country is small, they tend to produce stuff that's cheaper to make so more plot or comedy or drama driven. Actors are cheap, explosions less so. Some of those types of films do well internationally, some don't. I've been mistaken for Mr Bean in every fucking country I've ever been to, and that's really not because they want to be like him.
"Aid, trade, green technology and peace." - Hans Rosling.
"Welcome to SDN, where we can't see the forest because walking into trees repeatedly feels good, bro." - Mr Coffee
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

madd0ct0r wrote:If he rejects world bank methodology and has a better method he shouldn't be pissing around the internet when he could be working at goldman sachs.
Buddy, he makes simple mathematical mistakes so of course he won't be working at Goldman Sachs.

He rejects the World Bank methodology in regards to how we convert currencies. That is the World Bank averages out exchange rates for its conversion. He prefers to use which ever recent exchange rate (over a brief period of time) because gives him an advantage, although when press he will say "ah but I will compromise and use an average exchange rate for a longer length of time" (obviously not as long as the World Bank one).
PPP has its uses - its what the models for appliance ownership (like fridges) use, and seems to be especially relevant when the product in question can be produced domestically AND imported.
Sure. I will however note that the World Bank definitions of what makes a country a "low income", "middle income" and "high income" (ie rich) country is not defined in PPP terms though.

That does make it the relevant measure to your argument, although I personally think you're both completely wrong. Films are sold where there's marketing AND a market for them. It's not as simple as 'american films'. There's summer blockbusters; effect heavy plot light extraganzas like gravity; thrillers (universally understood), indie comedies (which are harder to escape their home country based on cultural jokes) ect ect ect.
American films were only used as an example because they are sold throughout the world. The same principle we are arguing could be applied to other countries' films. I just don't believe people watch American films because "America is rich" or the films predominantly show "rich American lifestyle," which poorer people would want to emulate. Heck if people wanted to become enamoured to rich lifestyle, they can go watch "lifestyle of the rich and famous" rather than Godzilla.

I still think production values (script, special effects etc) win out. If I were to compare an American film to say a film in a non English language, the American film automatically has an advantage because of the wide spread knowledge of English in countries which don't speak it as a native language. So a way to get around this, is to simply compare an American film with high production values with an American film with poor production values ie a B grade film. We both know the B grade movie loses out.

I would go further an argue that if we to compare an American film with a foreign film in that market (and lets assume that market has people with limited English and they have to watch it with subtitles), the American film would still do very well despite the disadvantage of the language barrier. This appears to be the case if we compare US films in the next two largest movie markets (behind the North American one), that is Japan (number 3 market as I write this) and China (number 2 market as of 2013). Arguably both these countries will have a bit of a language barrier, yet the top grossing films in those nations are American ones. For Japan (a high income country) American films hold spots 1,3,4,6,10,11-15,17,19-20 for the top 20 (http://en.wikipedia.org/wiki/List_of_hi ... s_in_Japan) and for China (a middle income country) American films hold spots 1,3,5,9 for the top 10 (http://www.hollywoodreporter.com/galler ... 19-million). Now my opponent could say that the strength of American films in these markets is due to America being rich, with high GDP/capita, and this allows it to overcome the language barrier, cultural jokes and norms of that culture etc. Although he they might run into problems why American films do better in richer Japan though since being richer you would expect them to be less wanting to ape the American lifestyle.
If the film industry in a country is small, they tend to produce stuff that's cheaper to make so more plot or comedy or drama driven. Actors are cheap, explosions less so. Some of those types of films do well internationally, some don't. I've been mistaken for Mr Bean in every fucking country I've ever been to, and that's really not because they want to be like him.
Of course. But if you have the means to produce high production values, you also have the means to hire more writers, would not need to compromise too much on the writers vision simply because you couldn't create the effects required etc. This would arguably make the final product not only better, but generate more revenue (whether its profitable is another matter).
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
madd0ct0r
Sith Acolyte
Posts: 6259
Joined: 2008-03-14 07:47am

Re: Statistics question

Post by madd0ct0r »

But if you have the means to produce high production values, you also have the means to hire more writers, would not need to compromise too much on the writers vision simply because you couldn't create the effects required etc.
I disagree with your model that domestic GDP (whether PPP, /capita or absolute) is proportional to the film financing sector which is proportional to how well those films do internationally.

The latter link is probably true. The more money there is in the the sector, the more money the publisher can afford to spend on advertising (on average). Heck, international advertising might even be financed off the back of better then expected domestic profits.
One caveat would be if two countries both had $2billion film sectors, but one has 2 publishers and the other 2000, I'm going to bet that the first one has a MUCH bigger advertising budget per film. That is what I would argue drives bums on seats, because looking at those lists, it sure isn't quality writing or acting.

The first link is problematic. You'd have to demonstrate correlation between a country's gdp (of whateve type) and the size of it's film industry.
"Aid, trade, green technology and peace." - Hans Rosling.
"Welcome to SDN, where we can't see the forest because walking into trees repeatedly feels good, bro." - Mr Coffee
User avatar
mr friendly guy
The Doctor
Posts: 11235
Joined: 2004-12-12 10:55pm
Location: In a 1960s police telephone box somewhere in Australia

Re: Statistics question

Post by mr friendly guy »

Actually he has to show the link, since he is the one postulating that higher gdp/capita means more people buy your films. Whereas I think more money in film production lead to films doing better. One doesn't necessarily need a large GDP/capita to churn out high budget films. If you have a large market, say China they can do a few big budget films themselves (and have done) even though they are a middle income country.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.

Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
User avatar
madd0ct0r
Sith Acolyte
Posts: 6259
Joined: 2008-03-14 07:47am

Re: Statistics question

Post by madd0ct0r »

ok, so you'd have to show stronger correlations between the film sector size and their results then he can between gdp/capita and their results.
That's going to be a fun data set to build :)
You'd need to avoid selection bias (simply limiting it to the top 30 would give you false correlations)
You'd also need the size of those industries when each film came out - many of the top grossers have been pulling in money for years, so you'd need to account for that.
Finally, you'll probably need to do a multi-variate regression to account for the probability that you are both correct. A sensitivity check might give one of you bragging rights if their factor is significantly better at predicting behaviour then the other.
"Aid, trade, green technology and peace." - Hans Rosling.
"Welcome to SDN, where we can't see the forest because walking into trees repeatedly feels good, bro." - Mr Coffee
User avatar
Ziggy Stardust
Sith Devotee
Posts: 3114
Joined: 2006-09-10 10:16pm
Location: Research Triangle, NC

Re: Statistics question

Post by Ziggy Stardust »

You could also frame it as an analysis of variance type question, though doing so would require some sort of coarse grouping (i.e. countries with "big" film sectors and ones with "small" film sectors, based on some pre-determined cut-off). If you have some reasonable line you can draw to clearly differentiate countries into groups like that, an ANOVA might give you useful information, in conjunction with a fairly standard linear regression model.
Post Reply