This TARSK (Things All Researchers Should Know) looks at the use of the word proof in market research.

The first point, and one widely agreed since the writings of Karl Popper, is that empirical observations can never prove that a universal rule is true, all observations can do is provide evidence. As the evidence for something grows then we start to assume that ‘for all practical purposes’ it is true. The scientific method, post Popper, is to work on the principle of falsifiability, i.e. a theory should be expressed in a way that allows it to be proved to be wrong.

Quite often the everyday language we use can clouds issue related to truth and proof. For example, in 1954 Roger Bannister became the first man to be officially recorded as running a mile in under four minutes. This was widely reported as proving that man could indeed run a mile in under four minutes, something which had been previously doubted. However, it is probably clearer to see Bannister’s run as disproving the theory “All men take four or more minutes to run one mile”.

To put this in a market research context, let’s look at two possible proof related questions.

“Prove that some of XXXX’s customers are satisfied” To do this all we have to do is find two (one if we say one is some) people who genuine customers and who are satisfied. This is a task that research can do – although we might want to caveat it to say they we have proved some XXXX customers say they are satisfied. In a media case Ryanair recently tried to claim the flat of a journalist who bet her flat that no Ryanair customer had every said they were satisfied.

“Prove that most XXXX customers are dissatisfied”. This would only be possible if we could ask a census of XXXX customers, or failing that a number in excess of 50% if the dissatisfaction turned out to be very high. This census would have to take place at a single moment in time, so is not possible for most non-trivial cases. To test this sort of proposition we typically conduct research on a sample. If we interview a sample of say 1000 customers, out of a total of 2 million customers, and we find that ALL 1000 are dissatisfied, we have not proved that a majority of the 2 million are, we have a finding that makes it highly probable that a majority of the 2 million are dissatisfied.

Quite often, in market research, we are interested in comparing two methods, for example two methods of delivering a product, two types of advertising, or even two methods of market research. We can explore the concept of proof further by looking at this issue of two research methods.

If we have two methods for research, let’s call them A and B) and we test them to see if they give the same result we might find A=B (A is equal to B) or A≠B (A is not equal to B). These two results give two different inferences:

A=B implies that sometimes A gives the same result as B, but it does not imply that we know that A sometimes gives a different result to B. It could be that A always gives the same result as B, or it could be that A only sometimes does).

A≠B implies that sometimes A gives a different result to B, but it does not imply that we know that A sometimes gives the same result as B. It could be that A always gives a different result to B, but it could be that A sometimes gives the same result as B.

To put this into the context of a yesterday’s post, where I pointed out that Gongos Research had NOT proved the validity of smartphone research, we could see A as a traditional method and B as smartphone research. If the results from a test (or some tests) show that A is sufficiently similar to B for us to say A=B (in a practical or commercial sense) then we have shown that sometimes A gives the same results as B. The more tests that are conducted, the more confident we will be that A=B, if the results keep going the right way. However, no amount of testing will every prove that A=B, it will simply make it more likely to be true. However, one result where A≠B will mean that we are sure that A does not always give the same result as B.

So, if we do an ad test with 100 people and all 100 score it in the top box, we have not proved that people like it, we have proved that the 100 people in the test ticked the top box, and we think it is very, very likely that a large proportion of the wider proportion will like it.

If you are interested in tests that compare methods you might also be interested in TARSK 16, which looks at why showing that two results are not significantly different does not necessarily imply they are similar.