The Quality of Evidence

Mintaka · 4 November 2009 07:24

How much does it take for you to change your mind? How easily are you convinced of a new idea, concept or proposition? For most of us, it depends on how highly we rate the available evidence in support of such a proposal. In the one corner, there are those who are willing to accept notions like ESP based on anecdotes of abandoned family pets that mysteriously cross hundreds of kilometers to reunite with their masters. At the other extreme, some continue to deny the lunar landing in the face of some pretty sound evidence supporting the event.

So it looks like an individual’s acceptance of evidence is largely subjective. The very same tidbit of evidence can be considered by one person as overwhelming, while another pooh-poohs it altogether. Why is this?

Let us define the Quality of Evidence (Q_E) as the capability of the evidence to convince a given individual. The “weight” that the evidence carries, in legal jargon.

Note that same piece of evidence will possibly have a different Q_E between individuals.

Can the Q_E be quantified? What I would like to develop through this thread is a fun-but-sane expression to try and evaluate the Q_E for a given person. I’m hoping to learn from this why someone would take the Bible as gospel, but shun the Qur’an or the Grimm brothers. Or why evolution is still denied by some, and widely accepted by others.

As a starting point, and solely based on intuition I thumbsucked the following list:

Things that may influence the quality Q_E of evidence E that supports proposition P:

Authority of the source of E - large effect, direct.
Novelty of P - medium effect, inverse
Age of individual - small effect, inverse
Cultural support of P - large effect, direct
Agenda - huge effect, either way

Please add to the list more stuff that you recon could modify one’s acceptance of evidence, as well as the magnitude (small, medium, large etc) and the direction of its effect.

Mintaka

Mefiante · 4 November 2009 09:14

Studies in cognitive science that examine how beliefs come to be formed and held by people reveal that perhaps the single most significant factor is direct personal experience. (You may have meant something like this with the first item on that list – “Authority of the source of E”.) Evolutionarily, this makes good sense: not trusting your sensory input and/or deliberating for too long on how it was produced is a dangerous thing in many instances. Snap judgements on imprecise information, even if wrong in their details, will mostly serve the purpose of avoiding dangers. Thus, we simply can’t afford not to trust the accuracy of our perceptions, and this is why one’s own direct experience is such a biggie in belief.

Now not to confuse the issue with unnecessary pedantry, but “weight of evidence” has both a legal and a statistical definition, and these definitions aren’t the same. Nevertheless, both definitions serve the purpose of evaluating the plausibility of a given claim (hypothesis).

One headache that occurs in statistical hypothesis testing is the practically unavoidable occurrence of Type I and Type II errors. A Type I error is made when we accept a false hypothesis, and Type II happens when we reject a true one. But it’s not hard to see that these errors are just formal definitions of the errors we are prone to in ordinary evaluations of propositions we are faced with day-to-day: accepting nonsense or rejecting truth, as the case may be.

In a more rigorous setting, these errors can have serious consequences. For example, most epidemiological studies (looking for risk factors in a given clinical condition, e.g. the role of butter vs. margarine in heart disease) are done at a significance level α = 0.05. This means that, on average, one in every 20 studies will commit either a Type I or a Type II error. (Needless to say, this is not true where a well-established causal chain between risk factor and clinical condition has been identified, i.e. where the mechanism itself is understood by which the risk factor contributes to the clinical condition.)

These errors are the reason that repeatability is so important: Assuming comparable qualities, if ten studies show a positive effect, and one shows a negative, it is then reasonable to accept provisionally that a positive effect is indeed established. In contrast, if one has just two studies that tell conflicting stories (again assuming comparable worthiness), no conclusion should be drawn.

All of the above is a quite roundabout introduction to saying that an individual’s Q_E will be influenced by how easily he or she is swayed by a few observations. If one or two observations of something unusual (e.g. seeing a UFO) is sufficient to cement belief in some otherwise unproven explanatory hypothesis (e.g. visiting aliens), then the individual’s Q_E demand is low and he or she is likely to make Type I errors. But there’s a flipside to this. Certain beliefs are adamantly maintained even in the face of the most overwhelming evidence that the belief is erroneous. Therefore, there comes a point beyond which unreasonable conservatism effectively starts eroding a person’s Q_E demand again because the evidence’s actual quality is simply ignored or denied, making Type II errors likely. There is thus a region of optimal Q_E demand between the extremes of gullibility and unshakeable conviction.

'Luthon64

Mintaka · 4 November 2009 14:01

Thanks 'Luthon. Have included “personal experience”, and assigned some arbitrary numbers as indicative of supposed importance and direction of each term.

Second draft:

List of things that may influence the quality Q_E of evidence E supporting proposition P:

Reliability of the Source of E

Own experience = 10000
Trusted second party = 100
Neutral (unknown) second party = 10
Distrusted second party = 1

Agenda

Defensive (preconceivably agrees with P) = 1000
Neutral = 1
Offensive (preconcievably disagrees with P) = 0.001

Compatibility of P with current beliefs or body of knowledge

Current beliefs incompatible with P are cherished = 0.001
Current beliefs incompatible with P are provisionally held = 0.01
Current beliefs neither contradicts nor resonates with P = 1
Current beliefs compatible with P are held = 10

Knowledge of the subject matter of P (modifier to compatibility)
Poor = Compatibility score x 0.01
So-so = Compatibility score x 0.1
OK = Compatibility score x 1
Told you! = Compatibility score x 10
We’re not worthy… = Compatibility score x 100

Individualism = 1(culturally apologetic) to 10(very individualistic)

Conservatism = 0.01(very conservative) to 1(neophile)

Age of individual = (100-Age)

There is thus a region of optimal Q_E demand between the extremes of gullibility and unshakeable conviction.

Yes! Hopefully such an optimum score will coincide with the profile of a healthy skeptic. ;D

Mintaka

Irreverend · 5 November 2009 06:37

Hmm, log scales mostly. You mean to multiply partial scores together to get a composite score? If so the dynamic range would be huge!

Mintaka · 5 November 2009 19:06

You are right, Irreverend. It won’t evaluate very well at this stage. For now the numbers are just indicative of proposed relative importance, and are of course up for debate.

Well done and congratulations on the full membership!

Mintaka