Living in a Bayesian world : scientific deduction through induction

Agrawal, Anurag (2012) Living in a Bayesian world : scientific deduction through induction Current Science, 102 (5). pp. 676-677. ISSN 0011-3891

Full text not available from this repository.

Official URL: https://www.jstor.org/stable/24084451

Abstract

Every day we wakeup with an existing model of the world in our head, incorporating any new data into the model, with a nip here and a tuck there, permitting us to seamlessly (and subconsciously) estimate likelihood of a variety of events. This system has served us well in understanding the world we live in, but is difficult to apply towards objective, unbiased assessment of scientific problems. Almost by definition, the models we use to understand the world are subjective, emerging inferentially from our cumulative experiences. For example, that the sun rises every day from the east permits us to infer that it should rise every day from the east, and every such observation further strengthens the inference. While this approach, referred to as induction, i.e. where a set of repeated observations allows us to infer or induce a larger relationship, is powerful and natural, it is not suited to making sense of solitary sets of experimental data. Experimental science is supposed to be objective with rigorous testing of hypotheses through welldesigned experiments. In this approach, we do not prove the hypothesis to be true, but rather through a series of falsification tests, try to reject the hypothesis. For example, we could set the hypothesis as that the sun may rise from any direction. Within a few days of observation, we could reject this hypothesis since the observations would be extremely unlikely under this hypothesis. Similarly, any hypothesis other than the sun rising in the east, could be rejected. The obvious limitation of this approach is that while it does exceedingly well at identifying anomalies between the data and the hypothesis, it does not necessarily provide a resolution. Blind application of such methods may lead us to reject a hypothesis (typically the null hypothesis) because it appears unlikely without much consideration to the likelihood of the alternate, which unfortunately often remains unstated. Sherlock Holmes is supposed to have said that once you eliminate the impossible, whatever remains, however improbable, must be true. He sees the world in black and white – impossible and possible. Shades of grey corresponding to degrees of probability are ignored. That corresponds to deductive logic where hypotheses are tested and eliminated. An example of a different, more inductive, approach would be of a physician examining a patient. A number of competing hypotheses (diagnoses) emerge at every step of the encounter, starting from watching the patient walk in, to eliciting a medical history, performing the examination and ordering and interpreting laboratory tests. Each piece of information changes the model, dictates the next piece of data required, and all conclusions are nuanced by probabilities. In this approach, assessment of prior probabilities of different diseases is important, corresponding to medical wisdom like ‘common things are common’ and ‘when you hear hoof beats, think horses, not zebras’. Thus a strongly positive syphilis test in a nun could be ignored, even if only 5% of uninfected people have a positive test, if there appears to be no good reason to pursue it further. It would be valid to say that because the disease is extremely unlikely in a nun, it is more likely that the test is a false positive, even though false-positive tests may be unlikely when seen as an independent hypothesis. A different observation in the patient described above, like a tattoo in a private area, may again shift the balance and lead us towards different conclusions, even if the patient claimed to be a nun. Each piece of data acts upon a prior probability distribution to yield a posterior probability distribution. It takes strong new information to significantly shift a strong prior probability. Mathematical structures that capture the essence of this relatively more complex reasoning system exist, generally referred to as Bayesian methods. Application of these methods towards scientific deduction is challenging but possible, and will be briefly discussed in the context of common problems in biological sciences. Thomas Bayes, after whom the Bayesian methods are named, lived more than two and a half centuries ago. For two events A and B, he addressed the relationship between probability of event A given that event B exists [P(A|B)], and probability of event B, given that A exists [P(B|A)]. This is a common problem in science, where we would like to know the unknown probability of model A, given data B, but instead rely upon calculation of the exact probability of data B, given model A. It should be obvious that the P value, based on which most scientific decisions are made, is a special case of P(B|A), where B is the data distribution and A the null hypothesis. Yet it does not allow us to understand P(A|B), i.e. what is the probability of the null hypothesis being true. To illustrate, P = 0.05 only means that assuming null hypothesis to be true, i.e. model A, the probability of a data distribution event similar or more extreme to the one seen, i.e. B, is 5%. To think that this implies that the probability of the null hypothesis being true is 5%, is a common error. The probability of the null hypothesis being true, given any dataset, cannot be calculated by objective Fisherian statistics. It can only be estimated using Bayesian methods that include a subjective component of prior probability distributions.

Item Type:Article
Source:Copyright of this article belongs to Current Science Association.
ID Code:120989
Deposited On:08 Jul 2021 07:29
Last Modified:08 Jul 2021 07:29

Repository Staff Only: item control page