Statistics for HCI. Alan Dix

Читать онлайн.
Название Statistics for HCI
Автор произведения Alan Dix
Жанр Программы
Серия Synthesis Lectures on Human-Centered Informatics
Издательство Программы
Год выпуска 0
isbn 9781681738239



Скачать книгу

      Figure 2.3: Monty Hall problem—Should you swap doors? (source: https://en.wikipedia.org/wiki/Monty_Hall_problem#/media/File:Monty_open_door.svg.

      Is it fair?—Has the way you have selected people made one outcome more likely. For example, if you do an election opinion poll of your Facebook friends, this may not be indicative of the country at large!

      For surveys, has there been self-selection?—Maybe you asked a representative sample, but who actually answered? Often you get more responses from those who have strong feelings about the issue. For usability of software, this probably means those who have had a problem with it.

      Have you phrased the question fairly?—For example, people are far more likely to answer “Yes” to a question, so if you ask “do you want to leave?” you might get 60% saying “yes” and 40% saying “no,” but if you asked the question in the opposite way “do you want to stay?,” you might still get 60% saying “yes.”

      We will discuss these kinds of issue in greater detail in Chapter 11.

      Simple techniques can help, but even mathematicians can get it wrong.

      It would be nice if there was a magic bullet to make all of probability and statistics easy. I hope this book will help you make more sense of statistics, but there will always be difficult cases—our brains are just not built for complex probabilities. However, it may help to know that even experts can get it wrong!

      We’ll look now at two complex issues in probability that even mathematicians sometimes find hard: the Monty Hall problem and DNA evidence. We’ll also see how a simple technique can help you tune your common sense for this kind of problem. This is not the magic bullet, but it may sometimes help.

      There was a quiz show in the 1950s where the star prize was a car. After battling their way through previous rounds the winning contestant had one final challenge. There were three doors, behind one of which was the prize car, but behind each of the other two was a goat.

      The contestant chose a door, but to increase the drama of the moment, the quizmaster did not immediately open the chosen door. Instead, they opened one of the others. The quizmaster, who knew which was the winning door, would always open a door with a goat behind. The contestant was then given the chance to change their mind. Imagine you are the contestant. What do you think you should do?

      • Should you stick with the original choice?

      • Should you change to the remaining unopened door?

      • Or, doesn’t it make any difference?

      Although there is a correct answer, there are several apparently compelling arguments in either direction:

      One argument is that, as there were originally three closed doors, the chance of the car being behind the door you chose first was 1 in 3, whereas now that there are only two closed doors to choose from, the chance of it being behind the one you didn’t choose originally is 1 in 2, so you should change. However, the astute may have noticed that this is a slightly flawed probabilistic argument, as the probabilities don’t add up to one.

      A counter argument is that at the end there are two closed doors, so the chances are even as to which has the car behind it, and hence there is no advantage to changing.

      An information theoretic argument is similar—the remaining closed doors hide the car equally before and after the other door has been opened: you have no more knowledge, so why change your mind?

      Even mathematicians and statisticians can argue about this, and when they work it out by enumerating the cases, they do not always believe the answer. It is one of those cases where common sense simply does not help … even for a mathematician!

      Before revealing the correct answer, let’s have a thought experiment.

      Imagine if instead of three doors there were a million doors. Behind 999,999 doors are goats, but behind the one lucky door there is a car.

      I am the quizmaster and ask you to choose a door. Let’s say you choose door number 42. Now I now open 999,998 of the remaining doors, being careful to only open doors that hide goats. You are left with two doors, your original choice and the one door I have not opened. Do you want to change your mind?

image

      Figure 2.4: Monty Hall with a million doors?

      This time it is pretty obvious that you should change. There was virtually no chance of you having chosen the right door to start with, so it was almost certainly (999,999 out of a million) one of the others—I have helpfully discarded all the rest so the remaining door I didn’t open is almost certainly the correct one.

      It is as if, before I opened the 999,998 ‘goat’ doors, I’d asked you, “do you think the car is precisely behind door 42, or any of the others?”

      In fact, exactly the same reasoning holds for three doors. In that case there was a 2/3 chance that the car was behind one of the two doors you did not choose, and as the quizmaster I discarded one of those, the one that hid a goat. So it is twice as likely as your original choice that the car is behind the door I did not open. Regarding the information theoretic argument: the act of opening the goat door does add information because the quizmaster knows which door hides the car, and only opens a goat door. However, it still feels a bit like smoke and mirrors with three doors, even though the million-door version is obvious.

      Using the extreme case helps tune your common sense, often allowing you to see flaws in mistaken arguments, or work out the true explanation. It is not an infallible heuristic (sometimes arguments do change with scale), but it is often helpful.

      The Monty Hall problem has always been a bit of fun, albeit disappointing if you were the contestant who got it wrong. However, there are similar kinds of problem where the outcomes are deadly serious. DNA evidence is just such an example. Although each person’s DNA is almost unique, DNA testing is imperfect and has the possibility of error.

      Suppose there has been a murder, and remains of DNA have been found on the scene. The lab DNA matching has an accuracy of one in 100,000.

      Imagine two scenarios.

      Case 1: Shortly prior to the body being found, the victim had been known to have had a violent argument with a friend. The police match the DNA of the friend with DNA found at the murder scene. The friend is arrested and taken to court.

      Case 2: The police look up the DNA in the national DNA database and find a positive match. The matched person is arrested and taken to court.

      Similar cases have occurred and led to convictions based heavily on the DNA evidence. However, while in case 1 the DNA is strong corroborating evidence, in case 2 it is not. Yet courts, guided by ‘expert’ witnesses, have not understood the distinction and convicted people in situations like case 2. Belatedly, the problem has been recognised and in the UK there have been a number of appeals where longstanding cases have been overturned, sadly not before people have spent considerable periods behind bars for crimes they did not commit. One can only hope that similar evidence has not been crucial in jurisdictions with a death penalty.

      If you were the judge or jury in such a case would the difference be obvious to you?

      If not, we can use a similar trick to the one we used in the Monty Hall problem. There, we made the numbers a lot bigger; here we will make the numbers less extreme. Instead of a 1 in 100,000 chance of a false DNA match, let’s make it 1 in 100. While this is still useful, though not overwhelming, corroborative evidence in case 1, it is pretty obvious