
AP Statistics
2016年10月23日星期日
Conditional distributions and relationships
EXAMPLE:
A small private college was curious about what levels of students were getting straight A grades. College officials collected data on the straight A status from the most recent semester for all of their undergraduate and graduate students. The data is shown in the two-way table below:
1.What type of distribution is this?
This is the marginal distribution of student level.
A conditional distribution turns each count in the table into a percent of individuals who fit a specific value of one of the variables.(means that in this case, the distribution of categories would be straight A, B+, B, etc...instead of straight A and Not Straight A)
A marginal distribution shows the totals (in counts or percents) for all the values of just one of the variables.(In this case, the categories straight A and Not Straight A are exactly marginal distribution because it only mentioned one term--straight A, and its goal is only to determine whether or not students are straight A instead of trying to know how many students are B+, etc...)
2.Based on these conditional distributions, what can we say about the association between student level and straight A status?
A.Graduate students were more likely to have straight A's than undergraduate students.
B.Straight A students were more likely to be graduate students than undergraduate students.
These two options(sentences) contain different subject, option A's subject is graduate student, while the subject of option B is straight A student. The subject of the sentence always determine which row or column we should look at. What we need to do, is to look at the row or column these to subjects are at. Let's look at option A, whose subject is graduate students. Since the term"graduate" is at column two, so this determine that we should look at column two. From column two, we can know that among 500 graduate student, 60 of them get straight A, which means the percentage of straight A students among graduate students is 12%. Option A's goal is to compare the number of graduate and undergraduate students who get straight A. Use the same method, we can know that the percentage of straight A students among undergraduate students is 6%. Therefore, the statement "Graduate students were more likely to have straight A's than undergraduate students."
Use the same method to deal with option B. Since the subject of it is straight A students, this directly determine that we need to look at row one. Among 300 students who got straight A, there are 60 graduate students and 240 undergraduate students, so it's clearly that Straight A students were more likely to be undergraduate students than graduate students, which means option B is false.

2016年10月22日星期六
Gathering data--Experiment
Observational studies and experiments
In an observational study, we measure or survey members of a sample without trying to affect them. For example, A study took random sample of adults and asked them about their bedtime habits. The data showed that people who drank a cup of tea before bedtime were more likely to go to sleep earlier than those who didn't drink tea. This is an observational study. Generally, people can't draw conclusions only through an observational study, like we can't say for sure that tea is the factor that cause the sleeping time to change, it might be the food people eat or something else that cause the differences. In a word, an observational study does not prove a hypothesis.
In a controlled experiment, we randomly assign people or things to groups. One group receives a treatment and the other group does not.For example, Another study took a group of adults and randomly divided them into two groups. One group was told to drink tea every night for a week, while the other group was told not to drink tea that week. Researchers then compared when each group fell asleep.This is an experiment, because the person who conducted it set up two groups--a experimental group and a control group, which enables us to actually test wether or not tea is the factor that cause the differences in sleeping time.
Language of experiments
EXAMPLE Karina wants to determine if kale consumption has an effect on blood pressure. She recruits 100 households and randomly assigns each household to either a kale-free diet plan or a kale-based diet. At the end of two months, she plans to compare the original and final blood pressures for members of each household.
1.What is the explanatory variable?An explanatory variable explains changes in another variables. Karina is curious if a kale-based diet will cause changes in blood pressure. Therefore, the explanatory variable is kale consumption.
2.What is the response variable?
A response variable measures the result of a study. Karina is measuring the change in blood pressure at the end of the study. Therefore, the response variable is the change in blood pressure.
3.What are the treatments?
A treatment is the specific thing given to individuals in an experiment. Karina is giving some households a kale-based diet and other households a kale-free diet. Therefore, the treatments are the kale-based and kale-free diets.
4.Who or what are the experimental units?
An experimental unit is who or what we are assigning to a treatment. Karina is randomly assigning each household to a treatment, not individual members. Therefore, the households are the experimental units.
2016年10月10日星期一
Identifying bias in samples and surveys
Terms to know
-Bias wording--Can cause people to like or dislike certain responses over others.
e.g. A high school wanted to know what percent of its students smoke cigarettes. Counselors selected a random sample of students to take a survey on drug use. One of the questions reads, "If you are under the age of 18, do you illegally smoke cigarettes?"
Suggesting that smoking is illegal might make it less likely for students who smoke to admit they do.
-Response bias--The tendency of a person to answer questions untruthfully or misleadingly.
e.g. A high school wanted to know what percent of its students smoke cigarettes. During the week when students visited the counselors to schedule classes, they asked every student in person if they smoked cigarettes or not.
High school students who smoke aren't likely to admit it to their counselor. At the same time, it's doubtful that students would lie in the other direction—students who don't smoke probably wouldn't say that they do.
-Undercoverage--When researchers exclude members of the population from being in the sample.
e.g. A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling 100 people whose names were randomly sampled from the phone book (note that mobile phones and unlisted numbers aren't in phone books). The senator's office called those numbers until they got a response from all 100 people chosen.
Since the senator used the phone book, people who only use mobile phones, have unlisted numbers, or don't have a phone at all can't possibly be in the sample.
-Convenience sampling--Choose a sample that available without using any randomization.
e.g. David hosts a podcast and he is curious how much his listeners like his show. He decides to poll the 100 listeners who send him fan emails.
Polling the 100 listeners who send him fan emails means that David simply chose a sample that was available to him without using any randomization. This is a convenience sample, which almost always produces biased results.
-Voluntary responses sampling--Let members of the population choose whether or not they would be in the sample.
e.g. David hosts a podcast and he is curious how much his listeners like his show. He decides to start on online poll, and he asks his listeners to visit his website and participate in the poll.
Asking all of his listeners to respond to an online poll means that David let members of the population choose whether or not they would be in the sample. This is a voluntary response sample, which almost always produces biased results.
FAQ
1.What is the most concerning source of bias in this scenario?
This kind of questions are simply asking you to identify which situation/term mentioned above belongs to that scenario.
2.Which direction of bias is more likely in this scenario?
(Underestimate/ overestimate/ unbiased estimate)
**Need to consider the emotional coloring or situations of the people being asked questions
e.g.1 David hosts a podcast and he is curious how much his listeners like his show. He decides to poll the 100 listeners who send him fan emails.(Overestimate)
The results are probably an overestimate of the percentage of all listeners that love the show, because listeners who send fan email to David's show are probably more likely to love his show compared to a typical listener.
e.g.2 A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling people using random digit dialing, where computers randomly generate phone numbers so unlisted and mobile numbers can still be reached. They called over 1000 random phone numbers—most people didn't answer—until they had reached 1000 respondents.(Underestimate)
People who didn't answer their phone probably feel stronger about privacy issues than the typical person. Having them in the sample probably would have changed the results to show more people concerned about privacy.
Sampling and observational studies
Correlation and Causality
e.g.
Passage:
-Eating breakfast may beat teen obesity;
-Regular breakfast eaters seemed more physically active than breakfast skippers;
-Regular breakfast eaters tended to gain less weight and had a lower body mass
index than breakfast skippers.
Conclusions:
-Eating breakfast--Not obese
-Eating breakfast--Active
-Skipping breakfast--Obese**We need to consider what's the relationship between these behaviors and the consequences they lead to, so think about the following questions: 1.Does the conclusion really suggest that eating breakfast leads to activity? 2.Does the conclusion just suggest being more active may have things to do with eating breakfast? Causality A causes B/ B causes A
-Eating breakfast causes not obese
-Eating breakfast causes activity
-Skipping breakfast causes obesityCorrelation Whenever B is happening, A might happen at the same time
-Breakfast eating correlates with obesity
-Breakfast eating correlates with activity
-Breakfast skipping correlates with obesity*Those relationships are all mutual
订阅:
博文 (Atom)