Lecture: Chi-Square Presentation Transcript
Okay, welcome to our next lecture. This lecture marks our final test of significance, so we will have completed an introduction to some of the basic ones. Pat yourself on the back, we are almost done! In this lecture, we are going to cover our final test of significance called chi-square.
Up until now, our dependent variable has always been quantitative. It’s been some quantifiable variable that you could directly measure. Now we’re going to move to a categorical dependent variable, so both the independent and dependent variable will be categorical, so it will be items that you can count rather than items that you can directly measure.
If you remember in ANOVA study, we looked at ACT scores from 3 types of high schools: rural, urban, and suburban. In ANOVA study, the dependent variable was quantitative. We took the mean ACT score and compared them for those 3 types of high schools. What if we took the ACT scores and said, “If you got a 20 or lower, we’re going to call it a low ACT score and maybe code that with number 1. If you got a 20 or higher, we’re going to call that a high ACT score, and we’re going to give that a code of 2. Then we can just simply count how many low and high ACTs for rural high schools, suburban high schools, and urban high schools. Chi-square can tell us whether those counts significantly differ from one another.
The chi-square, that test significance, is often called a ‘goodness of fit’ test because we’re comparing counts. What are we comparing? We’re comparing the actual number to what the expected number would be, so we’re taking those two counts and comparing them to one another.
In chi-square, like I said, we’re comparing actual to expected, and establishing expected can be a little tricky. We need to talk about that a little bit. First thing, the easy way you can do it is just to consider all categories equal. In the rural, suburban, and urban high school ACT example, we can just make the assumption that the number would be the same. We can also compare the actual number to an expected that we get from some type of outside source, so one example might be some type of national norm. If we know nationally how many type O blood types there are, we could compare our number to that if our variable was blood type; or, we can do some type of past experiences. I think it’s done a lot in smoking studies, where you know it’s a categorical dependent variable. You’re either a smoker, or you’re a non-smoker, and you can compare that number, let’s say entering college freshman, the number that smoke versus the number that don’t smoke, compare the number in your current entering freshman class to the number ten years ago or five years ago. Or, you can compare it to a state school versus a private school or students in Illinois versus students in Iowa. Establishing expected, we’ve got to spend some time thinking about how we’re going to define “expected.”
There’s some important assumptions with chi-square we need to review (three important assumptions.) The first is (I’ve already really talked about this) you’ve got to deal with frequency data. If there’s a way that we can convert if we get mean scores, or maybe some way that we can convert that mean score, the ACT I gave you as an example, creating a cutoff and saying this person has a high ACT, this person has a low ACT. You have to create cutoffs or hopefully there’s some type of logical cutoff that makes sense. I think in the book they use the example a dependent variable was age and the logical cutoffs they used were grade school, junior high, high school. They made three categories and just put them whether they were in the age that would be in a grade school student, a junior high student, or a high school student. You have to sometimes convert frequency data to categorical data.
The second assumption is that you have an adequate sample size. The picture here is a picture of a chi distribution, or a chi-squared distribution. It is a skewed distribution. If you think about it, that kind of makes sense. If we have observations you can’t have negative observations, so it does not run into—the left side of the normal curve really doesn’t exist, because you can’t have negative observations. The book talks about establishing a set sample size. Some books will say you have to have a minimum of 10. I’m not too worried about having you guys memorize a number. All the cells need to be filled, so if you’ve created three categories, you have to have members in each, and small cell rep
Quality Work
Unlimited Revisions
Affordable Pricing
24/7 Support
Fast Delivery