Term
| _____is the science of designing studies and analyzing the data that those studies produce. ____is the science of learning data. |
|
Definition
|
|
Term
| _____is the entire set of subjects in which we are interested. |
|
Definition
|
|
Term
| What is an example of a population? |
|
Definition
|
|
Term
| A_____is a subset of the population for whom we have data. |
|
Definition
|
|
Term
| What is an example of a sample? |
|
Definition
| 200 randomly selected voters |
|
|
Term
| A_____is an entity that we measure in a study. |
|
Definition
|
|
Term
| What is an example of a subject? |
|
Definition
|
|
Term
| A_____is the numerical value summarizing the population data. |
|
Definition
|
|
Term
| What is an example of a parameter? |
|
Definition
| the proportion of voters voting for candidate A in the entire population |
|
|
Term
| A_____is the numerical value summarizing the sample data. |
|
Definition
|
|
Term
| What is an example of a statistic? |
|
Definition
| the proportion of voters voting for candidate A in our sample (the 200 randomly selected voters) |
|
|
Term
| A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. The average age of all faculty members at the college is our _____. |
|
Definition
|
|
Term
| A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. The 30 randomly selected faculty members at the college is our _____. |
|
Definition
|
|
Term
| A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. A single faculty member from the sample is a _____. |
|
Definition
|
|
Term
| A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. All faculty members at the college is our _____ |
|
Definition
|
|
Term
| A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. The average age of the 30 randomly selected faculty members at the college is our _____. |
|
Definition
|
|
Term
| Whenever we are interested in an average for a full population, what symbol do we use? (population mean) |
|
Definition
| the symbol for population mean; μ (mu) |
|
|
Term
| Whenever we have an average calculated from a sample,this sample is denoted as ____. |
|
Definition
|
|
Term
| "mu" represents an average calculated from _____and x bar represents an average calculated from _____. |
|
Definition
| a full population; a sample |
|
|
Term
| If we have the proportion for an entire population, like the proportion of voters voting for candidate A in the entire population, we use the letter ___to denote this population proportion. |
|
Definition
|
|
Term
| If we have a proportion for just a sample, like the proportion of voters voting for candidate A in our sample (the 200 randomly selected voters), we use the symbol ___ to denote the sample proportion. |
|
Definition
|
|
Term
| p represents a proportion calculated from _____. p hat represents a proportion calculated from ____. |
|
Definition
| a full population; a sample |
|
|
Term
| _____is the act of obtaining subjects from a population to participate in a certain study. |
|
Definition
|
|
Term
| A_____is a sample in which every subject has some chance of being selected for the sample. |
|
Definition
|
|
Term
| A _____is a sample which every subject has an equally likely chance of being selected for the sample. |
|
Definition
|
|
Term
| _____is when the population is divided into non overlapping groups and a simple random sample is then obtained from each group. |
|
Definition
|
|
Term
| _____is when the population is divided into non-overlapping groups and all individuals within a randomly selected group or groups are sampled. |
|
Definition
|
|
Term
| _____is when you select every kth subject from the population. |
|
Definition
|
|
Term
| _____is sampling where the individuals are easily obtained. |
|
Definition
|
|
Term
| What is an example of convenience sampling? |
|
Definition
|
|
Term
| What type of sampling is generally flawed? |
|
Definition
|
|
Term
| What is the difference between stratified and cluster sampling? |
|
Definition
| stratified sampling samples some individuals from all groups where cluster sampling samples all individuals from some groups |
|
|
Term
| There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Pick every 10th passenger as people board the plane. |
|
Definition
|
|
Term
| There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. From the boarding list, randomly choose 5 people flying first class and 25 people flying coach. |
|
Definition
|
|
Term
| There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Randomly generate 30 seat numbers and survey the passengers sitting in those seats. |
|
Definition
|
|
Term
| There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Select the first 30 passengers that enter plane |
|
Definition
|
|
Term
| There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Randomly select several rows and survey all of the passengers sitting on those rows. |
|
Definition
|
|
Term
| A_____is a characteristic or property of an individual population unit. (eg. height, weight, score on a dice) |
|
Definition
|
|
Term
| All variables will be one of the following 2 types: ____or_____. |
|
Definition
| categorical or quantitative |
|
|
Term
| _____data classifies subjects based on some attribute or characteristic. Each observation belongs to a set of categories (car color, voting preference, etc.) |
|
Definition
|
|
Term
| _____data takes on numeric values (height, weight, SAT score) |
|
Definition
|
|
Term
| Any quantitative variable can be further categorized into one of the following types:_____or _____. |
|
Definition
|
|
Term
| A_____variable is one where there is a countable number of distinct possible values that the variable can equal. These variables jump from one possible value to the next. |
|
Definition
| discrete variables (look up a better explanation of this) |
|
|
Term
| A ____variable is one where, for any two values of that variable there are an infinite number of other possible values in between. |
|
Definition
|
|
Term
| Identify the following as categorical or quantitative. If quantitative, identify further as discrete or continuous. The length of time in minutes until a pain reliever begins to work. |
|
Definition
|
|
Term
| Identify the following as categorical or quantitative. If quantitative, identify further as discrete or continuous. The brand of a refrigerator found in a home |
|
Definition
|
|
Term
| Identify the following as categorical or quantitative. If quantitative, identify further as discrete or continuous. The number of files on a hard drive |
|
Definition
|
|
Term
| A____lists the number of occurences for each category in the data. |
|
Definition
|
|
Term
| Categorical data can be represented graphically using any of the following displays (3) |
|
Definition
| bar graph, pareto chart, pie chart |
|
|
Term
| A____is a graph constructed by putting the categories on the horizontal axis and the frequency or proportion on the vertical axis. The height of the rectangles for each category are equal to the category's frequency or proportion. |
|
Definition
|
|
Term
| A ____is a bar graph whose bars are drawn in decreasing order of frequency or proportion. |
|
Definition
|
|
Term
| A _____is a circle divided into sectors. Each sector represents a category of data with the size of each sector corresponding to the proportion of responses falling in that category. |
|
Definition
|
|
Term
| Quantitative data can be represented graphically using what two displays? |
|
Definition
| histogram or stem and leaf plot |
|
|
Term
| A____is a display that looks similar to a bar graph, however it is used for quantitative data |
|
Definition
|
|
Term
| What does a graph that is skewed left look like? |
|
Definition
| left tail is stretched out longer than the right tail |
|
|
Term
| What does the graph look like that is skewed right? |
|
Definition
| right tail is stretched out longer than the left tail |
|
|
Term
| Measures of the _____of a data set describe the tendency of the data to cluster, about certain numerical values. |
|
Definition
|
|
Term
| Which is sensitive to extreme values in the dataset, either very large or very small numbers? the mean or median? |
|
Definition
| the mean. The median is NOT sensitive to extreme values |
|
|
Term
| The mean is ____to extreme values. The median is _____to exteme values. |
|
Definition
|
|
Term
| If the mean is smaller than the median, then what does the graph look like? |
|
Definition
|
|
Term
| if the mean is greater than the median, then what does the graph look like? |
|
Definition
|
|
Term
| If the mean is equal to the median then what does the graph look like? |
|
Definition
|
|
Term
| Measures of the _____are used to measure the spread, or volatility, contained in the data set. |
|
Definition
|
|
Term
| What are three commonly used measures of variability? |
|
Definition
| range, variance, standard deviation |
|
|
Term
| What is the range? how do you find it? |
|
Definition
| the range of the data set is the difference between the largest and the smallest values in the data. Range=largest value-smallest value |
|
|
Term
| What does the term "deviation from the mean" mean? How do you find the "deviation from the mean"? |
|
Definition
| the deviation between a value and the mean is just the distance of the value from the mean; measured by subtracting one from the other, x-x bar |
|
|
Term
| The _____of a data is the average of the squared deviations from the mean, calculated using n-1 as the divisor. |
|
Definition
|
|
Term
| How do you find the variance? |
|
Definition
|
|
Term
| How do you find the standard deviation? |
|
Definition
| standard deviation is the positive square root of the variance |
|
|
Term
| What do variance and standard deviation measure? The higher the variance and standard deviation, the more______. |
|
Definition
| variance and standard deviation measure how spread apart your data values are; the higher the variance and standard deviation, the more spread apart the data values will be |
|
|
Term
| What is the symbol to represent population standard deviation? |
|
Definition
|
|
Term
| What is the symbol for sample standard deviation? |
|
Definition
|
|
Term
| What does the Empirical Rule say? |
|
Definition
| 68 % of all the values will lie within +1 or -1 standarad deviations from the mean; 95% of the data values will lie within +2 or -2 standard deviations from the mean; all of the values will lie within +3 or -3 standard deviations from the mean |
|
|
Term
| If a value lies at the 30th percentile, then approximately _____percent are less than that value and approximately ____are higher than that value. |
|
Definition
|
|
Term
| If John graduated at the 78th percentile in a class of 876, approximately how many students ranked below John? |
|
Definition
|
|
Term
|
Definition
| specific percentiles that split the data into quarters |
|
|
Term
| Each set of data has how many quartiles? |
|
Definition
|
|
Term
| The first quartile is a value such that ____percent of the data values are smaller than Q1 and ____percent are larger. This is also known as the _____. |
|
Definition
|
|
Term
| The Second Quartile (Q2) is a value such that ____percent of the data values are smaller than Q2 and and ___percent are larger. This is also known at the ____and the_____. |
|
Definition
| 50; 50; median; 50th percentile |
|
|
Term
| The third quartile (Q3) is a value such that ____percent of the data values are smaller than Q3 and ____percent are larger. This is also known as the _____. |
|
Definition
|
|
Term
| What is the best way to find quartiles in a data set? |
|
Definition
| arrange the data in order; first find the median for all of the values--this will be the second quartile (Q2); then find the median for the "lower half" of the values---this will be the first quartile (Q1); then find the median for the upper half of the values---this will be the third quartile (Q3) |
|
|
Term
| ____are extreme observations in the data that occur often because of error in the measurement of the variable, dring data entry, or from errors in sampling. |
|
Definition
|
|
Term
| How do you check for the presence of outliers in data? |
|
Definition
| determine the first and third quartiles; compute the interquartile range (IQR) which is the difference between the third and first quartile (IQR=Q3-Q1); if a data value is less than Q1-1.5xIQR or greater than Q3+1.5 xIQR, it is considered an outlier |
|
|
Term
| What does the five number summary represent? |
|
Definition
| the five number summary represents the five values that split the data into quarters. It includes the minimum, Q1, Q2 (median), Q3, and the maximum number |
|
|
Term
A_____is a graphical representation of the five number summary. |
|
Definition
|
|
Term
| What are the stephs involved in drawing a boxplot? |
|
Definition
| determine Q1, Q2, Q3; draw vertical lines at Q1, the median (Q2), and Q3. Enclose these vertical lines in a box. Draw a line from Q1 to the smallest data value that is not an outlier--the minimum. Draw a line from Q3 to the largest data value that is not an outlier--maximum. Any data values that are outliers are marked with an asterisk. |
|
|
Term
| For quantitative data, you can use what two types of graphs? For categorical data, you can use what three types of graphs? |
|
Definition
| QUANTITATIVE-histogram and stem and leaf plot; CATEGORICAL--bar graph, pie chart, box plot |
|
|
Term
| In a box plot, if the median is near the center of the box and each horizontal line is approximately equal length...what will the graph look like? |
|
Definition
|
|
Term
| In a box plot, if the median is to the left of the center of the box and/or the right horizontal line is much longer than the left line, what will the graph look like? |
|
Definition
|
|
Term
| In a box plot, if the median is to the right or center of the box and/or the left line is much longer than the right line, what will the graph look like? |
|
Definition
|
|
Term
| A ____measures the position a value has in the data set, relative to the mean. It is measured in _____. |
|
Definition
| z-score; standard deviations |
|
|
Term
| What is the formula for calculating z-score? |
|
Definition
| z score=value-mean/standard deviation |
|
|
Term
| When calculating the z-score, what would the z score be if the value is equal to the mean? |
|
Definition
|
|
Term
| Interpret a z score of -1. |
|
Definition
| 1 standard deviation below the mean. |
|
|
Term
| An outlier is more than ___deviations from the mean. (above or below) |
|
Definition
|
|
Term
| If a value has a z-score that is less than -3 or a z score greater than +3, then it is a ______. |
|
Definition
|
|
Term
| If measuring height, and the z score comes out negative...are the people tall or short? |
|
Definition
|
|
Term
| The _____is a variable that can be explained by, or is determined by, another variable. |
|
Definition
|
|
Term
| Which variable will be the "y variable" (the variable that goes on the vertical axis when graphing data)? response or explanatory |
|
Definition
|
|
Term
| The ____variable explains or affects the response variable. |
|
Definition
|
|
Term
| When the two variables are quantitative, which variable will be the x variable (the variable that goes on the horizontal axis when graphing data)? response variable or explanatory variable |
|
Definition
|
|
Term
| the amount you affects the how much weight you gain. what is the explanatory variable and what is the response variable? |
|
Definition
| the amount you eat=explanatory variable; weight gain=response varaible |
|
|
Term
| A/an _____exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. |
|
Definition
|
|
Term
| If the amount we eat is small, then we probably won't see much gain in weight. However, if the amont we eat is large, then we probably will see some gain in weight. So there is a/an______between the amount eaten and weight gain. |
|
Definition
|
|
Term
| A _____is a variable that is related to the response or explanatory variable (or both), but is not the variable being studied. |
|
Definition
|
|
Term
| A _____would be the frequency of exercise. The amount of exercising can also affect weight gain, the response variable. |
|
Definition
|
|
Term
| To explore the association between two categorical variables, we use _____. |
|
Definition
|
|
Term
| What is another word for a contingency table? |
|
Definition
|
|
Term
| A contingency table is a table that relates two ______. Each box inside the table is referred to as a ____. |
|
Definition
| categorical variables; cell |
|
|
Term
| In a contingency table, the ____will always be on the side and the ____will always be on the top. |
|
Definition
| explanatory variable; response variable |
|
|
Term
| A_____is the proportion for a value of a variable, given a specific value of the other variable. |
|
Definition
|
|
Term
| How do you calculate the relative risk? |
|
Definition
relative risk= conditional proportion for one group/conditional proportion for another group
*When we calculate relative risk, the higher conditional proportion goes in the numerator. |
|
|
Term
| what can relative risk be used for? |
|
Definition
| calculating how many times more likely the outcome for one group is than the other group |
|
|
Term
| What does it mean if the relative risk is close to one? |
|
Definition
| it will be about the same likelihood for both groups |
|
|
Term
| Before you calculate the relative risk, what must you do? |
|
Definition
| make sure you have the proprotions for the numbers, you cannot just use the numbers |
|
|
Term
| A _____is a graphical display for two quantitative variables. |
|
Definition
|
|
Term
| On a scatter plot, what variable should be on the horizontal axis and vertical axis? |
|
Definition
| horizontal axis-explanatory variable; vertical axis-response variable |
|
|
Term
| Are the points on a scatter plot connected? |
|
Definition
|
|
Term
| What are the three types of association? |
|
Definition
|
|
Term
| A____exists between two variables if as x increases, y also increases. |
|
Definition
|
|
Term
| A____exists between two variables if as x increases, y actually decreases. |
|
Definition
|
|
Term
| We say there is _____between two variables if as x increases, there is no definite shift in the values of y. |
|
Definition
|
|
Term
| Estimate the type of association for the following pairs of variables. Weight of a car and miles per gallon. |
|
Definition
|
|
Term
| Estimate the type of association for the following pairs of variables. Speed of a car and distance required to come to a complete stop. |
|
Definition
|
|
Term
| Estimate the type of association for the following pairs of variables. Weight on a bar and number of repetitions a weightlifter can achieve |
|
Definition
|
|
Term
| Estimate the type of association for the following pairs of variables. The temperature outside and my grade on a test |
|
Definition
|
|
Term
| If you want to figrue out if there is a linear association between two variables, you calculate the _____. |
|
Definition
|
|
Term
| A ____exists wehn the data tend to follow a straigth line path. |
|
Definition
|
|
Term
| If as x increases, y also increases it is a ____correlation; if as x increases, y decreases, it is a ____correlation |
|
Definition
|
|
Term
| ____means that as x increases there is no definite shift in the values of y. In other words, there is no linear relationship between x and y. |
|
Definition
|
|
Term
| Correlation can be ____, ___, ____, ____, or _____. |
|
Definition
| positive, negative, none, strong, weak |
|
|
Term
| The closer the correlation is to 1 or -1, then the ____the link is between x and y. |
|
Definition
|
|
Term
| The closer the correlation is to 0, then the ____ the link is between x and y. |
|
Definition
|
|
Term
| What are the seven properties fo the linear correlation coeficient, r? |
|
Definition
| r must always be between -1 and 1; if r is greater than 0, then there is a positive linear relationship; if r=+1 then there is a perfect positive correlation; if r is less than 0, then there is a negative linear relationship; if r is equal to -1, there is a perfect negative correlation; if r is equal to 0 then there is no linear relation between the 2 variables; a value of r close to 1 or -1 indicates a strong linear relationship while a value of r close to zero represents a weak linear relationship |
|
|
Term
| Which of the following is the strongest correlation? .8, .67, -.34, 0, -.92 |
|
Definition
|
|
Term
| How do you calculate r using stat crunch. |
|
Definition
| stat--summary stat---correlation |
|
|
Term
| The predict the response variable using the explanatory variable we create what is called a ____. |
|
Definition
|
|
Term
| A ____predicts the value for the response variable (y) as a straight line function of the value of the explanatory variable (x) |
|
Definition
|
|
Term
| The predicted value of y using the regression line is denoted as ____. |
|
Definition
|
|
Term
| What is the equation for the regression line? |
|
Definition
|
|
Term
| ŷ=a +bx....in this formula, what is teh y intercept and what is the slope? |
|
Definition
| a is the y intercept and b is the slope |
|
|
Term
| The ____for a value is the difference between the actual value and the predicted value of y. |
|
Definition
|
|
Term
| How do you calculate the residual? |
|
Definition
| residual=actual y-predicted y (y -ŷ) |
|
|
Term
| How do you find the regression eqution using stat crunch? |
|
Definition
| stat--regression---simple linear |
|
|
Term
| What does the y intercept represent? |
|
Definition
| the predicted vaue of y when x=0 |
|
|
Term
| What does it mean if you have a positive residual? negative residual? |
|
Definition
| you underpredicted; overpredicted |
|
|
Term
| when we use our regression line to predict the costs for other properties, this is called _____. |
|
Definition
|
|
Term
| We need to be careful that when we extrapolate, it is only for observations that have _____. |
|
Definition
| similar x values as our data |
|
|
Term
| If our values for number of carats are 0.5,0.75, 1, and 2...would it be acceptable to use our regression line to predict the price of a 10 carat diamond ring? Or a 1.5 carat diamond ring? |
|
Definition
| unacceptable for 10 carat; acceptable for 1.5 |
|
|
Term
| If you have a negative residual where does the point fall in relation to the line? positive residual? |
|
Definition
| point falls below the line with negative residual; point falls above the line with positive residual |
|
|
Term
| When interpreting the slope of a regression line...how should we do that? |
|
Definition
| for every 1 unit increase in x, we predict y will change by the slope |
|
|
Term
| When dealing with the regression line, how would you interpret the y intercept? |
|
Definition
| wehn zero is x value, the predicted y value will be equal to the y intercept |
|
|
Term
| If the height of a 15 year old male is 2.64 standard deviations below the mean, what is the corresponding z-score for that male? |
|
Definition
|
|
Term
| The average 15 year old male is 68.2 inches tall, with a standard deviation of 2.8 inches. What height for a 15 year old male is 2.64 standard deviations below the mean? |
|
Definition
| 68.2-2.64(2.8)=60.808 inches |
|
|