You are in browse mode. You must login to use MEMORY

level: Week 2 & 3

Questions and Answers List

Exploratory Data Analysis

level questions: Week 2 & 3

Question	Answer
Explain the difference between “exclude cases listwise” and “exclude cases pairwise” in dealing with missing data points.	Exclude cases listwise option will include cases in the analysis only if they have full data on all of the variables listed in your variables box for that case. The Exclude cases pairwise option however excludes the case (person) only if they are missing the data required for the specific analysis
Define the mean, the median, and the mode.	Mean: Average of the data set. Median: Middle set of numbers. Mode: Most common number in the data set
Why is the 5% trimmed mean a useful statistic?	If you compare the original mean and the new trimmed mean, you can see whether extreme scores are having a strong influence on the mean
If you have 2 data sets containing IQ measures, and the second data set has a larger standard deviation, then what does this suggest?	A larger standard deviation means that the values in the data set are further away from the mean, on average
Define a 95% confidence interval	An interval constructed such that the true population mean will fall within this interval in 95% of samples
What is the interquartile range?	A measure of statistical dispersion, being equal to the difference between the 75th and 25th percentiles, or upper and lower quartiles
If a score is at the 90% percentile, what does that mean	If you know that a score if in the 90th percentile, that means you scored better than 90% of people who took the test
Why do we do exploratory data analysis	Need to check for data entry errors. Gather information on descriptive statistics. Identify patterns. Identify any missing data points and devise a strategy how to deal with those. Identify sources of bias.
What are the upper and lower bound for a confidence interval	Refers to the upper and lower limits of where the mean should fall if the study was replicated and generalised to the public.
What is the purpose of a Historgram?	Gives information about normality to tell us whether we should use parametric or nonparametric tests for coninuous data
What is the purpose of bar graphs?	Visually represent continuous data differences between groups
What is the putpose of blox plots?	Gives information about the measures of central tendency and variability - what your data is 'looking' like
Where is the independent variable placed on a graph?	X-Axis
Where is the dependent variable placed on a graph?	Y-Axis
What does a positive skew look like?	Scores bunched at low values with the tail pointing to the high values
What does a negative skew look like?	Scores bunched at high values with the tail pointing to the low values
What does positive kurtosis look like?	The distribution has heavier tails thant the normal distribution. Usually looks more peaked
What does negative kurtosis look like?	The distribution has light tails than the normal distribution. Usually looks more flat
When should you delete (or deal with) outliers?	If it is obvious that the outlier is due to incorrectly entered or measured data, you should drop the outlier. If the outlier does not change the results but does affect assumptions, you may drop the outlier. More commonly, the outlier affects both results and assumptions. If the outlier creates a significant association, you shoulddrop the outlier and should not report any significane from your analysis
What is the assumption of normality and why is it important?	Means that you should make sure your data roughly fits a bell curbe shape before running certain statistical tests or regression. Deviations from normality render statistical tests inaccurate so it is important to know if your data are normal. Tests that rely upon the assumption of normality are called parametric tests. Small sample sizes <20 causes
What are different methods for testing normality?	Kolmogorov-Smirnov (tries to determine if 2 datasets differ significantly)
What are some common transformations used to deal with non-normal data?	Log10: Reduces positive skew. Square root: Reduces positive skew. Inverse: Reduces positive skew. Reverse scoring: Helps deal with negative skew
Why is normality the least important assumption?	Does not contribute to bias or inefficiency in regression models. It is only important for the calculation of p values for significance testing, but this is only a consideration when the sample size is very small
What is homogeneity of variance and how is it tested?	The variability of scores for each of the groups need to be similar. Levene's test uses an F-test to tst the null hypothesis that the variance is qual across groups. A p value less than .05 indicates a violation of the assumption
What is the relationship between sample size and the objective measures that are used to test normality?	Kolmogorov-Smirnov is best used for larger sample sizes whereas Shapiro-Wilk is best used for small sample sizes of 25 and under as it is incredibly sensitive.
What should a histogram and Q-Q plot look like when data are normal?	A histogram should look like a bell-curve and dots on a Q-Q plots should fall closely along the line
What is a common transformation used to deal with violation of the homogeneity of variance assumption?	Power Transformation: You raise the data by some power (e.g., squared) and it will shrink the data). Once you transform the data, you would re-run Levene's test for equality on the transformed data - not significant
What are some of the problems with transformations?	They are not a magic bullet - it will not fix everything. They make interpreting the results difficult. Some statistical tests are robust against small violations of assumptions, i.e, ANOVA and normality. So you need to decide whether a transformation is necessary
What is a common transformation used to deal with violation of the homogeneity of variance assumption?	Power Transformation: You raise the data by some power (e.g., squared) and it will shrink the data). Once you transform the data, you would re-run Levene's test for equality on the transformed data - not significant
Should you state what you do with Outliers in a Study?	Ethically, you should state howyou handle outlier data in any notes and write up of analysed work
What is an Outlier?	A score very different from the rest of the data. Sometimes the score is an error but sometimes it is legitimate but it biases our data
What would you do with an extreme score?	Remove from the data set
What are the 3 different ways we can bias our analysis?	Parameter Estimates (mean being compromised), Spread (confidence intervals and spread of the numbers), Test statistics and P-values
What is the little circle and number that's plotted in a box-plot?	Outlier - the number tells you the row where the outlier is causing the problem
What is the Central Limit Theorem?	If you sample parameter estimates from a population, then as the sample size increases, the distribution of those parameters bcomes incresingly normal
If you see Greek symbols, what is being spoken about? Population or Sample?	Population
If you see X or HO symbols, what is being spoken about? Population or sample?	Sample
If the P Value is less than .05 what can can you determine about the data?	The Data is not normally distributed
What does the Levene's Output mean?	You are hoping to find that the test is NOT significant (>.05) If you obtain a significance value of less than .05, the variances for the two groups are not equal