# Statistical Tests

## Normality of Data

Normality of data can be assessed by using either graph (i.e. frequency distribution, Q-Q plot) or normality tests (i.e. Shapiro-Wilk, Kolmogorov- Smiranov). Of note, Shapiro–Wilk test is used for small sample sizes (<50 samples); while Kolmogorov–Smirnov test is for n ≥50.

For normally distributed continuous data, use parametric tests such as t-test (for 2 groups) or ANOVA (for 3 groups or more). While for non-normally distributed continuous data (skewed), use non-parametric tests such as Mann-Whitney Test (for 2 groups) or Kruskal-wallis test (for 3 groups or more).

## t-TEST

**t-test**requires a categorical independent variable (binary) with a continuous dependent variable (outcomes).**t-test**determines if there is a statistically significant difference in the mean between the two groups. Type of t-test: One sample and two samples. Two samples t-test can be either dependent (paired) samples t-test or independent samples t-test.For the one-sample

**t-test**, the data should have**independent**observations (**Exposure**) and the**dependent**variable should be continuous and normally distributed.For dependent (paired) samples t-test, the

**independent**variable should consist of two**related or matched**groups (e.g. same patients pre and post-intervention), as well the**dependent**variable should be continuous and normally distributed. For example, a researcher wants to investigate whether an exercise or weight loss intervention is more effective in lowering cholesterol levels.For the independent samples t-test, the

**independent**variable should consist of two independent (**unrelated**) categorical groups, as well the**dependent**variable should be continuous and normally distributed. In addition, there needs to be homogeneity of variances (Levene’s test for homogeneity of variance). For example, a researcher wants to investigate in students whether a 12-week training program improves their standing long jump performance (measured in meter). Here the investigation will be in the same participants, measured pre-intervention, after applying for the 12-week training program, another measurement post-intervention to assess the differences in**normally**distributed data.

## Mann Whiteny U Test

Mann Whiteny U Test is used when the

**independent**variable consists of two independent (**unrelated**) categorical groups. As same as independent samples t-test, with the exception that Mann Whiteny U Test is used when**dependent**variables are**NOT**normally distributed (two distributions have a different shape). For example, a researcher wants to investigate in students whether a 12-week training program improves their standing long jump performance (measured in meter). Here the investigation will be in the same participants, measured pre-intervention, after applying for the 12-week training program, another measurement post-intervention to assess the differences in**non-normally**distributed data.

## Analysis of Variance (**ANOVA**)

**ANOVA**)

Used to compare the mean of one independent variable (

**with three or more groups**) with one dependent variable (**one outcome**). Should note that the independent variable should consist of ≥ 3**categorical**groups; while the dependent variable (outcome) should be a continuous variable.There should be

**NO**significant outliers. Outliers are data points within your data that do not follow the usual pattern.The dependent variable should be

**normally**distributed.There needs to be homogeneity of variances (

**Levene’s test for homogeneity of variance**).For example, a researcher wants to evaluate 3 regimens of statin (high, moderate, and low-intensity statin) on the effect of LDL, and want to compare the degree of LDL reduction between the 3 regimens.

Limitation of ANOVA test, the test will tell you that at least two groups were different from each other. But it **won’t **tell you which groups were different. Therefore, if your test returns a significant f statistic, you may need to run a **post hoc** test to tell you exactly which groups had a difference in mean. On the other hand, if you do not have a statistically significant result, you would not perform any further follow-up tests (post hoc test) because there is no difference between groups.

## Multivariate Analysis of Variance (**MANOVA**)

**MANOVA**)

Used to compare the mean of one independent variable (

**with three or more groups**) with**more**than one dependent continuous variable (**two or more outcomes**). Should note that the independent variable should consist of ≥ 3**categorical**groups; while the dependent variable (outcome) should be a**≥ 2**continuous variable. In this regard, it differs from a one-way ANOVA, in which one-way ANOVA**only**measures one dependent variable.There should be an adequate sample size (more than the # of the dependent variables in each group).

There should be

**NO**univariate or multivariate outliers. Outliers are data points within your data that do not follow the usual pattern.There should be

**NO**multicollinearity (e.g. as assessed by**Pearson**correlation).There should be multivariate normality (e.g. as assessed by

**Mahalanobis**distance).There is a

**linear relationship**between each pair of dependent variables for each group of the independent variable.There needs to be homogeneity of variances (

**Levene’s test for homogeneity of variance**).For example, a researcher wants to evaluate 3 regimens of statin (high, moderate, and low-intensity statin) on the effect of LDL and triglycerides levels, and want to compare the degree of LDL and triglycerides reduction between the 3 regimens.

Limitation of MANOVA test, the test will tell you that at least two groups were different from each other. But it **won’t **tell you which groups were different. Therefore, if your test returns a significant f statistic, you may need to run a **post hoc** test to tell you exactly which groups had a difference in mean. On the other hand, if you do not have a statistically significant result, you would not perform any further follow-up tests (post hoc test) because there is no difference between groups.

## Multivariate Analysis Of Covariance (**MANCOVA**)

**MANCOVA**)

Used to compare the

**adjusted**mean of one independent variable (**with three or more groups**) with**two or more**dependent continuous variables (**two or more outcomes**); having**controlled**for a continuous**c**ovariate. MAN**C**OVA analysis eliminates the**c**ovariates' effect on the relationship between the independent grouping variable and the continuous dependent variables; therefore, reducing error terms.There should be

**NO**univariate or multivariate outliers. Outliers are data points within your data that do not follow the usual pattern.There should be one or more

**covariates**which are all**continuous**variables. In addition, homogeneity of variances and covariances are required.There should be a

**linear relationship**between the covariate and each dependent variable within each group of the independent variable.There should be

**homogeneity**of regression slopesThe residuals should be normally distributed (

**Shapiro Wilk test**).For example, a researcher wants to evaluate 3 regimens of statin (high, moderate, and low-intensity statin) on the effect of LDL and triglycerides levels, and want to compare the degree of LDL and triglycerides reduction between the 3 regimens. The patient's age and weight were considered

**covariates**, that need to be controlled.

Limitation of MAN**C**OVA test, the test will tell you that at least two groups were different from each other. But it **won’t **tell you which groups were different. Therefore, if your test returns a significant f statistic, you may need to run a **post hoc** test to tell you exactly which groups had a difference in mean. On the other hand, if you do not have a statistically significant result, you would not perform any further follow-up tests (post hoc test) because there is no difference between groups.

## strength of association for continuous outcomes

Pearson Correlation test is used to measure the strength of association between two scale variables that were normally distributed (Parametric). On the other hand, for non-normally distributed (Non-parametric) data, Spearman’s rank correlation is used.

## Regression analysis

The type of regression analysis depends on the type of dependent variables (outcomes). If the dependent variables were categorical, logistic regression is used. While, if the dependent variables were continuous, simple, or multiple linear regression is used. Of note, multiple linear regression is used when there are two or more predictors (confounders).

Regression coefficients (β) with standard error (SE) can be calculated for

**both**categorical or continuous variables. The negative value of the regression coefficient indicates that the covariate has negative relation with the outcome. The positive value of the regression coefficient indicates that the effect on outcome increases with an increase in covariate values. For example, if the regression coefficient (β) for age is positive, this means that increases in age will lead the patients to be more prone to get more outcomes of interest. Another example is, if the regression coefficient (β) for vitamin D is negative, meaning that reduction or low vitamin D will lead the patients to be more prone to get more outcomes of interest.The coefficient of determination (R

^{2}), represented the degree of dispersion between individual data and the regression line. The R2 value is always between 0% and 100%, and the higher the R2 value, the lower the discrepancies between data. An R2 value close to 1 represents a reliable fitted regression line.