# GWAS

## What is GWAS?

Genome Wide Association Studies (GWAS) are a way to investigate the amount that sequence variation at a genetic marker is responsible for variation in trait measurements. To perform GWAS, a set of individuals or accessions is selected, several markers are determined and various phenotypic traits of interest are measured. Markers can come in many forms, including other phenotypic traits, but modern studies typically use genetic markers. More specifically, SNPs are used instead of RFLPs or the various kinds of repeats. Phenotypic traits can also take on many forms including visual measurements or the results of metabolic analyses. The probability of association between each marker and trait is then estimated.

## GWAS algorithms

Association probabilities can be calculated with many different algorithms. Common examples include the Fisher Exact test, the Wilcoxon Rank Sum test, General Linear Models or Mixed Linear Models. The algorithms are applied to the most basic unit of GWAS, a trait-marker pair. When applied, an algorithm essentially estimates how well the trait measurements are split by the marker alleles.

### Fisher Exact Test

The Fisher Exact test can be applied in situations where the trait and the marker both have a Bernoulli distribution (ie. values can only take on one of two possible values). SNPs are usually biallelic and thus have a Bernoulli distribution. Traits with this distribution are usually Mendalian in nature. The p-value produced is not an estimate, but exact.

### Wilcoxon Rank Sum Test

The Wilcoxon Rank Sum test can be applied in situations where the marker is biallelic. If the trait measurements are part of a gaussian (normal) distribution, then a Student's t-test is more appropriate. However, this is not often the case and a Wilcoxon Rank Sum test, performs a similar test without the assumption of a gaussian distribution at the cost of statistical power.

### Mixed Linear Models

Mixed Linear Models are generally the most appropriate choice for GWAS as they are able to model both the fixed effects (variation due to the marker) and the random effects (variation due to other effects such as population structure). These models are also able to estimate the strength of association between a trait and several markers.