Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes

Research output: Contribution to journal › Journal article › Research › peer-review

Christian Theil Have
Emil Vincent Rosenbaum Appel
Grarup, Niels
Hansen, Torben
Jette Bork-Jensen

Abstract—Undetected mislabeled samples may affect the
results of genotype studies, particular when rare genetic
variants are investigated. Mislabeled samples are often not
detected during quality control and if they are detected, they
are normally discarded due to a lack of a reliable method to
recover the correct labels.
Here we describe a statistical method which given a few extra
independent genotypes (barcode genotypes) detects mislabeled
samples and recovers the correct labels for sample mix-ups. We
have implemented the method in a program (named
Wunderbar) and we evaluate the reliability of the method on
simulated data. We find that even with only a small number of
barcode genotypes, Wunderbar is capable of identifying
mislabeled samples and sample mix-ups with high sensitivity
and specificity, even with a high genotyping error rate and even
in the presence of dependency between the individual barcode
genotypes.
To detect mislabeled samples we calculate the probability
that the discordance between genotypes in the data and in the
independent genotypes can be attributed to random
(non-mislabeling) genotyping errors. To identify mix-ups we
calculate the probability of identifying the set of identical
genotypes between sample x and sample y by chance. Based on
this we calculate a mix-up confidence score with penalization
for introducing mismatches in the proposed new label and
adjustment for independency among the genotypes. This
confidence score is used to identify probable mix-ups.

Original language	English
Article number	370
Journal	International Journal of Bioscience, Biochemistry and Bioinformatics
Volume	4
Issue number	5
Pages (from-to)	355-360
Number of pages	5
ISSN	2010-3638
DOIs	https://doi.org/10.7763/IJBBB.2014.V4.370
Publication status	Published - 2014

ID: 120736068

Forskning

Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes