Assessing the effects of subpopulations on the application of forensic DNA profiling
[Thesis]
Clark, Dan
University of Central Lancashire
2013
Thesis (Ph.D.)
2013
Currently, UK forensic service providers (FSPs) tend to employ three geographically-broad databases when estimating profile frequencies based on a standard SGM Plus® DNA profile. These estimations will typically include correction factors to take into account issues such as substructuring of populations and sampling inefficiencies. It has been shown previously that regional genetic variation within the UK 'Caucasian' population is negligible but consideration has to be made for profiles which may originate from an individual of a more genetically isolated population. Samples were collected from Indian, Pakistani and UK (white British) donors; as well as Kalash individuals, a small population from the Khyber Pakhtunkhwa region in the North West of Pakistan. These were profiled using the SGM Plus® and Identifiler® kits and databases for each population were compiled. The greatest pairwise FST was seen between the Kalash and Indian population at 2.9 %. Allele frequency data were collected for each population and each sample's profile frequency was estimated against all other databases to see whether samples reported a more conservative profile frequency (higher match probability) in their cognate database or in that of another population. A combined database comprising the Indian, Pakistani and previously published Bangladeshi data was also formed and used to calculate the level of correction required to make all samples of a population report a more conservative profile frequency in this combined database as opposed to their cognates. At the standard FST correction of 3 % - the minimum correction used by some FSPs, 94 % of the UK samples reported a more conservative profile frequency in the South Asian database; the lowest proportion that did so from all four populations. The Kalash dataset required the highest correction factor at FST = 12 % to make 100 % of samples report more conservative match probabilities when measured against the combined database. It was established that the current levels of correction applied to profile frequency calculations were more than sufficient; with random match probabilities remaining in the order of less than one in one billion for all samples in all databases with a correction of FST = 5 %. Although significant pairwise FST differences were observed as well as significant differentiation between populations across all SGM Plus® loci, no evidence of substructuring was detected using a program which employs a Bayesian probabilistic clustering approach, STRUCTURE, likely due to an insufficient number of samples and number of loci tested. Marked differences were seen in allele frequencies of the Kalash population, which also exhibited the highest affiliation to their cognate database, at least 80 %, with or without correction. AMOVA analysis also confirmed the greatest variance between groups was seen when the Kalash were kept as a separate entity from the other South Asian populations. Although current UK practice for applying FST correction prior to estimating STR match probabilities seems generous, there will be occasions when an estimation may appear less conservative when based on a broad database. Conversely, in this study, the one in one billion match probability ceiling threshold was not exceeded for any sample being compared to all databases. Therefore, although consideration should be given to a suspect's reference population prior to frequency estimation, the current correction factors applied should be sufficient in the vast majority of cases. In instances where partial profiles are obtained, this caused little effect on the estimation of geographic origin, compared to full profiles, with the populations used in this study.