link to AFSC home page
Mobile users can use the Site Map to access the principal pages

link to AFSC home page link to NMFS home page link to NOAA home page

The Federal Government is currently shutdown. This site will remain available, however, it will not be updated and we will not be able to respond to inquiries until appropriations are enacted. Visit Commerce.gov to learn more.

NOAA Technical Memorandum NMFS-AFSC-283

Publications Overview
Pubs Database
New Publications
Poster Presentations
Processed Reports
Quarterly Report:
Current Issue
Archives
Index
Feature Articles
Feature Archives
RACE Cruise Archives
Reports to Industry
Stock Assessments
Tech Memos
Yearly Lists

A Bayesian cross-validation approach to evaluate genetic baselines and forecast the necessary number of informative single nucleotide polymorphisms

Abstract

The determination of the origin of individuals from a mixture composed of multiple populations is becoming a routine tool for the management and study of an increasing number of taxa. It is accomplished by applying statistical methods and a reference genetic baseline whose accuracy and precision must be evaluated to determine its utility. Earlier evaluation methods that used simulated mixtures from a baseline with standard maximum likelihood (ML) methods for mixed-stock analysis (MSA) provided optimistic evaluations of baselines. More recent methods address the optimism but are based solely on ML methods and either do not accommodate potentially informative haploid data or require larger datasets than are available or possible.

We used data from a developing baseline for chum salmon (Oncorhynchus keta) that includes single nucleotide polymorphisms (SNPs) and microsatellites to produce a method that we call ‘leave- ten-percent-out cross-validation’ (LTO). This method avoids optimism in baseline evaluation, uses only observed multi-locus genotypes, accepts haploid and diploid data, applies Bayesian methods of MSA, and is less dependent on large baseline sample sizes. In order to further guide the development of genetic baselines, we also simulated increasing numbers of SNP loci and used LTO and logistic regression to estimate the number of informative SNP loci that would be necessary to achieve a specified rate of correct individual assignment.


View Online  (.pdf, 4.6 MB).
 


            | Home | Site Map | Contact Us | FOIA | Privacy | Disclaimer | USA.gov | Accessibility | Print |           doc logo