Overview

DNA padlock probes are synthetic DNA molecules with unique features, such as high specificity and low cross-interactions between probes. This makes them ideal for multiplexed amplification of bisulfite DNA sequence targets to characterize epigenetic abnormalities [2]. The Integrative Genomics Lab at UCSD, which has been conducting research regarding characterizing methylation patterns of human stem cells, uses padlock probes extensively. DNA methylation is a crucial cell mechanism to control gene expression and successfully characterizing the methylome of stem cells is an important step in the future clinical application of stem cells.
Despite the strengths of padlock probes, the previous sets of probes used by the lab have shown a large variability in capture efficiency, with some probes having sequencing read counts in the tens of thousands while others having counts less than ten. A more uniform distribution of reads was necessary to ensure accurate results and to eliminate the time-consuming and costly measures that must be taken to use the current probes [3]. Improving the current probe designer algorithm used by The Integrative Genomics Lab by creating a better scoring function was the main goal of this project.
The final design was achieved by identifying parameters that are statistically correlated with the efficiency of each padlock probe and using machine learning techniques to produce a scoring function. The scoring function was able to explain 57.0% of the total variation of bisulfite probe capture efficiency.