Closing the information gap in the pig genome

The pig industry around the world has made huge improvements in desirable traits thanks to the knowledge afforded by the sequencing of the pig genome, the first draft of which was published in 2012. Yet, in spite of these improvements, which include facilitating genomics-enabled breeding that has increased the rate of genetic gain in some programs by up to 35%, about 10% of the pig genome was missing.

“The IGF2 gene, which has an impact on muscling that I and others reported 17 years ago, was missing,” says Alan Archibald, Personal Chair of Mammalian Molecular Genetics, The Roslin Institute. “So was the CD163 gene, which encodes a molecule essential for infection by PRRSV. In one of our projects, we edited that gene and rendered pigs completely resistant to the virus. So, a number of key genes of interest to people in the breeding sector were absent from the genome sequence or only partially represented.”

While 10% may not seem very much to the outside eye—and clearly some remarkable discoveries were made without it—some projects lacked information (annotations) to make the very best decisions, for example, for gene-editing. And although the long-range information available was good, unresolved redundancies, short-range order and orientation errors, and associated misassembled genes could lead to information loss.

The paper presents two annotated highly-contiguous chromosome-level genome assemblies created with new long-read technologies and a whole-genome shotgun strategy. Both assemblies are of substantially higher (>90-fold) continuity and accuracy than the previous genome sequence. Together with the annotation of another 11 short-read assemblies, the new sequence provides a much needed base for genomic research in pigs.

For example, Aniek C. Bouwman et al.at Wageningen University in the Netherlands reported at the World Congress on Genetics Applied to Livestock Production that the new genome improved the accuracy of inferring genomic sequence from marker genotypes and thus improving genomic predictions.

Is this now the complete pig genome?

“No,” says Archibald. “Small bits are still missing but this is a substantial improvement. It’s 400-700 times more continuous. In genomes made up of strings of bases (letters), the technology we used could only read 900-1,000 bases at a time: short bursts of information. Assembling the’ jigsaw puzzle’ was a challenge. For the new genome, we read 1,0000-20,000 bases/letters at a time, so the pieces of the puzzle just got much bigger.”

Nonetheless, 120 gaps still remain in the sequence. Archibald believes some of the missing parts may be important in terms of how the chromosomes function, but not in terms of information content. In other words, not interesting, unique or useful to the geneticist, and highly repetitive so difficult to sequence; like assembling an all-blue sky in the aforementioned puzzle.


Gentec CEO Graham Plastow is a member of the Stakeholder Advisory Group for a project related to BovReg(another Gentec collaboration) in Europe, called GENE-SWitCH. Archibald contributed to designing the GENE-SWitCH project proposal, and is a member of the project team.

“The pig genome sequence is not a GENE-SWitCH outcome,” says Plastow. “But it’s highly relevant as the primary aims of GENE-SWitCH include adding value to the pig and chicken genomes through enhanced functional annotation, i.e. noting/identifying which parts of the genome have key functions such as encoding proteins or regulating when and where each gene is expressed.”


“Pork is the most popular of all meats and, with a growing global population, we need to improve the sustainability of food production. The improved knowledge of pigs’ genetic make-up will help farmers breed healthier and more productive animals,” says Archibald. “The sequence has been available for two years, so consumers might unknowingly have seen a benefit already. Improvements in and of themselves are modest but if you apply them across thousands of animals, the benefits add up.”