A database for understanding evolutionary and medical implications of human SNVS on personal and population levels

Aims and Scope

PGG.SNV is a user-friendly database for understanding the medical and evolutionary impacts of human SNVs on personal and population level. It now documents 71,080,900 human SNVs distribute on 200,000 human genomes representing 500 global populations, and provide comprehensive evolutionary annotation and interpretation, such as population prevalence, population differentiation and natural selection, for each genetic variant. Remarkably, the numerous human genomes from diverse ethnic groups make it feasible to seek the prevalence of candidate disease-associated variants with little ancestral biases, thus provide insight into health disparities between human populations. Besides contemporary human variants, the database includes genetic variants of ancient humans and archaic hominins with the age ranging from 400 to 40000 before present. This characteristic helps tract spatiotemporal trajectories of genetic variants associated with human adaptations to novel and changing environments, agricultural lifestyles, and pathogens. Moreover, PGG.SNV provides a dynamic feature for users to visualize their own data. The long-term aim of the PGG.SNV is to bridge evolutionary genetic studies to future precision medicine.

Data processing framework

The frame diagram below illustrates the data processing framework in PGG.SNV. The detailed description will appear in the paper associated with the database, which is coming out soon.

PGG.SNV data sets

The table below lists the data sets included in the PGG.SNV database.

Data set	Abbr.Dataset	Type	No.genomes	No.Populations Ancestries	Populations Ancestries	Description	Link	Ref.

PGG.SNV populations and ancestries

In the context of PGG.SNV, ethnicity or population affinity refers to a kind of “inherited” status of shared ancestry, language, history, society, culture or nation. Despite of being often correlated with genetic affinity of the group of people, ethnicity or population affinity is not defined based on genetic information. Typically, we assigned individuals into the following 8 geographical groups with ancestries derived from the continent where the individual is residing, such as African, American, Central Asian and Siberian, East Asian, Oceanian, South Asian, Southeast Asian, and West Eurasian. However, a population of known ancestry would be classified based on their major ancestry even the population are living in another continent. For example, the African American was treated as a population of African ancestry.

Population	Description	Ancestry	Data set

How to obtain population information

This section provides the method to obtain population information. First go to https://www.pggsnv.org/statistics.html. Then go to “Distribution of included global ethnic groups” section of the link and click the “data view” button at the top corner of the figure. Copy and paste the text. Populations included in the database were grouped by regions. Each line represents one population with the information of population name, population description, data set name, ancestry, latitude and longitude were provided.

How to query a variant or gene or region

Currently, the database supports the following five types of search keys to query a variant or gene or region:
(1) Variant. The format of variant is Chromosome-Position-ReferenceAllele-AlternativeAllele, for example, 1-231557623-G-C;
(2) RSID. If you search with RSID, the database would return you one or more variants with Chromosome-Position-ReferenceAllele-AlternativeAllele that denoted by the ID. Click the exact variant you want to read through the annotation information in detail.
(3) Region. The format is Chromosome-StartPosition-EndPosition, for example 1:231499497-231560790.
(4) Gene. The format is an official gene name.
(5) Ensembl gene or transcript ID.

How to query a variant via WeChat (Mobile APP)

This section provides the method to query a variant via WeChat (https://www.wechat.com/en/).