Aims and Scope
PGG.SNV is a user-friendly database for understanding the medical and evolutionary impacts of human SNVs on personal and population level. It now documents 71,080,900 human SNVs distribute on 200,000 human genomes representing 500 global populations, and provide comprehensive evolutionary annotation and interpretation, such as population prevalence, population differentiation and natural selection, for each genetic variant. Remarkably, the numerous human genomes from diverse ethnic groups make it feasible to seek the prevalence of candidate disease-associated variants with little ancestral biases, thus provide insight into health disparities between human populations. Besides contemporary human variants, the database includes genetic variants of ancient humans and archaic hominins with the age ranging from 400 to 40000 before present. This characteristic helps tract spatiotemporal trajectories of genetic variants associated with human adaptations to novel and changing environments, agricultural lifestyles, and pathogens. Moreover, PGG.SNV provides a dynamic feature for users to visualize their own data. The long-term aim of the PGG.SNV is to bridge evolutionary genetic studies to future precision medicine.
Data processing framework
The frame diagram below illustrates the data processing framework in PGG.SNV. The detailed description will appear in the paper associated with the database, which is coming out soon.
PGG.SNV data sets
The table below lists the data sets included in the PGG.SNV database.
Data set Abbr.Dataset Type No.genomes No.Populations Ancestries Populations Ancestries Description Link Ref.
PGG.SNV populations and ancestries
In the context of PGG.SNV, ethnicity or population affinity refers to a kind of “inherited” status of shared ancestry, language, history, society, culture or nation. Despite of being often correlated with genetic affinity of the group of people, ethnicity or population affinity is not defined based on genetic information. Typically, we assigned individuals into the following 8 geographical groups with ancestries derived from the continent where the individual is residing, such as African, American, Central Asian and Siberian, East Asian, Oceanian, South Asian, Southeast Asian, and West Eurasian. However, a population of known ancestry would be classified based on their major ancestry even the population are living in another continent. For example, the African American was treated as a population of African ancestry.
Population Description Ancestry Data set
How to obtain population information
This section provides the method to obtain population information. First go to https://www.pggsnv.org/statistics.html. Then go to “Distribution of included global ethnic groups” section of the link and click the “data view” button at the top corner of the figure. Copy and paste the text. Populations included in the database were grouped by regions. Each line represents one population with the information of population name, population description, data set name, ancestry, latitude and longitude were provided.
How to query a variant or gene or region
Currently, the database supports the following five types of search keys to query a variant or gene or region:
(1) Variant. The format of variant is Chromosome-Position-ReferenceAllele-AlternativeAllele, for example, 1-231557623-G-C;
(2) RSID. If you search with RSID, the database would return you one or more variants with Chromosome-Position-ReferenceAllele-AlternativeAllele that denoted by the ID. Click the exact variant you want to read through the annotation information in detail.
(3) Region. The format is Chromosome-StartPosition-EndPosition, for example 1:231499497-231560790.
(4) Gene. The format is an official gene name.
(5) Ensembl gene or transcript ID.
How to query a variant via WeChat (Mobile APP)
This section provides the method to query a variant via WeChat (https://www.wechat.com/en/).