[an error occurred while processing this directive]

Hereditas(Beijing) ›› 2025, Vol. 47 ›› Issue (10): 1156-1168.doi: 10.16288/j.yczz.25-009

• Research Article • Previous Articles     Next Articles

Geographical inference of dust from typical Chinese cities based on metagenomic shotgun sequencing

Qi Yang1,2(), Kelai Kang2, Bo Zhao3,4, Kai Feng3,4, Yaosen Feng2, Jian Ye1,2(), Ye Deng3,4(), Le Wang2()   

  1. 1. People’s Public Security University of China, Beijing 100038, China
    2. Key Laboratory of Forensic Genetics, Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
    3. Key Laboratory of Environmental Biotechnology of CAS, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
    4. College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2025-02-12 Revised:2025-03-20 Online:2025-10-20 Published:2025-03-21
  • Contact: Jian Ye, Ye Deng, Le Wang E-mail:458425932@qq.com;yejian77@126.com;yedeng@rcees.ac.cn;wangle_02@163.com
  • Supported by:
    Institute of Forensic Science, Ministry of Public Security of China(2022JB022);Ministry of Public Security of China(2023JC14)

Abstract:

Microbial profiles in dust are closely correlated with geographical locations and provide valuable clues for criminal investigation, demonstrating significant potential in forensic use. However, the feasibility of using microbial profiles from metagenomics datasets to infer the geographical locations remains underexplored. In this study, we collect 170 dust samples from resident communities in four cities across northern, eastern, southwestern, and northwestern China. All samples are subjected to shotgun metagenomic sequencing to reveal variations in microbial composition. In total, 41,029 species are annotated, including 93.39% bacteria, 6.37% eukaryotes, 0.21% viruses, and 0.03% archaea. Clear clustering patterns are observed among the four cities (R2=0.870, P<0.001). Further filtering of species with detection rates below 10% across all samples strengthens city-level clustering (R2=0.948, P<0.001). Additionally, 127 biomarkers are identified using linear discriminant analysis effect size (LEfSe) to distinguish between the cities. Each city harbors a distinct microbial community, with unique species and relatively abundant taxa that contribute to its differentiated microbial profile. All samples are randomly split into training and testing sets in a 7:3 ratio. Five machine learning models including SourceTracker, FEAST, LightGBM, Random Forest and Support Vector Machine are applied to 51 randomly sample data and achieve average accuracies of 88.89%, 92.16%, 98.04%, 99.35% and 69.28%, respectively. These results constitute a microbial genetic map of four cities in China that highlights distinct microbial taxonomic signatures and provides an approach for city-scale source tracking of dust samples.

Key words: Dust, metagenomic shotgun sequencing, microbial composition, geographical inference