How Open Science is Unveiling Our Chemical World
Imagine a detective with a database of every known criminal, tasked with finding suspects in a crowd. But what if the most intriguing individuals aren't in the system? This is the daily challenge for scientists identifying environmental chemicals.
Every day, we encounter a complex chemical world far beyond what meets the eye. From the water we drink to the food we eat, countless chemical compounds exist in our environment—some beneficial, some harmful, and many whose identities and effects remain mysterious. Scientists call these mysterious substances "known unknowns"—chemicals we know might exist but cannot yet specifically identify 1 .
Revolutionizing environmental testing by allowing researchers to cast a much wider net for chemical identification 1 .
Transforming how we identify chemical mysteries and assess their potential impact on human and environmental health.
Substances in databases but unconfirmed in samples 1
Chemicals we can target and confidently identify
Compounds completely absent from any database 2
The primary tool for finding these known unknowns is suspect screening. Unlike traditional targeted analysis that looks for specific chemicals, suspect screening allows researchers to screen for hundreds to thousands of chemicals of interest simultaneously 1 .
Collect molecular formulas and exact masses from chemical databases
Use high-resolution mass spectrometry to detect compounds with matching masses
Analyze fragmentation patterns to confirm chemical structures
Exact mass alone is insufficient for positive identification, as many chemicals can share the same mass but have different structures and toxicities.
| Category | Description | Identification Approach | Example |
|---|---|---|---|
| Known Knowns | Well-characterized, target chemicals | Targeted analysis | Regulated pesticides |
| Known Unknowns | Suspected or listed but unconfirmed | Suspect screening | Chemicals in databases without reference standards |
| Unknown Unknowns | Completely unexpected compounds | Non-targeted screening | Novel transformation products |
| UVCBs | Complex substances of variable composition | Representative structures | Surfactant mixtures, chlorinated paraffins |
One of the most exciting recent developments in identifying known unknowns comes from the laboratory of Dr. Tomáš Pluskal at IOCB Prague. Their team has created an extensive open library called MSnLib containing several million records showing how small molecules "break apart" when measured by mass spectrometry 5 .
Until this breakthrough, comparable databases expanded only very slowly, making it difficult to identify new or unusual compounds. The team's new approach allows data on unknown molecules to be obtained in a matter of minutes rather than years, opening potential for faster drug discovery, better environmental monitoring, and advances in artificial intelligence for biomedicine 5 .
Mass spectrometry works by breaking compounds into smaller fragments, creating a unique "fingerprint" for each substance. Traditional mass spectrometry typically breaks molecules once, but Pluskal's team uses multistage fragmentation (MSn)—repeatedly breaking molecules to obtain a more detailed view of their internal structure 5 .
"We can measure ten compounds at once, and the entire process takes only a minute and a half" - Dr. Corinna Brungs 5
| Aspect | Traditional Libraries | MSnLib Achievement | Future Goal |
|---|---|---|---|
| Timeframe | 20 years of slow expansion | Rapid generation in minutes | Continuous expansion |
| Number of Compounds | Limited | 70,000 processed by 2025 | 200,000 by end of 2025 |
| Spectral Data | Limited MS2 data | 2 million high-quality MSn spectra | 10x expansion over traditional libraries |
| Access | Often commercial or restricted | Openly available to global community | Maintained as open resource |
Identifying known unknown chemicals requires a sophisticated array of tools and technologies. These instruments and databases form the foundation of modern chemical detective work.
Function: Precisely measures molecular masses
Role: Distinguishes between compounds with similar masses
Function: Repeatedly breaks molecules into fragments
Role: Provides detailed structural information through fragmentation patterns
Function: Separates complex mixtures before analysis
Role: Isolates individual compounds from environmental samples
Function: Open database of chemical properties
Role: Provides suspect lists and predicted properties for identification
Function: Community-wide mass spectrometry repository
Role: Enables data sharing and collaborative identification
Function: Pattern recognition in complex data
Role: Predicts chemical structures from spectral data
The journey to identify "known unknown" chemicals represents one of the most exciting frontiers in environmental science and public health. As high-resolution mass spectrometry becomes increasingly sophisticated and open science resources like MSnLib continue to expand, our ability to pinpoint these chemical mysteries grows exponentially.
What makes this field particularly promising is the collaborative spirit driving it forward. When scientists share data in open resources, the entire community benefits—enabling smarter suspect screening, higher confidence identifications, and ultimately better protection of both human and ecological health 1 . As Dr. Pluskal notes, the spectral library his team created is openly available to the global scientific community, breaking with previous practices of limited access 5 .
The challenges are still significant—from the complexity of UVCBs to the emergence of completely "unknown unknown" chemicals—but the tools and approaches are evolving at an unprecedented pace. The systematic, open approach to identifying known unknowns not only helps us understand our chemical environment but also paves the way for managing it more effectively for future generations.
As we continue to unveil the identities of these mysterious chemical entities, we move closer to a comprehensive understanding of our chemical environment—and how to safeguard it.