The Hunt for the "Known Unknowns"

How Open Science is Unveiling Our Chemical World

Environmental Chemistry Mass Spectrometry Open Science

Imagine a detective with a database of every known criminal, tasked with finding suspects in a crowd. But what if the most intriguing individuals aren't in the system? This is the daily challenge for scientists identifying environmental chemicals.

The Invisible World Around Us

Every day, we encounter a complex chemical world far beyond what meets the eye. From the water we drink to the food we eat, countless chemical compounds exist in our environment—some beneficial, some harmful, and many whose identities and effects remain mysterious. Scientists call these mysterious substances "known unknowns"—chemicals we know might exist but cannot yet specifically identify 1 .

High-Resolution Mass Spectrometry

Revolutionizing environmental testing by allowing researchers to cast a much wider net for chemical identification 1 .

Open Science Movement

Transforming how we identify chemical mysteries and assess their potential impact on human and environmental health.

The Known Unknowns: Chemicals in Hiding

Known Unknowns

Substances in databases but unconfirmed in samples 1

Known Knowns

Chemicals we can target and confidently identify

Unknown Unknowns

Compounds completely absent from any database 2

Suspect Screening: The Detective's Magnifying Glass

The primary tool for finding these known unknowns is suspect screening. Unlike traditional targeted analysis that looks for specific chemicals, suspect screening allows researchers to screen for hundreds to thousands of chemicals of interest simultaneously 1 .

Database Compilation

Collect molecular formulas and exact masses from chemical databases

Mass Analysis

Use high-resolution mass spectrometry to detect compounds with matching masses

Structural Confirmation

Analyze fragmentation patterns to confirm chemical structures

The Challenge

Exact mass alone is insufficient for positive identification, as many chemicals can share the same mass but have different structures and toxicities.

Chemical Categories Comparison

Category Description Identification Approach Example
Known Knowns Well-characterized, target chemicals Targeted analysis Regulated pesticides
Known Unknowns Suspected or listed but unconfirmed Suspect screening Chemicals in databases without reference standards
Unknown Unknowns Completely unexpected compounds Non-targeted screening Novel transformation products
UVCBs Complex substances of variable composition Representative structures Surfactant mixtures, chlorinated paraffins

A Closer Look: The MSnLib Breakthrough

The Library of Chemical Fingerprints

One of the most exciting recent developments in identifying known unknowns comes from the laboratory of Dr. Tomáš Pluskal at IOCB Prague. Their team has created an extensive open library called MSnLib containing several million records showing how small molecules "break apart" when measured by mass spectrometry 5 .

Until this breakthrough, comparable databases expanded only very slowly, making it difficult to identify new or unusual compounds. The team's new approach allows data on unknown molecules to be obtained in a matter of minutes rather than years, opening potential for faster drug discovery, better environmental monitoring, and advances in artificial intelligence for biomedicine 5 .

MSnLib Project Scope

How the Experiment Works: Tracking the Chemical Fragments

Mass spectrometry works by breaking compounds into smaller fragments, creating a unique "fingerprint" for each substance. Traditional mass spectrometry typically breaks molecules once, but Pluskal's team uses multistage fragmentation (MSn)—repeatedly breaking molecules to obtain a more detailed view of their internal structure 5 .

MSnLib Methodology
  1. Compound Selection
    Curated selection of 70,000+ chemical compounds
  2. Multi-stage Fragmentation
    Repeated breaking of molecules using MSn technology
  3. Automated Processing
    Using open-source "mzmine" software for rapid analysis
  4. Data Integration
    Compiling 2+ million high-quality MSn spectra into open library
Time Efficiency

"We can measure ten compounds at once, and the entire process takes only a minute and a half" - Dr. Corinna Brungs 5

MSnLib Project Scale Comparison

Aspect Traditional Libraries MSnLib Achievement Future Goal
Timeframe 20 years of slow expansion Rapid generation in minutes Continuous expansion
Number of Compounds Limited 70,000 processed by 2025 200,000 by end of 2025
Spectral Data Limited MS2 data 2 million high-quality MSn spectra 10x expansion over traditional libraries
Access Often commercial or restricted Openly available to global community Maintained as open resource

The Scientist's Toolkit: Essential Tools for Chemical Detection

Identifying known unknown chemicals requires a sophisticated array of tools and technologies. These instruments and databases form the foundation of modern chemical detective work.

HR-MS

Function: Precisely measures molecular masses

Role: Distinguishes between compounds with similar masses

Multi-stage Fragmentation

Function: Repeatedly breaks molecules into fragments

Role: Provides detailed structural information through fragmentation patterns

Liquid Chromatography

Function: Separates complex mixtures before analysis

Role: Isolates individual compounds from environmental samples

CompTox Dashboard

Function: Open database of chemical properties

Role: Provides suspect lists and predicted properties for identification

GNPS

Function: Community-wide mass spectrometry repository

Role: Enables data sharing and collaborative identification

Machine Learning

Function: Pattern recognition in complex data

Role: Predicts chemical structures from spectral data

Conclusion: The Future of Chemical Discovery

The journey to identify "known unknown" chemicals represents one of the most exciting frontiers in environmental science and public health. As high-resolution mass spectrometry becomes increasingly sophisticated and open science resources like MSnLib continue to expand, our ability to pinpoint these chemical mysteries grows exponentially.

Current Progress
  • High-resolution mass spectrometry enables wider chemical screening
  • Open science resources like MSnLib accelerate discovery
  • Collaborative approaches improve identification confidence
  • Machine learning enhances pattern recognition in complex data
Future Directions
  • Expanding open databases with more chemical fingerprints
  • Improving AI algorithms for autonomous chemical identification
  • Developing better approaches for UVCB characterization
  • Enhancing global collaboration through open data sharing

What makes this field particularly promising is the collaborative spirit driving it forward. When scientists share data in open resources, the entire community benefits—enabling smarter suspect screening, higher confidence identifications, and ultimately better protection of both human and ecological health 1 . As Dr. Pluskal notes, the spectral library his team created is openly available to the global scientific community, breaking with previous practices of limited access 5 .

The challenges are still significant—from the complexity of UVCBs to the emergence of completely "unknown unknown" chemicals—but the tools and approaches are evolving at an unprecedented pace. The systematic, open approach to identifying known unknowns not only helps us understand our chemical environment but also paves the way for managing it more effectively for future generations.

As we continue to unveil the identities of these mysterious chemical entities, we move closer to a comprehensive understanding of our chemical environment—and how to safeguard it.

References