DataFrame Integration

CNotebook provides seamless integration with Pandas and Polars DataFrames through the chem accessor, enabling chemistry-aware operations on molecular data.

Pandas Integration

Prerequisites

pip install "cnotebook[pandas]"

Creating Molecule Columns

Convert SMILES strings to molecule objects:

import cnotebook
import oepandas as oepd
import pandas as pd

df = pd.DataFrame({
    "Name": ["Benzene", "Pyridine", "Pyrimidine"],
    "Molecule": ["c1ccccc1", "c1cnccc1", "n1cnccc1"]
})

# Convert SMILES column to molecules (in place)
df.chem.as_molecule("Molecule", inplace=True)

# Display the DataFrame
df

You should see:

	Name	Molecule
0	Benzene
1	Pyridine
2	Pyrimidine

Note

Molecule columns automatically render as chemical structures when displaying the DataFrame. This also works for anything that returns a DataFrame, such as df.head().

Creating Query Columns

OEPandas query columns use QueryDtype and render through the same CNotebook depiction path as molecule columns:

import cnotebook
import oepandas as oepd
import pandas as pd

queries = pd.DataFrame({
    "Name": ["Alcohol query", "Nitrogen query"],
    "Query": ["[#6]-[#8]", "[#7]"],
})

queries.chem.as_query("Query", inplace=True)
queries["Query"].chem.set_render_options(image_format="svg", title=False)
queries

Substructure Highlighting

You can highlight a substructures within a DataFrame column using SMARTS. By default this uses ball-and-stick-style highlighting that supports overlapping matches, using the oechem.OEGetLightColors() scheme.

# Highlight a single pattern using the DataFrame above
df.Molecule.chem.highlight("c1ccccc1")  # Highlight benzene rings

# Display the DataFrame
df

Outputs:

	Name	Molecule
0	Benzene
1	Pyridine
2	Pyrimidine

Note

Highlighting persists until you remove it with df.chem.clear_formatting_rules() for a DataFrame or df["column_name"].clear_formatting_rules().

You can also use any of the normal highlighting capabilities:

# Highlight using normal stick highlighting
df.Molecule.chem.highlight("c1ccccc1", style=oedepict.OEHighlightStyle_Stick, color=oechem.OELightBlue)

# Display the DataFrame
df

	Name	Molecule
0	Benzene
1	Pyridine
2	Pyrimidine

Finally, you can highlight a substructure based on the value in another column, rather than a fixed pattern:

# Add a different pattern for each row
df["Pattern"] = ["cc", "cnc", "ncn"]

# Highlight using that pattern with a slightly different style
df.chem.highlight_using_column("SMILES", "Pattern")

	Name	Pattern
0	Benzene	cc
1	Pyridine	cnc
2	Pyrimidine	ncn

Note that this creates a new column with display objects instead of molecules.

Molecular Alignment

Align molecule depictions for visual comparison:

import cnotebook
import oepandas as oepd

# Read the example unaligned molecules
df = oepd.read_sdf("examples/assets/rotations.sdf", no_title=True)

# Rename the "Molecule" column to "Original" so that we can
# see the original unaligned molecules
df = df.rename(columns={"Molecule": "Original"})

# Create a new molecule column called "Aligned" so that we can
# see the aligned molecules
df["Aligned"] = df.Original.chem.copy_molecules()

# Align the depictions based on the first molecule
# By default this does a path fingerprint-based alignment
df.Aligned.chem.align_depictions("first")

# Show the structures
df.head()

Outputs:

The align_depictions method accepts the following parameters:

Parameter

Description

Default

ref

Alignment reference. Can be:

"first" - Use the first molecule as reference
oechem.OEMolBase - A molecule to use as reference
oechem.OESubSearch - A substructure search pattern
oechem.OEMCSSearch - A maximum common substructure search
str - A SMARTS pattern

Required

method

Alignment method. Can be:

"substructure" / "ss" - Substructure-based alignment
"mcss" - Maximum common substructure alignment
"fingerprint" / "fp" - Fingerprint-based alignment
None - Auto-detect based on ref type

None (auto)

Fingerprint Similarity

Color molecules by similarity to a reference:

import cnotebook
import oepandas as oepd
from openeye import oechem, oedepict

# Read the example EGFR molecule file
df = oepd.read_smi("examples/assets/egfr.smi")

# Use Gefitinib as a reference
# We call oedepict.OEPrepareDepiction to give the SMILES string nice 2D coordinates
gefitinib = oechem.OEGraphMol()
oechem.OESmilesToMol(gefitinib, "COc1cc2c(cc1OCCCN3CCOCC3)c(ncn2)Nc4ccc(c(c4)Cl)F")

# Align all molecules to Gefitinib
df.Molecule.chem.align_depictions(gefitinib)

# Show similarity to Gefitinib
df.chem.fingerprint_similarity("Molecule", gefitinib, inplace=True)

# Show just the fingerprint similarity
df[["reference_similarity", "target_similarity"]]

This will output aligned structures that are colored by fingerprint similarity for both the target and reference:

Polars Integration

Prerequisites

pip install "cnotebook[polars]"

Creating Molecule Columns

Convert SMILES strings to molecule objects:

import cnotebook
import oepolars as oeplr
import polars as pl

df = pl.DataFrame({
    "Name": ["Benzene", "Pyridine", "Pyrimidine"],
    "smiles": ["c1ccccc1", "c1cnccc1", "n1cnccc1"]
}).chem.as_molecule("smiles")

# Display the DataFrame
df

This outputs the same styled DataFrame (depending on whether you are using Jupyter or Marimo):

DataFrames display automatically in both Jupyter and Marimo when left as a bare statement at the end of a cell.

Reading Molecule Files

Read molecules directly from files:

# Read from SMILES file
df = oeplr.read_smi("molecules.smi")

# Read from SDF file
df = oeplr.read_sdf("molecules.sdf")

Substructure Highlighting

Substructure highlighting must be done at the DataFrame level in Polars due to it’s architecture. It works exactly the same was as Pandas otherwise. By default this uses ball-and-stick-style highlighting that supports overlapping matches, using the oechem.OEGetLightColors() scheme.

# Highlight a single pattern using the DataFrame above
df.chem.highlight("smiles", "c1ccccc1")  # Highlight benzene rings

# Display the DataFrame
df

Outputs:

Note

Highlighting persists until you remove it with df.chem.clear_formatting_rules()

Finally, you can highlight a substructure based on the value in another column, rather than a fixed pattern:

# Add a column with patterns to highlight
df = df.with_columns(
    pl.Series("Pattern", ["cc", "cnc", "ncn"])
)

# Highlight each row using the Pattern column
df = df.chem.highlight_using_column("smiles", "Pattern")

# Display the DataFrame
df

Molecular Alignment

Align molecule depictions:

import cnotebook
import oepolars as oepl

# Read the example unaligned molecules
df = oepl.read_sdf("examples/assets/rotations.sdf", no_title=True)

# # Rename the "Molecule" column to "Original" so that we can
# # see the original unaligned molecules
df = df.rename({"Molecule": "Original"})

# # Create a new molecule column called "Aligned" so that we can
# # see the aligned molecules
df = df.chem.copy_molecules("Original", "Aligned")

# Align the depictions based on the first molecule
# By default this does a path fingerprint-based alignment
df["Aligned"].chem.align_depictions("first")

# Show the structures
df.head()

This will output:

The align_depictions method accepts the following parameters:

Parameter

Description

Default

ref

Alignment reference. Can be:

"first" - Use the first molecule as reference
oechem.OEMolBase - A molecule to use as reference
oechem.OESubSearch - A substructure search pattern
oechem.OEMCSSearch - A maximum common substructure search
str - A SMARTS pattern

Required

method

Alignment method. Can be:

"substructure" / "ss" - Substructure-based alignment
"mcss" - Maximum common substructure alignment
"fingerprint" / "fp" - Fingerprint-based alignment
None - Auto-detect based on ref type

None (auto)

Fingerprint Similarity

Color molecules by similarity:

import cnotebook
import oepolars as oepl
from openeye import oechem, oedepict

# Read the example EGFR molecule file
df = oepl.read_smi("examples/assets/egfr.smi")

# Use Gefitinib as a reference
# We call oedepict.OEPrepareDepiction to give the SMILES string nice 2D coordinates
gefitinib = oechem.OEGraphMol()
oechem.OESmilesToMol(gefitinib, "COc1cc2c(cc1OCCCN3CCOCC3)c(ncn2)Nc4ccc(c(c4)Cl)F")

# Align all molecules to Gefitinib
df["Molecule"].chem.align_depictions(gefitinib)

# Show similarity to Gefitinib
df = df.chem.fingerprint_similarity("Molecule", gefitinib, inplace=True)

# # Show just the fingerprint similarity
df[["reference_similarity", "target_similarity"]]

This outputs:

MolGrid from DataFrames

Create interactive molecule grids from DataFrames:

Pandas:

from cnotebook import MolGrid
import oepandas as oepd

# Read the example EGFR molecule file
df = oepd.read_smi("examples/assets/egfr.smi")

# 1. Create a molecule grid with all data
grid = df.chem.molgrid("Molecule")

# 2. Create a molecule grid with only the molecule series (no data)
# df = df["Molecule"].chem.molgrid()

# Display the grid
grid.display()

This outputs:

Polars:

The exact same code works above, just swap out oepolars for oepandas:

from cnotebook import MolGrid
import oepolars as oepl

# Read the example EGFR molecule file
df = oepl.read_smi("examples/assets/egfr.smi")

# 1. Create a molecule grid with all data
grid = df.chem.molgrid("Molecule")

# 2. Create a molecule grid with only the molecule series (no data)
# df = df["Molecule"].chem.molgrid()

# Display the grid
grid.display()

You should get the exact same molecule grid as with Pandas.

See the molgrid-class documentation for more details on MolGrid features.

Best Practices

Memory Management: For large datasets, consider using molecule indices rather than storing full molecule objects in memory.
Performance: Use PNG format for faster rendering of large DataFrames. SVG provides better quality but may be slower for many molecules.
Column Naming: Use descriptive column names and avoid conflicts with reserved names like “Molecule” when possible.
Lazy Evaluation: When using Polars, take advantage of lazy evaluation for complex operations on large datasets.

	Original	Aligned
0
1
2

	reference_similarity	target_similarity
0
1
2
3
4

Original	Aligned

reference_similarity	target_similarity