Scaffold Tree | SARvision | High Throughput Screening | Hierarchical Molecule Tree

Analyzing High Throughput Data Using Scaffold Trees and Data Grids

by Mark Hansen, Ph.D.

A hierarchical scaffold tree intuitively organizes molecules based on structure.

A hierarchical scaffold tree intuitively organizes molecules based on structure.

Scaffold trees are a great way to analyze large libraries of molecules such as screening libraries used in high throughput screening or virtual screening. The scaffold tree building algorithm performs maximum common substructures on groups of molecules, to recursively create scaffolds arranged in hierarchical trees. Simpler scaffold structures form on the top of the scaffold tree, while more complex scaffolds are added on each successive level of the tree. Each scaffold in the tree represents a group of molecules or chemotype based on scaffold substructure. Simpler scaffolds at the top of the tree represent large families of diverse molecules; traversing down the scaffold tree adds complexity at each level, where increasingly complex scaffolds cluster smaller subfamilies of more closely related molecules. This is the ideal way to find minimum common scaffolds that best represents an active group of molecules (chemotype).

Double clicking on any scaffold rebuilds the views on the right, such as the molecule spreadsheet and the molecule data grid. The default setting is to filter the views to display only the molecules that belong to the currently selected scaffold. Under each scaffold is a number that denotes the number of member molecules that belong to this scaffold in the current data-set. In addition to this membership number, activity data can be mapped onto the scaffold tree using a subset filter. When mapped on to the scaffold tree, activity data is shown as a fraction: ratio of actives (defined in the subset filter) over the total number of molecules for that scaffold family.

To start a high throughput data analysis, a molecule file with screening data should be loaded (main menu->File->Import molecules) into a molecule spreadsheet. A molecule data-grid (main menu->Insert->Data grid) is the best table to look at many molecules at once. In the data-grid control, the activity data of interest can be selected for display under the molecules. Right clicking on the column in the checkbox control will add a heat-map icon next to each molecule color coded by activity. Molecules can be sorted from most active to least active, from left to right, and then top to bottom.  The most active molecules group at the top of the table making it easy to browse for the most compelling molecules.

A molecular grid of molecules displaying heat-mapped HTS data.

A molecular grid of molecules displaying heat-mapped HTS data.

When browsing molecules like these, it is often hard to see chemical patterns in the molecules making the scaffold tree extremely useful add on to analyze HTS data. In the scaffold pane (left), a scaffold tree can be automatically generated (scaffold pane: right click->Identify scaffolds) to create scaffold substructures to group the current molecules by chemotype.  Selecting a scaffold (double clicking) chooses the subsets of molecules that belong to this scaffold, displays these molecules in the Data Grid, re-oriented and color coded relative to the parent scaffold.

A molecule grid of heat-mapped HTS data filter by scaffold structure or chemotype.

A molecule grid of heat-mapped HTS data filter by scaffold structure or chemotype.

Activity data can be mapped onto the scaffold tree by creating a subset in the lower left Subsets panel. Add an activity column to the subset and set a value range that defines active compounds.  The subset can now be defined in the scaffold tree filter (top) and the data-grid can be filtered (control bar upper right) by the scaffold tree + subset filter. Below each scaffold in the scaffold tree are now displayed ratios of compounds defined as active (subset 1) over the total number of compounds in the scaffold group. Navigating the hierarchical scaffold tree and double clicking any scaffold that contains active molecules will reset the table and display the scaffold+activity filtered set of molecules.

HTS or activity data can be mapped onto the hierarchical scaffold tree to help isolate active molecules.

HTS or activity data can be mapped onto the hierarchical scaffold tree to help isolate active molecules.

Interesting scaffolds can be ‘cherry picked’ by copying the scaffold (scaffold: right click->Copy), and pasting it outside the folder on the top level of the scaffold tree (scaffold: right click->Paste).  Scaffolds can be edited (scaffold: right click->Edit scaffold) and reoriented to display the molecules in a more appealing orientation for presentation. Scaffolds can be repeatedly copied from the folder or even drawn from scratch at the user discretion until scaffolds for all active molecules are represented. Toggling the filter control in the table panel in the upper right can show all molecules that belong to the scaffold or only those that belong to the scaffold and activity subset. By expanding to all molecules in the scaffold family, early structure activity relationship can be observed.  A demonstrated SAR trend in the data for a set of lead molecules gives confidence that the series is a well behaved set of molecules. This suggest that the compounds can be further refined, as opposed to a dead end series that does not reproduce activity well.

tree_5.PNG
Previous
Previous

Molecule Data Grid

Next
Next

R-Group Tables