I/O, file reading and writing, core formats (thunor.io)

The thunor.io module defines the package’s core dataset container, thunor.io.HtsPandas, together with readers and writers for the HDF5 and Vanderbilt HTS formats.

The most important implementation detail is that experiment wells and control wells are stored separately. Experiment wells live in doses plus assays; control wells live in controls plus assays. This makes it possible to keep control data available for viability normalisation and plotting without forcing fake drug annotations onto untreated wells.

Use thunor.io.read_hdf() when you want to preserve the full internal data model, and thunor.io.read_vanderbilt_hts() when importing tabular plate exports. For a full description of the Vanderbilt HTS file format, including column definitions and an example, see Vanderbilt HTS file format.

class thunor.io.HtsPandas(doses, assays, controls)

High throughput screen dataset

Internally, the dataset is stored as three aligned pandas DataFrames:

  • doses — one row per experiment well, indexed by (drug, cell_line, dose). The well_id column links to assays.

  • assays — one row per (well, timepoint) measurement, indexed by (assay, well_id, timepoint).

  • controls — one row per (control well, timepoint) measurement, indexed by (assay, cell_line, plate, well_id, timepoint).

Multi-dataset mode. When multiple datasets are combined (e.g. by Thunor Web), a 'dataset' level is prepended to the index of each DataFrame. All pipeline functions (dip_rates(), viability(), fit_params()) detect this level automatically and partition their work by dataset. HDF5 files that were saved with a 'dataset' index level preserve it on load.

Parameters:
  • doses (pd.DataFrame) – DataFrame of doses

  • assays (pd.DataFrame) – DataFrame of assays

  • controls (pd.DataFrame) – DataFrame of controls

cell_lines

List of cell lines in the dataset

Type:

list

drugs

List of drugs in the dataset

Type:

list

assay_names

List of assay names in the dataset

Type:

list

dip_assay_name

The assay name used for DIP rate calculations, e.g. “Cell count”

Type:

str

doses_unstacked()

Return the doses DataFrame with drug/dose tuples split into numbered columns

Converts the internal stacked representation (drug, dose tuple index levels) into the flat drug1, dose1, drug2, dose2, … column layout used in HDF5 files and required by external tools such as the synergy package for combination-dose matrices.

Returns:

Doses DataFrame indexed by (drug1, cell_line, dose1) for single-drug datasets, or by (drug1, drug2, cell_line, dose1, dose2) for combination datasets.

Return type:

pd.DataFrame

filter(cell_lines=None, drugs=None, plate=None)

Return a filtered copy of this dataset

None means no filter on that dimension.

Parameters:
  • cell_lines (Iterable, optional) – Cell line names to keep

  • drugs (Iterable, optional) – Drug names (strings) or drug tuples to keep

  • plate (str or Iterable, optional) – Plate identifier(s) to keep

Returns:

A new dataset filtered using the supplied arguments

Return type:

HtsPandas

plate(plate_name, plate_size=384, include_dip_rates=False)

Return a single plate in PlateData format

Parameters:
  • plate_name (str) – The name of a plate in the dataset

  • plate_size (int) – The number of wells on the plate (default: 384)

  • include_dip_rates (bool) – Calculate and include DIP rates for each well if True

Returns:

The plate data for the requested plate name

Return type:

PlateData

class thunor.io.PlateData(width=24, height=16, dataset_name=None, plate_name=None, cell_lines=[], drugs=[], doses=[], dip_rates=[])

A High Throughput Screening Plate with Data

exception thunor.io.PlateFileParseException

Raised when a plate data file cannot be parsed

class thunor.io.PlateMap(**kwargs)

Representation of a High Throughput Screening plate

Parameters:

kwargs (dict, optional) – Optionally supply “width” and “height” values for the plate

col_iterator()

Iterate over the column numbers in the plate

Returns:

Iterator over the column numbers (1, 2, 3, etc.)

Return type:

Iterator of int

property num_wells

Number of wells in the plate

classmethod plate_size_from_num_wells(num_wells)

Calculate plate size from number of wells, assuming 3x2 ratio

Parameters:

num_wells (int) – Number of wells in a plate

Returns:

Width and height of plate (numbers of wells)

Return type:

tuple

row_iterator()

Iterate over the row letters in the plate

Returns:

Iterator over the row letters (A, B, C, etc.)

Return type:

Iterator of str

well_id_to_name(well_id)

Convert a Well ID into a well name

Well IDs use a numerical counter from left to right, top to bottom, and are zero based.

Parameters:

well_id (int) – Well ID on this plate

Returns:

Name for this well, e.g. A1

Return type:

str

well_iterator()

Iterator over the plate’s wells

Returns:

Iterator over the wells in the plate. Each well is given as a dict of ‘well’ (well ID), ‘row’ (row character) and ‘col’ (column number)

Return type:

Iterator of dict

well_list()

List of the plate’s wells

Returns:

The return value of well_iterator() as a list

Return type:

list

well_name_to_id(well_name, raise_error=True)

Convert a well name to a Well ID

Parameters:
  • well_name (str) – A well name, e.g. A1

  • raise_error (bool) – Raise an error if the well name is invalid if True (default), otherwise return -1 for invalid well names

Returns:

Well ID for this well. See also well_id_to_name()

Return type:

int

thunor.io.read_hdf(filename_or_buffer)

Read a HtsPandas dataset from Thunor HDF5 format file

Parameters:

filename_or_buffer (str or object) – Filename or buffer from which to read the data

Returns:

Thunor HTS dataset

Return type:

HtsPandas

thunor.io.read_incucyte(filename_or_buffer, plate_width=24, plate_height=16)

Read an Incucyte Zoom exported file

Parameters:
  • filename_or_buffer (str or file-like object) – Path to an Incucyte Zoom TSV export file, or a file-like object

  • plate_width (int) – Width of the microtiter plate in wells (default: 24, for 384-well plate)

  • plate_height (int) – Height of the microtiter plate in wells (default: 16, for 384-well plate)

Returns:

HTS Dataset containing the data read from the file

Return type:

HtsPandas

thunor.io.read_vanderbilt_hts(file_or_source, *, plate_width=24, plate_height=16, sep=None, _unstacked=False)

Read a Vanderbilt HTS format file

See Vanderbilt HTS file format for a full description of the file format, including column definitions and an example.

Parameters:
  • file_or_source (str or object) – Source for CSV data

  • plate_width (int) – Width of the microtiter plates (default: 24, for 384 well plate)

  • plate_height (int) – Width of the microtiter plates (default: 16, for 384 well plate)

  • sep (str) – Source file delimiter (default: detect from file extension)

Returns:

HTS Dataset containing the data read from the CSV

Return type:

HtsPandas

thunor.io.write_hdf(df_data, filename, dataset_format='fixed')

Save a dataset to Thunor HDF5 format

Parameters:
  • df_data (HtsPandas) – HTS dataset

  • filename (str or io.BytesIO) – Output filename, or io.BytesIO instance for in-memory use

  • dataset_format (str) – One of ‘fixed’ or ‘table’. See pandas HDFStore docs for details

thunor.io.write_vanderbilt_hts(df_data, filename, plate_width=24, plate_height=16, sep=None)

Write a Vanderbilt HTS format file

See Vanderbilt HTS file format for a full description of the file format, including column definitions and an example.

Parameters:
  • df_data (HtsPandas) – HtsPandas - HTS dataset

  • filename (str or object) – filename or buffer to write into

  • plate_width (int) – plate width (number of wells)

  • plate_height (int) – plate height (number of wells)

  • sep (str) – Source file delimiter (default: detect from file extension)