I/O, file reading and writing, core formats (thunor.io)
The thunor.io module defines the package’s core dataset container,
thunor.io.HtsPandas, together with readers and writers for the HDF5
and Vanderbilt HTS formats.
The most important implementation detail is that experiment wells and control
wells are stored separately. Experiment wells live in doses plus assays;
control wells live in controls plus assays. This makes it possible to
keep control data available for viability normalisation and plotting without
forcing fake drug annotations onto untreated wells.
Use thunor.io.read_hdf() when you want to preserve the full internal data
model, and thunor.io.read_vanderbilt_hts() when importing tabular plate
exports. For a full description of the Vanderbilt HTS file format, including
column definitions and an example, see Vanderbilt HTS file format.
- class thunor.io.HtsPandas(doses, assays, controls)
High throughput screen dataset
Internally, the dataset is stored as three aligned pandas DataFrames:
doses— one row per experiment well, indexed by(drug, cell_line, dose). Thewell_idcolumn links toassays.assays— one row per (well, timepoint) measurement, indexed by(assay, well_id, timepoint).controls— one row per (control well, timepoint) measurement, indexed by(assay, cell_line, plate, well_id, timepoint).
Multi-dataset mode. When multiple datasets are combined (e.g. by Thunor Web), a
'dataset'level is prepended to the index of each DataFrame. All pipeline functions (dip_rates(),viability(),fit_params()) detect this level automatically and partition their work by dataset. HDF5 files that were saved with a'dataset'index level preserve it on load.- Parameters:
doses (pd.DataFrame) – DataFrame of doses
assays (pd.DataFrame) – DataFrame of assays
controls (pd.DataFrame) – DataFrame of controls
- cell_lines
List of cell lines in the dataset
- Type:
list
- drugs
List of drugs in the dataset
- Type:
list
- assay_names
List of assay names in the dataset
- Type:
list
- dip_assay_name
The assay name used for DIP rate calculations, e.g. “Cell count”
- Type:
str
- doses_unstacked()
Return the doses DataFrame with drug/dose tuples split into numbered columns
Converts the internal stacked representation (
drug,dosetuple index levels) into the flatdrug1,dose1,drug2,dose2, … column layout used in HDF5 files and required by external tools such as the synergy package for combination-dose matrices.- Returns:
Doses DataFrame indexed by
(drug1, cell_line, dose1)for single-drug datasets, or by(drug1, drug2, cell_line, dose1, dose2)for combination datasets.- Return type:
pd.DataFrame
- filter(cell_lines=None, drugs=None, plate=None)
Return a filtered copy of this dataset
Nonemeans no filter on that dimension.- Parameters:
cell_lines (Iterable, optional) – Cell line names to keep
drugs (Iterable, optional) – Drug names (strings) or drug tuples to keep
plate (str or Iterable, optional) – Plate identifier(s) to keep
- Returns:
A new dataset filtered using the supplied arguments
- Return type:
- plate(plate_name, plate_size=384, include_dip_rates=False)
Return a single plate in PlateData format
- Parameters:
plate_name (str) – The name of a plate in the dataset
plate_size (int) – The number of wells on the plate (default: 384)
include_dip_rates (bool) – Calculate and include DIP rates for each well if True
- Returns:
The plate data for the requested plate name
- Return type:
- class thunor.io.PlateData(width=24, height=16, dataset_name=None, plate_name=None, cell_lines=[], drugs=[], doses=[], dip_rates=[])
A High Throughput Screening Plate with Data
- exception thunor.io.PlateFileParseException
Raised when a plate data file cannot be parsed
- class thunor.io.PlateMap(**kwargs)
Representation of a High Throughput Screening plate
- Parameters:
kwargs (dict, optional) – Optionally supply “width” and “height” values for the plate
- col_iterator()
Iterate over the column numbers in the plate
- Returns:
Iterator over the column numbers (1, 2, 3, etc.)
- Return type:
Iterator of int
- property num_wells
Number of wells in the plate
- classmethod plate_size_from_num_wells(num_wells)
Calculate plate size from number of wells, assuming 3x2 ratio
- Parameters:
num_wells (int) – Number of wells in a plate
- Returns:
Width and height of plate (numbers of wells)
- Return type:
tuple
- row_iterator()
Iterate over the row letters in the plate
- Returns:
Iterator over the row letters (A, B, C, etc.)
- Return type:
Iterator of str
- well_id_to_name(well_id)
Convert a Well ID into a well name
Well IDs use a numerical counter from left to right, top to bottom, and are zero based.
- Parameters:
well_id (int) – Well ID on this plate
- Returns:
Name for this well, e.g. A1
- Return type:
str
- well_iterator()
Iterator over the plate’s wells
- Returns:
Iterator over the wells in the plate. Each well is given as a dict of ‘well’ (well ID), ‘row’ (row character) and ‘col’ (column number)
- Return type:
Iterator of dict
- well_list()
List of the plate’s wells
- Returns:
The return value of
well_iterator()as a list- Return type:
list
- well_name_to_id(well_name, raise_error=True)
Convert a well name to a Well ID
- Parameters:
well_name (str) – A well name, e.g. A1
raise_error (bool) – Raise an error if the well name is invalid if True (default), otherwise return -1 for invalid well names
- Returns:
Well ID for this well. See also
well_id_to_name()- Return type:
int
- thunor.io.read_hdf(filename_or_buffer)
Read a HtsPandas dataset from Thunor HDF5 format file
- Parameters:
filename_or_buffer (str or object) – Filename or buffer from which to read the data
- Returns:
Thunor HTS dataset
- Return type:
- thunor.io.read_incucyte(filename_or_buffer, plate_width=24, plate_height=16)
Read an Incucyte Zoom exported file
- Parameters:
filename_or_buffer (str or file-like object) – Path to an Incucyte Zoom TSV export file, or a file-like object
plate_width (int) – Width of the microtiter plate in wells (default: 24, for 384-well plate)
plate_height (int) – Height of the microtiter plate in wells (default: 16, for 384-well plate)
- Returns:
HTS Dataset containing the data read from the file
- Return type:
- thunor.io.read_vanderbilt_hts(file_or_source, *, plate_width=24, plate_height=16, sep=None, _unstacked=False)
Read a Vanderbilt HTS format file
See Vanderbilt HTS file format for a full description of the file format, including column definitions and an example.
- Parameters:
file_or_source (str or object) – Source for CSV data
plate_width (int) – Width of the microtiter plates (default: 24, for 384 well plate)
plate_height (int) – Width of the microtiter plates (default: 16, for 384 well plate)
sep (str) – Source file delimiter (default: detect from file extension)
- Returns:
HTS Dataset containing the data read from the CSV
- Return type:
- thunor.io.write_hdf(df_data, filename, dataset_format='fixed')
Save a dataset to Thunor HDF5 format
- Parameters:
df_data (HtsPandas) – HTS dataset
filename (str or io.BytesIO) – Output filename, or io.BytesIO instance for in-memory use
dataset_format (str) – One of ‘fixed’ or ‘table’. See pandas HDFStore docs for details
- thunor.io.write_vanderbilt_hts(df_data, filename, plate_width=24, plate_height=16, sep=None)
Write a Vanderbilt HTS format file
See Vanderbilt HTS file format for a full description of the file format, including column definitions and an example.
- Parameters:
df_data (HtsPandas) – HtsPandas - HTS dataset
filename (str or object) – filename or buffer to write into
plate_width (int) – plate width (number of wells)
plate_height (int) – plate height (number of wells)
sep (str) – Source file delimiter (default: detect from file extension)