Dose Response Curve Fitting (thunor.curve_fit)

This module fits Hill/log-logistic dose-response curves and derives summary parameters from them. The usual high-level entry point is thunor.curve_fit.fit_params().

Internally, fitting code tries to remain robust in the face of sparse or noisy screening data. Fits may be performed in an alternative parameterisation for better numerical stability, doses can be rescaled before optimisation, and underdetermined fits return None rather than exposing partially fitted results.

exception thunor.curve_fit.AAFitWarning

Warning issued when an activity area value may be unreliable

exception thunor.curve_fit.AUCFitWarning

Warning issued when an AUC value may be unreliable

exception thunor.curve_fit.DrugCombosWarning

Warning issued when drug combination wells are skipped during fitting

fit_params_minimal() currently fits single-drug dose-response curves only. Combination wells (where the drug tuple has length > 1) are filtered out and this warning is issued. Future versions will support combination fitting via a dedicated code path; the skip-and-warn behaviour is intentional and will not change when that support lands.

class thunor.curve_fit.HillCurve(popt)

Base class defining Hill/log-logistic curve functionality

null_response_fn(axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)

Compute the arithmetic mean along the specified axis.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.

Parameters:
  • a (array_like) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

  • axis (None or int or tuple of ints, optional) –

    Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.

    If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.

  • dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.

  • out (ndarray, optional) – Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See ufuncs-output-type for more details. See ufuncs-output-type for more details.

  • keepdims (bool, optional) –

    If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

    If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

  • where (array_like of bool, optional) –

    Elements to include in the mean. See ~numpy.ufunc.reduce for details.

    Added in version 1.20.0.

Returns:

m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.

Return type:

ndarray, see dtype parameter above

See also

average

Weighted average

std, var, nanmean, nanstd, nanvar

Notes

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.

By default, float16 results are computed using float32 intermediates for extra precision.

Examples

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> np.mean(a)
2.5
>>> np.mean(a, axis=0)
array([2., 3.])
>>> np.mean(a, axis=1)
array([1.5, 3.5])

In single precision, mean can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
np.float32(0.54999924)

Computing the mean in float64 is more accurate:

>>> np.mean(a, dtype=np.float64)
0.55000000074505806 # may vary

Computing the mean in timedelta64 is available:

>>> b = np.array([1, 3], dtype="timedelta64[D]")
>>> np.mean(b)
np.timedelta64(2,'D')

Specifying a where argument:

>>> a = np.array([[5, 9, 13], [14, 10, 12], [11, 15, 19]])
>>> np.mean(a)
12.0
>>> np.mean(a, where=[[True], [False], [False]])
9.0
class thunor.curve_fit.HillCurveLL2(popt)

Two-parameter log-logistic Hill curve; E0 fixed at 1, Emax fixed at 0

classmethod fit_fn(x, b, e)

Two parameter log-logistic function (“Hill curve”)

Parameters:
  • x (np.ndarray) – One-dimensional array of “x” values

  • b (float) – Hill slope

  • e (float) – EC50 value

Returns:

Array of “y” values using the supplied curve fit parameters on “x”

Return type:

np.ndarray

classmethod fit_fn_log(x, b, log_e)

LL2 Hill curve with log-transformed EC50.

classmethod initial_guess(x, y)

Heuristic function for initial fit values

Uses the approach followed by R’s drc library: https://cran.r-project.org/web/packages/drc/index.html

Parameters:
  • x (np.ndarray) – Array of “x” (dose) values

  • y (np.ndarray) – Array of “y” (response) values

Returns:

Four-valued list corresponding to initial estimates of the parameters defined in the ll4() function.

Return type:

list

classmethod initial_guess_log(x, y, dose_scale=1.0)

Initial guess in log-EC50 space, accounting for dose scaling.

classmethod jac_fn_log(x, b, log_e)

Analytic Jacobian of LL2 w.r.t. (b, log_e).

classmethod transform_popt_from_log(popt, dose_scale=1.0)

Back-transform popt from log-EC50 space to linear EC50.

classmethod undo_dose_scale(popt, dose_scale)

Undo dose scaling on linear-space popt (EC50 is at index 1).

class thunor.curve_fit.HillCurveLL3u(popt)

Three parameter log logistic curve, for viability data

classmethod fit_fn(x, b, c, e)

Three parameter log-logistic function (“Hill curve”)

Parameters:
  • x (np.ndarray) – One-dimensional array of “x” values

  • b (float) – Hill slope

  • c (float) – Maximum response (lower plateau)

  • e (float) – EC50 value

Returns:

Array of “y” values using the supplied curve fit parameters on “x”

Return type:

np.ndarray

classmethod fit_fn_log(x, b, c, log_e)

LL3u Hill curve with log-transformed EC50.

classmethod initial_guess(x, y)

Heuristic function for initial fit values

Uses the approach followed by R’s drc library: https://cran.r-project.org/web/packages/drc/index.html

Parameters:
  • x (np.ndarray) – Array of “x” (dose) values

  • y (np.ndarray) – Array of “y” (response) values

Returns:

Four-valued list corresponding to initial estimates of the parameters defined in the ll4() function.

Return type:

list

classmethod initial_guess_log(x, y, dose_scale=1.0)

Initial guess in log-EC50 space, accounting for dose scaling.

classmethod jac_fn_log(x, b, c, log_e)

Analytic Jacobian of LL3u w.r.t. (b, c, log_e).

max_fit_evals = None
static null_response_fn(_)

Compute the arithmetic mean along the specified axis.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.

Parameters:
  • a (array_like) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

  • axis (None or int or tuple of ints, optional) –

    Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.

    If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.

  • dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.

  • out (ndarray, optional) – Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See ufuncs-output-type for more details. See ufuncs-output-type for more details.

  • keepdims (bool, optional) –

    If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

    If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

  • where (array_like of bool, optional) –

    Elements to include in the mean. See ~numpy.ufunc.reduce for details.

    Added in version 1.20.0.

Returns:

m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.

Return type:

ndarray, see dtype parameter above

See also

average

Weighted average

std, var, nanmean, nanstd, nanvar

Notes

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.

By default, float16 results are computed using float32 intermediates for extra precision.

Examples

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> np.mean(a)
2.5
>>> np.mean(a, axis=0)
array([2., 3.])
>>> np.mean(a, axis=1)
array([1.5, 3.5])

In single precision, mean can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
np.float32(0.54999924)

Computing the mean in float64 is more accurate:

>>> np.mean(a, dtype=np.float64)
0.55000000074505806 # may vary

Computing the mean in timedelta64 is available:

>>> b = np.array([1, 3], dtype="timedelta64[D]")
>>> np.mean(b)
np.timedelta64(2,'D')

Specifying a where argument:

>>> a = np.array([[5, 9, 13], [14, 10, 12], [11, 15, 19]])
>>> np.mean(a)
12.0
>>> np.mean(a, where=[[True], [False], [False]])
9.0
classmethod transform_popt_from_log(popt, dose_scale=1.0)

Back-transform popt from log-EC50 space to linear EC50.

classmethod undo_dose_scale(popt, dose_scale)

Undo dose scaling on linear-space popt (EC50 is at index 2).

class thunor.curve_fit.HillCurveLL4(popt)

Four-parameter log-logistic Hill curve (E0, Emax, EC50, hill slope)

aa(min_conc, max_conc)

Find the activity area (area over the curve)

Parameters:
  • min_conc (float) – Minimum concentration to consider for fitting the curve

  • max_conc (float) – Maximum concentration to consider for fitting the curve

Returns:

Activity area value, or None for stimulatory responses (Emax > E0) which are not yet supported.

Return type:

float or None

auc(min_conc)

Find the area under the curve

Parameters:

min_conc (float) – Minimum concentration to consider for fitting the curve

Returns:

Area under the curve (AUC) value, or None for stimulatory responses (Emax > E0) which are not yet supported.

Return type:

float or None

ec(ec_num=50)

Find the effective concentration value (e.g. IC50)

Parameters:

ec_num (int) – EC number between 0 and 100 (response level)

Returns:

Effective concentration value for requested response value

Return type:

float

classmethod fit_fn(x, b, c, d, e)

Four parameter log-logistic function (“Hill curve”)

Parameters:
  • x (np.ndarray) – One-dimensional array of “x” values

  • b (float) – Hill slope

  • c (float) – Maximum response (lower plateau)

  • d (float) – Minimum response (upper plateau)

  • e (float) – EC50 value

Returns:

Array of “y” values using the supplied curve fit parameters on “x”

Return type:

np.ndarray

classmethod fit_fn_log(x, b, c, d, log_e)

LL4 Hill curve with log-transformed EC50 (log_e = log(e)).

Using log(EC50) as the optimisation parameter removes the positivity constraint on e, enabling the faster Levenberg-Marquardt solver.

ic(ic_num=50)

Find the inhibitory concentration value (e.g. IC50)

Parameters:

ic_num (int) – IC number between 0 and 100 (response level)

Returns:

Inhibitory concentration value for requested response value

Return type:

float

classmethod initial_guess(x, y)

Heuristic function for initial fit values

Uses the approach followed by R’s drc library: https://cran.r-project.org/web/packages/drc/index.html

Parameters:
  • x (np.ndarray) – Array of “x” (dose) values

  • y (np.ndarray) – Array of “y” (response) values

Returns:

Four-valued list corresponding to initial estimates of the parameters defined in the ll4() function.

Return type:

list

classmethod initial_guess_log(x, y, dose_scale=1.0)

Initial guess in log-EC50 space, accounting for dose scaling.

classmethod jac_fn_log(x, b, c, d, log_e)

Analytic Jacobian of LL4 w.r.t. (b, c, d, log_e).

Avoids ~4 finite-difference evaluations per optimiser step.

classmethod transform_popt_from_log(popt, dose_scale=1.0)

Back-transform popt from log-EC50 space to linear EC50.

classmethod undo_dose_scale(popt, dose_scale)

Undo dose scaling on linear-space popt (EC50 is at index 3).

class thunor.curve_fit.HillCurveNull(popt)

Flat (no-effect) curve returned when the null hypothesis cannot be rejected

exception thunor.curve_fit.ValueWarning

Base warning for computed values that may be unreliable

thunor.curve_fit.aa_obs(responses, doses=None)

Activity Area (observed)

Parameters:
  • responses (np.array or pd.Series) – Response values, with dose values in the Index if a Series is supplied

  • doses (np.array or None) – Dose values - only required if responses is not a pd.Series

Returns:

Activity area (observed)

Return type:

float

thunor.curve_fit.fit_drc(doses, responses, response_std_errs=None, fit_cls=<class 'thunor.curve_fit.HillCurveLL4'>, null_rejection_threshold=0.05, ctrl_dose_test=False)

Fit a dose response curve

Parameters:
  • doses (np.ndarray) – Array of dose values

  • responses (np.ndarray) – Array of response values, e.g. viability, DIP rates

  • response_std_errs (np.ndarray, optional) – Array of fit standard errors for the response values

  • fit_cls (Class) – Class to use for fitting (default: 4 parameter log logistic “Hill” curve)

  • null_rejection_threshold (float, optional) – p-value for rejecting curve fit against no effect “flat” response model by F-test (default: 0.05). Set to None to skip test.

  • ctrl_dose_test (boolean) – If True, the minimum dose is assumed to represent control values (in DIP rate curves), and will reject fits where E0 is greater than a standard deviation higher than the mean of the control response values. Leave as False to skip the test.

Returns:

A HillCurve object containing the fit parameters

Return type:

HillCurve

thunor.curve_fit.fit_params(ctrl_data, expt_data, fit_cls=<class 'thunor.curve_fit.HillCurveLL4'>, ctrl_dose_fn=<function <lambda>>)

Fit dose-response curves and return the full set of statistics

Convenience wrapper that runs fit_params_minimal() followed by fit_params_from_base() with IC50, EC50, AUC, activity area, Hill slope, Emax, and per-well response values all enabled. The result is suitable for passing directly to any thunor.plots function.

Use fit_params_minimal() + fit_params_from_base() directly when you need a different or smaller subset of statistics.

Parameters:
  • ctrl_data (pd.DataFrame or None) – Control DIP rates from dip_rates() or ctrl_dip_rates(). Set to None to not use control data.

  • expt_data (pd.DataFrame) – Experiment (non-control) DIP rates from dip_rates() or expt_dip_rates(), or viability data from viability()

  • fit_cls (Class) – Class to use for curve fitting (default: HillCurveLL4)

  • ctrl_dose_fn (function) – Function to use to set an effective “dose” (non-zero) for controls. Takes the array of experiment doses as an argument.

Returns:

DataFrame with one row per drug/cell-line combination containing IC50, EC50, AUC, AA, Hill slope, Emax, and per-well response values.

Return type:

pd.DataFrame

thunor.curve_fit.fit_params_from_base(base_params, ctrl_data=None, expt_data=None, ctrl_dose_fn=<function <lambda>>, custom_ic_concentrations=frozenset({}), custom_ec_concentrations=frozenset({}), custom_e_values=frozenset({}), custom_e_rel_values=frozenset({}), include_aa=False, include_auc=False, include_hill=False, include_emax=False, include_einf=False, include_response_values=True)

Attach derived statistics to core fit parameters

Second stage of the two-stage fitting pipeline; see fit_params_minimal() for an overview. Call this function to select exactly which statistics to compute, avoiding unnecessary work when only a subset is required.

Parameters:
  • base_params (pd.DataFrame) – Output of fit_params_minimal().

  • ctrl_resp_data (pd.DataFrame or None) – Control DIP rates or viability values (used to attach per-well control response columns). Pass None to omit control columns.

  • expt_resp_data (pd.DataFrame or None) – Experiment DIP rates or viability values (used to attach per-well experiment response columns). Pass None to omit.

  • ctrl_dose_fn (function) – Function mapping experiment dose array to an effective control dose for visualisation (default: min dose / 10).

  • custom_ic_concentrations (set of int) – Effect levels at which to compute IC values, e.g. {50} for IC50.

  • custom_ec_concentrations (set of int) – Effect levels at which to compute EC values, e.g. {50} for EC50.

  • custom_e_values (set of float) – Doses at which to evaluate the fitted curve (absolute response).

  • custom_e_rel_values (set of float) – Doses at which to evaluate the fitted curve (relative response).

  • include_aa (bool) – Include activity area (default: False).

  • include_auc (bool) – Include area under the curve (default: False).

  • include_hill (bool) – Include Hill slope (default: False).

  • include_emax (bool) – Include Emax at the highest measured dose (default: False).

  • include_einf (bool) – Include Einf (asymptotic Emax from the fitted curve; default: False).

  • include_response_values (bool) – Attach per-well DIP rate or viability columns (default: True).

Returns:

base_params with additional columns for the requested statistics.

Return type:

pd.DataFrame

thunor.curve_fit.fit_params_minimal(ctrl_data, expt_data, fit_cls=<class 'thunor.curve_fit.HillCurveLL4'>, ctrl_dose_fn=<function <lambda>>)

Fit dose-response curves and return core fit parameters

This is the first stage of a two-stage fitting pipeline:

  1. fit_params_minimal() — fits the curve and returns one row per drug/cell-line combination with the fitted curve object (fit_obj), E0, Emax, and observed response range. Fast; suitable when only a subset of derived statistics is needed.

  2. fit_params_from_base() — takes the output of stage 1 and attaches any requested derived statistics (IC50, EC50, AUC, AA, Hill slope, Emax, per-well response values).

fit_params() is a convenience wrapper that runs both stages with the standard set of statistics (IC50, EC50, AUC, AA, Hill, Emax). Use fit_params_minimal() + fit_params_from_base() directly when you need a different subset of statistics, or when the full set is not required for every call (e.g. interactive dashboards).

Parameters:
  • ctrl_data (pd.DataFrame or None) – Control DIP rates from dip_rates() or ctrl_dip_rates(). Set to None to not use control data.

  • expt_data (pd.DataFrame) – Experiment (non-control) DIP rates from dip_rates() or expt_dip_rates(), or viability data from viability()

  • fit_cls (Class) – Class to use for curve fitting (default: HillCurveLL4)

  • ctrl_dose_fn (function) – Function to use to set an effective “dose” (non-zero) for controls. Takes the array of experiment doses as an argument.

Returns:

DataFrame with one row per drug/cell-line combination, containing fit_obj, e0, emax, min_dose_measured, max_dose_measured, and emax_obs.

Return type:

pd.DataFrame

thunor.curve_fit.is_param_truncated(df_params, param_name)

Checks if parameter values are truncated at boundaries of measured range

Parameters:
  • df_params (pd.DataFrame) – DataFrame of DIP curve fits with parameters from fit_params()

  • param_name (str) – Name of a parameter, e.g. ‘ic50’

Returns:

Array of booleans showing whether each entry in the DataFrame is truncated

Return type:

np.ndarray