Dose Response Curve Fitting (thunor.curve_fit)
This module fits Hill/log-logistic dose-response curves and derives summary parameters
from them. The usual high-level entry point is thunor.curve_fit.fit_params().
Internally, fitting code tries to remain robust in the face of sparse or noisy
screening data. Fits may be performed in an alternative parameterisation for
better numerical stability, doses can be rescaled before optimisation, and
underdetermined fits return None rather than exposing partially fitted
results.
- exception thunor.curve_fit.AAFitWarning
Warning issued when an activity area value may be unreliable
- exception thunor.curve_fit.AUCFitWarning
Warning issued when an AUC value may be unreliable
- exception thunor.curve_fit.DrugCombosWarning
Warning issued when drug combination wells are skipped during fitting
fit_params_minimal()currently fits single-drug dose-response curves only. Combination wells (where thedrugtuple has length > 1) are filtered out and this warning is issued. Future versions will support combination fitting via a dedicated code path; the skip-and-warn behaviour is intentional and will not change when that support lands.
- class thunor.curve_fit.HillCurve(popt)
Base class defining Hill/log-logistic curve functionality
- null_response_fn(axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
Compute the arithmetic mean along the specified axis.
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.
- Parameters:
a (array_like) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
axis (None or int or tuple of ints, optional) –
Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.
If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.
dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.
out (ndarray, optional) – Alternate output array in which to place the result. The default is
None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See ufuncs-output-type for more details. See ufuncs-output-type for more details.keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.
where (array_like of bool, optional) –
Elements to include in the mean. See ~numpy.ufunc.reduce for details.
Added in version 1.20.0.
- Returns:
m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.
- Return type:
ndarray, see dtype parameter above
See also
averageWeighted average
std,var,nanmean,nanstd,nanvarNotes
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.
By default, float16 results are computed using float32 intermediates for extra precision.
Examples
>>> import numpy as np >>> a = np.array([[1, 2], [3, 4]]) >>> np.mean(a) 2.5 >>> np.mean(a, axis=0) array([2., 3.]) >>> np.mean(a, axis=1) array([1.5, 3.5])
In single precision, mean can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.mean(a) np.float32(0.54999924)
Computing the mean in float64 is more accurate:
>>> np.mean(a, dtype=np.float64) 0.55000000074505806 # may vary
Computing the mean in timedelta64 is available:
>>> b = np.array([1, 3], dtype="timedelta64[D]") >>> np.mean(b) np.timedelta64(2,'D')
Specifying a where argument:
>>> a = np.array([[5, 9, 13], [14, 10, 12], [11, 15, 19]]) >>> np.mean(a) 12.0 >>> np.mean(a, where=[[True], [False], [False]]) 9.0
- class thunor.curve_fit.HillCurveLL2(popt)
Two-parameter log-logistic Hill curve; E0 fixed at 1, Emax fixed at 0
- classmethod fit_fn(x, b, e)
Two parameter log-logistic function (“Hill curve”)
- Parameters:
x (np.ndarray) – One-dimensional array of “x” values
b (float) – Hill slope
e (float) – EC50 value
- Returns:
Array of “y” values using the supplied curve fit parameters on “x”
- Return type:
np.ndarray
- classmethod fit_fn_log(x, b, log_e)
LL2 Hill curve with log-transformed EC50.
- classmethod initial_guess(x, y)
Heuristic function for initial fit values
Uses the approach followed by R’s drc library: https://cran.r-project.org/web/packages/drc/index.html
- Parameters:
x (np.ndarray) – Array of “x” (dose) values
y (np.ndarray) – Array of “y” (response) values
- Returns:
Four-valued list corresponding to initial estimates of the parameters defined in the
ll4()function.- Return type:
list
- classmethod initial_guess_log(x, y, dose_scale=1.0)
Initial guess in log-EC50 space, accounting for dose scaling.
- classmethod jac_fn_log(x, b, log_e)
Analytic Jacobian of LL2 w.r.t. (b, log_e).
- classmethod transform_popt_from_log(popt, dose_scale=1.0)
Back-transform popt from log-EC50 space to linear EC50.
- classmethod undo_dose_scale(popt, dose_scale)
Undo dose scaling on linear-space popt (EC50 is at index 1).
- class thunor.curve_fit.HillCurveLL3u(popt)
Three parameter log logistic curve, for viability data
- classmethod fit_fn(x, b, c, e)
Three parameter log-logistic function (“Hill curve”)
- Parameters:
x (np.ndarray) – One-dimensional array of “x” values
b (float) – Hill slope
c (float) – Maximum response (lower plateau)
e (float) – EC50 value
- Returns:
Array of “y” values using the supplied curve fit parameters on “x”
- Return type:
np.ndarray
- classmethod fit_fn_log(x, b, c, log_e)
LL3u Hill curve with log-transformed EC50.
- classmethod initial_guess(x, y)
Heuristic function for initial fit values
Uses the approach followed by R’s drc library: https://cran.r-project.org/web/packages/drc/index.html
- Parameters:
x (np.ndarray) – Array of “x” (dose) values
y (np.ndarray) – Array of “y” (response) values
- Returns:
Four-valued list corresponding to initial estimates of the parameters defined in the
ll4()function.- Return type:
list
- classmethod initial_guess_log(x, y, dose_scale=1.0)
Initial guess in log-EC50 space, accounting for dose scaling.
- classmethod jac_fn_log(x, b, c, log_e)
Analytic Jacobian of LL3u w.r.t. (b, c, log_e).
- max_fit_evals = None
- static null_response_fn(_)
Compute the arithmetic mean along the specified axis.
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.
- Parameters:
a (array_like) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
axis (None or int or tuple of ints, optional) –
Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.
If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.
dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.
out (ndarray, optional) – Alternate output array in which to place the result. The default is
None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See ufuncs-output-type for more details. See ufuncs-output-type for more details.keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.
where (array_like of bool, optional) –
Elements to include in the mean. See ~numpy.ufunc.reduce for details.
Added in version 1.20.0.
- Returns:
m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.
- Return type:
ndarray, see dtype parameter above
See also
averageWeighted average
std,var,nanmean,nanstd,nanvarNotes
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.
By default, float16 results are computed using float32 intermediates for extra precision.
Examples
>>> import numpy as np >>> a = np.array([[1, 2], [3, 4]]) >>> np.mean(a) 2.5 >>> np.mean(a, axis=0) array([2., 3.]) >>> np.mean(a, axis=1) array([1.5, 3.5])
In single precision, mean can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.mean(a) np.float32(0.54999924)
Computing the mean in float64 is more accurate:
>>> np.mean(a, dtype=np.float64) 0.55000000074505806 # may vary
Computing the mean in timedelta64 is available:
>>> b = np.array([1, 3], dtype="timedelta64[D]") >>> np.mean(b) np.timedelta64(2,'D')
Specifying a where argument:
>>> a = np.array([[5, 9, 13], [14, 10, 12], [11, 15, 19]]) >>> np.mean(a) 12.0 >>> np.mean(a, where=[[True], [False], [False]]) 9.0
- classmethod transform_popt_from_log(popt, dose_scale=1.0)
Back-transform popt from log-EC50 space to linear EC50.
- classmethod undo_dose_scale(popt, dose_scale)
Undo dose scaling on linear-space popt (EC50 is at index 2).
- class thunor.curve_fit.HillCurveLL4(popt)
Four-parameter log-logistic Hill curve (E0, Emax, EC50, hill slope)
- aa(min_conc, max_conc)
Find the activity area (area over the curve)
- Parameters:
min_conc (float) – Minimum concentration to consider for fitting the curve
max_conc (float) – Maximum concentration to consider for fitting the curve
- Returns:
Activity area value, or
Nonefor stimulatory responses (Emax > E0) which are not yet supported.- Return type:
float or None
- auc(min_conc)
Find the area under the curve
- Parameters:
min_conc (float) – Minimum concentration to consider for fitting the curve
- Returns:
Area under the curve (AUC) value, or
Nonefor stimulatory responses (Emax > E0) which are not yet supported.- Return type:
float or None
- ec(ec_num=50)
Find the effective concentration value (e.g. IC50)
- Parameters:
ec_num (int) – EC number between 0 and 100 (response level)
- Returns:
Effective concentration value for requested response value
- Return type:
float
- classmethod fit_fn(x, b, c, d, e)
Four parameter log-logistic function (“Hill curve”)
- Parameters:
x (np.ndarray) – One-dimensional array of “x” values
b (float) – Hill slope
c (float) – Maximum response (lower plateau)
d (float) – Minimum response (upper plateau)
e (float) – EC50 value
- Returns:
Array of “y” values using the supplied curve fit parameters on “x”
- Return type:
np.ndarray
- classmethod fit_fn_log(x, b, c, d, log_e)
LL4 Hill curve with log-transformed EC50 (log_e = log(e)).
Using log(EC50) as the optimisation parameter removes the positivity constraint on e, enabling the faster Levenberg-Marquardt solver.
- ic(ic_num=50)
Find the inhibitory concentration value (e.g. IC50)
- Parameters:
ic_num (int) – IC number between 0 and 100 (response level)
- Returns:
Inhibitory concentration value for requested response value
- Return type:
float
- classmethod initial_guess(x, y)
Heuristic function for initial fit values
Uses the approach followed by R’s drc library: https://cran.r-project.org/web/packages/drc/index.html
- Parameters:
x (np.ndarray) – Array of “x” (dose) values
y (np.ndarray) – Array of “y” (response) values
- Returns:
Four-valued list corresponding to initial estimates of the parameters defined in the
ll4()function.- Return type:
list
- classmethod initial_guess_log(x, y, dose_scale=1.0)
Initial guess in log-EC50 space, accounting for dose scaling.
- classmethod jac_fn_log(x, b, c, d, log_e)
Analytic Jacobian of LL4 w.r.t. (b, c, d, log_e).
Avoids ~4 finite-difference evaluations per optimiser step.
- classmethod transform_popt_from_log(popt, dose_scale=1.0)
Back-transform popt from log-EC50 space to linear EC50.
- classmethod undo_dose_scale(popt, dose_scale)
Undo dose scaling on linear-space popt (EC50 is at index 3).
- class thunor.curve_fit.HillCurveNull(popt)
Flat (no-effect) curve returned when the null hypothesis cannot be rejected
- exception thunor.curve_fit.ValueWarning
Base warning for computed values that may be unreliable
- thunor.curve_fit.aa_obs(responses, doses=None)
Activity Area (observed)
- Parameters:
responses (np.array or pd.Series) – Response values, with dose values in the Index if a Series is supplied
doses (np.array or None) – Dose values - only required if responses is not a pd.Series
- Returns:
Activity area (observed)
- Return type:
float
- thunor.curve_fit.fit_drc(doses, responses, response_std_errs=None, fit_cls=<class 'thunor.curve_fit.HillCurveLL4'>, null_rejection_threshold=0.05, ctrl_dose_test=False)
Fit a dose response curve
- Parameters:
doses (np.ndarray) – Array of dose values
responses (np.ndarray) – Array of response values, e.g. viability, DIP rates
response_std_errs (np.ndarray, optional) – Array of fit standard errors for the response values
fit_cls (Class) – Class to use for fitting (default: 4 parameter log logistic “Hill” curve)
null_rejection_threshold (float, optional) – p-value for rejecting curve fit against no effect “flat” response model by F-test (default: 0.05). Set to None to skip test.
ctrl_dose_test (boolean) – If True, the minimum dose is assumed to represent control values (in DIP rate curves), and will reject fits where E0 is greater than a standard deviation higher than the mean of the control response values. Leave as False to skip the test.
- Returns:
A HillCurve object containing the fit parameters
- Return type:
- thunor.curve_fit.fit_params(ctrl_data, expt_data, fit_cls=<class 'thunor.curve_fit.HillCurveLL4'>, ctrl_dose_fn=<function <lambda>>)
Fit dose-response curves and return the full set of statistics
Convenience wrapper that runs
fit_params_minimal()followed byfit_params_from_base()with IC50, EC50, AUC, activity area, Hill slope, Emax, and per-well response values all enabled. The result is suitable for passing directly to anythunor.plotsfunction.Use
fit_params_minimal()+fit_params_from_base()directly when you need a different or smaller subset of statistics.- Parameters:
ctrl_data (pd.DataFrame or None) – Control DIP rates from
dip_rates()orctrl_dip_rates(). Set to None to not use control data.expt_data (pd.DataFrame) – Experiment (non-control) DIP rates from
dip_rates()orexpt_dip_rates(), or viability data fromviability()fit_cls (Class) – Class to use for curve fitting (default:
HillCurveLL4)ctrl_dose_fn (function) – Function to use to set an effective “dose” (non-zero) for controls. Takes the array of experiment doses as an argument.
- Returns:
DataFrame with one row per drug/cell-line combination containing IC50, EC50, AUC, AA, Hill slope, Emax, and per-well response values.
- Return type:
pd.DataFrame
- thunor.curve_fit.fit_params_from_base(base_params, ctrl_data=None, expt_data=None, ctrl_dose_fn=<function <lambda>>, custom_ic_concentrations=frozenset({}), custom_ec_concentrations=frozenset({}), custom_e_values=frozenset({}), custom_e_rel_values=frozenset({}), include_aa=False, include_auc=False, include_hill=False, include_emax=False, include_einf=False, include_response_values=True)
Attach derived statistics to core fit parameters
Second stage of the two-stage fitting pipeline; see
fit_params_minimal()for an overview. Call this function to select exactly which statistics to compute, avoiding unnecessary work when only a subset is required.- Parameters:
base_params (pd.DataFrame) – Output of
fit_params_minimal().ctrl_resp_data (pd.DataFrame or None) – Control DIP rates or viability values (used to attach per-well control response columns). Pass
Noneto omit control columns.expt_resp_data (pd.DataFrame or None) – Experiment DIP rates or viability values (used to attach per-well experiment response columns). Pass
Noneto omit.ctrl_dose_fn (function) – Function mapping experiment dose array to an effective control dose for visualisation (default: min dose / 10).
custom_ic_concentrations (set of int) – Effect levels at which to compute IC values, e.g.
{50}for IC50.custom_ec_concentrations (set of int) – Effect levels at which to compute EC values, e.g.
{50}for EC50.custom_e_values (set of float) – Doses at which to evaluate the fitted curve (absolute response).
custom_e_rel_values (set of float) – Doses at which to evaluate the fitted curve (relative response).
include_aa (bool) – Include activity area (default: False).
include_auc (bool) – Include area under the curve (default: False).
include_hill (bool) – Include Hill slope (default: False).
include_emax (bool) – Include Emax at the highest measured dose (default: False).
include_einf (bool) – Include Einf (asymptotic Emax from the fitted curve; default: False).
include_response_values (bool) – Attach per-well DIP rate or viability columns (default: True).
- Returns:
base_paramswith additional columns for the requested statistics.- Return type:
pd.DataFrame
- thunor.curve_fit.fit_params_minimal(ctrl_data, expt_data, fit_cls=<class 'thunor.curve_fit.HillCurveLL4'>, ctrl_dose_fn=<function <lambda>>)
Fit dose-response curves and return core fit parameters
This is the first stage of a two-stage fitting pipeline:
fit_params_minimal()— fits the curve and returns one row per drug/cell-line combination with the fitted curve object (fit_obj), E0, Emax, and observed response range. Fast; suitable when only a subset of derived statistics is needed.fit_params_from_base()— takes the output of stage 1 and attaches any requested derived statistics (IC50, EC50, AUC, AA, Hill slope, Emax, per-well response values).
fit_params()is a convenience wrapper that runs both stages with the standard set of statistics (IC50, EC50, AUC, AA, Hill, Emax). Usefit_params_minimal()+fit_params_from_base()directly when you need a different subset of statistics, or when the full set is not required for every call (e.g. interactive dashboards).- Parameters:
ctrl_data (pd.DataFrame or None) – Control DIP rates from
dip_rates()orctrl_dip_rates(). Set to None to not use control data.expt_data (pd.DataFrame) – Experiment (non-control) DIP rates from
dip_rates()orexpt_dip_rates(), or viability data fromviability()fit_cls (Class) – Class to use for curve fitting (default:
HillCurveLL4)ctrl_dose_fn (function) – Function to use to set an effective “dose” (non-zero) for controls. Takes the array of experiment doses as an argument.
- Returns:
DataFrame with one row per drug/cell-line combination, containing
fit_obj,e0,emax,min_dose_measured,max_dose_measured, andemax_obs.- Return type:
pd.DataFrame
- thunor.curve_fit.is_param_truncated(df_params, param_name)
Checks if parameter values are truncated at boundaries of measured range
- Parameters:
df_params (pd.DataFrame) – DataFrame of DIP curve fits with parameters from
fit_params()param_name (str) – Name of a parameter, e.g. ‘ic50’
- Returns:
Array of booleans showing whether each entry in the DataFrame is truncated
- Return type:
np.ndarray