Conversion tools for external formats and databases (thunor.converters)

thunor.converters.convert_ctrp(directory='.', output_file='ctrp_v2.h5')

Convert CTRP v2.0 data to Thunor format

CTRP is the Cancer Therapeutics Response Portal, a project which has generated a large quantity of viability data.

The data are freely available from the CTD2 Data Portal:

https://ocg.cancer.gov/programs/ctd2/data-portal

The required files can be downloaded from their FTP server:

ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.0_2015_ctd2_ExpandedDataset/

You’ll need to download and extract the following file:

  • “CTRPv2.0_2015_ctd2_ExpandedDataset.zip”

Please note that the layout of wells in each plate after conversion is arbitrary, since this information is not in the original files.

Please make sure you have the “tables” python package installed, in addition to the standard Thunor Core requirements.

You can run this function at the command line to convert the files; assuming the two files are in the current directory, simply run:

python -c "from thunor.converters import convert_ctrp; convert_ctrp()"

This script will take several minutes to run, please be patient. It is also resource-intensive, due to the size of the dataset. We recommend you utilize the highest-spec machine that you have available.

This will output a file called (by default) ctrp_v2.h5, which can be opened with thunor.io.read_hdf(), or used with Thunor Web.

Parameters
  • directory (str) – Directory containing the extracted CTRP v2.0 dataset

  • output_file (str) – Filename of output file (Thunor HDF5 format)

thunor.converters.convert_gdsc(drug_list_file='Screened_Compounds.xlsx', screen_data_file='v17a_public_raw_data.xlsx', output_file='gdsc-v17a.h5')

Convert GDSC data to Thunor format

GDSC is the Genomics of Drug Sensitivity in Cancer, a project which has generated a large quantity of viability data.

The data are freely available under the license agreement described on their website:

https://www.cancerrxgene.org/downloads

The required files can be downloaded from here:

ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/

You’ll need to download two files to convert to Thunor format:

  • The list of drugs, “Screened_Compounds.xlsx”

  • Sensitivity data, “v17a_public_raw_data.xlsx”

Please note that the layout of wells in each plate after conversion is arbitrary, since this information is not in the original files.

Please make sure you have the “tables” and “xlrd” python packages installed, in addition to the standard Thunor Core requirements.

You can run this function at the command line to convert the files; assuming the two files are in the current directory, simply run:

python -c "from thunor.converters import convert_gdsc; convert_gdsc()"

This script will take several minutes to run, please be patient. It is also resource-intensive, due to the size of the dataset. We recommend you utilize the highest-spec machine that you have available.

This will output a file called (by default) gdsc-v17a.h5, which can be opened with thunor.io.read_hdf(), or used with Thunor Web.

Parameters
  • drug_list_file (str) – Filename of GDSC list of drugs, to convert drug IDs to names

  • screen_data_file (str) – Filename of GDSC sensitivity data

  • output_file (str) – Filename of output file (Thunor HDF5 format)

thunor.converters.convert_gdsc_tags(cell_line_file='Cell_Lines_Details.xlsx', output_file='gdsc_cell_line_primary_site_tags.txt')

Convert GDSC cell line tissue descriptors to Thunor tags

GDSC is the Genomics of Drug Sensitivity in Cancer, a project which has generated a large quantity of viability data.

The data are freely available under the license agreement described on their website:

https://www.cancerrxgene.org/downloads

The required files can be downloaded from here:

ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/

You’ll need to download one file:

  • Cell line details, “Cell_Lines_Details.xlsx”

You can run this function at the command line to convert the files; assuming the downloaded file is in the current directory, simply run:

python -c "from thunor.converters import convert_gdsc_tags; convert_gdsc_tags()"

This will output a file called (by default) gdsc_cell_line_primary_site_tags.txt, which can be loaded into Thunor Web using the “Upload cell line tags” function.

Parameters
  • cell_line_file (str) – Filename of GDSC cell line details (Excel .xlsx format)

  • output_file (str) – Filename of output file (tab separated values format)

thunor.converters.convert_teicher(directory='.', output_file='teicher.h5')

Convert Teicher data to Thunor format

The “Teicher” data is a dataset of dose-response data on a panel of small cell lung cancer (SCLC) cell lines. The data can be downloaded from the following link (select the Compound Concentration/Response Data link):

https://sclccelllines.cancer.gov/sclc/downloads.xhtml

Unzip the downloaded file. The dataset can then be converted on the command line:

python -c "from thunor.converters import convert_teicher; convert_teicher()"

Please note that the layout of wells in each plate after conversion is arbitrary, since this information is not in the original files.

This will output a file called (by default) teicher.h5, which can be opened with thunor.io.read_hdf(), or used with Thunor Web.

Parameters
  • directory (str) – Directory containing the Teicher dataset

  • output_file (str) – Filename of output file (Thunor HDF5 format)