Conversion tools for external formats and databases (thunor.converters
)¶
-
thunor.converters.
convert_ctrp
(directory='.', output_file='ctrp_v2.h5')¶ Convert CTRP v2.0 data to Thunor format
CTRP is the Cancer Therapeutics Response Portal, a project which has generated a large quantity of viability data.
The data are freely available from the CTD2 Data Portal:
https://ocg.cancer.gov/programs/ctd2/data-portal
The required files can be downloaded from their FTP server:
ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.0_2015_ctd2_ExpandedDataset/
You’ll need to download and extract the following file:
“CTRPv2.0_2015_ctd2_ExpandedDataset.zip”
Please note that the layout of wells in each plate after conversion is arbitrary, since this information is not in the original files.
Please make sure you have the “tables” python package installed, in addition to the standard Thunor Core requirements.
You can run this function at the command line to convert the files; assuming the two files are in the current directory, simply run:
python -c "from thunor.converters import convert_ctrp; convert_ctrp()"
This script will take several minutes to run, please be patient. It is also resource-intensive, due to the size of the dataset. We recommend you utilize the highest-spec machine that you have available.
This will output a file called (by default)
ctrp_v2.h5
, which can be opened withthunor.io.read_hdf()
, or used with Thunor Web.- Parameters
directory (str) – Directory containing the extracted CTRP v2.0 dataset
output_file (str) – Filename of output file (Thunor HDF5 format)
-
thunor.converters.
convert_gdsc
(drug_list_file='Screened_Compounds.xlsx', screen_data_file='v17a_public_raw_data.xlsx', output_file='gdsc-v17a.h5')¶ Convert GDSC data to Thunor format
GDSC is the Genomics of Drug Sensitivity in Cancer, a project which has generated a large quantity of viability data.
The data are freely available under the license agreement described on their website:
https://www.cancerrxgene.org/downloads
The required files can be downloaded from here:
ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/
You’ll need to download two files to convert to Thunor format:
The list of drugs, “Screened_Compounds.xlsx”
Sensitivity data, “v17a_public_raw_data.xlsx”
Please note that the layout of wells in each plate after conversion is arbitrary, since this information is not in the original files.
Please make sure you have the “tables” and “xlrd” python packages installed, in addition to the standard Thunor Core requirements.
You can run this function at the command line to convert the files; assuming the two files are in the current directory, simply run:
python -c "from thunor.converters import convert_gdsc; convert_gdsc()"
This script will take several minutes to run, please be patient. It is also resource-intensive, due to the size of the dataset. We recommend you utilize the highest-spec machine that you have available.
This will output a file called (by default)
gdsc-v17a.h5
, which can be opened withthunor.io.read_hdf()
, or used with Thunor Web.- Parameters
drug_list_file (str) – Filename of GDSC list of drugs, to convert drug IDs to names
screen_data_file (str) – Filename of GDSC sensitivity data
output_file (str) – Filename of output file (Thunor HDF5 format)
Convert GDSC cell line tissue descriptors to Thunor tags
GDSC is the Genomics of Drug Sensitivity in Cancer, a project which has generated a large quantity of viability data.
The data are freely available under the license agreement described on their website:
https://www.cancerrxgene.org/downloads
The required files can be downloaded from here:
ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/
You’ll need to download one file:
Cell line details, “Cell_Lines_Details.xlsx”
You can run this function at the command line to convert the files; assuming the downloaded file is in the current directory, simply run:
python -c "from thunor.converters import convert_gdsc_tags; convert_gdsc_tags()"
This will output a file called (by default)
gdsc_cell_line_primary_site_tags.txt
, which can be loaded into Thunor Web using the “Upload cell line tags” function.- Parameters
cell_line_file (str) – Filename of GDSC cell line details (Excel .xlsx format)
output_file (str) – Filename of output file (tab separated values format)
-
thunor.converters.
convert_teicher
(directory='.', output_file='teicher.h5')¶ Convert Teicher data to Thunor format
The “Teicher” data is a dataset of dose-response data on a panel of small cell lung cancer (SCLC) cell lines. The data can be downloaded from the following link (select the Compound Concentration/Response Data link):
https://sclccelllines.cancer.gov/sclc/downloads.xhtml
Unzip the downloaded file. The dataset can then be converted on the command line:
python -c "from thunor.converters import convert_teicher; convert_teicher()"
Please note that the layout of wells in each plate after conversion is arbitrary, since this information is not in the original files.
This will output a file called (by default)
teicher.h5
, which can be opened withthunor.io.read_hdf()
, or used with Thunor Web.- Parameters
directory (str) – Directory containing the Teicher dataset
output_file (str) – Filename of output file (Thunor HDF5 format)