Archives

  • 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • Many proteins are secreted despite not having a

    2020-03-17

    Many proteins are secreted despite not having a signal peptide. To account for these unconventionally secreted proteins, we defined an ‘‘experimentally-derived’’ secretome consisting of proteins that had been detected within the extracellular environment in any one of the 35 secretome studies included in the Human Cancer Secretome Database (HCSD). We first retrieved the label-free proteomic data from HCSD, and extracted a list of all proteins that had been detected in at least one of the studies. For the label-based studies, proteins were retrieved if they had been measured to decrease or increase in concentration among any of the studies, as both cases imply detection. These lists were combined and mapped to the set of L-NAME hydrochloride present in TCGA RNA-Seq data, resulting in a secretome consisting of 6,543 genes.
    Retrieval of human plasma proteome data
    Given the RNA-based nature of the analysis, we sought to enrich the results through the integration of protein-level data. We therefore retrieved a list of proteins that have been experimentally detected in plasma, which is a result of the Human Plasma Proteome Project (HPPP) (Schwenk et al., 2017). This protein evidence information was integrated with the consensus score results summarized in Figure 1D and Table S2.
    The human plasma proteome was retrieved from PeptideAtlas (Farrah et al., 2013) (htpp://www.peptideatlas.org/hupo/c-hppp/). Only entries with a neXtProt protein evidence (PE) level of 1 (evidence at the protein level) were considered. This yielded four sets of proteins with categories of ‘‘canonical,’’ ‘‘uncertain,’’ ‘‘redundant,’’ or ‘‘not observed’’ (see Tables S2 or S3 for category definitions). Non-unique protein entries were combined, where the category of greater evidence was used if multiple categories were assigned to the same entry. Genes in the present study that did not have a corresponding entry in the plasma proteome dataset were categorized as ‘‘NA.’’
    Transcriptomic data retrieval
    RNA-Seq data (FPKM and raw gene counts) were retrieved from TCGA on May 4, 2017 using the TCGAbiolinks (Colaprico et al., 2016) package in R (Gentleman et al., 2004; R Development Core Team, 2018), for all 33 cancer types available at that time. One cancer type, acute myeloid leukemia (LAML), did not have any associated primary tumor RNA-Seq data, and was thus excluded from all analyses, resulting in a total of 32 cancer types. GTEx RNA-Seq data (V7, TPM and raw gene counts) were retrieved directly from the site (http://www.gtexportal.org/home/datasets) on October 18, 2017.
    Primary tumor and paired-normal transcriptomic (RNA-Seq) data were retrieved for 32 cancer types from TCGA, for a total of 9,760 primary tumor and 730 paired-normal samples, where both sample types were available for 697 patients. Healthy tissue RNA-Seq data was retrieved from the GTEx database, for a total of 11,688 samples spanning 714 donors and 30 tissue/organ types (or 53 sub-tissue types).
    Mutation burden quantification
    Mutation annotation files (MAFs) derived from whole-exome sequencing data were retrieved for all available cancer types from TCGA using the TCGAbiolinks R package. The total number of somatic mutation events (insertion, deletion, or single nucleotide polymor-phism) for each primary tumor sample were summed to yield a total mutation burden for each sample.
    Analysis of high-purity tumor samples
    Consensus purity estimate (CPE) scores for TCGA primary solid tumor samples were obtained from a previous study (Aran et al., 2015), which calculated and combined purity scores using four different methods (ESTIMATE (Yoshihara et al., 2013), ABSOLUTE (Carter et al., 2012), LUMP and IHC (Aran et al., 2015)). All tumor samples with a CPE of less than 80% (0.80), or those that did not have a score available, were removed from the high-purity analysis. Cancer types that did not have any scores available (CHOL, ESCA, PAAD, PCPG, STAD), or had 3 or fewer tumor–normal sample pairs after removing low-purity tumor samples (BLCA, HNSC) were also excluded.