This dataset is from the UCI Machine Learning repository. This collection of data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD and PRAD.
The datast is divided in data.csv and labels.csv. We need both to correctly analyse the dataset.
Our goal is to preprocess the large dataset using 3 different dimensionality reduction models: PCA, TSNE, UMAP and then apply different classification models on the reduced data.
Dataset: https://archive.ics.uci.edu/dataset/401/gene+expression+cancer+rna+seq
Contributors: https://github.com/EmiljaB https://github.com/kleagjoshi https://github.com/sindiziu1