Colton Blackwell

Personal Portfolio and Projects




March 2024

scRNA-seq Analysis

The KMeans clustering algorithm was implemented and applied to a dataset of human pancreas tissue, first reducing its dimensionality using PCA. Utilizing both random and KMeans++ initialization, we explore different numbers of clusters (k ranging from 2 to 10) to identify the optimal clustering configuration, assessed through silhouette coefficients. Lastly, we visualize the clusters in two dimensions using scatter plots to aid interpretation of the clustering outcomes.

Responsibilities

  • Implemented the K-Means clustering algorithm from scratch in Python, incorporating methods for centroid initialization, updating, and silhouette coefficient computation.
  • Processed and reduced the dimensionality of a single-cell RNA sequencing (scRNA-seq) dataset using PCA before applying the clustering algorithm.
  • Utilized K-Means++ initialization to improve clustering quality and evaluated clustering results using silhouette coefficients for different values of k.
  • Visualized clustering outcomes with a scatter plot, showcasing the best k-value clusters determined through both random initialization and K-Means++ initialization.

Skills Developed

  • Enhanced Python proficiency covering syntax, data structures, and NumPy for efficient numerical operations.
  • Acquired hands-on experience in implementing K-Means clustering, focusing on centroid initialization, updates, and Euclidean distances.
  • Developed data preprocessing skills for scRNA-seq datasets.
  • Learned PCA for dimensionality reduction, applied clustering techniques.
  • Utilized Matplotlib for visualization, interpreted silhouette coefficients.
  • Improved documentation for code and report preparation.

Technologies

  • Python
    • Matplotlib
    • numpy
    • scanpy
    • sklearn