Advanced Seaborn (Optional)

In this mini-course of advanced Seaborn, we are going to focus on two learning objectives: first, we produce data visualizations with Seaborn, and second, we apply graphical techniques used in exploratory data analysis (EDA) using Seaborn. This is a semi-hands-on project that we will work together to solve real-life problems about Breast Cancer Wisconsin using visualization toolkits.

By the end of this course, you will be able to generate publication-quality graphs using Seaborn and Python to analyze data.

Tumor Diagnosis Exploratory Data Analysis in Seaborn

The Breast Cancer Diagnostic dataset is retrieved from the UCI Machine Learning Repository, and could be downloaded there. The data describe the characteristics of digitized image for fine needle aspirate (FNA) of a breast mass. Here is a list of the attributes of the dataset.

Attribute Information:

Ten real-valued features are computed for each cell nucleus:

  1. radius (mean of distances from center to points on the perimeter)
  2. texture (standard deviation of gray-scale values)
  3. perimeter
  4. area
  5. smoothness (local variation in radius lengths)
  6. compactness (perimeter^2 / area - 1.0)
  7. concavity (severity of concave portions of the contour)
  8. concave points (number of concave portions of the contour)
  9. symmetry
  10. fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

In this hands-on exploratory data analysis in Seaborn, we've collected great insight about how to statistically interpret the data with countplot, violinplot, boxplot, pairplot, swampplot and heatmap. We also went through the techniques to avoid cluttering in visualizations, and finally the storytelling technique to analyze and clearly communicate features about breast cancer tumor data in simple language. Hopefully you enjoy exploring the insights of data visualization in the rest of the weeks of the journey to study Bayesian analysis.

Created in deepnote.com Created in Deepnote