Understanding Your Data: The Essentials of Exploratory Data Analysis

kennedy keli - Aug 15 - - Dev Community

What is Exploratory Data Analysis?
EDA is a powerful approach to analyzing datasets using summary statistics and graphical tools to gain insight into the data. It can help you find anomalies such as outliers, identify patterns, understand possible relationships between variables and generate interesting hypotheses using statistical methods.

EDA is a crucial process for data scientists to make any data-based prediction. It contributes towards:

1.Spotting the obvious errors in the data sets

  1. Exposing trends, patterns, and relationships that are not evident
  2. Ensuring that the obtained results are valid and applicable to desired business outcomes
  3. Visualizing data through charts and graphs to present underlying information accurately
  4. Getting close to accurate answers about standard deviations, categorical variables, and confidence intervals
  5. Facilitating more sophisticated and accurate data analysis.

Steps Involved in EDA

  1. Understanding the data
  2. Cleaning the data - This would include checking for missing/null values, duplicates, incorrect data types and outliers.
  3. Analyzing relationship between variables - This can be done through correlation analysis/matrix, heat maps, box plots, pair plots, scatter plots, distribution plots and such.

Types of EDA

  1. Univariate Non-Graphical
  2. Univariate Graphical
  3. Multivariate Non-Graphical
  4. Multivariate Graphical

Tools Used in EDA

  1. Python - Numpy, Pandas, Matplotlib, Seaborn
  2. R - ggplot2

Exploratory Data Analysis is a crucial way to understand the data you will be working on and is a highly recommended method for a correct research methodology. EDA helps to explore, describe, summarize and visualize the data collected in the random variables of the project or research of interest through the application of simple data summary techniques and graphic methods without assuming assumptions for their interpretation.

. . .
Terabox Video Player