FEATURE ENGINEERING; THE ULTIMATE GUIDE

Samuel Kamuli - Aug 18 - - Dev Community

INTRODUCTION
Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features.
A feature also known as a variable/attribute can be defined as an individual measurable property or characteristic of a data point that is used as input for a machine learning algorithm.
The core purpose of feature engineering is to optimize machine learning model performance by transforming and selecting relevant features.

FEATURE ENGINEERING
The process of feature engineering involves creating, transforming, selecting, extracting variables, exploratory data analysis and finally benchmarking. Each of these stages is geared towards engineering variables that are most conducive to making a machine learning model accurate. Below is an in depth look at each of these stages;

  1. Feature Creation; Creating features involves identifying the variables that will be most useful in the predictive model. This is a subjective process that requires human intervention and creativity. Existing features are mixed via addition, subtraction, multiplication, and ratio to create new derived features that have greater predictive power.

  2. Transformation involves manipulating the predictor/independent variables to improve model performance; e.g. ensuring the model is flexible in the variety of data it can ingest; ensuring variables are on the same scale, making the model easier to understand; improving accuracy; and avoiding computational errors by ensuring all features are within an acceptable range for the model. The goal of this stage is to plot and visualize data.

  3. Feature Extraction: Feature extraction is the automatic creation of new variables by extracting them from raw data. The purpose of this step is to automatically reduce the volume of data into a more manageable set for modeling. Some feature extraction methods include cluster analysis, text analytics, edge detection algorithms, and principal components analysis.

  4. Feature Selection: Feature selection algorithms essentially analyze, judge, and rank various features to determine which features are irrelevant and should be removed, which features are redundant and should be removed, and which features are most useful for the model and should be prioritized.

  5. Exploratory data analysis : Exploratory data analysis (EDA) is a powerful and simple tool that can be used to improve your understanding of your data, by exploring its properties. The technique is often applied when the goal is to create new hypotheses or find patterns in the data. It’s often used on large amounts of qualitative or quantitative data that haven’t been analyzed before.

  6. Benchmark : A benchmark model is the most user-friendly, dependable, transparent and interpretable model against which you can measure your own. It’s a good idea to run test data sets to see if your new machine learning model outperforms a recognized benchmark. These benchmarks are often used as measures for comparing the performance between different machine learning models like neural networks and support vector machines, linear and non-linear classifiers or different approaches like bagging and boosting.

Some of the best feature engineering tools that can automate the feature engineering process include FeatureTools,AutoFeat,TsFresh,OneBM,Explorekit.

. . . .
Terabox Video Player