Jul
28
1:30pm
EDA Tips & Tricks: From Feature Engineering to Model Training (HYBRID MEETUP)
By Data Science Connect
Feature engineering is the process of selecting the features used to train your ML models. Your base data must be turned into features and attributes to be useful in predictive modeling. Proper feature selection drives the ultimate performance of your model’s predictive accuracy, resilience and output scale. The process of building features to train and operate your ML models can take 70-80% of overall time and effort. EDA is one of the most challenging and important steps in this process. If you don’t have a deep understanding of your data ,you can waste resources creating sub-optimal features that ultimately are the biggest culprits causing low-quality models.
This session will dig into the key activities in EDA and discuss new ideas and tools for how to improve the process and outcomes. We will also hear from industry leader, SymphonyAI, on their strategy around EDA and Auto Feature generation.
Attendees will learn about the importance of performing an extensive EDA for a meaningful feature creation.
We will explain in detail ways to perform an extensive data assessment by looking at the following:
- Descriptive Statistics
- Measures of Counts (fill rate, missing count, nonzero count)
- Measures of Central Tendency (mean, median, mode, mode rows, mode percentage)
- Measures of Cardinality (unique value count & IDness)
- Measures of Percentiles
- Measures of Dispersion (variance, std-dev, CoV, IQR, Range)
- Measures of Shape
- Data Quality Check
- Null Detection, IDness Detection, Biasedness detection, Invalid entries detection, Outlier detection etc
- Attribute Associations
- Correlation Analysis, Information Value & Information Gain analysis, and Variable Clustering
- Data Drift and Data stability Analysis
Speakers:
Neha Makhija
Sr Product Manager -Enterprise AI- Machine Learning, SymphonyAI
Anindya Datta, PhD
Founder & CEO, Mobilewalla
hosted by
Data Science Connect
share