What exactly is Feature Engineering ?

The goal is to turn data into information, and information into insight.

– Carly Fiorina, Ex – C.E.O Helwett-Packard

What exactly is Feature Engineering ?

Feature Engineering (FE) involves leveraging techniques to analyze data, along with domain knowledge to crystallize information and actionable insights. When data is interpreted in a different way than conventional, it can prove to be the secret sauce to developing data-driven technology.

Ideally one should be able to describe a process using First Principle Models (FPM). However, in practice, the processes are complex in nature and cannot be wholly described just by FPM. Hence, analytical models generated using operating data measured around the process and synthesized data from FPM can be used to describe the process completely. That is why it is important to define Process Metrics around the assets of interest for developing realistic insights.

Next question would be, what are Process Metrics ?

Process Metrics (PM) are characteristics of the processing ‘micro-environment’ which need to be maintained nearly constant across all scales to achieve target product quality and process performance.

A realistic use case for FE in the Pharmaceutical Industry is while developing commerical scale technology starting from lab to pilot to plant. Scale up factors for ” Assets ” from lab to plant level can be defined using FPM and generated data during lab and pilot operations.

While the conventional way of a scaling up process is laborius and time consuming because Process and Quality attributes have to be studied and analyzed at every scale from lab to plant to ensure that target results are achieved. Feature Engineering of Process Metrics on the other hand enables developing scale independent metrics and models while meeting target results.

For example, A typical Computational Fluid Dynamics (CFD) analysis of a process can range from a few hours (lab scale) to a few weeks (plant scale), directly increasing the execution time of a project.

Converting CFD data of lab assets, Process Metrics in this case, to data driven models will lead to faster execution than actual simulation, saving time and resources.

This can be a game changer by significantly reducing “ time to market” for a new technology.

Stay tuned for more blogs in this series, as we explore the data science approach of Scale up of Process in the Pharmaceutical Industry!

– Prerak Dongaonkar