Table of Contents

Predictors and FeatureUnion in ML

Predictors & FeatureUnion in ML

Scikit-learn refers to machine learning algorithms as estimators. There are three different types of estimators: classifiers, regressors, and transformers. The inheritance of the second class determines what kind of estimator the model represents. We’ll divide the estimators into two groups based on their interface. These two groups are predictors and transformers, and in this blog, we’ll discuss Predictors.  

Predictors: classifiers and regressors

As the name suggests, predictors are models that make predictions. There are two main methods: Fit (X, y): trains/fit the object to the feature matrix X and label vector y and Predict (X): makes predictions on the passed data set X.

Code :

					from sklearn.linear_model import LinearRegression 

# create model and train/fit
model = LinearRegression(), y) 

# predict label values on X 
y_pred = model.predict(X) 
print("shape of the  prediction array: {}".format(y_pred.shape)) 
print("shape of the training set: {}".format(X.shape)) 

There is one thing to note that the output of predict(X) is a NumPy array of one dimension. 


Before discussing feature unions, let’s first have a brief knowledge about Pipelines. As our analysis and workflow become more complicated, we need a tool that helps with scaling up. For example, suppose you need to apply multiple transformations to your data before using a supervised machine learning model. In that case, you can apply the transformations explicitly, creating intermediate variables of the transformed data. But this work is a little bit tiresome. 

Here, the Pipelines approach helps prevent keeping track of intermediate transformations and scales up our code for more complicated analysis. Essentially, a pipeline is an estimator object that holds a series of transformers with a final estimator. 

Code Example: 

					from sklearn.pipeline import Pipeline 
from sklearn.preprocessing import PolynomialFeatures 

# construct pipeline 
scaler = StandardScaler() 
poly_features = PolynomialFeatures(degree=2) 
lin_reg = LinearRegression() 

pipe = Pipeline([ 
  ('scaler', scaler), 
  ('poly', poly_features), 
  ('regressor', lin_reg) 

Feature Union

A FeatureUnion is another tool for dealing with situations where your data requires different transformation processes for various features. Because in ColumnTransformer, processes feature separately and combine the results into a single feature matrix, but in feature union, it can handle more complex workflows where you need to use distinct transformers and estimators together before passing the complete feature matrix to a final estimator. 

We already know that Pipeline objects arrange estimators in a series, but FeatureUnion objects arrange transformers in parallel. A FeatureUnion object combines the output of each of the transformers in parallel to generate one output matrix. Using a combination of Pipeline and FeatureUnion objects, we can construct complicated machine learning workflows within a single scikit-learn estimator object. So, the ultimate aim of feature union is to combine several feature extraction mechanisms into a single transformer. 


					from sklearn.pipeline import FeatureUnion 
from sklearn.decomposition import PCA, TruncatedSVD 

union = FeatureUnion([("pca", PCA(n_components=1)), 
           ("svd", TruncatedSVD(n_components=2))]) 

X = [[0., 1., 3], [2., 2., 5]] 
array([[ 1.5   ,  3.0...,  0.8...], 
      [-1.5    ,  5.7..., -0.4...]]) 

Liked what you read !

Please leave a Feedback

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Join the sustainability movement

Is your carbon footprint leaving a heavy mark? Learn how to lighten it! ➡️

Register Now

Calculate Your DataOps ROI with Ease!

Simplify your decision-making process with the DataOps ROI Calculator, optimize your data management and analytics capabilities.

Calculator ROI Now!

Related articles you may would like to read

The Transformative Power of Artificial Intelligence in Healthcare
How To Setup An AI Center of Excellence (COE) With Use Cases And Process 

Know the specific resource requirement for completing a specific project with us.


Keep yourself updated with the latest updates about Cloud technology, our latest offerings, security trends and much more.


Gain insights into latest aspects of cloud productivity, security, advanced technologies and more via our Virtual events.

ISmile Technologies delivers business-specific Cloud Solutions and Managed IT Services across all major platforms maximizing your competitive advantage at an unparalleled value.

Request a Consultation