#7 Data Engineering — TRANSFORM DATA (ETL Pipelines)

Sakshi Agarwal
2 min readOct 28, 2021

--

This is the seventh blog in the series of posts related to Data Engineering. I have been writing all the important things that I learn as a part of the Data Scientist Nanodegree Program, Udacity. I have realized it is the best way to test my understanding of the course material and maintain the discipline to study. Please check the other posts as well.

If you are here after following my previous posts, then congratulations! Now that we have learnt how to extract data, we will try our hands-on transformation of the data. This post is an introduction to transforming the data.

What is Data Transformation?

Data Transformation usually implies getting the data ready for a machine learning algorithm. For example, we might be working with the data coming from multiple sources and thus to run any sort of algorithm on them, we need to transform the data into the same syntax. Also, we transform the data to clean it and engineer new features. Hence, we transform the dataset to create a new dataset.

In this series, we will work with the following things:

  1. Combining data
  2. Cleaning data
  3. Working with encodings
  4. Removing duplicate rows
  5. Dummy variables
  6. Remove outliers
  7. Normalize data
  8. Engineer new features

These terms might be overwhelming to look at but stick with me and we will learn it all together. These are very important parts of a BI Analyst’s life. They usually spend anywhere between 50% to 90% of their time preparing the data.

Conclusion

After this, we will cover all the transform techniques. Please refer to previous blogs if you are new to this and want to learn extraction of data first. In the next blog, we will cover the techniques used to combine the data.

Feel free to write to me at agarwalsak10@gmail.com if you have any questions or suggestions.

--

--

Sakshi Agarwal
Sakshi Agarwal

Written by Sakshi Agarwal

Computer Science Engineering graduate. Specialisation-Python Programming, Javascript, React, three.js

No responses yet