#3 Data Engineering — EXTRACT DATA from CSV Files

Sakshi Agarwal
7 min readMar 6, 2024

This is the third blog in the series of posts related to Data Engineering. I have been writing all the important things that I learn as a part of the Data Scientist Nanodegree Program, Udacity. I have realized it is the best way to test my understanding of the course material and maintain the discipline to study. Please check the other posts as well.

In this article, our focus is on the first part of the ETL pipeline which is extracting the data. Extraction means pulling or loading data from various sources. As data engineers, we might work with any kind of data source, such as CSV, JSON, XML, webpages, etc. In this blog, we will extract data from CSV Files. The Github link for the code is here.

I use data from the World Bank. The data comes from two sources:

  1. World Bank Indicator Data — This data contains socio-economic indicators for countries around the world. A few example indicators include population, arable land, and central government debt.
  2. World Bank Project Data — This data set contains information about World Bank project lending since 1947.

Both of these data sets are available in different formats including as a CSV file, JSON, or XML. You can download the CSV directly or you can use the World Bank APIs to extract data from the World Bank’s…

--

--

Sakshi Agarwal

Computer Science Engineering graduate. Specialisation-Python Programming, Javascript, React, three.js