Introduction to Data Engineering: setting up python for ETL

Hiswill Thompson - Nov 6 - - Dev Community

Introduction to Data Engineering: Setting up python for ETL

Hey there!

If you’ve missed my other articles on “introduction to Data Engineering “ and Understanding ETL pipeline” I’ll recommend you go check it out on my profile. As an aspiring Data engineer it will be handful in your journey of Data Engineering .

I’m this article, we will be talking about how to set up python for ETL. However, before then, let’s have an overview of what Data Engineering and the ETL process is all about .

Data engineering is the process of building, maintaining and optimizing data. It involves the process of “gathering” information from various sources might be an APIs, flat files websites etc processing and streamlining in into a useful and meaningful information then making it available to user ; might be a Data scientist or a Data Analyst .

The ETL process on the other hand is an integral apart of Data Engineering. It stands for Extract, Transform and Load. The Extraction process process basically is “fetching “ data for various sources or platforms .The TRANSFORM process is the most vigorous process in ETL .
Data gathered are streamlined into useful information and then stored in Data repositories for User accessibility; this is the LOAD stage.

Having have a clue of what Data Engineering and what ETL is let’s look at what Python and the significance it has in the ETL process before we actually delve into setting up python for ETL . That’s fair enough isn’t it?

Python is one of the most popular programming language. It is an open-source, high level and object-orientated programming language.
It is simple, easy to -learn and readable .It is easy to understand and have user interface reason most IT expert choose it over other programming languages.
Python has versatile and powerful features the helps Data Engineers in the ETL process. It possesses various libraries which fine- tune to Data Engineering needs.
Examples includes; Pandas, Numpy, Apache Airflow , Scikit- learn, Beautiful Soup etc.

This libraries has so much to do with the ETL process. It’s significant ranges from collection of Data from various sources, streamlining data, merging datasets, Data classification etc.
For instance Pandas helps in extracting, processing and even loading datasets, Psyspark helps in working with large datasets and SQL Alchemy with its flexibility helps in database interaction .

With this, let’s delve into how to set up python in your operating system in other to use it for your ETL operation.
Below are the python installation Guide:

1.open your favorite browser and search “Python download”
2.Python original website will display; python.org
3.Download the version of your choice; preferably the latest version.
4.Install the one for your operating system (OS options will be displayed)
5.Click Download.
6.Then install; you can customize installation
7.Tick the two boxes that will be displayed below it
8.Use admin privately when installing Py.exe and add python .exe to PATH.
9.Optional features will be displayed
10.Click on next
11.Advance setting will show , installation location will be show .
12.Click installation and wait for successful installation

  1. Click close or minimize an set up your python for you

Conclusively, python plays an important role in Data engineering and will still be of great effect in the field of Data Engineering task like ETL . It is pertinent to learn it an instill it into your Data Engineering journey.
Recommendation:
https://www.astera.com/type/blog/etl-using-python/

https://medium.com/@godswillthompson16/understanding-etl-pipelines-extract-transform-load-in-data-engineering-814472d71646?source=user_profile_page---------1-------------d1624a597f9d---------------

.
Terabox Video Player