Python 101: Introduction to Python as a Data Analytics Tool

michael kibia - Oct 13 - - Dev Community

In this article, we will explore some of the core concepts you will need to know about Python for data analytics. However, no matter what you’re doing — managing memory, performing numerical computations, or working with some data structures — Python has a lot of powerful features that make your work easier. In this, let’s look at five important questions that every data analyst should know when working with Python.

1. Garbage Collection Python and why it is important.
Python has a garbage collection process in which you don’t actually write the programming code to manage the memory. In Python, when Python programs run, we allocate memory to store variables and objects. But when you’re done with these objects, you should release that memory to prevent memory leaks that can bog down your system or cause it to crash.

Python maintains the memory management using a reference count and cyclic garbage collector. In Python, each object has a reference count, which counts how many times the object is being used. The object is deleted when the count hits zero, and the memory is freed. But if objects point to each other (creating reference cycles), Python’s cyclic garbage collector cleans them up for us. In Python I, this automatic memory management system makes Python faster and frees programmers from having to worry about memory allocation and freeing.

2. How are NumPy Arrays Different From Python Lists?
NumPy arrays are the preferred option when you are working with large datasets or doing numerical calculations. There are a few important differences between the two:
Homogeneity vs. Heterogeneity: They are structs with homogeneous elements where the numpy arrays’ elements are the same type and in the python list the elements can be different integers, floats, strings, etc.
Memory Efficiency: NumPy arrays are stored in a contiguous block of memory, so they take up less memory. While Python lists are less efficient, they are collections of pointers to objects stored in different places.
Performance: For numerical tasks, Python lists are not as optimization as NumPy arrays, that are optimized for mathematical operations, they also support element wise operations, which makes them very fast.
NumPy arrays, because of their speed and efficiency, are important analysis tools since they are used for handling large amounts of data, use of mathematical functions, and matrix operations.

3. What is List Comprehension in Python?
A list comprehension is an elegant and efficient way of creating new lists by looping over an existing iterable (such as a list or a range) and running an expression on each element. It means that the code is cleaner and more readable than usual loops.

For example, if you want to create a list of squares of numbers from 0 to 9, you could do it using list comprehension like this:

squares = [x**2 for x in range(10)]
print(squares) # Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

You can also use list comprehension to filter elements based on a condition. For example, to filter out even numbers:

even_numbers = [x for x in range(10) if x % 2 == 0]
print(even_numbers) # Output: [0, 2, 4, 6, 8]

List comprehension is not only easier to read but often faster than using traditional for-loops, making it a valuable tool for Python developers.

4. What is Shallow and Deep Copying in Python?
Copying objects can be a pain in Python, especially when you try to copy nested structures such as lists of lists. There are two types of copying: shallow and deep.

Deep copy creates copies of nested objects, and shallow copy doesn’t. It just references them instead. This basically means that if you modify variables in the nested objects, those changes will be transferred to the shallow copy. We can make shallow copies with the copy() method or the copy module’s copy() function.
Example of shallow copying:

import copy
original = [[1, 2, 3], [4, 5, 6]]
shallow_copy = copy.copy(original)
shallow_copy[0][0] = 10
print(original) # Output: [[10, 2, 3], [4, 5, 6]]

When we do deep copy, we create a new object, as well as independent copies of all nested objects. Thus, anything modified in the inner objects of the original will not be translated into the deep copy. However, you can generate deep copies using the copy.deepcopy() method.
Example of deep copying:

deep_copy = copy.deepcopy(original)
deep_copy[0][0] = 100
print(original) # Output: [[10, 2, 3], [4, 5, 6]]

5. What is the Difference Between Lists and Tuples in Python?
Lists and tuples are both sequence data types in Python, but they have some key differences:

Mutability: Lists are mutable, so once a list is created, you can change its elements. Tuples are immutable, which means that they can’t be changed after they are created.
Syntax: Square brackets [ ] defines a list, and parentheses ( ) defines a tuple.
Here’s an example:

_my_list = [1, 2, 3]
my_tuple = (1, 2, 3)

(#Lists are mutable)
my_list[0] = 10
print(my_list) # Output: [10, 2, 3]

(#Tuples are immutable)
(#my_tuple[0] = 10 # This would raise a TypeError_)

Tuples are used often because they’re immutable, and since you don’t want to change a collection of values, this is often the way to go: geographical coordinates, days of the week, etc. They are also a little faster to use, have less memory than lists, and are good for performance-critical applications.

CONCLUSION
Python has a lot of tools that make it a great tool for data analytics. Knowing garbage collection minimizes waste and diminishes the NumPy arrays needed to perform numerical calculations. As a code simplifier, list comprehension is great, and an understanding of shallow vs deep copying is crucial when you work with data that is not trivial. Third, being able to pick the right data structure for a particular task depends upon your being aware of the characteristics of lists and tuples.

With these fundamentals, you'll be well-equipped to use Python as a powerful data analytics tool. Happy coding!

.
Terabox Video Player