intuitive-deep-learning

Lab-000: Scalars & Vectors

Implement and compare the performance of vector addition and dot product using standard Python lists and NumPy arrays, measuring the execution time for both implementations on large vectors to observe speed differences.

Notebook

Notes

In summary, use Python lists for general-purpose programming where flexibility, mixed data types, and frequent modifications are needed, and use NumPy arrays for performance-critical numerical computations involving large datasets.

The performance difference between Python lists and NumPy arrays is quite significant, especially for numerical operations on large datasets. Python lists are general-purpose dynamic arrays that can store elements of different data types, incurring overhead due to type checking and generic object storage. In contrast, NumPy arrays are designed for numerical computations and store elements of a single data type contiguously in memory. This contiguous storage, combined with NumPy’s underlying C implementation, allows for highly optimized, vectorized operations.

From the tests we ran, as vector size increases, NumPy arrays consistently outperform Python lists by orders of magnitude for both vector addition and dot product. For example, for a vector size of 100,000, NumPy operations are dramatically faster. This is because NumPy operations are implemented in C, enabling faster execution, and they leverage vectorization, meaning operations are applied to entire arrays at once rather than element by element, avoiding Python’s loop overhead. This makes NumPy the preferred choice for scientific computing and data analysis in Python when performance is critical. Comparison of performance of Python list and Numpy Array

Feature	Python List	NumPy Array
Data Types	Heterogeneous (mixed types)	Homogeneous (single type)
Memory Storage	Non-contiguous (pointers to objects)	Contiguous (packed data values)
Operations	Slower, uses Python loops	Faster, uses optimized C functions (vectorization)
Memory Efficiency	Less efficient (high overhead per element)	Highly efficient (compact storage)
Flexibility	Dynamic sizing, easier append/insert	Fixed size upon creation

Key Reasons for Speed Difference:

Homogeneous Data Types: Python lists can store elements of different data types (e.g., an integer, a string, and a float). This flexibility means each item must be stored as a full Python object with its own type information and reference counts, adding significant memory and processing overhead. In contrast, NumPy arrays enforce a single, consistent data type (e.g., all integers or all floats), allowing them to store data values directly in a compact, fixed-size format.
Contiguous Memory Allocation: NumPy arrays store their elements in a single, contiguous block of memory. This physical proximity allows modern CPUs to efficiently prefetch data into the cache, which dramatically speeds up operations that process elements sequentially (a principle known as locality of reference). Python lists, which store pointers to objects scattered throughout memory, cannot take advantage of these hardware optimizations.
Optimized C Implementation (Vectorization): The loops and mathematical operations in NumPy are not executed in the slower Python interpreter but are implemented in highly optimized, pre-compiled C and Fortran code. This allows operations on entire arrays at once (vectorization), avoiding the overhead of explicit Python loops which are slow for numerical computations.