0. Pre-knowledge
In Python, the list () is a dynamic array of pointers, and the array object provided by the Array Module is a dynamic array that holds the same type of values. Since the array directly stores the value, it uses less memory than the list. Both lists and arrays are dynamic arrays, so when new elements are added to them, and there is no space to save new elements, they will automatically reallocate the memory block and copy the values in the original memory to the new memory block.
In order to reduce the number of re-allocation of memory, usually every time it is re-allocated, the size is k times the original size. The larger the value of k, the fewer times the memory is re-allocated, but the more space is wasted. This section observes the memory allocation patterns of lists and arrays through a series of experiments.
list storage structure
After list() is instantiated, the structure is divided into 3 parts,
-variable name:
list object (structured data + pointer array)
-list content:
where id represents the position of the list object,
“v”refers to the name of the variable, v[:] refers to the list object, this rule is also true for other sequence structures in python, the following examples can be used as evidence,
When a=b, a and b point to the same list object
When a=b[:], the list object of a and the list object of b point to the same list content
In addition, [0] and [:1] are also different:
1 | 1. In [30]: a[0] |
Empty list memory space
1 | 1. In [32]: sys.getsizeof([]) |
1. Calculate the growth mode of the list by getsizeof()
Step1:
sys.getsizeof() can get the memory size occupied by the list. Please write a program to calculate a list with a length of 10000, and each subscript of it saves the size of the list when it grows to this subscript:
1 | import sys |
Step2:
Calculate the resize_pos array representing the memory size of the list each time memory is allocated:
1 | import numpy as np |
Step3:
We can use scipy.optimize.curve_fit() to fit the resize_pos array, and the fitting function is an exponential function:
1 | from scipy.optimize import curve_fit |
Question 1:
Are the element storage addresses consecutive?
First, test the memory address of the content stored in the list object.
1 | In [2]: id(a) |
Then look at the relative address,
1 | In [6]: for i in a: |
It can be seen that for the list object, the element content is not necessarily stored linearly, but due to memory allocation problems, the illusion of linear storage will appear. When the element appears in a container or changes relative to the previous element type, the memory space will no longer be continuous.
Question 2:
Are the address of the list object and the address of the element consecutive?
In fact, Q1 has already answered this question. After all, the element address itself is not continuous, but we still tested it.
1 | In [22]: id(a[0])-id(a) |
The difference is far, and we can see from the analysis of the source code that the body of the list object is an array of pointers, that is, the position of id(a) is an array of pointers to the position of the element, and of course there are auxiliary object header information and the like.
Question 3:
Analysis of the memory usage of list objects (without elements)
1 | In [16]: sys.getsizeof([1,2,3,'a','b','c','de']) |
It can be seen that each object of the list occupies 8 bytes and 32 bits of space. Let’s look at the slice,
1 | In [20]: sys.getsizeof(a[:3]) |
The slice object also occupies 8 bytes for each element, but the slice is also a list object, even if it is cut from the middle (without cutting the head), it will also contain the storage occupation of the header information.