Are arrays and pointers the same in C?

Pointers are the essence of C/C++, and pointers and arrays are happy enemies. Many times we cannot distinguish pointers and arrays very well, and few undergraduates who have just graduated from the computer science department can Someone can master the usage and difference of pointers and arrays. The reason for this may be related to current university teaching and many popular C or C++ tutorials on the market. Although these tutorials are easy to understand, they avoid talking about many key points or do not explain them clearly at all. In many cases, What is stated is a wrong point of view. Generally, when you first learn C/C++, you are exposed to this type of tutorials, and the learning effect can be imagined. It is really critical for beginners to choose a good tutorial, because once you accept a wrong point of view or thought, even if you learn about it later, it will be difficult to correct it (I know this deeply). Here I recommend three books that are very suitable. Tutorials for beginners:

"The C Programming Language" Brian W. Kernighan and Dennis M. Ritchie's classic book (K&R Bible)

"C ++ Primer" Stanley B . Lippman, Josée Lajoie, Barbara E. Moo C++ classic and authoritative book

"Pointers on C" Kenneth A. Reek

Many times, someone will say "Pointing is the same as an array." ", this is a very dangerous statement and not entirely correct. Pointers and arrays are equivalent in certain contexts, but not in all cases. However, people often naturally ignore the conditions for this situation to be established and assume that this is the case in all situations. Let's focus on the difference between pointers and arrays.

1. Definition of pointers and arrays

A pointer is a pointer. A pointer variable stores an address and is used to access data indirectly. In a 32-bit system, a pointer variable (including void pointer) always occupies 4 bytes of space. Pointers can point to any memory space, but not all memory spaces can be accessed through pointers.

Arrays are arrays. After defining an array, the compiler will open a continuous space in the memory to store data according to the type and number of the array elements, thereby directly accessing the data.

Let’s look at an example

There is the following code in file1.c:

char p[100]="abcdef";

There is the following code in file2.c:

#include

extern char *p;

int main(void)

{

printf("%c\n",p[1]);

return 0;

}

Copy the code

I found that it can be compiled and passed, but can it be executed correctly? Debugging found: The error shown below occurred and the value of p[1] could not be calculated. The reason will be explained later.

It can be seen from here that pointers and arrays are not equivalent. The definition of an array is not equivalent to the external declaration of a pointer (note the difference between declaration and definition. A definition is to allocate memory for a variable or object. space, whereas the declaration simply describes the type).

2. The difference between pointer and array access

Reference to array subscript:

Reference to pointer:

From above As can be seen from the picture, pointers and arrays are two completely different things. For arrays, since the compiler already knows the address of each symbol when compiling, if an address is needed to perform a certain operation, the operation can be performed directly without adding instructions to first obtain the specific address. This is the case for arrays; For pointers, their current specific value must first be obtained at runtime before they can be referenced. From this point, we can explain why the above program cannot be executed correctly, because p defined in file1.c is an array, but it is declared as a pointer in file2.c. Therefore, when referenced in file2.c, p is a pointer variable by default, and any data in the pointer variable will be treated as an address. Therefore, the contents of the first 4 bytes of the original array are first taken: 0x61 0x62 0x63 0x64 to form an address. (Ignore the issue of big and small endian for now) 0x61626364, and then read the contents of the address 0x61626364 according to the char type, but this address may not be a valid address, even if it is valid, it is not what we want.

You can think about what will happen if p is defined as a pointer type in file1.c and p is declared as an array type in file2.c?

The solution to the above problem is to keep definitions and declarations consistent at all times.

Test program:

file2.c

#include

extern char p[];

extern void print();

int main(void)

{

printf("%x\n",p[0] );

printf("%x\n",p[1]);

printf("%08x\n",p); //Note that p at this time The value is the first address of the memory unit that stores the original pointer p (p in file1.c)

print();

return 0;

}< /p>

Copy code

file1.c

#include

char *p="abcdef";

void print()

{

printf("%08x\n",p);

printf("%08x\n" ,&p);

}

Copy code

The execution result is:

00424a30

00422028

00424a30

Press any key to continue

三 .Some things that should be noted

1. The difference between sizeof when calculating the occupied space.

For arrays, sizeof calculates the space occupied by the entire array, and under 32-bit systems, the value of the sizeof pointer is always 4.

2. The array name is used as an lvalue It cannot be modified, but a pointer can be assigned as an lvalue.

3. Pointers can perform self-increment (self-decrement) operations (except void pointers, because void pointers cannot know the step size), but arrays cannot perform self-increment or self-decrement operations.

4. Understand the difference between char *p="abcde" and char str[]="abcde".

The C language standard explains this:

Rule 1: The array name in the expression is treated by the compiler as a pointer to the first element of the array;

Note: There are exceptions to the following situations

1) The array name is used as the operand of sizeof

2) Use & to get the address of the array

Rules 2: The subscript is always the same as the offset of the pointer;

Rule 3: In the declaration of function parameters, the array name is treated by the compiler as a pointer to the first element of the array.

Rule 1 and Rule 2 are understood together, that is, a reference to an array subscript can always be written as "a pointer to the starting address of the array plus an offset." For example, a[i] is always parsed by the compiler as *(a+i).

Rule 1: Array names in expressions are always parsed as pointers by the compiler, so the following statement int a[3];int *p=a; can be compiled and executed correctly. In the expression a is parsed as a pointer to the first element of the array, then the types on both sides of the assignment symbol match, so it can be compiled and executed correctly.

Rule 2: The subscript is always the same as the offset of the pointer. The main reason why array subscripts are rewritten as pointer offsets in C language is that pointers and offsets are the basic types used by the underlying hardware. For example, i in a[i] is always interpreted as an offset by the compiler, so a[i] is always rewritten in the form *(a+i), where a is a pointer to the first element of the array, plus Offset i means that the pointer moves back i steps, and then takes the content of the unit where a+i is located.

This can explain why the subscript of an array in C language can be negative, and in my opinion, the fact that C language does not check whether the subscript of an array is out of bounds is also related to this, such as the following program:

< p>#include

int main(void)

{

int a[3]={1,2,3};

int *p=(a+3);

printf("%d\n",p[-1]);

return 0;

}

Copy code

The program execution result is 3. Although the subscript is -1, it is parsed as an offset by the compiler, so it is equivalent to *(p-1).

Rule 3: In the declaration of function parameters, the array name is treated by the compiler as a pointer to the first element of the array. In C language, arrays of formal parameters and pointers are equated for efficiency reasons. If you don't do this, a copy of the value of each element of the entire array will be transferred, which may be very expensive in terms of time and space. But if you want to be able to operate on the elements in the array, you only need to pass the address of the first element of the array to the calling function, and then use the pointer to access the space you want to access. In this way, the time and space consumption will be greatly reduced. Therefore, inside the function, the compiler always treats the array name declared in the parameter as a pointer to the first element of the array. In this way, the compiler can generate correct code without distinguishing between arrays and pointers. . Therefore, the effects of void fun(int a[]); and void fun(int *a) are exactly the same. If a is referenced inside a function, it will always be considered a pointer by the compiler. Because void fun(int a[]); this form will eventually be parsed by the compiler as void fun(int *a); this form tells us that a pointer to integer data must be passed when calling. So the following code can be compiled and executed correctly:

#include

void fun(int a[])

{

printf("%d\n",a[0]);

}

int main(void)

{

int a[3]={1,2,3};

int *p1,*p2;

int b=4;

< p> p1=a;

p2=&b;

fun(a);

fun(&a[1]);

fun(p1);

fun(p2);

fun(&b);

return 0;

}

Copy the code

Distinguish the meanings of several expressions:

&p, p, a, &a

&p: indicates taking the storage pointer variable The address of the memory unit of p; sizeof(&p)=4;

p: indicates taking the address stored in the pointer variable p; sizeof(p)=4;

a: indicates taking The address of the first element of the array; sizeof(a)=3*4=12;

&a: means taking the first address of the entire array; sizeof(&a)=4 (in VC++6.0 this The value is 12, which I think is wrong because its type is an array pointer)

Although the values ????of a and &a are the same, they express completely different meanings. a represents the address of the first element of the array. , and &a means taking the first address of the array. The types they represent are also completely different. a is an int pointer, and &a is an int (*p)[] pointer, that is, an array pointer (will be explained in a subsequent article). So the results obtained by a+1 and &a+1 are different. a+1 means moving the pointer pointing to the first element of the array backward by one step (the step here is the number of bytes occupied by the array element type) ; And &a+1 means moving the pointer to the array backward by one step (the step here is the number of array elements * the number of bytes occupied by the element type).

#include

int main(void)

{

int a[3]={1, 2,3};

int *p=a;

printf("%08x\n",&p);

printf("%08x\ n",p);

printf("%08x\n",&p+1);

printf("%08x\n",p+1);

printf("%08x\n",a);

printf("%08x\n",&a);

printf("%08x\n ",a+1);