C 语言中 a[i][j] 的地址是 *(a + i) + j 么？

问题

昨晚有人问了我一个问题——下面的代码输出的两行内容相同么？

int main() {
    int a[5][3];
    printf("%p\n", &a[3][2]);
    printf("%p", *(a + 3) + 2);
}

我的错误答案

我的答案是不一样，因为我认为 *(a + 3) 是一个值，所以一个值加 2 还是一个值，而 &a[3][2] 明显是一个地址，所以不一样。

我的理由是，虽然逻辑上 a 是一个二维数组，但是物理上（内存模型）依然是一段连续的内存，且 a 指向内存的首地址，也就是说我认为 int* a = 数组首地址; 。那么对于一个一级指针解引用自然就会得到一个值了。

但是结果是我错了，代码所输出的两行内容完全相同。

分析

当时我整个人都傻了，感觉对不起 C 语言老师。那么现在的结论是 &a[3][2] == *(a + 3) + 2，要如何解释这个现象呢？没有别的好办法，只能去看汇编了。

代码文件 main.c

#include <stdio.h>

int main() {
    int a[5][3];
    printf("%p\n", a);
    printf("%p", *(a + 1));
}

使用 gcc 编译

gcc -m32 -S main.c

得到下面的汇编代码

	.file	"main.c"
	.text
	.section	.rodata
.LC0:
	.string	"%p\n"
.LC1:
	.string	"%p"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	...
	; eax = 二维数组 a 的首地址
	leal	-72(%ebp), %eax
	; 将 eax 的内容作为参数传递给 printf
	pushl	%eax
	; eax = 字符串 "%p\n" 的首地址
	leal	.LC0@GOTOFF(%ebx), %eax
	; 将 eax 的内容作为参数传递给 printf
	pushl	%eax
	; 调用 printf
	call	printf@PLT
	addl	$16, %esp
	; eax = 二维数组 a 的首地址
	leal	-72(%ebp), %eax
	; eax += 12
	; 12 Byte = 3 * 4 Byte
	; int 的大小为 4 Byte
	addl	$12, %eax
	subl	$8, %esp
	; 将 eax 的内容作为参数传递给 printf
	pushl	%eax
	; eax = 字符串 "%p" 的首地址
	leal	.LC1@GOTOFF(%ebx), %eax
	; 将 eax 的内容作为参数传递给 printf
	pushl	%eax
	; 调用 printf
	call	printf@PLT
	...

可以看到对于二维数组 a，执行 a + 1 时编译器会自动地帮我们计算出第 1 行的首地址，而不是像普通的一级指针仅仅移动一个单位而已。

后续测试发现将 a + 1 改为 *(a + 1) 后汇编代码没有发生改动。所以对于二维数组 a 来说，a + 1 等价于 *(a + 1)。

所以就不难理解为何 &a[3][2] == *(a + 3) + 2 了。

为什么编译器会做这种处理呢？首先确定这是不是一个未定义行为。我在 C99 的标准文档中找到了这些内容

6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to object type’’,
  the other expression shall
  have integer type, and the result has type ‘‘type’’.
  Semantics
2 A postfix expression followed by an expression in square brackets [] is a subscripted
  designation of an element of an array object. The definition of the subscript operator []
  is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
  apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
  initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
  element of E1 (counting from zero).
3 Successive subscript operators designate an element of a multidimensional array object.
  If E is an n-dimensional array (n ≥ 2) with dimensions i × j × . . . × k, then E (used as
  other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with
  dimensions j × . . . × k. If the unary * operator is applied to this pointer explicitly, or
  implicitly as a result of subscripting, the result is the pointed-to (n − 1)-dimensional array,
  which itself is converted into a pointer if used as other than an lvalue. It follows from this
  that arrays are stored in row-major order (last subscript varies fastest).
4 EXAMPLE Consider the array object defined by the declaration
         int x[3][5];
  Here x is a 3 × 5 array of ints; more precisely, x is an array of three element objects, each of which is an
  array of five ints. In the expression x[i], which is equivalent to (*((x)+(i))), x is first converted to
  a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually
  entails multiplying i by the size of the object to which the pointer points, namely an array of five int
  objects. The results are added and indirection is applied to yield an array of five ints. When used in the
  expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j]
  yields an int.

看起来人家早就考虑到这个问题了并将计算规则写入了标准。

既然这不是一个未定义行为，那么为什么 C 标准协会会如此规定呢？不知道，不过我猜大概是因为 C 语言还算是一个高级语言，还是要屏蔽一些底层的东西的。如果你有不同的看法希望可以在评论区留言一同讨论。

最后一段

起初我得出错误答案的时候还是满脸自信的，告诉他这个绝对不会错，因为内存模型就是一维数组，看成一个一级指针就行了，结果是自己做实验去否定自己。果然还是自以为懂得多啊。

问题

我的错误答案

分析

最后一段

发送评论 编辑评论

发送评论编辑评论