Skip to content
frankpfenning edited this page Aug 15, 2011 · 22 revisions

We write t[] for the type of arrays with elements of type t. Note that t is arbitrary: we can have an array of integers (int[]), and array of booleans (bool[]) or an array of arrays of characters (char[][]). This syntax for the type of arrays is like Java, but is a minor departure from C; see C arrays for more information.

Arrays are of a fixed size, and they must be explicitly allocated using the expression alloc_array(t, n). Here t is the type of the array elements, and n is their number. Let's try in coin:

% coin
Coin 0.2.6 "Penny" (r2087, Fri May 27 12:10:21 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> int[] A = alloc_array(int, 10);
A is 0xECE2FFF0 (int[] with 10 elements)
--> 

The result may be surprising: A is an array of integers with 10 elements (obvious), but what does it mean to say

A is 0xECE2FFF0

? As we discussed in the section on integers, variables can only hold values of a fixed size, the word size of the machine. An array of 10 integers would be 10 times this size, so we cannot hold it directly in the variable A. Instead, the variable A holds the address in memory where the actual array elements are stored. In this case, the address happens to be 0xECE2FFF0, but their is no guarantee that the next time you run coin you will get the same address. Fortunately, this is okay because you cannot actually ever do anything directly with this address as a number. Instead we access the array elements using the syntax A[i] where 0 ≤ i < n, where n is the length of the array. We say that arrays are 0-based because elements are numbered starting at 0. For example:

--> A[0];
0 (int)
--> A[1];
0 (int)
--> A[2];
0 (int)
--> A[10];
Error: accessing element 10 in 10-element array
Last position: <stdio>:1.1-1.5
--> A[-1];
Error: accessing element -1 in 10-element array
Last position: <stdio>:1.1-1.5
--> 

We notice that after allocating the array, all elements appear to be 0. This is guaranteed by the implementation, which initializes all array elements to a default value which depends on the type. The default value of type int is 0. We also observe that trying to access an array element not in the specified range will lead to an error. In this example, the valid accesses are A[0], A[1], ..., A[9] (which comes to 10 elements); everything else is illegal.

How do we change an element of an array? We can use it on the left-hand size of an assignment. We can set A[i] = e as long as e is an expression of the right type for an array element. For example:

--> A[0] = 0; A[1] = 1; A[2] = 2;
A[0] is 0 (int)
A[1] is 1 (int)
A[2] is 2 (int)
-->

Recall that an assignment (like A[0] = 0;) is a statement and as such has an effect, but no value. coin will print back the effect of the assignment. Here we have given three statements together, so all three effects are shown. Again, exceeding the array bounds will result in an error message and the program aborts.

--> A[10] = 10;
Error: accessing element 10 in 10-element array
Last position: <stdio>:1.1-1.11
--> 
## For Loops to Traverse Arrays

A common pattern of access and traversal of arrays is for-loops, where the index i is counted up from 0 to the length of the array. To continue the example above, we can assign i to the ith element of the array as follows:

--> for (int i = 0; i < 10; i++)
... A[i] = i;
--> A[7];
7 (int)
--> 

Characteristically, the exit condition of the loop test i < n where i is the array index and n is the length of the array.

After we type in the first line (the header of the for-loop), coin responds with the prompt ... instead of -->. This indicates that the expression or statement it has parsed so far is incomplete. We complete it by supplying the body of the loop, the assignment A[i] = i;. Note that no assignment effect is printed. This is because the assignment is part of a loop. In general, coin will only print effects of top-level statements such as assignments, because when a complicated program is executed, a huge number of effects could be taking place.

## Loop Invariants

When we use loops to traverse arrays, we need to make sure that all the array accesses are in bounds. In many cases this is evident, but in can be tricky in particular if we have two-dimensional data (for example, images). As an aid to this reasoning, we state an explicit loop invariant which expresses what will be true on every iteration of the loop.

The following example functions computes an array of the first n Fibonacci numbers, starting to count from 0. It uses the recurrence:

  • f0 = 0
  • f1 = 1
  • fn+2 = fn+1 + fn (for n ≥ 2)
/* file fib.c0 */
#use <conio>

int[] fib(int n)
//@requires n >= 0;
//@ensures \length(\result) == n;
{
  int[]	F = alloc_array(int, n);
  if (n	> 0) F[0] = 0;                      /* line 0 */
  if (n	> 1) F[1] = 1;                      /* line 1 */
  for (int i = 2; i < n; i++)
    //@loop_invariant 2 <= i;
    F[i] = F[i-1] + F[i-2];                 /* line 2 */
  return F;
}

The loop invariant states a property that must be true just before the exit condition is tested. Clearly, i is incremented each time around the loop (with the step statement i++ which is the same as i = i+1), so i will always be greater or equal to 2. More precisely:

  • When we enter the loop the first time, i = 2 so i ≥ 2.
  • Assume that i ≥ 2 (the loop invariant) when we enter the loop, we have to show it still holds after we traverse the loop body once. We obtain the next value of the loop by executing i = i+1 so the new value of i (written i`, read "i prime") will only be bigger, so it must still be greater of equalt to 2.

Now let's verify that all array accesses are guaranteed to be in bounds.

  • In line 0, we assign to F[0]. If the length of the array F (which is n) were 0, this would be out of bounds. But we check that n > 0 so that the assignment only takes place if there is at least one element in the array, labeled F[0].
  • Similarly, in line 1 we access F[1], but this is okay because we only access it if n > 1.
  • In line 2, we access F[i], F[i-1] and F[i-2]. By the loop invariant we know that i, i-1, and i-2 are all greater or equal to 0, because i ≥ 2. Since we only enter the loop body if i < n, and n is the length of array, we also know that i is less than the length of the array (and so are i-1 and i-2 because they are only smaller). So all three access must always be in bounds.

In the last case, we do not reason about how the loop operates but rely on the loop invariant instead. This is crucial, since the loop invariant is supposed to contain all the relevant information.

## Aliasing

We have seen assignments to array elements, such as A[0] = 0;. But we have also seen assignments to array variables themselves, such as

int[] A = alloc_array(int, n);

What do they mean? To explore this, we separate the declaration of array variable (here: F and G) from assignment to them.

--> int[] F;
--> int[] G;
--> F = fib(15);
F is 0xF6969A80 (int[] with 15 elements)
--> G = F;
G is 0xF6969A80 (int[] with 15 elements)
--> G = fib(10);
G is 0xF6969A30 (int[] with 10 elements)
--> 

The first assignment to F is as expected: it is the address of an array of Fibonacci numbers with 15 elements. When we set G = F, G and F (as variables) hold the same address! Holding the same address means that F and G are aliased. When we make the second assignment to G (changing its value) we get a new array, which is in fact smaller and definitely no longer aliased to F (not the different address).

If we are using integers, booleans, characters and string constants, all runtime values during the execution of a program are held in variables or on the stack. Arrays are the prototypical built-in data structure whose values must be held in explicitly allocated memory. Arrays must be allocated with a fixed size, and all elements of the array will be initialized to some default value, depending on the type of the element.

Arrays holding values of type t are allocated with alloc_array(t, n) where n is the number of elements of the array.

Clone this wiki locally