-
Notifications
You must be signed in to change notification settings - Fork 0
Arrays
We write t[]
for the type of arrays with elements of type t. Note that t is arbitrary: we can have an array of integers (int[]
), and array of booleans (bool[]
) or an array of arrays of characters (char[][]
). This syntax for the type of arrays is like Java, but is a minor departure from C.
Arrays are of a fixed size, and they must be explicitly allocated using the expression alloc_array(t, n)
. Here t is the type of the array elements, and n is their number. Let's try in coin:
% coin
Coin 0.2.6 "Penny" (r2087, Fri May 27 12:10:21 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> int[] A = alloc_array(int, 10);
A is 0xECE2FFF0 (int[] with 10 elements)
-->
The result may be surprising: A is an array of integers with 10 elements (obvious), but what does it mean to say
A is 0xECE2FFF0
? As we discussed in the section on integers, variables can only hold values of a fixed size, the word size of the machine. An array of 10 integers would be 10 times this size, so we cannot hold it directly in the variable A. Instead, the variable A holds the address in memory where the actual array elements are stored. In this case, the address happens to be 0xECE2FFF0
(incidentally presented in hexadecimal notation, but there is no guarantee that the next time you run coin you will get the same address. Fortunately, this is okay because you cannot actually ever do anything directly with this address as a number. Instead we access the array elements using the syntax A[i]
where 0 ≤ i < n, where n is the length of the array. We say that arrays are 0-based because elements are numbered starting at 0. For example:
--> A[0];
0 (int)
--> A[1];
0 (int)
--> A[2];
0 (int)
--> A[10];
Error: accessing element 10 in 10-element array
Last position: <stdio>:1.1-1.5
--> A[-1];
Error: accessing element -1 in 10-element array
Last position: <stdio>:1.1-1.5
-->
We notice that after allocating the array, all elements appear to be 0. This is guaranteed by the implementation, which initializes all array elements to a default value which depends on the type. The default value of type int
is 0. We also observe that trying to access an array element not in the specified range will lead to an error. In this example, the valid accesses are A[0]
, A[1]
, ..., A[9]
(which comes to 10 elements); everything else is illegal.
How do we change an element of an array? We can use it on the left-hand size of an assignment. We can set A[i] = e
as long as e is an expression of the right type for an array element. For example:
--> A[0] = 0; A[1] = 1; A[2] = 2;
A[0] is 0 (int)
A[1] is 1 (int)
A[2] is 2 (int)
-->
Recall that an assignment (like A[0] = 0;
) is a statement and as such has an effect, but no value. coin will print back the effect of the assignment. Here we have given three statements together, so all three effects are shown. Again, exceeding the array bounds will result in an error message and the program aborts.
--> A[10] = 10;
Error: accessing element 10 in 10-element array
Last position: <stdio>:1.1-1.11
-->
A common pattern of access and traversal of arrays is for-loops, where the index i is counted up from 0 to the length of the array. To continue the example above, we can assign i to the ith element of the array as follows:
--> for (int i = 0; i < 10; i++)
... A[i] = i;
--> A[7];
7 (int)
-->
Characteristically, the exit condition of the loop test i < n
where i is the array index and n is the length of the array.
After we type in the first line (the header of the for-loop), coin responds with the prompt ...
instead of -->
. This indicates that the expression or statement it has parsed so far is incomplete. We complete it by supplying the body of the loop, the assignment A[i] = i;
. Note that no assignment effect is printed. This is because the assignment is part of a loop. In general, coin will only print effects of top-level statements such as assignments, because when a complicated program is executed, a huge number of effects could be taking place.
When we use loops to traverse arrays, we need to make sure that all the array accesses are in bounds. In many cases this is evident, but in can be tricky in particular if we have two-dimensional data (for example, images). As an aid to this reasoning, we state an explicit loop invariant which expresses what will be true on every iteration of the loop.
The following example functions computes an array of the first n Fibonacci numbers, starting to count from 0. It uses the recurrence:
- f0 = 0
- f1 = 1
- fn+2 = fn+1 + fn (for n ≥ 0)
/* file fib.c0 */
#use <conio>
int[] fib(int n)
//@requires n >= 0;
//@ensures \length(\result) == n;
{
int[] F = alloc_array(int, n);
if (n > 0) F[0] = 0; /* line 0 */
if (n > 1) F[1] = 1; /* line 1 */
for (int i = 2; i < n; i++)
//@loop_invariant 2 <= i;
F[i] = F[i-1] + F[i-2]; /* line 2 */
return F;
}
The loop invariant states a property that must be true just before the exit condition is tested. Clearly, i is incremented each time around the loop (with the step statement i++
which is the same as i = i+1
), so i will always be greater or equal to 2. More precisely:
- When we enter the loop the first time, i = 2 so i ≥ 2.
- Assume that i ≥ 2 (the loop invariant) when we enter the loop, we have to show it still holds after we traverse the loop body once. We obtain the next value of the loop by executing
i = i+1
so the new value of i (written i`, read "i prime") will only be bigger, so it must still be greater of equalt to 2.
Now let's verify that all array accesses are guaranteed to be in bounds.
- In line 0, we assign to F[0]. If the length of the array F (which is n) were 0, this would be out of bounds. But we check that n > 0 so that the assignment only takes place if there is at least one element in the array, labeled F[0].
- Similarly, in line 1 we access F[1], but this is okay because we only access it if n > 1.
- In line 2, we access F[i], F[i-1] and F[i-2]. By the loop invariant we know that i, i-1, and i-2 are all greater or equal to 0, because i ≥ 2. Since we only enter the loop body if i < n, and n is the length of array, we also know that i is less than the length of the array (and so are i-1 and i-2 because they are only smaller). So all three access must always be in bounds.
In the last case, we do not reason about how the loop operates but rely on the loop invariant instead. This is crucial, since the loop invariant is supposed to contain all the relevant information.
## AliasingWe have seen assignments to array elements, such as A[0] = 0;
. But we have also seen assignments to array variables themselves, such as
int[] A = alloc_array(int, n);
What do they mean? To explore this, we separate the declaration of array variable (here: F and G) from assignment to them.
--> int[] F;
--> int[] G;
--> F = fib(15);
F is 0xF6969A80 (int[] with 15 elements)
--> G = F;
G is 0xF6969A80 (int[] with 15 elements)
--> G = fib(10);
G is 0xF6969A30 (int[] with 10 elements)
-->
The first assignment to F is as expected: it is the address of an array of Fibonacci numbers with 15 elements. When we set G = F, G and F (as variables) hold the same address! Holding the same address means that F and G are aliased. When we make the second assignment to G (changing its value) we get a new array, which is in fact smaller and definitely no longer aliased to F (not the different address). Aliasing (or the lack thereof) is crucial, because modifying one of two aliased arrays will also change the other. For example:
--> int[] A = alloc_array(int, 5);
A is 0xE8176FF0 (int[] with 5 elements)
--> int[] B = A;
B is 0xE8176FF0 (int[] with 5 elements)
--> A[0] = 42;
A[0] is 42 (int)
--> B[0];
42 (int)
-->
There is no built-in way to copy from one array to another, but here is a simple function to accomplish this.
/* file copy.c0 */
int[] array_copy(int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@ensures \length(\result) == n;
{
int[] B = alloc_array(int, n);
for (int i = 0; i < n; i++)
//@loop_invariant 0 <= i;
B[i] = A[i];
return B;
}
For example, we can create B as a copy of A, and now assigning to the copy of B will not affect A. We will invoke coin with the -d
flag to make sure that if a pre- or post-condition or loop invariant is violated we get an error message.
% coin copy.c0 -d
Coin 0.2.6 "Penny" (r2087, Fri May 27 12:10:21 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> int[] A = alloc_array(int, 10);
A is 0xF3B8DFF0 (int[] with 10 elements)
--> for (int i = 0; i < 10; i++) A[i] = i*i;
--> int[] B = array_copy(A, 10);
B is 0xF3B8DFB0 (int[] with 10 elements)
--> B[9];
81 (int)
--> A[9] = 17;
A[9] is 17 (int)
--> B[9];
81 (int)
-->
Internally, arrays are stored in the area of the memory called the heap. Memory on the heap is allocated with alloc_array
(which returns the address of an array) and alloc
(which returns a pointer). In C0, memory is not explicitly deallocated, but it is garbage collected in the sense that memory that can no longer be accessed from within the running program is freed so that it can be used to satisfy future allocation requests.
In order to check whether array accesses are in bounds, the C0 runtime system must store not only the array data, but also the length of the array. In the running program, this information can not be accessed directly: given an array we cannot obtain its length. This is mostly in order to simulate safe programming practices in C. For example, when arrays are passed as arguments to functions we usually also pass (a bound on) their length.
However, in contracts (that is, function preconditions @requires
, postconditions @ensures
, loop invariations @loop_invariant
and assertions @assert
) we can refer to the length of an array using the special function \length
. We have already used this in the examples above. For example, the copy function
int[] array_copy(int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@ensures \length(\result) == n;
;
requires n to be smaller than the length of the parameter array A and ensures that the result array will have length n.