Skip to content

From C0 to C

iliano edited this page Nov 27, 2019 · 259 revisions

C0 is a safe subset of C. As we coded C0 programs, we wrote contracts, we compiled with -d flag to make sure we catch any bugs before they become big headaches and huge time wasters. Transitioning from C0 to C can be a rocky road, unless you are very careful and do some planning ahead. But we need to warn you, no matter how much you prepare, there is no way to completely avoid many of C's nuances. They may include weird interpretations, implementation-dependent behavior and undefined program behavior. This tutorial will cover some of the basic ideas of transitioning from C0 to C programming. This guide is only intended for those who are familiar with C0 and now want to make a transition to C programming language. A good resource for learning C is the book titled "The C Programming Language" by Brian Kernighan and Dennis Ritchie, who originally designed and implemented the C language. They were also involved with co-designing the Unix operating system. The C language and an UNIX operating system are closely intertwined.

Quick Tutorial

We have organized this tutorial in such a way, that you can quickly find some helpful ideas when transitioning from C0 to C. In particular we briefly introduce some familiar C0 concepts such as arrays, integers, booleans, pointers, structs and strings, go over C1 concepts such asfunction pointers and generic data types, and then move on to introduce some things that are in C but were not covered in C0 or C1. The list includes C macros, header files, C preprocessing, separate compilation and C standard IO. We recommend that you start with this quick guide that explains familiar C0 constructs and how they look in C as well as some of the constructs in C that you have not seen so far in C0.

Detailed Section

We have also developed a detailed section that cover more material and show more examples to explain C concepts in some detail. We also include extra concepts such as file IOthat can be useful in upstream courses. If you would like to start with the table of content of the detailed section click here

A Quick Tutorial

Lets take a look at some of the familiar C0 constructs and how they look like in C.

Arrays

An array is an aggregate data structure that contains data elements of the same type (we call arrays homogeneous data structures). In C0, we define arrays of type t as t[]. So we can have integer arrays as int[] or boolean arrays as bool[]. We allocated arrays in C0 as follows.

int[] A = alloc_array(int, 10);

In C, arrays that are explicitly allocated on the heap are indistinguishable from pointers. C does not know how long your array is or even what type of data it holds. In other words, it is the programmer's responsibility rather than the compiler's to track this information. So the C0 code above can be converted to C as follows.

int* A = calloc(10, sizeof(int));  <--- include <stdlib.h> to access calloc 
or
int* A = xcalloc(10, sizeof(int)); <--- a safe version of calloc is xcalloc (validates A != NULL)
                                   <--- include <xalloc.h> 

calloc is a function that is included in the C standard library <stdlib.h> with the prototype

void* calloc (size_t num, size_t size);  <--- size_t is defined in stddef.h as an unsigned int  

calloc takes the number of elements as the first argument and the size of each element (in bytes) as the second argument and returns a memory block of total size (num×size) bytes (initialized to 0). The sizeof operator, which is implementation dependent, returns the number of bytes necessary to hold the data type. We note that sizeof(int) can return 4, 8 or even 16. The function calloc returns a void* (pointer to data of arbitrary type), an address that holds the starting location of a memory block of (num×size) bytes. It is possible that calloc may return a NULL pointer if it is not able to allocate the requested heap-allocated memory. We can avoid testing for NULL pointers by using xcalloc instead (included in the CMU-based xalloc library). The following code allocates an array of 10 integers and initializes the all elements to zero.

int* A = xcalloc(10, sizeof(int));

A here is just a pointer to a block of memory. Why is its type int important then? For one, the compiler can produce a warning or error if you are trying to use it inconsistently. Second, when we dereference the array A as A[i], then it needs to calculate the address of element i in the array based on the size of each element.

In C0, arrays are always heap-allocated. But in C, we can also allocate arrays on the stack. A stack-allocated array A of type t has the form

t A[constant_expression];

constant-expression must be an integral expression and must have a value > 0. The type t can be any standard type such int or char, a struct, or even another array.

Stack-allocated arrays are fixed in size and are discarded when variable A goes out of scope. You can read more about stack-allocated and heap-allocated arrays in the section on Declaring, Initializing and Accessing C arrays.

Integers

Integer are a fundamental data type in all programming languages. There are infinitely many integers and a finite subrange of them can be represented directly in computers. In C0, we use 32-bit signed integers, and all C0 integers obey the rules of modular arithmetic. However in C, there are signed and unsigned integers. Integers can be typed as short int, int, or long int indicating allocation of different memory sizes, an implementation dependent detail. Integers of different sizes can be converted to each other. Even a character (type char) can be treated as an integer (type int). This provides a great deal of flexibility in managing data types in your code, but it may also introduce some strange behavior.

Signed versus unsigned integers
Any integer declared as an int is by default a signed integer. Signed integers in C may not obey the laws of modular arithmetic. But unsigned integers are always positive or zero and obeys the laws of modular arithmetic. If int's are represented by 32-bits, then unsigned int's range from 0 to 232-1. Integers can be cast between types, but we must be careful in doing so. For example, a long int can be cast to an int as

long int x = 83748572718389549;   /*initialize x to some large value*/;
int y = (int)x;  /* cast long int x to an int y */

Depending on the implementation (int, long and short can be the same size), this may truncate or extend the data from one type to another. So for example, we can do the following.

int x = 256;
char ch = (char)x;

In this case, assuming sizeof(char) is 1, then ch will contain only the lowest order byte of x. You can also cast signed data types to unsigned types and vice versa.

We can convert between signed and unsigned integers. For example,

int x = -10;
unsigned int y = (unsigned int)x;

In this case, -10 will be converted to an unsigned integer by adding UINT_MAX + 1 where UINT_MAX is defined in the header file <limits.h>. It is the maximal value that can be stored in an unsigned integer.

We can also mix expressions that contains signed and unsigned ints. For example,

unsigned int x = 45;
int y = -56;
int z = x + y;
unsigned int uz = x + y;

In the case of expression x + y, the signed integer y is converted to an unsigned integer before adding x and y in the unsigned mode. The result is then converted back to signed or unsigned type based on expression on the left hand side (int z or unsigned int uz).

Booleans

In C0, we used the type bool. But there is no data type bool in C. Instead we can use the library <stdbool.h> to have access to a defined type bool and constants true and false. Internally, these are just integers, where true is 1 and false is 0. Boolean operations interpret all non-zero values as true and zero as false. For example even if x and y are integers, we can still do the following.

x&y    <---  bitwise operator & is applied to x and y and returns 0 (false) or some non-zero value
x&&y   <---  logical operator && is applied to x and y and returns 0 or 1. 

Therefore using constructs such as

while (x&y) {...}    <--- this loop can run for ever unless x&y=0 
while (x&&y) {...}   <--- this loop can run for ever unless x=0 or y=0
while (1) {...}      <--- this is an infinite loop unless you break
while (0) {...}      <--- this loop never runs

are all legal, but may not be what you want.

Pointers

In C0 and C, pointers are defined the same way. A pointer is an address in memory. A pointer to a data type t is defined as t*. For example, in C and Co we can define a pointer to an integer as

int* ptr;   <--- define pointer in C and C0

One difference you find in C is that, you can use the unary address-of operator (&) to find the address of any data location. For example, consider the following C code.

int x = 10;
int* ptr = &x;
printf("The value of x is %d and the address of x is %p\n", x, ptr);

The output shows the value of x and the address of x. The unary & operator when applied to the operand x (&x) gives the address of where the operand x is stored in memory. Even pointer variables have addresses. Consider the following code where we print the address of the pointer variable ptr.

int x = 10;
int* ptr = &x;
printf("The address of the ptr is &ptr = %x\n", &ptr);

Validating pointers
pointers must be validated by testing for NULL pointer. When calling calloc or malloc (from stdlib.h) we always need to perform NULL checks as follows.

int* ptr = calloc(10,sizeof(int));
if (ptr != NULL) {...}

But you can avoid NULL checks by using the CMU-based (not standard C) library xalloc.h that contains the xcalloc and xmalloc functions.

A pointer can be dereferenced only if it is not null. Even then, you may dereference an illegal block of memory and can cause the program to display undefined behavior. For example, consider this code.

char ch = 'a';
int* ptr = (int*)&ch;  
*ptr = 1000;

What is the problem here? The char ch is allocated only the sizeof(char) bytes. But as we dereference the ptr, we dereference sizeof(int) bytes, an illegal access of memory that we may not get a warning about.

structs

A struct is just an aggregate type, consisting of several data elements stored together of potentially different types. Compare this to arrays, which is an aggregate of elements of the same type. Structs must be explicitly declared. We use very similar syntax in C and C0 to declare structs. For example, a struct representing a point in two-dimensional space (such as the location of a pixel in an image), could be declared in C and C0 as

struct point {
 int x;
 int y;
};

The expression sizeof(struct point) returns the number of bytes necessary to hold a struct point data type. As we work with pointers and structs in C, we must be very careful. For example, sizeof(struct point) returns the total number of bytes necessary to hold the x and y fields in the struct. But sizeof(struct point*) only returns the bytes necessary to hold an address. It is also important to note that

sizeof(t*)

is expected to return the same value regardless of type t. That is, sizeof(int*) and sizeof(char*) and sizeof(bool*) must be the same. Although this is the standard practice in many machines, it is still possible that there is a machine out there such that sizeof(int*) and sizeof(char*) may not be the same.

Strings

In C and C0, string is a sequence of characters. In C0, strings are immutable, which means we cannot modify a string once it has been constructed. However in C, strings are mutable. Also C strings are NUL ('\0') terminated. Here is an example of C0 and C strings.

string s ="me";                    <--- C0 code

char* s = malloc(strlen("me")+1);  <--- C code
strcpy(s, "me");

In C0, memory was implicitly allocated to hold the sequence of characters "me" (plus any overhead) and the string s was initialized to "me". In C, we had to explicitly allocate the string of length "me" (equals to 2) plus 1 extra character for the NUL char. The behavior of the program is undefined if less memory is allocated or NUL character missing. We also note that the strcpy function included in the <string.h> library will add the NUL character to end of the string "me". You can read more about C strings later in this tutorial.

C1 constructs

We now go over C1-specific constructions, namely function pointers and generic data types, and how they work in C.

Function pointers

Function pointers in C work very much the same way as in C1, with some added flexibility.

As in C1, we can define pointers to functions, function pointers. The function pointers can be stored in arrays, passed to other functions and can be returned from other functions. The notation to declare the type of a function pointer is like in C1. For example, a declaration such as

typedef int binop_fn(int x, int y);

declare the type binop_fn of functions that take to int inputs and return an int in output.

Getting function pointers

We don't allocate space for function pointers like we do for pointers to integers or structs; instead, we use the address-of operation to get the address of an existing function. Consider a simple sum function as

int sum(int x, int y) {return x+y;}

We declare the variable f of type binop_fn and and assign sum to it as

binop_fn *f = &sum;   

Since fn is a function pointer, we dereference the pointer as (*fn).

Using function pointers

Having access to function pointers in C makes it possible to do elegant things such as passing a function as an argument to another function. A good example of passing a function pointer to another function is the qsort function. This function takes a pointer to a function that allow it to compare the elements to be sorted. The type of a generic comparison functions is

typedef int compare_fn(const void *a, const void *b);

The qsort function prototype defined as

void qsort(void *base, size_t nmemb, size_t size, compare_fn *compar); 

takes a pointer to a compare function as its fourth argument. Here is an example of a compare function that can be used with qsort above.

int intcompare(const void* a, const void* b){
    int x = *((int*)a);   <--- cast void* to int* before dereference
    int y = *((int*)b);   <--- cast void* to int* before dereference
    return (x-y);
}

The intcompare function returns positive, zero or negative value based on content stored at a and b. The qsort function can then be used as follows.

int A[3] = {4,1,3};   <---- define a stack-allocated array of size 3 and initializes elements
qsort(A, 3, sizeof(int), &intcompare);  <--- calls the qsort function on array A

More details about how to use function pointer are given in the section on Function Pointers.

Generic data types in C

A useful feature of C is the ability to define a pointer to an arbitrary type. Like in C1, we call this void*. Defining a pointer to an arbitrary type allows us to develop generic C libraries without having to bind data to any particular type. The type void* can also be used for multiple purposes. Here we define a stack-allocated array of pointers and initialize each element to point to an array of ints.

#define n 3
int A[n]={3,1,2};
void* ptr[n];
for (int i=0; i<n; i++) {ptr[i]=&A[i];}

Now if we need to print the content of array A using the indirect references in ptr array, we must do the following.

for (int i=0; i<n; i++) {
  printf ("%d ", *((int*)ptr[i]));
}

Since we can never dereference a void*, we cast the pointer to an int* before dereferencing as an int. A longer discussion on Generic Data Types in C is given later.

Not in C0, but in C

There are many things you will learn about C programming in the future. For the purpose of this tutorial we will discuss few things such as C macros, header files, C pre_processor, standard IO, detecting memory leaks, and separate compilation that were not covered in C0 or C1.

C macros

A macro allows us to define expressions that is replaced by a text during pre-compilation process. A generic form of a macro substitution is

#define name replacement_text

Here is an example.

#define size 10

allows us to use size throughout our program. But size is actually replaced by 10 after the pre-processing stage. This accomplishes two things. One, it makes the code more readable. The second is that, because it is a just a text replacement, there is no overhead associated with macro substitutions (unlike defining a constant or variable). We can do other interesting things with macro substitutions. For example we can define a C macro for C0 alloc function as follow,

#define alloc(t) (xcalloc(1,sizeof(t)))

This allows us to use C0 construct alloc in our C code without having to convert all alloc statements to calloc since compiler will take care of that during pre-preprocessing stage. Similarly, we can use a macro substitution for C0 function alloc_array as

#define alloc_array(t,n) (xcalloc(n,sizeof(t)))

In both cases, after the pre-processing stage, any code with alloc(t) will be replaced by the corresponding macro definition. For example, here is how alloc looks in your C0 code and after conversion to xalloc in C code.

int* ptr = alloc(int);                <--- C0 code

int* ptr = (xcalloc(1,sizeof(int)));  <--- C code after macro substitution

C macros are a great way to improve the readability and manageability of your code.

header files

C has a rich collection of header files that can be included with your code. Here are some of the header files that may be of use.

#include <assert.h>     <--- useful for debugging programs
#include <stdio.h>      <--- include functions such as printf and scanf for IO
#include <stdlib.h>     <--- includes malloc, calloc, rand etc
#include <math.h>       <--- common math functions such as abs, sqrt
#include <string.h>     <--- string processing functions strlen, strcpy, strcmp
#include <stdbool.h>    <--- bool type that represents TRUE(1) and FALSE(0)
#include <limits.h>     <--- limits of data types INT_MAX, INT_MIN, CHAR_MAX etc
#include <time.h>       <--- time functions such as clock, difftime, time

You can also develop your own header files such as stack.h that can be included in your code as follows

#include "stack.h"

C pre_processing

A pre processor performs some important functions. Among them are macro substitutions, conditional compilations and inclusion of header files. The pre_processor directives begin with # and syntax of these lines are independent of the rest of the code. They can appear anywhere and have an effect that last until the end of the compilation unit. Here are some examples of pre_processor directives.

#include <stdlib.h>    <--- includes the standard library functions
#include "mylib.h"     <--- a header file of a library you have developed

#define max(x,y) ((x)>(y)?(x):(y))   <-- defines a function macro

#ifndef STACK_H       <--- avoids multiple inclusion of code
#define STACK_H
  ....
#endif

header files and separate compilation

Unlike in C0, your C code can be separated into two units. A C header file (.h) may contain variables, function prototypes and other identifiers. For example, stack.h file may define the interface to our stack library. Our stack.h header file may look like,

#ifndef STACK_H
#define STACK_H
stack stack_new(int n);
void stack_push(stack s, elem x);
elem stack_pop(stack s);
bool stack_empty(stack s);
bool stack_full(stack s);
#endif

We only define function prototypes and leave the implementation details to the file stack.c. One advantage of having a header file is that, you can provide a client the header file so code can be compiled without having the actual library code in stack.c available. For example, a function can include

#include "stack.h"
void foo(){
   //uses functions in stack.h
}

So foo can be separately compiled without having access to stack.c or main.c as follows.

gcc -Wall -Wextra -std=c99 -pedantic -Werror -c foo.c 

This generates the file foo.o and when foo is all debugged, we can combine the stack library (compiled into its object code file stack.o) with the main program to create the executable a.out as follows.

gcc -Wall -Wextra -std=c99 -pedantic -Werror -c stack.o foo.o main.c 

Standard IO

In C0, the conio library contained functions for performing basic console input and output. First we will show how to do the same in C. lets consider all the print functions we used in C0. Here are the C0 functions used to write to standard output (stdout)

void print(string s); /* print s to standard output */  <--- in C0
void println(string s); /* print s with trailing newline */
void printint(int i); /* print i to standard output */
void printbool(bool b); /* print b to standard output */
void printchar(char c); /* print c to standard output */

All of the above functions are now replaced by printf function given by the following prototype.

int printf (const char* format, ... );

Where the format is a string(char*) and three dots(...) indicates variable number of arguments. For example, a signed integer x can be printed in C0 and C as follows.

printint(x);     <--- in C0

printf("%d",x);  <--- in C

In the latter, x is printed as a signed decimal integer format %d. There are many other ways printf can be used as shown below.

printf("%d\n",x);  <--- print x and then a newline
printf("%s", s);   <--- print s as a string
printf("%c", 65);  <--- print ASCII 65 as character A
printf("%u", x);   <--- print x as an unsigned integer
printf("%x", 65);  <--- print 65 in hexadecimal
printf("%s + %d", "me", 65);  <--- prints the string "me + 65"

Reading from standard input (stdin) C provides the function scanf for reading from standard input. The scanf function prototype is given as

int scanf (const char* format, ... );

Here is an example of reading a signed integer in C

int x;
scanf("%d",&x);

Note that the second argument in scanf is the address of x. Since static memory for x is already assigned by the previous statement this works. **Caution:**Forgetting to write & in scanf will cause to program to display undefined behavior.

Here is another example of reading into a string.

string s = readline(); <--- reads a string in C0

scanf("%s",s);         <--- reads into C string s (char*)

Caution: Unlike in C0, we need to have memory pre-allocated for reading into s. If no memory or insufficient memory is allocated for s, the program can have undefined behavior. These statements in C also can cause "buffer overflow" a serious vulnerability exploited by hackers. We provide more about input/output in the section on File I/O.

Detailed Section

This section reiterates and expands some of the ideas that were discussed in the quick start guide. One can just focus on this section or use both quick section and the detailed section as needed. Some of the C topics we discuss here may have been addressed briefly before. We will start this section with an example.

An Example

Lets start with a simple C0 array example. Here is a C0 function that we wrote to compute the first n Fibonacci numbers.

int[] fib(int n) 
//@requires 2 <= n;
//@ensures length(\result) == n;
{
 int[] F = alloc_array(int, n);
 F[0] = 0;
 F[1] = 1;
 for (int i = 0; i < n-2; i++)
   F[i+2] = F[i+1] + F[i];
 return F;
}

To compile this code as a C program we use gnu C compiler or simply known as gcc. We recommend that you always use the following sequence of commands to compile C programs. The recommended command line sequence for compiling C code is as follows.

gcc -Wall -Wextra -std=c99 -pedantic -Werror <files...> 

if you are planning to use debug mode and use gdb debugger, then we recommend that you use extra parameters in your command line as follows

gcc -g -DDEBUG -ggdb -Wall -Wextra -std=c99 -pedantic -Werror <files...> 

There are few things that we need to understand about the above command line. The -Wall and -Wextra enable all warnings and when we add -Werror, those warning turns into errors. We also pedantically follow the c99 standard. Also you will need to add -c for separate compilation of code.

We name the file that contains fib function fib.c and compile with gcc (with -c option) as follows.

gcc -Wall -Wextra -std=c99 -pedantic -Werror -c fib.c

This creates the object file fib.o but not a.out. We note the following error.

fib.c:3: error: expected identifier or ( before a token

It tells us that there is an error in line 1. So lets look at line 1 of the program.

int[] fib(int n) {

The fib function is expected return an array. Unlike in C0, int[] is not a type in C and arrays and pointers in C are indistinguishable. An array in C can be regarded as a pointer to some data type. So in this case. we change the line 1 of the program as follows.

int* fib(int n) {

Lets compile again to see what happens. You will note the following errors as a result.

fib.c: In function fib:
fib.c:7: error: expected identifier or ( before a token
fib.c:8: error: F undeclared (first use in this function)
fib.c:8: error: (Each undeclared identifier is reported only once
fib.c:8: error: for each function it appears in.)

Unfortunately, the list of errors given above do not tell us much. So lets try to convert more of our C0 code to C. The first thing we note is how arrays are declared in C0. We use alloc_array(int,n) to define an array that can hold n integers. In C0, it keeps track of array type and its length(for contract checking).

int[] F = alloc_array(int, n);

How do we declare an equivalent arrays in C? There are many ways to declare and initialize C arrays. In particular there are heap-allocated arrays and stack-allocated arrays in C (all C0 arrays are heap-allocated). To define a stack-allocated array of size n, we write the following C code. You should read more about how to declare and initialize C arrays to learn more about this topic.

int F[n*sizeof(int)];  

The sizeof is an operator and sizeof(int) returns the number of bytes required to hold an integer an implementation dependent quantity. We use sizeof operator instead of assuming our implementation uses 32-bit integers. In this case our code should work fine even if our implementation represents ints as 32-bit or 64-bit (or even 128-bits) integers.

Now lets take a look at our program again after making the changes.

int* fib(int n) {
 int F[n*sizeof(int)];
 F[0] = 0;
 F[1] = 1;
 for (int i = 0; i < n-2; i++)
   F[i+2] = F[i+1] + F[i];
 return F;
}

Lets compile the code to see what happens now.

cc1: warnings being treated as errors
fib.c: In function fib:
fib.c:12: error: function returns address of local variable

the above output indicates that warnings are treated as errors. This underscores the importance of compiling your code with "safe" options. If we had compiled the code without the -Werror option as follows,

gcc -Wall -Wextra -std=c99 -pedantic -c fib.c

Then we would have seen the following.

fib.c: In function fib:
fib.c:10: warning: function returns address of local variable

Now it is just a warning. But it must be taken pretty seriously. We note that F is declared locally as a stack-allocated array. As a result, when you leave the scope of the function fib, this variable is no longer available to any calling program. Returning a reference to a local variable that no longer exists is a serious error. Unfortunate thing is that if you decide to ignore this warning, you will not notice the impact of this error until you start accessing this returned address in your calling program. So lets write a main program to use our fib function. We have annotated the C code to show you how C code here maps to equivalent C0 code.

#include <stdio.h>                 //equivalent to #use <conio>
int* fib(int);                     //a prototype declaration of a function that is referenced in main. 
int main(int argc, char* argv[]){  //argc and argv[] are command line arguments. 
  int* F = fib(10);                //fib returns a pointer to an int array. Assigned to F
  for (int i=0; i<10; i++)
    printf("%d ", F[i]);           //printing the values of the array 
  printf("\n");
  return 0;
}

Now lets compile fib.c and fibmain.c together to create the executable file a.out. Note again, we do not have the -Werror option (you should always have it. This is just an illustration of what happens if you are not careful)

gcc -Wall -Wextra -std=c99 -pedantic fib.c fibmain.c
cc1: warnings being treated as errors
fib.c: In function fib:
fib.c:10: warning: function returns address of local variable
fibmain.c: In function main:
fibmain.c:3: warning: unused parameter argc
fibmain.c:3: warning: unused parameter argv

Except for the warnings, everything looks fine to us. Now the next step is to run the program. We get the following output (not guaranteed that you will get the same output if you run this code)

[guna@unix2]$ ./a.out
0 0 0 0 -1188248704 32767 965242938 32752 16 48
[guna@unix2]$

Clearly, there is something seriously wrong with our code. No one wants to believe that we got the first 10 Fibonacci numbers. So where did we go wrong? This is the price we paid for ignoring the "warning: function returns address of local variable" and not compiling with the option -Werror. What our main program prints is some content from a location that may not be a valid location. So how do we fix the error?
We need to rewrite the fib function to return an address of a valid location. That is, instead of declaring a stack-allocated array F that is deallocated when the program leaves the function scope of fib, we declare a heap-allocated array and return the address to that location.

#include <stdlib.h>                  //need this library to use calloc
#include <assert.h>

int* fib(int n) {
 int* F = calloc(n,sizeof(int));     // replaces: int F[sizeof(int)*n];
 assert(F!=NULL);
 F[0] = 0;
 F[1] = 1;
 for (int i = 0; i < n-2; i++)
   F[i+2] = F[i+1] + F[i];
 return F;
}

Now let us compile all modules with the -Werror option as follows.

cc1: warnings being treated as errors
fibmain.c: In function main:
fibmain.c:3: error: unused parameter argc
fibmain.c:3: error: unused parameter argv

Only thing we get now is the error as a result of having unused command line arguments argv and argc. We can remove argc and argv from the main, since we did not use them this time.

instead of

int main(int argc, char* argv[])

you can use

int main()

compile and run the program using a.out to get the following output.

0 1 1 2 3 5 8 13 21 34

Now that seems like a nice output and we seem to have correctly produced the first 10 Fibonacci numbers. There are some few important concepts to note here though.
--- When we wrote

int F[sizeof(int)*n];

we allocated memory from the stack. This memory is no longer valid when we leave the scope of the variable.
--- When we wrote

int* F = calloc(n, sizeof(int));

we allocated memory from the heap. Since heap memory is not deallocated when you leave a function scope, we still have access to where the fib array is stored.

--- we note that

calloc(n,sizeof(int));  <-- c ways of allocating an array
alloc_array(int, n);    <-- c0 way of allocating the array

are equivalent. The first argument to calloc is the size of the array and the first argument to alloc_array requires type of an element
. We already note that C does not know the type of the elements you are planning to store in the array, but it only needs to know how much memory to allocate. It is possible to use the CMU developed library xalloc.h that contains the function xcalloc that not only returns a heap-allocated array, but also validates (check for NULL) the pointer. If you decide to use the standard function calloc then you must validate the pointer before dereferencing it.

So far you may have seen brief discussions on several topics. Now we plan to expand some of the topics with more details and examples. The following tutorial will cover

  1. C pointers
  2. Declaring, Initializing and Accessing C arrays
  3. Memory Allocation and Deallocation
  4. Using valgrind
  5. "contracts" in C
  6. C strings
  7. Generic Data Types in C
  8. Function Pointers
  9. File I/O
  10. Converting C0 code to C
  11. Common Errors
  12. C Idioms

    in some detail here. A stated this tutorial is only intended for those who are familiar with C0 language, a safe subset of C. Throughout this tutorial we refer to C0 constructs and how they look in C. We start with pointers.

C pointers

^ to index

Pointer is an address in the memory. We can declare a pointer as

int *ptr;  /* int* ptr , int * ptr , int *ptr  are all equivalent */

and initialize a pointer by allocating memory as follows. Allocate memory to hold a single integer.

ptr = (int*)calloc(1,sizeof(int))

Altogether, we can write the following to declare, allocate and initialize the ptr.

int *ptr = (int*)calloc(1,sizeof(int));

Now ptr points to a block of memory that is large enough to hold one integer. We hope by now, you understand the difference between pointer and "pointee". A pointer is an address of a memory location. "Pointee" is the content stored at that address. Meaning, if ptr is the pointer, then *ptr is the "pointee".

Lets say we need to initialize the "pointee". Here is how we do this.

*ptr = 10;

Passing a pointer to a function

Passing a pointer to a function as an argument is a great way to exploit the efficiency of passing addresses of variables or large structs into functions. Arguments to functions are always passed by value. That is, you provide a copy of the value of the variable to the function. Now that value can be a numerical value like 10 or an address like 0x00FFA100. In the latter case, we may provide direct access to the variable in the calling function through its address. Let us illustrate this concept further. We define two functions below. foo1 just takes a value of a variable and foo3 takes a value of a pointer variable.

void foo1(int x);  <--- prototype of a function
void foo2(int* ptr);  <---- prototype of a function

int x=10;
foo1(x);    <--- passes a copy of 10 into function foo1
foo2(&x);   <--- passes a copy of the address of x into foo2. 

in the first case, foo1 only gets a copy of x. It cannot change the value of x. In the second case, foo2 gets a copy of the address of x, so it can dereference and change the content of x.

Passing the address of a variable is a great way to improve the efficiency of your program. For example, if the "size" of the variable is "large", and if you pass a copy of the variable, this can lead to unnecessarily copying data during function execution. We can reduce this overhead by passing a copy of an address (or a pointer) instead of the copy of the variable. Here is an example. Assume that we have declared the following struct in our code.

struct large_recd {
   int buf_size;
   int buffer[buf_size];
   int capacity;   
} 

The point here is that it is better to pass a pointer to this struct, as struct large_recd* instead of a copy of struct large_recd itself. Passing a pointer to a function is efficient, but it can also be dangerous as you may allow function to directly access content of a calling program variable. This can lead to unintended consequences if you are not careful.

We illustrate few ways of passing variables into functions using following examples. We will define 4 versions of a function as follows. Each function is supposed to take a pointer and change the content of the "pointee".

void foo1(int* ptr) {*ptr = 10;}
void foo2(const int* ptr) {*ptr=10;}
void foo3(int* const ptr) {*ptr=10;}
void foo4(const int* const ptr) {*ptr=10;}

All 4 functions seem to do the same thing, except that we pass the argument ptr in 4 different ways.
In foo1, argument ptr is passed and any changes to *ptr within the loop body affects the value of the *ptr in the calling function.

In foo2, argument ptr is passed, but as a const int*. This means that *ptr cannot be changed inside the function body. Hence

*ptr=10;        <-- is illegal /*will be flagged by the compiler*/
x = *ptr + 1;   <-- is legal

This is an important thing to remember. That is, you can pass a pointer ptr to a function without having to worry about *ptr being changed by the function. This allows us to efficiently pass an address of a variable (or large struct) to a function while keeping the original content secure.

In foo3, argument ptr is passed as a int* const. This means that ptr cannot be changed, but *ptr can be changed. So in this case,

int x=10; ptr = &x;  <-- is illegal; 
*ptr=x;              <-- is legal. 

A good example of a pointer that cannot be changed is an array in C. When we declare arrays in C

int A[n];

we can think of A as a pointer to the beginning of the memory where array is stored. So we can access and change A[i] which is equivalent to *(A+i). However, we cannot change A for example by assigning

int x=10;
int A[n];
A = &x;

foo4 is a combination of foo2 and foo3 actions. That is both ptr and *ptr are constants and cannot be changed by the function meaning both ptr = &x; *ptr=x; are illegal.

Declaring, Initializing and Accessing C Arrays

^ to index

There are many ways to declare and initialize arrays in C. First we start with the simple array declaration and initializing statement in C0. This line means allocating an array of n ints called F. C0 always allocate arrays from the heap.

int[] F = alloc_array(int, n);

However C arrays can be allocated from heap or from stack. We show below, two standard ways of allocating an array of capacity = sizeof(int)*n in C

int F[sizeof(int)*n];              //a stack-allocated array of capacity = sizeof(int)*n
int* F = calloc(n, sizeof(int));   //a heap-allocated array of capacity = sizeof(int)*n
int* F = xcalloc(n, sizeof(int));  //a heap-allocated "validated" array of capacity = sizeof(int)*n

Although they do the same thing (that is, allocate an array), there are some subtle differences in ways the memory is allocated. In the first case, we call it a "stack-allocated array" because the memory is allocated from the stack. The key here is that when F leaves its program scope, the memory is no longer available to the program.

{
 int F[sizeof(int)*n];             //a stack-allocated array of capacity = sizeof(int)*n
 ......
}
 -- memory allocated for F is no longer available here outside of { ... }

In the second case, we call it a "heap-allocated array" because the memory is allocated from the heap. When you no longer need this memory, you must call free to deallocate this memory manually. You can read more about allocation and deallocation of memory below.

{
 int* F = calloc(n,sizeof(int));   //a heap-allocated array of capacity = sizeof(int)*n
 ......
}
 -- memory allocated for F is still available here outside of { ... }. But memory must be accessed through 
    a valid variable that contains a reference to F

In general here are number of ways to declare and initialize arrays in C.

 int F[n];                         //a stack-allocated array of size n
 int F[] ={1,2,3,4,5};             //a stack-allocated array initialized to list {1,2,3,4,5}
 int* F = calloc(n,sizeof(int));   //a heap-allocated array of capacity = sizeof(int)*n
 int* F = malloc(n*sizeof(int));   //a heap-allocated array using malloc, but unlike calloc, 
                                     array is NOT initialized to 0
 int* F = xcalloc(n,sizeof(int));  //a heap-allocated array with a validated pointer (same as calloc above)
 int* F = xmalloc(n*sizeof(int));  //a heap-allocated array with a validated pointer (same as malloc above)

Arrays and Pointers

Are arrays and pointers are the same in C? So lets take a look at few examples.

int *x;
int y[n];

The first declares x as a pointer to an integer and second statement declares y as an array of integers of size n. Are x and y equivalent? Yes and no.

For example it is legal to say

x = y;   

but it is not legal to say

y = x;  

Since the array y stores a constant pointer, it is illegal to change y as illustrated in foo3 function above.

In general, when we make an assignment such as

a = b;

the symbol a in the context means the address of the location a represent (we call this the l-value). The symbol b means, actual content of the address that b represents (we call this the r-value). An l-value is known at the compile time, but r-value may not be known until the run time.

in general, there are few similarities and differences between arrays and pointers.

  1. A pointer holds the address of data AND an array holds the data itself
  2. A pointer can be allocated a dynamic block of memory at run time AND array sizes are fixed at compile time.
  3. A pointer is commonly used to represent a dynamic data structure (eg: dynamic array) AND array data structure holds fixed number of elements.
  4. A pointer points to anonymous data AND array points to known data (eg: array of ints)
  5. If A is an array and ptr is a pointer, then sizeof(ptr) returns the amount of memory needed to hold an address while sizeof(A) returns the total bytes allocated for the array A
  6. When passing an array A to a function, we are only passing a pointer to the array. The function is not aware of the sizeof(A) and hence one must always pass the length of A as a function argument.

In general arrays and pointers can be used interchangeably, but you need to be aware of what is allowed and what is not.

Dynamically allocating and Deallocating Memory in C

^ to index

In C0, you did not worry about the garbage collection. When heap memory was allocated using alloc or alloc_array, we could count on C0 compiler to clean up the mess when we are done with the heap memory. C0 returns a block of heap memory when you call alloc or alloc_array and it can keep track of your memory allocations and clean them up as needed. But C is quite different. C expect you to clean up your mess. The golden rule in C is "If you allocate it, you must deallocate it". Lets take a look at some of the functions that can be helpful in allocating and deallocating memory in C.
If you type the command "man malloc" in a unix shell, you are likely to see an output similar to the following.

MALLOC(3)                  Linux Programmerâs Manual                 MALLOC(3)

NAME
       calloc,  malloc,  free,  realloc  - Allocate and free
       dynamic memory

SYNOPSIS
       #include <stdlib.h>

       void *calloc(size_t nmemb, size_t size);
       void *malloc(size_t size);
       void free(void *ptr);
       void *realloc(void *ptr, size_t size);

<stdlib.h> gives you four functions, calloc, malloc, free and realloc that allows you to manage heap memory.
Lets take a look at how each function works.

void* ptr = calloc(n, sizeof(int));

allocates a contiguous block of memory (or an array) that is enough to hold n ints. By default, all locations in the array are initialized on 0. We also note that the function returns a void*, that is, a pointer to nothing. In order to use this pointer, you must cast this pointer to whatever type you want. Then only you can start to write something into this array. For example, we can cast the block as an array of ints and then allocate new ints into this array.

int* ptr = (int*)calloc(n, sizeof(int));
for (int i=0; i<n; i++)
   ptr[i] = i;           

We note that if ptr is a type void*, then it is illegal to write ptr[i]=i since ptr[i] would mean that we have to dereference a void*. If you are interested in knowing why, please refer to the section on generic pointers.

We also note that a C array is a contiguous block of bytes. Unlike C0 arrays, C arrays have no idea, how long they are. So it is programmers responsibility to keep track of the length and pass them to functions as needed.

The function malloc is equivalent to calloc, except that malloc does not initialize memory block to 0. If you are really curious, the glibc malloc uses it own data structure to keep track of allocated memory. When it runs out it asks the kernel for more. In some machines malloc asks for 1MB at a time. Here is an example of malloc use. In fact, both calloc and realloc are specialized versions of malloc and free.

int* ptr = (int*)calloc(n, sizeof(int));
and
int* ptr = (int*)malloc(n*sizeof(int));      

are equivalent except for the fact that calloc does initialize locations to zero.

The function free is a very important function. Remember the golden rule? "if you allocate, then you must deallocate". So we do so by using the function free. In the example below, we allocate a block of memory for the pointer ptr and then later in the program, call free on the ptr to free the block of memory associated with it.

void* ptr = calloc(n, sizeof(int));
.....
free(ptr);               

Although this seems fine, there are many things that can go wrong in the above code. You can free only the block allocated with calloc (or malloc). However, in order to free the memory you must retain a pointer to that location. That is, if the pointer changed after allocating memory and before calling free, you might be in for a big surprise. Lets look at the following example.

int* ptr = (int*)calloc(n, sizeof(int));
int x = 10;
ptr = &x;
free(ptr);               

What is wrong with above code? Everything seems legal. We allocate a block of memory, we declare a variable, and make ptr points to the address of the variable x, then we try to free it. There are many things wrong with this code.

  1. free(ptr) is trying to deallocate stack memory. This is not your job. It is compilers
  2. since you make the ptr points to the address of x, you lost a reference to the block you allocated. We call this a "memory leak". Because if there is no way of getting there, then there is no way of freeing it. So we leave garbage that we don't have a way to clean. Leaving too much garbage can have a serious affect on your program performance

So there are few important things that you need to know about free.

  1. You can only free memory allocated with a call to malloc, calloc or realloc.
  2. You need to know a pointer to the "beginning" of the memory block to free it. That is, you cannot free part of the memory that was allocated. You must free it all.

The only function left to discuss is the realloc. Realloc also allocates memory as calloc and malloc. But realloc can resize a block of memory that was allocated before. For example, the following code shows that a block of n ints can be doubled using realloc.

void* ptr = calloc(n, sizeof(int));
ptr = realloc(ptr, sizeof(int)*n*2);            

The above code can resize the block, expand or shrink, as needed. There is no guarantee that it will retain the original address, but it does return a pointer to the new block.

Using valgrind to detect memory leaks

^ to index

There are tools that detects and reports memory leaks. The most widely used tool is called “valgrind”. The Valgrind home page is at http://www.valgrind.org and you can find many resources on Valgrind there. To learn more about valgrind on unix, type

% man valgrind 

and you will see something as follows

VALGRIND(1)                      Release 3.6.0                     VALGRIND(1)

NAME
       valgrind - a suite of tools for debugging and profiling
       programs

SYNOPSIS
       valgrind [valgrind-options] [your-program]
                [your-program-options]

DESCRIPTION
       Valgrind is a flexible program for debugging and profiling
       Linux executables. It consists of a core, which provides a
       synthetic CPU in software, and a series of debugging and
       profiling tools. The architecture is modular, so that new
       tools can be created easily and without disturbing the
       existing structure.

       Some of the options described below work with all Valgrind
       tools, and some only work with a few or one. The section
       MEMCHECK OPTIONS and those below it describe tool-specific
       options.
       ...ctd

valgrind is a set of tools for debugging and profiling programs.

INVOCATION

valgrind is typically invoked as follows:

valgrind program args

This runs program (with arguments args) under valgrind using the memcheck tool. memcheck performs a range of memory checking functions, including detecting accesses to uninitialized memory, misuse of allocated memory (double frees, access after free, etc.) and detecting memory leaks.

To use a different tool, use the --tool option:

valgrind --tool=toolname program args

To use valgrind on linux, first compile your code as

% gcc -Wall -Wextra -std=c99 -pedantic -Werror <files...> 

Then run the code with valgrind enabled as follows

% valgrind --tool=memcheck --leak-check=full ./a.out 

In addition to memcheck, valgrind has many other tools to check the use of functions, cache events etc. For now, we are only interested in making sure our programs don’t leak memory. Lets consider an example where program memory leaks occur. We detect the leaks using valgrind. Lets look at this code.

#include <stdlib.h>
#include <assert.h>
int main(){
  int* ptr = malloc(10);
  assert(ptr!=NULL);
  /* some code here */
  ptr = NULL;
  return 0;
}

Close inspection of the above code will inform you why this code leaks memory. But let us compile and run with valgrind to detect any memory leaks.

gcc -Wall -Wextra -std=c99 -pedantic -Werror filename.c
valgrind --tool=memcheck --leak-check=full ./a.out

here is the valgrind output we get. It indicates that 10 bytes of memory is definitely lost.

[guna@unix1 ~]$ valgrind --tool=memcheck --leak-check=full ./a.out
==7826== Memcheck, a memory error detector
==7826== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==7826== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
==7826== Command: ./a.out
==7826== 
==7826== 
==7826== HEAP SUMMARY:
==7826==     in use at exit: 10 bytes in 1 blocks
==7826==   total heap usage: 1 allocs, 0 frees, 10 bytes allocated
==7826== 
==7826== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1
==7826==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
==7826==    by 0x4004D5: main (in /afs/andrew.cmu.edu/usr20/guna/a.out)
==7826== 
==7826== LEAK SUMMARY:
==7826==    definitely lost: 10 bytes in 1 blocks
==7826==    indirectly lost: 0 bytes in 0 blocks
==7826==      possibly lost: 0 bytes in 0 blocks
==7826==    still reachable: 0 bytes in 0 blocks
==7826==         suppressed: 0 bytes in 0 blocks
==7826== 
==7826== For counts of detected and suppressed errors, rerun with: -v
==7826== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)

Now how would you avoid any memory leaks? To avoid this memory leak, we should have properly deallocated the heap memory before exit. For example we should free the ptr memory before exiting as follows.

#include <stdlib.h>
#include <assert.h>
int main(){
  int* ptr = malloc(10);
  assert(ptr!=NULL);
  /* some code here */
  free(ptr);
  ptr = NULL;
  return 0;
}

Note that just making ptr=NULL does not free the memory. It only set the value of the ptr variable to NULL without having any impact on the memory ptr is pointing to.

After adding free(ptr) we compile and run the code with valgrind to get the following valgrind report.

==8268== Memcheck, a memory error detector
==8268== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==8268== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
==8268== Command: ./a.out
==8268== 
==8268== 
==8268== HEAP SUMMARY:
==8268==     in use at exit: 0 bytes in 0 blocks
==8268==   total heap usage: 1 allocs, 1 frees, 10 bytes allocated
==8268== 
==8268== All heap blocks were freed -- no leaks are possible
==8268== 
==8268== For counts of detected and suppressed errors, rerun with: -v
==8268== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)

We have a clean report now. The above report indicates that all heap memory were freed.

Strings in C

^ to index

In C0 and C, Strings are sequences of characters. To mention strings in program they must be enclosed in double quotes, such as: "Hello World!\n". \n is a newline character, written as a so-called escape sequence. There is an empty string, written as "" which consists of 0 characters. C0 also has a string library that can be used to perform many string operations.

In C0, strings are immutable, meaning we cannot modify a string once it has been constructed. However, C strings that are represented as null terminated array of characters are "mutable". That is one can easily replace one character of a string with another. Here is an example of creating a new string, allocating memory and mutating the string.

#include <strings.h>                   //contains strcpy and strlen functions among others
...
char* ptr = (char*)malloc(strlen("guna")+1); 
                                       //allocate memory to hold the string + one extra for null char
strcpy(ptr,"guna");                    //copy the characters of "guna" to newly allocated location
ptr[0]='l';                            //replace 'g' with 'l' and so new string is "luna"

You may have few questions about the code we presented above.

  1. we allocated 1 extra character for the string ending null character, but did not explicitly write that into the string -- Actually we did that when we called strcpy function that automatically places null character at the end.
  2. However, if you did not use the strcpy function and manually copied characters in string, the you must explicitly add the null character as shown below. Failure to do so may result in serious errors.
char* ptr = (char*)malloc(strlen("guna")+1); 
ptr[0]='g';
ptr[1]='u';
ptr[2]='n';
ptr[2]='a';
ptr[4]='\\0';

We suggest that you familiarize yourself with <strings.h> header file that contains many of the string library functions provided to you. Here is a comparison between C0 string functions and how to do the same in C.

int string_length(string s);  <-- in C0 /* number of characters in s */
int strlen(char* s);          <-- in C /* number of characters in s */
string string_join(string a, string b); <-- in C0 /* concatenate a and b */
char* strcat(char* a, char* b);         <-- in C
  -- append b to a. That is,  a <- a+b
  -- important to make sure sufficient memory has been allocated for a to hold both a and b
string string_sub(string a, int start, int end)  /* substring a[start..end) */
  //@requires 0 <= start && start <= end && end <= string_length(a);
  ;
no equivalent function in C
/* lexicographic comparison; less than (-1), equal (0), greater than (1) */
int string_compare(string a, string b)    <-- in C0
  //@ensures -1 <= \result && \result <= 1;
  ;
int strcmp ( const char * a, const char * b ); <-- in C
string string_fromint(int i);
string string_frombool(bool b);
string string_fromchar(char c)
  //@requires c != '\0';
  ;
-- no equivalent functions in C

more details on C strings libraries can be found at http://www.cplusplus.com/reference/clibrary/cstring/

Function Pointers

^ to index

Function Pointers provide an extremely interesting, efficient and elegant programming technique. You can use them to replace switch/if-statements, and to realize late-binding. Late binding refers to deciding the proper function during runtime instead of compile time. Unfortunately function pointers have complicated syntax and therefore are not widely used. If at all, they are addressed quite briefly and superficially in textbooks. They are less error prone than normal pointers because you will never allocate or deallocate memory with them.

What is a Function Pointer?

Function Pointers are pointers, i.e. variables, which point to the address of a function. we can think of this value of a function pointer as the starting address of where the function is stored in working memory. It is only important how the compiler and processor interpret the memory the function pointer points to. A function can take many types of arguments including the address of another function. This can be an extremely elegant way to bind a function to a specific algorithm at run time. For example, let us assume that we plan to use a sorting algorithm in one of our functions. At the compile time, we are not sure which sorting algorithm to use since we may not know the size of the data set n. For example, if n < 100 we may use something like insertion sort, or if n is large, then we may decide to use something like quicksort. C provides an interesting way to achieve this by allowing the programmer to decide the algorithm at run time.

Perhaps the best way to introduce function pointers is to introduce you to unix native function qsort. The qsort function prototype is given by
void qsort(void *base, size_t nmemb, size_t size, int(*compar)(const void *, const void *)); 

We note that the qsort function takes 4 arguments.

  1. void* base ---> base is the starting address of the array you are trying to sort with qsort
  2. size_t nmemb ---> nmemb represents the number of elements in the array (size_t is equivalent to long)
  3. size_t size ---> size refers to the size of a single element in the array (like an int)
  4. int(*compar)(const void *, const void *) ---> is perhaps the trickiest part to understand. That is, the 4th argument to qsort function is a function pointer variable. It is called compar because it is intended to be written by the user to define how to sort the array.

We give below an example of how to use qsort

int compare(const void* x, const void* y){
   int a = *((int*)x);   // return the int value pointed to by x
   int b = *((int*)y);   // return the int value pointed to by y
   return (b-a);         //return > 0 if b>a, 0 if b=a and < 0 is b<a
}

int A[]={5,2,1,3};
qsort(A,4,sizeof(int),compare);  //sorts the array A and return {1,2,3,5}

Defining a Function Pointer

A specific function pointer variable can be defined as follows.

   int  (*fn)(int,int) ;

Here we define a function pointer fn, that can be initialized to any function that takes two integer arguments and return an integer. Here is an example of such a function

    int sum(int x, int y) {  
      return (x+y); 
    }

Now to initialize fn to the address of the sum, we can do the following.

fn = &sum;   /* make fn points to the address of sum */

So we use the sum function in two ways.

int  x = sum(10,12);  /* direct call to the function  */
           or
 int x = (*fn)(12,10);  /* call to the function through a pointer */

An array of function pointers

When implementing VM, our code may have to call a specific function to execute an instruction. For example, you may use call a function native_print whose signature (list of arguments and return type) include an input array of void*'s and returning a void*. A prototype of native_print is as given below.

void *native_print(void **args); 

You may have a number of similar functions that needs to be called at different times. How would we represent and access all these functions conveniently? Since all such functions have similar prototype declarations, we can start by defining a "type" to represent any such function. For example, we define a typedef native_fn as follows.

typedef void *(*native_fn)(void **);

It is somewhat harder to understand above syntax for typedefing a function pointer, but this is the best way to keep all the confusing syntax out of your code. You typedef your function type once and you can use it in multiple context.
Now lets define an array of such function pointers as follows.

native_fn native_function_table[NATIVE_FUNCTION_COUNT]

where native_function_table is an array of native_fn types that happens to be a function pointer to functions such as

void *native_print(void **args); 

So for example, the pointer to the function native_print can be stored at the i-th index of the native_function_table as native_function_table[i]=&native_print
Above example is an illustration of how to use function pointers in an extremely elegant way. From an array of function pointers, when different functions needs to be called(say from an eval function), it makes it easy to just call native_function_table[i] as needed. We hope you will be able to experience functions pointers in the VM assignment and practice some good uses of it.

Generic Data Types

^ to index

C0 does not have a way to represent "typeless" data. However in C, we can work with type void* to represent arbitrary types. This allows us to develop generic libraries and avoid binding data to a specific type until a client defines the data type. A good example is the development of a generic stack data type. In C0, we defined the elem type and allowed the client to define what elem is. However, the problem was that there can only be one elem type in a particular context. So if we need a stack of ints and a stack of queues, there is no easy way to code this in C0. But we can find some relief in C. Lets take a look how we would do this in C. We defined stack interface in class. We convert the C0 interface code to C code as follows.

typedef void* elem;  <-- changed from int/string to void*
// there is no bool type in C, so #include <stdbool.h>
/* Interface to stacks of elems */
typedef struct stack_header* stack;
bool stack_empty(stack S);	/* O(1) */
stack stack_new();		/* O(1) */
void push(stack S, elem e);	/* O(1) */
elem pop(stack S)		/* O(1) */
//@requires !stack_empty(S);
int stack_size(stack S);	/* O(1) */

The elements in this stack are void*'s. The void* can be cast as an int (assuming sizeof(int) = sizeof(void*)) or another pointer. Now lets take a look at how a client may use this stack library to define a stack of strings and a stack of queues.

typedef void* elem;
/* stack of ints */
stack string_stack = stack_new();
push(int_stack, (elem)s1);   <-- assume that s1 is a properly allocated C string
push(int_stack, (elem)s2);   <-- assume that s2 is a properly allocated C string
/* stack of queues */
stack queue_stack = stack_new();
queue Q = new_queue();  <-- Q is really a pointer
push(queue_stack, (elem)Q);

Dealing with void*'s

You will have to be very careful when dealing with void*'s. Memory allocation functions such as malloc, calloc, realloc (or in our safe libraries xmalloc and xcalloc) returns an address of an arbitrary type (that is a void*). Although not required, it is a good practice to cast this pointer to the correct type. For example, you can dynamically allocate memory for one integer and cast the void* pointer to an int* as follows.

int* ptr = (int*)malloc(sizeof(int));

One thing you are not allowed to do is to dereference a void*. For example, the following statement would cause the program to raise an exception.

void* ptr = malloc(sizeof(int));
*ptr = 10;

Also be extremely careful when casting integers into pointers (int to void*) as sizes may be different for each type.

File IO in C

^ to index

Input and Output

Input/output(I/O) is fundamental to any programming language. Most programming languages provide libraries that can be used to do I/O operations. In C, I/O operations can be performed by using stdio.h. You are encouraged to type man stdio.h at the Unix prompt to read about the standard buffered input/output routines.

In C, files are treated as streams of data that can come from terminal, hard drive or from an external storage device such as a tape or a disk. Files can be read as ASCII files or can be treated as binary files. An ASCII file is an array of bytes where each byte is represented using a character in the ASCII table. A binary file is a sequence of bytes and some examples of binary files are sound and image files. To process standard input and output you can use the following references.

STDIN – refers to Standard input device (typically a keyboard)
STDOUT – defines standard output device (typically a terminal)
STDERR – defines a device where the errors are recorded.

There are four functions (from stdio.h) that can be used for I/O

 #include <stdio.h>
 int scanf(const char *format, ...);
 int fscanf(FILE *stream, const char *format, ...);

 int printf(const char *format, ...);
 int fprintf(FILE *stream, const char *format, ...);

fprintf is used for formatted output while fscanf is used for formatted input. Both functions take variable number of arguments.
-- FILE* stream: stream is a pointer to a FILE type. File must be opened before calling this or use STDIN or STDOUT
-- char *format : format is a pointer to a character or a C string. It contains the formatted string for input and output.
-- The third argument denoted by ... is a variable length argument in C. This means that any number of variables can be included there
Here are some examples to illustrate the use of IO functions.

int n;
fscanf(STDIN,"%d", &n); // read a value into the variable n, equivalent to  scanf("%d", &n);

In the above example, an integer is read from STDIN and assign to variable n. One important thing to note is that the third argument in the function call is &n, indicating the address of the variable n.

int n=10;
fprintf(STDOUT,"%d", n); // print a value to standard output, equivalent to  printf("%d", n);

unlike in fscanf we do not provide an address to read the data into. In fprintf only the reference to variable is needed to print the values.

opening, processing and closing files

A file is a data stream. The first part of reading from a file is to open a pointer to that data stream. For example;

FILE* fp; // defines a pointer to any file (input/output/text/binary) 

To open a file for reading we simply use the fopen function defined in stdio.

fp = fopen(filename, “r”);

This indicates that the file with the filename(a string) is open for read only. We can combine the two statements for example by writing

FILE* fp = fopen(“data.txt”, “r”);

Available file opening modes are “r” = read, “w” = write and “a” = append. Some systems may require binary files to open with “b” mode. So you may write:

FILE* fp = fopen(“data.bin”, “rb”); 

If the file cannot be open fopen will return NULL. You need to check this before starting to read data from the file.

FILE* fp = fopen(“data.txt”, “r”);
if (fp!=NULL) {...process the file...}

All files must be closed. For example we can close the file associated with fp by calling,

fclose(fp);

Contracts using assert and macros

^ to index

C0 is a safe subset of C that is designed to help develop safe programs with checks and balances. It allows us to use contracts such as //@requires, //@ensures and //@loop_invariant to test our code. However, C does not have these contract features or ways to test them using -d flag. But we'd like to continue the good habit of developing "safe" C programs whenever possible, especially at the debugging stage. When writing C programs we follow a similar style, but using C macros.
A macro is a fragment of a C code that is given a special name. Whenever this special name is used, it replaces the name with the code fragment. Here is an example of a C macro.

#define BUFFER_SIZE 1024
for (int i=0; i<BUFFER_SIZE; i++) {....}

Important thing to note here is that the above macro (we call it an object like macro) starts with a # indicating it should be pre-processed (like #include <stdio.h>). In this case, C pre-processor will replace ALL occurrences of BUFFER_SIZE in the code with the value 1024. This makes the code more "readable" and "manageable". However, it does not carry the overhead of declaring a specific variable and initialization. Now that we have "some" understanding of basic macros, we can define #ASSERT, #REQUIRES and #ENSURES macros as follows.

#ifdef DEBUG
 #define ASSERT(COND) assert(COND)
 #define REQUIRES(COND) assert(COND)
 #define ENSURES(COND) assert(COND)
#else
 #define ASSERT(COND) ((void)0)
 #define REQUIRES(COND) ((void)0)
 #define ENSURES(COND) ((void)0)
#endif

It is a good idea to include this code with all your C programs. Using #ifdef statements, debugging can be enabled during code testing. A good way to include this code with any C project is to create a header file (or .h file) called contracts.h and include the above lines there. The header file contracts.h will be included with your C programming projects in this course.
Lets take a look at how we might use this code in an assignment. Suppose we write a function string_reverse that takes a valid string and returns a string that is reversed its characters. We note that in C a string is an array of characters (ending with null) and it is treated a pointer to a char, char*. Here is the code.

char* string_reverse(char* str, int length) {
  REQUIRES(length>0);
  REQUIRES(str!=NULL);
  REQUIRES(strlen(str)==length);  //strlen is a function in <string.h>
  for (int i=0;i<length;i++){  //swap the array to reverse it
     ASSERT(0<= i && i < length);  // just for fun.
     char first = str[i];          //just to check if swap works        
     char last = str[length-i-1];  //just to check if swap works       

     char ch = str[i];
     str[i] = str[length-i-1];
     str[length-i-1]=ch;
     ASSERT(str[i]==last && str[length-i-1]==first); //checks if properly swapped
  }
  ENSURES(str!=NULL && strlen(str)==length);
  return str;
}

There are few things to note here. The REQUIRES and ENSURES are just macro replacements that are included within the function body {...}. We do not have a way to check loop_invariants in C, but still ASSERTS can be included within the loop to check expected conditions.

converting some C0 code

^ to index

In this section, we will consider converting some code we wrote in C0 into C. In the process we will discuss all the things that are similar, that are different and where you have to be careful. .

Example 1: binary search

Here is the correct binary search function we wrote in class.

int binsearch(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sorted(A,n);
/*@ensures (-1 == \result && !is_in(x, A, n))
|| ((0 <= \result && \result < n) && A[\result] == x);
@*/
{ int lower = 0;
  int upper = n;
  while (lower < upper)
  //@loop_invariant 0 <= lower && lower <= upper && upper <= n;
  //@loop_invariant (lower == 0 || A[lower-1] < x);
  //@loop_invariant (upper == n || A[upper] > x);
   { int mid = lower + (upper-lower)/2;
     //@assert lower <= mid && mid < upper;
     if (A[mid] == x) return mid;
     else if (A[mid] < x) lower = mid+1;
     else /*@assert(A[mid] > x);@*/
     upper = mid;
}
  return -1;
}

Here is the equivalent C version.

int binsearch(int x, int* A, int length, int n)
{
 REQUIRES (0 <= n && n <= length);        <--- changed to use REQUIRES macro
 REQUIRES (is_sorted(A,n));               <--- changed to use REQUIRES macro
/*@ensures (-1 == \result && !is_in(x, A, n))
|| ((0 <= \result && \result < n) && A[\result] == x);  <--- left as a C comment
@*/
{ int lower = 0;
  int upper = n;
  while (lower < upper)
  //@loop_invariant 0 <= lower && lower <= upper && upper <= n;  <-- left as a C comment
  //@loop_invariant (lower == 0 || A[lower-1] < x);              <-- left as a C comment
  //@loop_invariant (upper == n || A[upper] > x);                <-- left as a C comment
   { int mid = lower + (upper-lower)/2;
     ASSERT(lower <= mid && mid < upper);                      <-- changed to use ASSERT macro
     if (A[mid] == x) return mid;
     else if (A[mid] < x) lower = mid+1;
     else /*@assert(A[mid] > x);@*/
     upper = mid;
}
  return -1;
}

There are few changes to note here.

int binsearch(int x, int[] A, int n)   <--- C0 version

int binsearch(int x, int* A, int length, int n)   <--- C version

The difference is that int[] is not a type in C and instead we replace int[] in C0 as int* in C. We also note that once an array is passed to a function as a pointer, it has no knowledge of its length. Hence we need to pass length also as an argument and assure that n is less or equal to length. C does not have contracts. But to continue our good practice of writing contracts using the macros REQUIRES and ASSERTS and in some cases ENSURES. In this code //@requires was replaced by our macro REQUIRES, //@assert was replaced by macro ASSERT and loop invariant and //@ensures statements were left as C comments.

Example 2: partition in quicksort

We discussed the recursive quicksort in lecture and the partition function is a key component of the quicksort function. Here is the C0 version of partition.

int partition(int[] A, int lower, int pivot_index, int upper)
//@requires 0 <= lower && lower <= pivot_index;
//@requires pivot_index < upper && upper <= \length(A);
//@ensures lower <= \result && \result < upper;
//@ensures ge_seg(A[\result], A, lower, \result);
//@ensures le_seg(A[\result], A, \result+1, upper);
{
 int pivot = A[pivot_index];
 swap(A, pivot_index, upper-1);
 int left = lower;
 int right = upper-2;
 while (left <= right)
 //@loop_invariant lower <= left && left <= right+1 && right+1 < upper;
 //@loop_invariant ge_seg(pivot, A, lower, left);
 //@loop_invariant le_seg(pivot, A, right+1, upper-1);
 {
  if (A[left] <= pivot) {
  left++;
  } 
  else { //@assert A[left] > pivot;
         swap(A, left, right);
         right--;
  }
}
 //@assert left == right+1;
 //@assert A[upper-1] == pivot;
 swap(A, left, upper-1);
 return left;
}

We convert the C0 code to C as follows (assuming a C version of swap is defined elsewhere)

int partition(int* A, int length, int lower, int pivot_index, int upper)
{
  REQUIRES(0 <= lower && lower <= pivot_index);       <--- replaced with REQUIRES macro
  REQUIRES( pivot_index < upper && upper <= length);  <--- replaced with REQUIRES macro
 
 int pivot = A[pivot_index];
 swap(A, pivot_index, upper-1);   <--- assumes the existence of C swap function
 int left = lower;
 int right = upper-2;
 while (left <= right)
 //@loop_invariant lower <= left && left <= right+1 && right+1 < upper;   <--- left as a C comment
 //@loop_invariant ge_seg(pivot, A, lower, left);                         <--- left as a C comment
 //@loop_invariant le_seg(pivot, A, right+1, upper-1);                    <--- left as a C comment
 {
  if (A[left] <= pivot) {
   left++;
  } 
  else { 
     ASSERT(A[left] > pivot);
     swap(A, left, right);   <--- assumes the existence of C swap function
     right--;
  }
}
 ASSERT(left == right+1);
 ASSERT(A[upper-1] == pivot);
 swap(A, left, upper-1);   <--- assumes the existence of C swap function

 ENSURES(lower <= left && left < upper);           <--- ensures can only be checked here with macro
 ENSURES(ge_seg(A[left], A, lower, left);          <--- \result is replaced by left, the return value
 ENSURES(le_seg(A[left], A, left+1, upper);        <--- assumes the functions ge_seg and le_seg
 return left;
}

Example 3: implementing binary search trees

In this example, we will discuss how to convert some of our binary search tree(bst) C0 code to C. Lets start with the tree_node declaration in C0.

struct tree_node {
 elem data;
 struct tree_node* left;
 struct tree_node* right;
};
typedef struct tree_node tree;

struct bst_header {
 tree* root;
};

assuming that elem is defined by the client, here is the equivalent C code

struct tree_node {
 elem data;
 struct tree_node* left;
 struct tree_node* right;
};
typedef struct tree_node tree;

struct bst_header {
 tree* root;
};

As you can see, the definition of a struct in C and C0 are identical. Now lets take a look at the C0 function bst_insert.

tree* tree_insert(tree* T, elem e)
 //@requires is_ordtree(T);
 //@requires e != NULL;
 //@ensures is_ordtree(\result);
{
 if (T == NULL) {
 /* create new node and return it */
 T = alloc(struct tree_node);
 T->data = e;
 T->left = NULL; T->right = NULL;
 return T;
}
 int r = key_compare(elem_key(e), elem_key(T->data));
 if (r == 0)
  T->data = e; /* modify in place */
 else if (r < 0)
  T->left = tree_insert(T->left, e);
 else //@assert r > 0;
  T->right = tree_insert(T->right, e);
 return T;
}

Here is the equivalent code in C.

tree* tree_insert(tree* T, elem e)
{
  REQUIRES(is_ordtree(T));
  REQUIRES(e != NULL;
  //@ensures is_ordtree(\result);   <--- left as a C comment for now. See below

  if (T == NULL) {
   /* create new node and return it */
   T = xcalloc(1,sizeof(struct tree_node));  <--- changed from C0 alloc to C xcalloc
   T->data = e;
   T->left = NULL; T->right = NULL;
   ENSURES(is_ordtree(T));
   return T;
  }
  int r = key_compare(elem_key(e), elem_key(T->data));  <-- assume the existence of C functions
  if (r == 0)
   T->data = e; /* modify in place */
  else if (r < 0)
   T->left = tree_insert(T->left, e);
  else {
   ASSERT(r > 0);
   T->right = tree_insert(T->right, e);
   ENSURES(is_ordtree(T));
   return T;
  }
}

The above C code is annotated to show the changes made. We have also used the macro ENSURES as we can check the return value just before the return is executed.

Common Errors when programming in C

^ to index

There are many things that can go wrong in your C program unless you are aware of some of the most common things to look for. The following is a list of thing you can check when your program display some weird behavior or altogether crash.

Dereference of an invalid or uninitialized pointer

```c int* ptr; //ptr declared but not initialized *ptr = 10; //cause the program to seg fault as no memory is allocated for pointer. .... int A[]={1,2,3}; //declare and initialize an array int* ptr = A; //ptr now points to the address of A[0] *(ptr+3) = 4; //trying to access ptr[3] that does not exists ```

Insufficient or no memory allocated for the operation

```c int* ptr; //a variable of type int* declared *ptr = 10; //since no memory allocated for ptr, this could cause a segmentation fault .... char* ptr = malloc(1); //allocate one byte of memory int* ptr2 = (int*)ptr; //ptr and ptr2 points to the same location *ptr2 = 10; //insufficient memory to dereference ptr2 (need 4 bytes) ```

Storage used after free

```c int* ptr = malloc(sizeof(int)); //a variable of type int* declared and initialized .... free(ptr); // memory is freed *ptr = 10; //ptr can no longer be dereferenced as no memory is associated with ptr ```

Allocation freed repeatedly

```c int* ptr = malloc(sizeof(int)); //a variable of type int* declared and initialized int* ptr1 = ptr; //ptr1 is an alias of ptr free(ptr); // memory is freed ..... free(ptr1); //since ptr1 is also points to the same location, this is double freeing of memory -error ```

Free of unallocated or potentially storage

```c int* ptr; //a variable of type int* declared free(ptr); //attempting to free unallocated memory ```

Free of stack space

```c int x = 10; //memory for x is allocated from stack int* ptr = &x; //ptr is an alias for the address of x free(ptr); //attempting to free stack memory ```

Return a pointer to a local variable

```c int* foo(){ int x; return &x; // returns the address of x } int* ptr = foo(); *ptr = 10; //local variable x no longer exists outside of foo() ```

Dereference of wrong type

```c char* ptr = malloc(2); int* ptr1 = (int*)ptr; *ptr1 = 10; ```

Assignment of incompatible types

```c char ch ='a'; int x = ch; //legal, but be careful ```

Program logic confuses pointer and reference types

```c int x; // x is a reference type int* ptr; // ptr is pointer type ptr = x; // legal but very dangerous ```

incorrect use of pointer arithmetic

```c int A[]={1,2,3}; *A = 1; //is equivalent to A[0]=1 *(A+1) = 2; //is equivalent to A[1]=2 *(A+2) = 3; //is equivalent to A[2]=3 /* A+i is the address of A[i]. This is obtained by adding i*sizeof(int) to the address of A ```

Array index out of bounds

int A[]={1,2,3};
A[3] = 4;        //legal, but just violated the array boundaries. consequences are undefined

C Idioms

^ to index

This section describes a number of C idioms you may find useful in your implementation of the C0VM. This text is modified from original content written by Willam Lovas.

1. The switch/case construct:

The 'switch' statement is a multi-way branching control construct something like an iterated conditional. Given an int expression e and constants c1, c2, ..., cn, the following 'switch'-statement and 'if'-statement are roughly equivalent:

        switch (e) {
            case c1:
                <statement 1>
                break;

            case c2:
                <statement 2>
                break;

            ...

            case cn:
                <statement n>
                break;

            default:
                <default statement>
                break;
        }

and

       int x = e;
        if (x == c1)
            <statement 1>
        else if (x == c2)
            <statement 2>
        ...
        else if (x == cn)
            <statement n>
        else
            <default statement>

Note: the 'break;' statements in the cases of the switch are important: without a 'break;', control falls through to the following case. This design choice can easily lead to unexpected behaviors. Consider:

        /* XXX WRONG XXX */
        switch (x % 2) {
            case 0:
                printf("x is even\n");

            case 1:
                printf("x" is odd\n");
        }

The fall through behavior of cases can sometimes be exploited, though:

        switch (x) {
            case 1:
            case 4:
            case 9:
                printf("x is a single digit perfect square\n");
                break;

            case 2:
            case 3:
            case 5:
            case 7:
                printf("x is a single digit prime\n');
                break;
        }

To introduce new variables local to a case, create a new block using curly braces:

        switch(x) {
            case 1:
            case 4:
            case 9:
            {
                int sx = isqrt(x);
                printf("sqrt(x) = %d\n", sx);
                break;
            }

            default:
                printf("not a perfect square...\n");
        }

The entire main loop of your VM will be one big switch statement. Be wary of fall through!

2. Structs that aren't pointers

In C0 and in much of our C programming, we have accessed structs through pointers almost exclusively. For example, we would write

        typedef struct stack * stack;
        struct stack {
            list top;
        };

        ...
        stack S = stack_new();
        ...
        ... S->top ...

Sometimes it is convenient to access structs directly, though, like when they're stored in an array. To access a field f of a struct expression e, one uses the syntax 'e.f'. For example, if stack_array is an array of stacks we can access the top element of the i-th stack of the array using stack_array[i].top

        stack stack_array[] = ...

        ... stack_array[i].top ...

In fact, the familiar notation 'p->f' for accessing a struct field through a pointer p is just syntactic sugar for '(*p).f', and since array indexing is just syntactic sugar for pointer arithmetic. The above expression really amounts to

   ...(*(stack_array+i)).top... 
    or 
   (stack_array+i)->top.

Accessing structs directly will be useful when dealing with various pools.

3. Flexible array members

Usually, structs aren't allowed to contain arrays directly, only pointers, but C99 allows the last member of a struct to be an array of an unknown size. Just as with stack-allocated arrays, the array will be contiguous in memory with the earlier struct elements. Flexible array members are used by the "bare" C0 run time to implement the struct c0_array, an array stored with its length and the size of its elements:

        struct c0_array {
            int count;
            int elt_size;
            char elems[];
        };

The sizeof operator on a struct with a flexible array member returns the size the struct would have if the array had length 0. To allocate a struct with a flexible array member, request space for the struct itself plus however much space you want for the array. For instance:

        struct c0_array *a = xcalloc(1, sizeof(struct c0_array) + 23);

where xcalloc is a safe version of calloc. The above code allocates space for the c0 representation of an array containing 23 bytes, perhaps a char array. The fields can be accessed as usual using struct-pointer notation:

        a->count = 23;
        a->elt_size = 1;
        ... a->elems[17] ...

An alternative that would be equivalent under most implementations is to simply allocate sizeof(int) bytes for the first int plus sizeof(int) bytes for the second int plus 23 bytes for the array, giving you a block of memory laid out as follows:

        [<- 4bytes ->][<- 4bytes ->][<--------- 23 bytes --------->]

Using a struct with a flexible array member offers the advantage of being able to access the non-flexible fields by name.

4. Casting pointers <-> integers, signed <-> unsigned

C99 leaves the behavior of casting pointers to integral types and the representation of signed integers up to the implementation. They are both "implementation-defined" behaviors. Please refer to gcc documentation:
GCC supports only two's complement integer types, and all bit patterns are ordinary values. (from http://gcc.gnu.org/onlinedocs/gcc-4.3.5/gcc/Integers-implementation.html#Integers-implementation)


A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged(from http://gcc.gnu.org/onlinedocs/gcc-4.3.5/gcc/Arrays-and-pointers-implementation.html#Arrays-and-pointers-implementation)

We encourage you to leverage these implementation decisions to implement reliable signed modular arithmetic using unsigned modular arithmetic and to store integer types directly onto a stack that holds elements of type void *.

The macros INT(p) and VAL(x) cast back and forth between void* pointers and ints, and we always have INT(VAL(x)) == x. INT(p) is not defined, unless p was obtained by casting an int (or is equal to NULL). You can see the definition of these macros in the file c0vm.h.

5. Coping with unused variables

When compiling with -Wall -Wextra, a warning will be generated if any variable is unused, and when compiling with -Werror, that warning turns into an error:

        cc1: warnings being treated as errors
        foo.c: In function 'bar':
        foo.c:12: error: unused variable 'x'

Sometimes this indicates a bug in your program, but sometimes it just indicates that you're program is incomplete, and you may want to compile regardless in order to test the partial functionality you have written thus far. You might think to suppress the error simply by mentioning the unused variable on a line by itself:
x;
but that just triggers another warning/error:

        cc1: warnings being treated as errors
        foo.c: In function 'bar':
        foo.c:12: error: statement with no effect

The proper solution is to mention the unused variable on a line by itself but additionally indicate your intent to ignore its value by casting it to void;

        (void) x;

That should eliminate the warning and get you back on track to compiling and testing your code.

Conclusion

This tutorial is intended for those who are already familiar with C0 language and want to make a transition into C. We plan to expand this tutorial to include more examples in the near future. In the meantime, if you find any errors or have any comments or suggestions, please send email to [email protected]

Clone this wiki locally