Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,148,046 members, 7,799,539 topics. Date: Wednesday, 17 April 2024 at 12:19 AM

CC500: A Tiny Self-hosting C Compiler - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / CC500: A Tiny Self-hosting C Compiler (729 Views)

Best C/C++ Compiler? / Pls Help, Website For C/c++ Compiler Free Download Windows 7 / Is There Any C++ Compiler For Mobile Phone Se(java) (2) (3) (4)

(1) (Reply)

CC500: A Tiny Self-hosting C Compiler by africanaija: 8:17pm On Nov 08, 2014
How it works

The compiler reads the whole program in C from stdin and writes an ELF binary for Linux/x86 to stdout. The ELF binary is somewhat non-standard as it consists of a single section that is both writable and executable. It seems to work for me on a real Linux box, but it might cause problems on an emulator.

No libraries are used. Tiny, machine-code implementations of exit(), getchar(), malloc() and putchar() are included in the binary that is generated. There is no free() as malloc() is just sbrk(), really, implemented with two calls to brk().

There is almost no error-checking. With some invalid C programs the compiler calls exit(1) instead of generating a binary, but in most cases it generates a binary that crashes when you run it.

There is no preprocessor. The lexer looks for tokens that match one of the following Perl regular expressions: /[a-z0-9_]+/, /[<=>|&!]+/, /'[^']*'/, /"[^"]*"/, /./. Traditional C-style comments are handled, but not C++-style comments.

The symbol table is implemented with a single array of char. The type of each symbol is recorded as 'D' (defined global), 'U' (undefined global), 'L' (local variable) or 'A' (argument of a function) - see sym_get_value(). There is also an address recorded for each symbol, but no type, as there is no type-checking and all arrays are assumed to be arrays of char. The address of an 'L' or 'A' is its position on the stack. The address of a 'D' is the position of that function or global variable in the binary. The address of a 'U' is also a pointer into the binary, but a pointer to where the symbol was last used and the head of a linked list of all the forward references to that symbol.

Scoping rules for local variables are implemented correctly as it is easy to do so: at the end of a compound statement the pointer to the end of the symbol table is restored to what it was at the start of the compound statement - see the assignment to table_pos in statement().

I started off writing CC500 without local variables, but it felt very unnatural. Nested scopes are an essential part of C and it is not easy to write a recursive-descent parser without automatic variables.

It is assumed that all variables, arguments and return values are char * or int, though the compiler does not even parse the types, let alone record them in the symbol table. The functions that implement the recursive-descent parser return 1, 2 or 3, corresponding to a "char lval", "int lval" or "other" - see promote().

Each parsing function is preceded by a comment that documents the syntactic structures recognised, in an obvious notation. In some cases the code accepts additional, incorrect structures in addition to what is documented in the comment. Only a tiny subset of C's operators are accepted: just the ones I needed while writing the compiler. There are no unary operators: instead of the indirection operator * you must use x[y], where it is assumed that x is char * and y is int. The only statements recognised are compound statements, expressions, if, while and return. Local variable declarations, with initialisers, are treated as a kind of statement. (If you write something like if (x) int y; the compiler will not complain, but the binary will crash.)

There are no arbitary limits on the length of symbols, number of symbols are the like: all three buffers (token, table and code) are supposed to be automatically extended as required by calls to my_realloc().

Original Post - http://homepage.ntlworld.com/edmund.grimley-evans/cc500/

(1) (Reply)

Nce Mat/phy Or Phy/comp Which Is Best? / Using Php To Create A Form / Help With Android Sqlite

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 17
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.