Skip to content

A C language tokenizer implemented by pure Python,can be use to tokenize C code in Python

License

Notifications You must be signed in to change notification settings

MrGreyfun/C-tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

C-tokenizer

A C language tokenizer implemented by pure Python, can be used to tokenize C code in Python

example

tokenize code from string

from c_tokenizer import *

codeString = '''
int main()
{
	char* inputString = input();
	printf("%s", inputString);
	return 0;
}
'''
codeStream = CodeReader.readFromString(codeString)
tokenStream = TokenStream(codeStream)
print(tokenStream)

output

[(typeof = 5 (NEWLINE), string = '\n'),
(typeof = 2 (NAME), string = 'int'),
(typeof = 2 (NAME), string = 'main'),
(typeof = 1 (OP), string = '('),
(typeof = 1 (OP), string = ')'),
(typeof = 5 (NEWLINE), string = '\n'),
(typeof = 1 (OP), string = '{'),
(typeof = 5 (NEWLINE), string = '\n'),
(typeof = 2 (NAME), string = 'char'),
(typeof = 1 (OP), string = '*'),
(typeof = 2 (NAME), string = 'inputString'),
(typeof = 1 (OP), string = '='),
(typeof = 2 (NAME), string = 'input'),
(typeof = 1 (OP), string = '('),
(typeof = 1 (OP), string = ')'),
(typeof = 1 (OP), string = ';'),
(typeof = 5 (NEWLINE), string = '\n'),
(typeof = 2 (NAME), string = 'printf'),
(typeof = 1 (OP), string = '('),
(typeof = 3 (STRING), string = '"%s"'),
(typeof = 1 (OP), string = ','),
(typeof = 2 (NAME), string = 'inputString'),
(typeof = 1 (OP), string = ')'),
(typeof = 1 (OP), string = ';'),
(typeof = 5 (NEWLINE), string = '\n'),
(typeof = 2 (NAME), string = 'return'),
(typeof = 0 (NUM), string = '0'),
(typeof = 1 (OP), string = ';'),
(typeof = 5 (NEWLINE), string = '\n'),
(typeof = 1 (OP), string = '}'),
(typeof = 5 (NEWLINE), string = '\n'),
(typeof = 5 (NEWLINE), string = '\n')]

tokenize code from file

from c_tokenizer import *

codeStream = CodeReader.readFromFile(path="example.c") # replace it with you own file path
tokenStream = TokenStream(codeStream)
print(tokenStream)

About

A C language tokenizer implemented by pure Python,can be use to tokenize C code in Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published