Friday 16 December 2016

Libclang python : hello world!

Many a time we are faced with the challenge of trying to understand large code repositories for various reasons. There is a plethora of tools out there, but sometimes you just get stuck with a problem for which there simply isn't any tool. One such problem hit me today, I wanted to find the call graph of a function, say A, and check where a function, say B lies in it, and what's the call path if I want to go from function A to B.

This is where code analysis libraries enter. They allow us to parse source code in a programmatic way, thus for my case - no more manually exploring the call path.
One such library is libclang (from the awesome CLANG/LLVM project). libclang is for c, but it has python bindings as well, which we are going to explore.

Credits - The following awesome blog helped me in getting started with libclang, and some code, and examples may be from it.

http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang

Before you proceed, make sure of the following.

  • LLVM(Clang) is installed, and is present in computer's path
  • minGW is installed, and is present in computer's path
  • You are using 32bit (x86) windows OS (I was unable to use it on x64 windows, I kept getting access violation errors, which is perhaps due to a known bug). I have not tested this on Linux, but it should work on x32 Linux equally well.
  • Install python clang module from "https://github.com/llvm-mirror/clang/tree/master/bindings/python"
    Here are the steps -
       a. Download clang folder from the link
       b. Put it in python's lib folder. On my windows 7 x86, I had to paste it in "C:\Python27\Lib".
           
    If you are not sure where the lib folder is on your system, open a python console, import a module for which you are sure that it exists in your system, say ctypes, and print the value of ctypes.__file__. This will give you an idea about the location of the folder. See the following screenshot.
Finding python lib folder
Finding python lib folder


Let's suppose we have the following C++ code that we want to parse.


//demo_code.cpp
class Person {
};


class Room {
public:
    void add_person(Person person)
    {
        // do stuff
    }

private:
    Person* people_in_room;
};


template <class T, int N>
class Bag<T, N> {
};


int main()
{
    Person* p = new Person();
    Bag<Person, 42> bagofpersons;

    return 0;
}

Let's start writing the python script which will parse this code.

Step 1 Import clang


import clang.cindex

Step 2 Load the C++ source code


index = clang.cindex.Index.create()
translation_unit = index.parse("demo_code.cpp")

Step 3 Get an iterator over the syntax tree of the source code, and print some tokens


print 'Translation unit:', translation_unit.spelling
for child in translation_unit.cursor.get_children():
   print child.spelling

for node in translation_unit.cursor.walk_preorder():
   print "pre-order-traversal",node.spelling,node.get_usr()

See the following screenshot.


python libclang parsing over our source code demo_code.cpp
Few things to note

  1. spelling member in python code tells the name of the token we are referring to.
  2. get_usr() gives us something called as Unified Symbol Resolution, this is like a global name for the token we are referring to. This helps is identifying a token uniquely when we have multiple source files.
Here is the complete python code.


import sys
import clang.cindex

index = clang.cindex.Index.create()
translation_unit = index.parse("demo_code.cpp")
print 'Translation unit:', translation_unit.spelling
for child in translation_unit.cursor.get_children():
    print child.spelling
for node in translation_unit.cursor.walk_preorder():
    print "preorder",node.spelling,node.get_usr()

This is my first attempt at parsing source code via a python script, and I believe it will help me greatly  in debugging and understanding source codes, as I learn more. Hope it helps you as well.

References-
  • Installing clang on windows - http://blog.johannesmp.com/2015/09/01/installing-clang-on-windows-pt2
  • Parsing C++ in Python with Clang - http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang
  • libclang slides - http://llvm.org/devmtg/2010-11/Gregor-libclang.pdf