Here I describe a simple way to extract information on variables, clasess and functions from Python source code without actually compiling / interpreting it. Very useful Python modules that can help with this task are ast and tokenize.

The first goal I had was to get a list of all classes, functions and variables defined in a Python file, including useful information such as function and class length and the hierarchy of variables, functions and classes in the file. As it turns out, using ast this is really simple.

The first step consists in loading the source code and parsing it using ast:

with open(filename,'r') as f:
    content =
p = ast.parse(content,filename,mode='exec')

The next step is to define a custom NodeVisitor class, which we can use to walk down all branches of the ast tree generated by the ast.parse function. In my case, I wanted to extract information on functions, classes, variables and import statements, so I define the following node visitor:

class AnalysisNodeVisitor(ast.NodeVisitor):

    def __init__(self,rootNode = None):
        self._modules = []
        self._classes = []
        self._functions = []
        self._variables = []
        self._imports = []
        self._rootNode = rootNode
        self._parentNode = rootNode
        self._level = 0

    def rootNode(self):
        return self._rootNode

    def imports(self):
        return self._imports

    def functions(self):
        return self._functions

    def variables(self):
        return self._variables

    def classes(self):
        return self._classes

    def visit_Import(self,node):
        for name in node.names:
            importNode = Node(attributes = {'type':'import','names':map(lambda,node.names)},parent = self._parentNode)
        ast.NodeVisitor.generic_visit(self, node)

    def visit_ImportFrom(self,node):
        for name in node.names:
            importNode = Node(attributes = {'line_number':node.lineno,'type':'from_import','module':node.module,'names':map(lambda,node.names)},parent = self._parentNode)
        ast.NodeVisitor.generic_visit(self, node)

    def visit_Assign(self,node):
        for target in node.targets:
        ast.NodeVisitor.generic_visit(self, node)

    def visit_AssignAug(self,node):
        ast.NodeVisitor.generic_visit(self, node)

    def _add_target_to_variables(self,target):
        if hasattr(target,'value'):
        elif hasattr(target,'id'):
            if not in self._variables and not == "self":
                variableNode = Node(attributes = {'type':'variable','name'},parent = self._parentNode)

    def visit_FunctionDef(self,node):
        body = node.body
        functionNode = Node(attributes = {'type':'function','name','start_line':body[0].lineno,'end_line':_get_last_line_number(body),'docstring':ast.get_docstring(node)},parent = self._parentNode)
        oldParent = self._parentNode
        self._parentNode = functionNode
        ast.NodeVisitor.generic_visit(self, node)
        self._parentNode = oldParent

    def visit_ClassDef(self,node):
        body = node.body
        classNode = Node(attributes = {'type':'class','name','start_line':body[0].lineno,'end_line':_get_last_line_number(body),'docstring':ast.get_docstring(node)},parent = self._parentNode)
        oldParent = self._parentNode
        self._parentNode = classNode
        ast.NodeVisitor.generic_visit(self, node)
        self._parentNode = oldParent

When passing a node tree to the visitor class, it will call the visit(node) function on each node of the syntax tree. The default implementation of visit() then calls another function depending on the type of the node it encounters. For example, for class and function definitions, it will call the `visit_FunctionDef` and `visit_ClassDef` functions. A complete list of all function types can be found in the abstract grammar section of the ast documentation page. So in my class, I just redefine `visit_Import`, `visit_ImportFrom`, `visit_Assign`, `visit_AssignAug`, `visit_FunctionDef` and `visit_ClassDef` to extract all the required information on imports, classes, functions and variables, giving me the names, location and length of all of them. The calculation of the length of a class or function is a bit tricky since it involves the lengths of child nodes, so I wrote a little helper function to get the last line number associated to a class or function body:

def _get_last_line_number(nodes):
    children = None
    if hasattr(nodes[-1],'orelse'):
        children = nodes[-1].orelse
    elif hasattr(nodes[-1],'finalbody'):
        children = nodes[-1].finalbody
    elif hasattr(nodes[-1],'body'):
        children = nodes[-1].body
    if children:
        return max(nodes[-1].lineno,_get_last_line_number(children))
        return nodes[-1].lineno

Now, a syntax tree can be parsed by creating an `AnalysisNodeViewer`() instance and calling the `visit`(p) function with the AST syntax tree as argument on it. The Node class that appears in the code is a simple class that stores the information on each node and contains a list of the children nodes in the code hierarchy:

class Node(object):

    def __init__(self,attributes = {},parent = None):
        self.parent = parent

    def __repr__(self):
        return self.__class__.__name__+"(attributes = "+str(self.attributes)+")"

    def attributes(self):
        return self._attributes

    def attributes(self,attributes):
        self._attributes = attributes

    def parent(self):
        return self._parent

    def parent(self,parent):
        if self._parent != None:
        self._parent = parent
        if self._parent != None:

    def children(self):
        return self._children

    def removeChild(self,child):
        if child in self.children():
            del self._children[self._children.indexof(child)]

    def addChild(self,child):
        if not child in self._children:

When initializing the `AnalysisNodeVisitor` class with a root node, all nodes generated during the analysis will be attached to this node, allowing us to reconstruct the code structure from the node hierarchy. In addition, all function, class, import and variable nodes are stored in the `functions`, `variables`, `classes `and `imports` attributes of the class.