Extract Python function source text from the source code string

Suppose I have valid Python source code, as a string:

code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

""".strip()

Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings

def foo(a, b):

  return a + b

and

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.

Attempts

I can parse the code string into its AST:

code_ast = ast.parse(code_string)

And I can find the FunctionDef nodes, e.g.:

function_def_nodes = [node for node in ast.walk(code_ast)

                      if isinstance(node, ast.FunctionDef)]

Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:

last_line = max(node.lineno for node in ast.walk(function_def_node)

                if hasattr(node, 'lineno'))

but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.

I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.

I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.

In theory I could write a parser for Python but that really seems like overkill.

A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:

def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass

edited 3 hours ago

asked 5 hours ago

pkpnd

4,6211140

I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

– Blorgbeard
4 hours ago

You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

– pkpnd
4 hours ago

Oops, yes. You get the idea, anyway.

– Blorgbeard
4 hours ago

Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

– pkpnd
4 hours ago

Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

– Blorgbeard
3 hours ago

|
show 6 more comments

Suppose I have valid Python source code, as a string:

code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

""".strip()

Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings

def foo(a, b):

  return a + b

and

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.

Attempts

I can parse the code string into its AST:

code_ast = ast.parse(code_string)

And I can find the FunctionDef nodes, e.g.:

function_def_nodes = [node for node in ast.walk(code_ast)

                      if isinstance(node, ast.FunctionDef)]

Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:

last_line = max(node.lineno for node in ast.walk(function_def_node)

                if hasattr(node, 'lineno'))

but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.

I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.

I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.

In theory I could write a parser for Python but that really seems like overkill.

A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:

def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass

edited 3 hours ago

asked 5 hours ago

pkpnd

4,6211140

I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

– Blorgbeard
4 hours ago

You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

– pkpnd
4 hours ago

Oops, yes. You get the idea, anyway.

– Blorgbeard
4 hours ago

Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

– pkpnd
4 hours ago

Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

– Blorgbeard
3 hours ago

|
show 6 more comments

Suppose I have valid Python source code, as a string:

code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

""".strip()

Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings

def foo(a, b):

  return a + b

and

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.

Attempts

I can parse the code string into its AST:

code_ast = ast.parse(code_string)

And I can find the FunctionDef nodes, e.g.:

function_def_nodes = [node for node in ast.walk(code_ast)

                      if isinstance(node, ast.FunctionDef)]

Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:

last_line = max(node.lineno for node in ast.walk(function_def_node)

                if hasattr(node, 'lineno'))

but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.

I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.

I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.

In theory I could write a parser for Python but that really seems like overkill.

A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:

def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass

edited 3 hours ago

asked 5 hours ago

pkpnd

4,6211140

Suppose I have valid Python source code, as a string:

code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

""".strip()

Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings

def foo(a, b):

  return a + b

and

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]

Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.

Attempts

I can parse the code string into its AST:

code_ast = ast.parse(code_string)

And I can find the FunctionDef nodes, e.g.:

function_def_nodes = [node for node in ast.walk(code_ast)

                      if isinstance(node, ast.FunctionDef)]

Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:

last_line = max(node.lineno for node in ast.walk(function_def_node)

                if hasattr(node, 'lineno'))

but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.

I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.

I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.

In theory I could write a parser for Python but that really seems like overkill.

A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:

def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass

python

edited 3 hours ago

asked 5 hours ago

pkpnd

4,6211140

edited 3 hours ago

asked 5 hours ago

pkpnd

4,6211140

edited 3 hours ago

asked 5 hours ago

pkpnd

4,6211140

asked 5 hours ago

pkpnd

4,6211140

asked 5 hours ago

pkpnd

4,6211140

I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

– Blorgbeard
4 hours ago

You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

– pkpnd
4 hours ago

Oops, yes. You get the idea, anyway.

– Blorgbeard
4 hours ago

Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

– pkpnd
4 hours ago

Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

– Blorgbeard
3 hours ago

|
show 6 more comments

I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

– Blorgbeard
4 hours ago

You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

– pkpnd
4 hours ago

Oops, yes. You get the idea, anyway.

– Blorgbeard
4 hours ago

Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

– pkpnd
4 hours ago

Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

– Blorgbeard
3 hours ago

I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

– Blorgbeard
4 hours ago

You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

– pkpnd
4 hours ago

Oops, yes. You get the idea, anyway.

– Blorgbeard
4 hours ago

Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

– pkpnd
4 hours ago

Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

– Blorgbeard
3 hours ago

|
show 6 more comments

3 Answers
3

active

oldest

votes

A much more robust solution would be to use the tokenize module. The following code can handle weird indentations, comments, multi-line tokens, single-line function blocks and empty lines within function blocks:

import tokenize

from io import BytesIO

from collections import deque

code_string = """

# A comment.

def foo(a, b):

  return a + b



class Bar(object):

  def __init__(self):



    self.my_list = [

        'a',

        'b',

    ]



  def test(self): pass

  def abc(self):

    '''multi-

    line token'''



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # unmatched parenthesis: ( }

  pass

""".strip()

file = BytesIO(code_string.encode())

tokens = deque(tokenize.tokenize(file.readline))

lines = 

while tokens:

    token = tokens.popleft()

    if token.type == tokenize.NAME and token.string == 'def':

        start_line, start_column = token.start

        end_line, _ = token.end

        enclosures = 0

        while tokens:

            token = tokens.popleft()

            if token.type == tokenize.NL: # ignore empty lines

                continue

            if token.type == tokenize.OP and token.string in '([{':

                enclosures += 1

            _, column = token.start

            if column <= start_column and token.type != tokenize.INDENT and not enclosures:

                tokens.appendleft(token)

                break

            if token.type == tokenize.OP and token.string in ')]}':

                enclosures -= 1

            end_line, _ = token.end

        lines.append((start_line, end_line))

print(lines)

This outputs:

[(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]

edited 21 mins ago

answered 2 hours ago

blhsing

29.9k41336

This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

– pkpnd
2 hours ago

Oops did not actually have any logic to handle weird indentation. Added now.

– blhsing
28 mins ago

This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

– user2357112
9 mins ago

add a comment |

Rather than reinventing a parser, I would use python itself.

Basically I would use the compile() built-in function, which can check if a string is a valid python code by compiling it. I pass to it a string made of selected lines, starting from each def to the farther line which does not fail to compile.

code_string = """

#A comment

def foo(a, b):

  return a + b



def bir(a, b):

  c = a + b

  return c



class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



""".strip()



lines = code_string.split('n')



#looking for lines with 'def' keywords

defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]



#getting the indentation of each 'def'

indents = {}

for i in defidxs:

    ll = lines[i].split('def')

    indents[i] = len(ll[0])



#extracting the strings

end = len(lines)-1

while end > 0:

    if end < defidxs[-1]:

        defidxs.pop()

    try:

        start = defidxs[-1]

    except IndexError: #break if there are no more 'def'

        break



    #empty lines between functions will cause an error, let's remove them

    if len(lines[end].strip()) == 0:

        end = end -1

        continue



    try:

        #fix lines removing indentation or compile will not compile

        fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation

        body = 'n'.join(fixlines)

        compile(body, '<string>', 'exec') #if it fails, throws an exception

        print(body)

        end = start #no need to parse less line if it succeed.

    except:

        pass



    end = end -1

It is a bit nasty because of the except clause without specific exceptions, which is usually not recommended, but there is no way to know what may cause compile to fail, so I do not know how to avoid it.

This will prints

def baz():

  return [

1,

  ]

def __init__(self):

  self.my_list = [

      'a',

      'b',

  ]

def bir(a, b):

  c = a + b

  return c

def foo(a, b):

  return a + b

Note that the functions are printed in reverse order than those they appear inside code_strings

This should handle even the weird indentation code, but I think it will fails if you have nested functions.

answered 1 hour ago

Valentino

39929

add a comment |

I think a small parser is in order to try and take into account this weird exceptions:

import re



code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



def test_comment(

    a #)

):

    return [a,

    # ]

a]



def test_escaped_endline():

    return "asdad 

asdsad 

asdas"



def test_nested():

    return {():[,

{

}

]

}

""".strip()



code_string += 'n'





func_list=

func = ''

tab  = ''

brackets = {'(':0, '[':0, '{':0}

close = {')':'(', ']':'[', '}':'{'}

string=''

tab_f=''

c_old=''

multiline=False

check=False

for line in code_string.split('n'):

    tab = re.findall(r'^s*',line)[0]

    if 'def ' in line and not func:

        func += line + 'n'

        tab_f = tab

        check=True

    if func:

        if not check:

            if sum(brackets.values()) == 0 and not string and not multiline:

                if len(tab) <= len(tab_f):

                    func_list.append(func)

                    func=''

                    c_old=''

                    c_old2=''

                    continue

            func += line + 'n'

        check = False

        for c in line:

            if c == '#' and not string and not multiline:

                break

            if c_old != '\':

                if c in ['"', "'"]:

                    if c_old2 == c_old == c == '"' and string != "'":

                        multiline = not multiline

                        string = ''

                        continue

                    if not multiline:

                        if c in string:

                            string = ''

                        else:

                            if not string:

                                string = c

                if not string and not multiline:

                    if c in brackets:

                        brackets[c] += 1

                    if c in close:

                        b = close[c]

                        brackets[b] -= 1

            c_old2=c_old

            c_old=c



for f in func_list:

    print('-'*40)

    print(f)

output:

----------------------------------------

def foo(a, b):

  return a + b



----------------------------------------

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



----------------------------------------

def baz():

  return [

1,

  ]



----------------------------------------

  def hello(self, x):

    return self.hello(

x - 1)



----------------------------------------

def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



----------------------------------------

def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



----------------------------------------

def test_comment(

    a #)

):

    return [a,

    # ]

a]



----------------------------------------

def test_escaped_endline():

    return "asdad asdsad asdas"



----------------------------------------

def test_nested():

    return {():[,

{

}

]

}

edited 1 hour ago

answered 2 hours ago

Crivella

33627

Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

– pkpnd
2 hours ago

Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

– Crivella
2 hours ago

You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

– pkpnd
2 hours ago

1

Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

– Crivella
2 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54374296%2fextract-python-function-source-text-from-the-source-code-string%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

import tokenize

from io import BytesIO

from collections import deque

code_string = """

# A comment.

def foo(a, b):

  return a + b



class Bar(object):

  def __init__(self):



    self.my_list = [

        'a',

        'b',

    ]



  def test(self): pass

  def abc(self):

    '''multi-

    line token'''



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # unmatched parenthesis: ( }

  pass

""".strip()

file = BytesIO(code_string.encode())

tokens = deque(tokenize.tokenize(file.readline))

lines = 

while tokens:

    token = tokens.popleft()

    if token.type == tokenize.NAME and token.string == 'def':

        start_line, start_column = token.start

        end_line, _ = token.end

        enclosures = 0

        while tokens:

            token = tokens.popleft()

            if token.type == tokenize.NL: # ignore empty lines

                continue

            if token.type == tokenize.OP and token.string in '([{':

                enclosures += 1

            _, column = token.start

            if column <= start_column and token.type != tokenize.INDENT and not enclosures:

                tokens.appendleft(token)

                break

            if token.type == tokenize.OP and token.string in ')]}':

                enclosures -= 1

            end_line, _ = token.end

        lines.append((start_line, end_line))

print(lines)

This outputs:

[(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]

edited 21 mins ago

answered 2 hours ago

blhsing

29.9k41336

This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

– pkpnd
2 hours ago

Oops did not actually have any logic to handle weird indentation. Added now.

– blhsing
28 mins ago

This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

– user2357112
9 mins ago

add a comment |

import tokenize

from io import BytesIO

from collections import deque

code_string = """

# A comment.

def foo(a, b):

  return a + b



class Bar(object):

  def __init__(self):



    self.my_list = [

        'a',

        'b',

    ]



  def test(self): pass

  def abc(self):

    '''multi-

    line token'''



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # unmatched parenthesis: ( }

  pass

""".strip()

file = BytesIO(code_string.encode())

tokens = deque(tokenize.tokenize(file.readline))

lines = 

while tokens:

    token = tokens.popleft()

    if token.type == tokenize.NAME and token.string == 'def':

        start_line, start_column = token.start

        end_line, _ = token.end

        enclosures = 0

        while tokens:

            token = tokens.popleft()

            if token.type == tokenize.NL: # ignore empty lines

                continue

            if token.type == tokenize.OP and token.string in '([{':

                enclosures += 1

            _, column = token.start

            if column <= start_column and token.type != tokenize.INDENT and not enclosures:

                tokens.appendleft(token)

                break

            if token.type == tokenize.OP and token.string in ')]}':

                enclosures -= 1

            end_line, _ = token.end

        lines.append((start_line, end_line))

print(lines)

This outputs:

[(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]

edited 21 mins ago

answered 2 hours ago

blhsing

29.9k41336

This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

– pkpnd
2 hours ago

Oops did not actually have any logic to handle weird indentation. Added now.

– blhsing
28 mins ago

This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

– user2357112
9 mins ago

add a comment |

import tokenize

from io import BytesIO

from collections import deque

code_string = """

# A comment.

def foo(a, b):

  return a + b



class Bar(object):

  def __init__(self):



    self.my_list = [

        'a',

        'b',

    ]



  def test(self): pass

  def abc(self):

    '''multi-

    line token'''



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # unmatched parenthesis: ( }

  pass

""".strip()

file = BytesIO(code_string.encode())

tokens = deque(tokenize.tokenize(file.readline))

lines = 

while tokens:

    token = tokens.popleft()

    if token.type == tokenize.NAME and token.string == 'def':

        start_line, start_column = token.start

        end_line, _ = token.end

        enclosures = 0

        while tokens:

            token = tokens.popleft()

            if token.type == tokenize.NL: # ignore empty lines

                continue

            if token.type == tokenize.OP and token.string in '([{':

                enclosures += 1

            _, column = token.start

            if column <= start_column and token.type != tokenize.INDENT and not enclosures:

                tokens.appendleft(token)

                break

            if token.type == tokenize.OP and token.string in ')]}':

                enclosures -= 1

            end_line, _ = token.end

        lines.append((start_line, end_line))

print(lines)

This outputs:

[(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]

edited 21 mins ago

answered 2 hours ago

blhsing

29.9k41336

import tokenize

from io import BytesIO

from collections import deque

code_string = """

# A comment.

def foo(a, b):

  return a + b



class Bar(object):

  def __init__(self):



    self.my_list = [

        'a',

        'b',

    ]



  def test(self): pass

  def abc(self):

    '''multi-

    line token'''



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # unmatched parenthesis: ( }

  pass

""".strip()

file = BytesIO(code_string.encode())

tokens = deque(tokenize.tokenize(file.readline))

lines = 

while tokens:

    token = tokens.popleft()

    if token.type == tokenize.NAME and token.string == 'def':

        start_line, start_column = token.start

        end_line, _ = token.end

        enclosures = 0

        while tokens:

            token = tokens.popleft()

            if token.type == tokenize.NL: # ignore empty lines

                continue

            if token.type == tokenize.OP and token.string in '([{':

                enclosures += 1

            _, column = token.start

            if column <= start_column and token.type != tokenize.INDENT and not enclosures:

                tokens.appendleft(token)

                break

            if token.type == tokenize.OP and token.string in ')]}':

                enclosures -= 1

            end_line, _ = token.end

        lines.append((start_line, end_line))

print(lines)

This outputs:

[(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]

edited 21 mins ago

answered 2 hours ago

blhsing

29.9k41336

edited 21 mins ago

answered 2 hours ago

blhsing

29.9k41336

answered 2 hours ago

blhsing

29.9k41336

answered 2 hours ago

blhsing

29.9k41336

This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

– pkpnd
2 hours ago

Oops did not actually have any logic to handle weird indentation. Added now.

– blhsing
28 mins ago

This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

– user2357112
9 mins ago

add a comment |

This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

– pkpnd
2 hours ago

Oops did not actually have any logic to handle weird indentation. Added now.

– blhsing
28 mins ago

This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

– user2357112
9 mins ago

This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

– pkpnd
2 hours ago

Oops did not actually have any logic to handle weird indentation. Added now.

– blhsing
28 mins ago

This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

– user2357112
9 mins ago

add a comment |

Rather than reinventing a parser, I would use python itself.

code_string = """

#A comment

def foo(a, b):

  return a + b



def bir(a, b):

  c = a + b

  return c



class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



""".strip()



lines = code_string.split('n')



#looking for lines with 'def' keywords

defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]



#getting the indentation of each 'def'

indents = {}

for i in defidxs:

    ll = lines[i].split('def')

    indents[i] = len(ll[0])



#extracting the strings

end = len(lines)-1

while end > 0:

    if end < defidxs[-1]:

        defidxs.pop()

    try:

        start = defidxs[-1]

    except IndexError: #break if there are no more 'def'

        break



    #empty lines between functions will cause an error, let's remove them

    if len(lines[end].strip()) == 0:

        end = end -1

        continue



    try:

        #fix lines removing indentation or compile will not compile

        fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation

        body = 'n'.join(fixlines)

        compile(body, '<string>', 'exec') #if it fails, throws an exception

        print(body)

        end = start #no need to parse less line if it succeed.

    except:

        pass



    end = end -1

This will prints

def baz():

  return [

1,

  ]

def __init__(self):

  self.my_list = [

      'a',

      'b',

  ]

def bir(a, b):

  c = a + b

  return c

def foo(a, b):

  return a + b

Note that the functions are printed in reverse order than those they appear inside code_strings

This should handle even the weird indentation code, but I think it will fails if you have nested functions.

answered 1 hour ago

Valentino

39929

add a comment |

Rather than reinventing a parser, I would use python itself.

code_string = """

#A comment

def foo(a, b):

  return a + b



def bir(a, b):

  c = a + b

  return c



class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



""".strip()



lines = code_string.split('n')



#looking for lines with 'def' keywords

defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]



#getting the indentation of each 'def'

indents = {}

for i in defidxs:

    ll = lines[i].split('def')

    indents[i] = len(ll[0])



#extracting the strings

end = len(lines)-1

while end > 0:

    if end < defidxs[-1]:

        defidxs.pop()

    try:

        start = defidxs[-1]

    except IndexError: #break if there are no more 'def'

        break



    #empty lines between functions will cause an error, let's remove them

    if len(lines[end].strip()) == 0:

        end = end -1

        continue



    try:

        #fix lines removing indentation or compile will not compile

        fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation

        body = 'n'.join(fixlines)

        compile(body, '<string>', 'exec') #if it fails, throws an exception

        print(body)

        end = start #no need to parse less line if it succeed.

    except:

        pass



    end = end -1

This will prints

def baz():

  return [

1,

  ]

def __init__(self):

  self.my_list = [

      'a',

      'b',

  ]

def bir(a, b):

  c = a + b

  return c

def foo(a, b):

  return a + b

Note that the functions are printed in reverse order than those they appear inside code_strings

This should handle even the weird indentation code, but I think it will fails if you have nested functions.

answered 1 hour ago

Valentino

39929

add a comment |

Rather than reinventing a parser, I would use python itself.

code_string = """

#A comment

def foo(a, b):

  return a + b



def bir(a, b):

  c = a + b

  return c



class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



""".strip()



lines = code_string.split('n')



#looking for lines with 'def' keywords

defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]



#getting the indentation of each 'def'

indents = {}

for i in defidxs:

    ll = lines[i].split('def')

    indents[i] = len(ll[0])



#extracting the strings

end = len(lines)-1

while end > 0:

    if end < defidxs[-1]:

        defidxs.pop()

    try:

        start = defidxs[-1]

    except IndexError: #break if there are no more 'def'

        break



    #empty lines between functions will cause an error, let's remove them

    if len(lines[end].strip()) == 0:

        end = end -1

        continue



    try:

        #fix lines removing indentation or compile will not compile

        fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation

        body = 'n'.join(fixlines)

        compile(body, '<string>', 'exec') #if it fails, throws an exception

        print(body)

        end = start #no need to parse less line if it succeed.

    except:

        pass



    end = end -1

This will prints

def baz():

  return [

1,

  ]

def __init__(self):

  self.my_list = [

      'a',

      'b',

  ]

def bir(a, b):

  c = a + b

  return c

def foo(a, b):

  return a + b

Note that the functions are printed in reverse order than those they appear inside code_strings

This should handle even the weird indentation code, but I think it will fails if you have nested functions.

answered 1 hour ago

Valentino

39929

Rather than reinventing a parser, I would use python itself.

code_string = """

#A comment

def foo(a, b):

  return a + b



def bir(a, b):

  c = a + b

  return c



class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



""".strip()



lines = code_string.split('n')



#looking for lines with 'def' keywords

defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]



#getting the indentation of each 'def'

indents = {}

for i in defidxs:

    ll = lines[i].split('def')

    indents[i] = len(ll[0])



#extracting the strings

end = len(lines)-1

while end > 0:

    if end < defidxs[-1]:

        defidxs.pop()

    try:

        start = defidxs[-1]

    except IndexError: #break if there are no more 'def'

        break



    #empty lines between functions will cause an error, let's remove them

    if len(lines[end].strip()) == 0:

        end = end -1

        continue



    try:

        #fix lines removing indentation or compile will not compile

        fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation

        body = 'n'.join(fixlines)

        compile(body, '<string>', 'exec') #if it fails, throws an exception

        print(body)

        end = start #no need to parse less line if it succeed.

    except:

        pass



    end = end -1

This will prints

def baz():

  return [

1,

  ]

def __init__(self):

  self.my_list = [

      'a',

      'b',

  ]

def bir(a, b):

  c = a + b

  return c

def foo(a, b):

  return a + b

Note that the functions are printed in reverse order than those they appear inside code_strings

This should handle even the weird indentation code, but I think it will fails if you have nested functions.

answered 1 hour ago

Valentino

39929

answered 1 hour ago

Valentino

39929

answered 1 hour ago

Valentino

39929

answered 1 hour ago

Valentino

39929

add a comment |

I think a small parser is in order to try and take into account this weird exceptions:

import re



code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



def test_comment(

    a #)

):

    return [a,

    # ]

a]



def test_escaped_endline():

    return "asdad 

asdsad 

asdas"



def test_nested():

    return {():[,

{

}

]

}

""".strip()



code_string += 'n'





func_list=

func = ''

tab  = ''

brackets = {'(':0, '[':0, '{':0}

close = {')':'(', ']':'[', '}':'{'}

string=''

tab_f=''

c_old=''

multiline=False

check=False

for line in code_string.split('n'):

    tab = re.findall(r'^s*',line)[0]

    if 'def ' in line and not func:

        func += line + 'n'

        tab_f = tab

        check=True

    if func:

        if not check:

            if sum(brackets.values()) == 0 and not string and not multiline:

                if len(tab) <= len(tab_f):

                    func_list.append(func)

                    func=''

                    c_old=''

                    c_old2=''

                    continue

            func += line + 'n'

        check = False

        for c in line:

            if c == '#' and not string and not multiline:

                break

            if c_old != '\':

                if c in ['"', "'"]:

                    if c_old2 == c_old == c == '"' and string != "'":

                        multiline = not multiline

                        string = ''

                        continue

                    if not multiline:

                        if c in string:

                            string = ''

                        else:

                            if not string:

                                string = c

                if not string and not multiline:

                    if c in brackets:

                        brackets[c] += 1

                    if c in close:

                        b = close[c]

                        brackets[b] -= 1

            c_old2=c_old

            c_old=c



for f in func_list:

    print('-'*40)

    print(f)

output:

----------------------------------------

def foo(a, b):

  return a + b



----------------------------------------

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



----------------------------------------

def baz():

  return [

1,

  ]



----------------------------------------

  def hello(self, x):

    return self.hello(

x - 1)



----------------------------------------

def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



----------------------------------------

def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



----------------------------------------

def test_comment(

    a #)

):

    return [a,

    # ]

a]



----------------------------------------

def test_escaped_endline():

    return "asdad asdsad asdas"



----------------------------------------

def test_nested():

    return {():[,

{

}

]

}

edited 1 hour ago

answered 2 hours ago

Crivella

33627

Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

– pkpnd
2 hours ago

Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

– Crivella
2 hours ago

You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

– pkpnd
2 hours ago

1

Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

– Crivella
2 hours ago

add a comment |

I think a small parser is in order to try and take into account this weird exceptions:

import re



code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



def test_comment(

    a #)

):

    return [a,

    # ]

a]



def test_escaped_endline():

    return "asdad 

asdsad 

asdas"



def test_nested():

    return {():[,

{

}

]

}

""".strip()



code_string += 'n'





func_list=

func = ''

tab  = ''

brackets = {'(':0, '[':0, '{':0}

close = {')':'(', ']':'[', '}':'{'}

string=''

tab_f=''

c_old=''

multiline=False

check=False

for line in code_string.split('n'):

    tab = re.findall(r'^s*',line)[0]

    if 'def ' in line and not func:

        func += line + 'n'

        tab_f = tab

        check=True

    if func:

        if not check:

            if sum(brackets.values()) == 0 and not string and not multiline:

                if len(tab) <= len(tab_f):

                    func_list.append(func)

                    func=''

                    c_old=''

                    c_old2=''

                    continue

            func += line + 'n'

        check = False

        for c in line:

            if c == '#' and not string and not multiline:

                break

            if c_old != '\':

                if c in ['"', "'"]:

                    if c_old2 == c_old == c == '"' and string != "'":

                        multiline = not multiline

                        string = ''

                        continue

                    if not multiline:

                        if c in string:

                            string = ''

                        else:

                            if not string:

                                string = c

                if not string and not multiline:

                    if c in brackets:

                        brackets[c] += 1

                    if c in close:

                        b = close[c]

                        brackets[b] -= 1

            c_old2=c_old

            c_old=c



for f in func_list:

    print('-'*40)

    print(f)

output:

----------------------------------------

def foo(a, b):

  return a + b



----------------------------------------

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



----------------------------------------

def baz():

  return [

1,

  ]



----------------------------------------

  def hello(self, x):

    return self.hello(

x - 1)



----------------------------------------

def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



----------------------------------------

def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



----------------------------------------

def test_comment(

    a #)

):

    return [a,

    # ]

a]



----------------------------------------

def test_escaped_endline():

    return "asdad asdsad asdas"



----------------------------------------

def test_nested():

    return {():[,

{

}

]

}

edited 1 hour ago

answered 2 hours ago

Crivella

33627

Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

– pkpnd
2 hours ago

Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

– Crivella
2 hours ago

You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

– pkpnd
2 hours ago

1

Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

– Crivella
2 hours ago

add a comment |

I think a small parser is in order to try and take into account this weird exceptions:

import re



code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



def test_comment(

    a #)

):

    return [a,

    # ]

a]



def test_escaped_endline():

    return "asdad 

asdsad 

asdas"



def test_nested():

    return {():[,

{

}

]

}

""".strip()



code_string += 'n'





func_list=

func = ''

tab  = ''

brackets = {'(':0, '[':0, '{':0}

close = {')':'(', ']':'[', '}':'{'}

string=''

tab_f=''

c_old=''

multiline=False

check=False

for line in code_string.split('n'):

    tab = re.findall(r'^s*',line)[0]

    if 'def ' in line and not func:

        func += line + 'n'

        tab_f = tab

        check=True

    if func:

        if not check:

            if sum(brackets.values()) == 0 and not string and not multiline:

                if len(tab) <= len(tab_f):

                    func_list.append(func)

                    func=''

                    c_old=''

                    c_old2=''

                    continue

            func += line + 'n'

        check = False

        for c in line:

            if c == '#' and not string and not multiline:

                break

            if c_old != '\':

                if c in ['"', "'"]:

                    if c_old2 == c_old == c == '"' and string != "'":

                        multiline = not multiline

                        string = ''

                        continue

                    if not multiline:

                        if c in string:

                            string = ''

                        else:

                            if not string:

                                string = c

                if not string and not multiline:

                    if c in brackets:

                        brackets[c] += 1

                    if c in close:

                        b = close[c]

                        brackets[b] -= 1

            c_old2=c_old

            c_old=c



for f in func_list:

    print('-'*40)

    print(f)

output:

----------------------------------------

def foo(a, b):

  return a + b



----------------------------------------

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



----------------------------------------

def baz():

  return [

1,

  ]



----------------------------------------

  def hello(self, x):

    return self.hello(

x - 1)



----------------------------------------

def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



----------------------------------------

def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



----------------------------------------

def test_comment(

    a #)

):

    return [a,

    # ]

a]



----------------------------------------

def test_escaped_endline():

    return "asdad asdsad asdas"



----------------------------------------

def test_nested():

    return {():[,

{

}

]

}

edited 1 hour ago

answered 2 hours ago

Crivella

33627

I think a small parser is in order to try and take into account this weird exceptions:

import re



code_string = """

# A comment.

def foo(a, b):

  return a + b

class Bar(object):

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



def baz():

  return [

1,

  ]



class Baz(object):

  def hello(self, x):

    return self.hello(

x - 1)



def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



def test_comment(

    a #)

):

    return [a,

    # ]

a]



def test_escaped_endline():

    return "asdad 

asdsad 

asdas"



def test_nested():

    return {():[,

{

}

]

}

""".strip()



code_string += 'n'





func_list=

func = ''

tab  = ''

brackets = {'(':0, '[':0, '{':0}

close = {')':'(', ']':'[', '}':'{'}

string=''

tab_f=''

c_old=''

multiline=False

check=False

for line in code_string.split('n'):

    tab = re.findall(r'^s*',line)[0]

    if 'def ' in line and not func:

        func += line + 'n'

        tab_f = tab

        check=True

    if func:

        if not check:

            if sum(brackets.values()) == 0 and not string and not multiline:

                if len(tab) <= len(tab_f):

                    func_list.append(func)

                    func=''

                    c_old=''

                    c_old2=''

                    continue

            func += line + 'n'

        check = False

        for c in line:

            if c == '#' and not string and not multiline:

                break

            if c_old != '\':

                if c in ['"', "'"]:

                    if c_old2 == c_old == c == '"' and string != "'":

                        multiline = not multiline

                        string = ''

                        continue

                    if not multiline:

                        if c in string:

                            string = ''

                        else:

                            if not string:

                                string = c

                if not string and not multiline:

                    if c in brackets:

                        brackets[c] += 1

                    if c in close:

                        b = close[c]

                        brackets[b] -= 1

            c_old2=c_old

            c_old=c



for f in func_list:

    print('-'*40)

    print(f)

output:

----------------------------------------

def foo(a, b):

  return a + b



----------------------------------------

  def __init__(self):

    self.my_list = [

        'a',

        'b',

    ]



----------------------------------------

def baz():

  return [

1,

  ]



----------------------------------------

  def hello(self, x):

    return self.hello(

x - 1)



----------------------------------------

def my_type_annotated_function(

  my_long_argument_name: SomeLongArgumentTypeName

) -> SomeLongReturnTypeName:

  # This function's indentation isn't unusual at all.

  pass



----------------------------------------

def test_multiline():

    """

    asdasdada

sdadd

    """

    pass



----------------------------------------

def test_comment(

    a #)

):

    return [a,

    # ]

a]



----------------------------------------

def test_escaped_endline():

    return "asdad asdsad asdas"



----------------------------------------

def test_nested():

    return {():[,

{

}

]

}

edited 1 hour ago

answered 2 hours ago

Crivella

33627

edited 1 hour ago

answered 2 hours ago

Crivella

33627

answered 2 hours ago

Crivella

33627

answered 2 hours ago

Crivella

33627

Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

– pkpnd
2 hours ago

Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

– Crivella
2 hours ago

You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

– pkpnd
2 hours ago

1

Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

– Crivella
2 hours ago

add a comment |

Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

– pkpnd
2 hours ago

Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

– Crivella
2 hours ago

You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

– pkpnd
2 hours ago

1

Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

– Crivella
2 hours ago

Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

– pkpnd
2 hours ago

Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

– Crivella
2 hours ago

You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

– pkpnd
2 hours ago

Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

– Crivella
2 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj