Extract Python function source text from the source code string












7















Suppose I have valid Python source code, as a string:



code_string = """
# A comment.
def foo(a, b):
return a + b
class Bar(object):
def __init__(self):
self.my_list = [
'a',
'b',
]
""".strip()


Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings



def foo(a, b):
return a + b


and



  def __init__(self):
self.my_list = [
'a',
'b',
]


Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.



Attempts



I can parse the code string into its AST:



code_ast = ast.parse(code_string)


And I can find the FunctionDef nodes, e.g.:



function_def_nodes = [node for node in ast.walk(code_ast)
if isinstance(node, ast.FunctionDef)]


Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:



last_line = max(node.lineno for node in ast.walk(function_def_node)
if hasattr(node, 'lineno'))


but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.



I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.



I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.



In theory I could write a parser for Python but that really seems like overkill.



A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:



def baz():
return [
1,
]

class Baz(object):
def hello(self, x):
return self.hello(
x - 1)

def my_type_annotated_function(
my_long_argument_name: SomeLongArgumentTypeName
) -> SomeLongReturnTypeName:
# This function's indentation isn't unusual at all.
pass









share|improve this question

























  • I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

    – Blorgbeard
    4 hours ago











  • You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

    – pkpnd
    4 hours ago











  • Oops, yes. You get the idea, anyway.

    – Blorgbeard
    4 hours ago











  • Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

    – pkpnd
    4 hours ago











  • Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

    – Blorgbeard
    3 hours ago
















7















Suppose I have valid Python source code, as a string:



code_string = """
# A comment.
def foo(a, b):
return a + b
class Bar(object):
def __init__(self):
self.my_list = [
'a',
'b',
]
""".strip()


Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings



def foo(a, b):
return a + b


and



  def __init__(self):
self.my_list = [
'a',
'b',
]


Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.



Attempts



I can parse the code string into its AST:



code_ast = ast.parse(code_string)


And I can find the FunctionDef nodes, e.g.:



function_def_nodes = [node for node in ast.walk(code_ast)
if isinstance(node, ast.FunctionDef)]


Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:



last_line = max(node.lineno for node in ast.walk(function_def_node)
if hasattr(node, 'lineno'))


but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.



I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.



I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.



In theory I could write a parser for Python but that really seems like overkill.



A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:



def baz():
return [
1,
]

class Baz(object):
def hello(self, x):
return self.hello(
x - 1)

def my_type_annotated_function(
my_long_argument_name: SomeLongArgumentTypeName
) -> SomeLongReturnTypeName:
# This function's indentation isn't unusual at all.
pass









share|improve this question

























  • I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

    – Blorgbeard
    4 hours ago











  • You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

    – pkpnd
    4 hours ago











  • Oops, yes. You get the idea, anyway.

    – Blorgbeard
    4 hours ago











  • Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

    – pkpnd
    4 hours ago











  • Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

    – Blorgbeard
    3 hours ago














7












7








7


1






Suppose I have valid Python source code, as a string:



code_string = """
# A comment.
def foo(a, b):
return a + b
class Bar(object):
def __init__(self):
self.my_list = [
'a',
'b',
]
""".strip()


Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings



def foo(a, b):
return a + b


and



  def __init__(self):
self.my_list = [
'a',
'b',
]


Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.



Attempts



I can parse the code string into its AST:



code_ast = ast.parse(code_string)


And I can find the FunctionDef nodes, e.g.:



function_def_nodes = [node for node in ast.walk(code_ast)
if isinstance(node, ast.FunctionDef)]


Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:



last_line = max(node.lineno for node in ast.walk(function_def_node)
if hasattr(node, 'lineno'))


but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.



I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.



I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.



In theory I could write a parser for Python but that really seems like overkill.



A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:



def baz():
return [
1,
]

class Baz(object):
def hello(self, x):
return self.hello(
x - 1)

def my_type_annotated_function(
my_long_argument_name: SomeLongArgumentTypeName
) -> SomeLongReturnTypeName:
# This function's indentation isn't unusual at all.
pass









share|improve this question
















Suppose I have valid Python source code, as a string:



code_string = """
# A comment.
def foo(a, b):
return a + b
class Bar(object):
def __init__(self):
self.my_list = [
'a',
'b',
]
""".strip()


Objective: I would like to obtain the lines containing the source code of the function definitions, preserving whitespace. For the code string above, I would like to get the strings



def foo(a, b):
return a + b


and



  def __init__(self):
self.my_list = [
'a',
'b',
]


Or, equivalently, I'd be happy to get the line numbers of functions in the code string: foo spans lines 2-3, and __init__ spans lines 5-9.



Attempts



I can parse the code string into its AST:



code_ast = ast.parse(code_string)


And I can find the FunctionDef nodes, e.g.:



function_def_nodes = [node for node in ast.walk(code_ast)
if isinstance(node, ast.FunctionDef)]


Each FunctionDef node's lineno attribute tells us the first line for that function. We can estimate the last line of that function with:



last_line = max(node.lineno for node in ast.walk(function_def_node)
if hasattr(node, 'lineno'))


but this doesn't work perfectly when the function ends with syntactic elements that don't show up as AST nodes, for instance the last ] in __init__.



I doubt there is an approach that only uses the AST, because the AST fundamentally does not have enough information in cases like __init__.



I cannot use the inspect module because that only works on "live objects" and I only have the Python code as a string. I cannot eval the code because that's a huge security headache.



In theory I could write a parser for Python but that really seems like overkill.



A heuristic suggested in the comments is to use the leading whitespace of lines. However, that can break for strange but valid functions with weird indentation like:



def baz():
return [
1,
]

class Baz(object):
def hello(self, x):
return self.hello(
x - 1)

def my_type_annotated_function(
my_long_argument_name: SomeLongArgumentTypeName
) -> SomeLongReturnTypeName:
# This function's indentation isn't unusual at all.
pass






python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 3 hours ago







pkpnd

















asked 5 hours ago









pkpndpkpnd

4,6211140




4,6211140













  • I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

    – Blorgbeard
    4 hours ago











  • You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

    – pkpnd
    4 hours ago











  • Oops, yes. You get the idea, anyway.

    – Blorgbeard
    4 hours ago











  • Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

    – pkpnd
    4 hours ago











  • Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

    – Blorgbeard
    3 hours ago



















  • I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

    – Blorgbeard
    4 hours ago











  • You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

    – pkpnd
    4 hours ago











  • Oops, yes. You get the idea, anyway.

    – Blorgbeard
    4 hours ago











  • Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

    – pkpnd
    4 hours ago











  • Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

    – Blorgbeard
    3 hours ago

















I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

– Blorgbeard
4 hours ago





I suppose you could just iterate lines, and when one matches ^(s*)defs.*$, extract that matched group (the leading whitespace) and then consume the line and all subsequent lines that startWith(thatWhitespace)

– Blorgbeard
4 hours ago













You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

– pkpnd
4 hours ago





You mean, extract all subsequent lines that start with strictly more than that whitespace? Or else you'd also extract the following functions defined at the same indentation level

– pkpnd
4 hours ago













Oops, yes. You get the idea, anyway.

– Blorgbeard
4 hours ago





Oops, yes. You get the idea, anyway.

– Blorgbeard
4 hours ago













Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

– pkpnd
4 hours ago





Hmm, doesn't work if the function has weird indentation inside, for example def baz():n return [n1,n ]

– pkpnd
4 hours ago













Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

– Blorgbeard
3 hours ago





Ah, I didn't even realise that was valid python. Looks like there's no simple text-processing method, then.

– Blorgbeard
3 hours ago












3 Answers
3






active

oldest

votes


















2














A much more robust solution would be to use the tokenize module. The following code can handle weird indentations, comments, multi-line tokens, single-line function blocks and empty lines within function blocks:



import tokenize
from io import BytesIO
from collections import deque
code_string = """
# A comment.
def foo(a, b):
return a + b

class Bar(object):
def __init__(self):

self.my_list = [
'a',
'b',
]

def test(self): pass
def abc(self):
'''multi-
line token'''

def baz():
return [
1,
]

class Baz(object):
def hello(self, x):
return self.hello(
x - 1)

def my_type_annotated_function(
my_long_argument_name: SomeLongArgumentTypeName
) -> SomeLongReturnTypeName:
# unmatched parenthesis: ( }
pass
""".strip()
file = BytesIO(code_string.encode())
tokens = deque(tokenize.tokenize(file.readline))
lines =
while tokens:
token = tokens.popleft()
if token.type == tokenize.NAME and token.string == 'def':
start_line, start_column = token.start
end_line, _ = token.end
enclosures = 0
while tokens:
token = tokens.popleft()
if token.type == tokenize.NL: # ignore empty lines
continue
if token.type == tokenize.OP and token.string in '([{':
enclosures += 1
_, column = token.start
if column <= start_column and token.type != tokenize.INDENT and not enclosures:
tokens.appendleft(token)
break
if token.type == tokenize.OP and token.string in ')]}':
enclosures -= 1
end_line, _ = token.end
lines.append((start_line, end_line))
print(lines)


This outputs:



[(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]





share|improve this answer


























  • This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

    – pkpnd
    2 hours ago











  • Oops did not actually have any logic to handle weird indentation. Added now.

    – blhsing
    28 mins ago











  • This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

    – user2357112
    9 mins ago



















1














Rather than reinventing a parser, I would use python itself.



Basically I would use the compile() built-in function, which can check if a string is a valid python code by compiling it. I pass to it a string made of selected lines, starting from each def to the farther line which does not fail to compile.



code_string = """
#A comment
def foo(a, b):
return a + b

def bir(a, b):
c = a + b
return c

class Bar(object):
def __init__(self):
self.my_list = [
'a',
'b',
]

def baz():
return [
1,
]

""".strip()

lines = code_string.split('n')

#looking for lines with 'def' keywords
defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]

#getting the indentation of each 'def'
indents = {}
for i in defidxs:
ll = lines[i].split('def')
indents[i] = len(ll[0])

#extracting the strings
end = len(lines)-1
while end > 0:
if end < defidxs[-1]:
defidxs.pop()
try:
start = defidxs[-1]
except IndexError: #break if there are no more 'def'
break

#empty lines between functions will cause an error, let's remove them
if len(lines[end].strip()) == 0:
end = end -1
continue

try:
#fix lines removing indentation or compile will not compile
fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation
body = 'n'.join(fixlines)
compile(body, '<string>', 'exec') #if it fails, throws an exception
print(body)
end = start #no need to parse less line if it succeed.
except:
pass

end = end -1


It is a bit nasty because of the except clause without specific exceptions, which is usually not recommended, but there is no way to know what may cause compile to fail, so I do not know how to avoid it.



This will prints



def baz():
return [
1,
]
def __init__(self):
self.my_list = [
'a',
'b',
]
def bir(a, b):
c = a + b
return c
def foo(a, b):
return a + b


Note that the functions are printed in reverse order than those they appear inside code_strings



This should handle even the weird indentation code, but I think it will fails if you have nested functions.






share|improve this answer































    1














    I think a small parser is in order to try and take into account this weird exceptions:



    import re

    code_string = """
    # A comment.
    def foo(a, b):
    return a + b
    class Bar(object):
    def __init__(self):
    self.my_list = [
    'a',
    'b',
    ]

    def baz():
    return [
    1,
    ]

    class Baz(object):
    def hello(self, x):
    return self.hello(
    x - 1)

    def my_type_annotated_function(
    my_long_argument_name: SomeLongArgumentTypeName
    ) -> SomeLongReturnTypeName:
    # This function's indentation isn't unusual at all.
    pass

    def test_multiline():
    """
    asdasdada
    sdadd
    """
    pass

    def test_comment(
    a #)
    ):
    return [a,
    # ]
    a]

    def test_escaped_endline():
    return "asdad
    asdsad
    asdas"

    def test_nested():
    return {():[,
    {
    }
    ]
    }
    """.strip()

    code_string += 'n'


    func_list=
    func = ''
    tab = ''
    brackets = {'(':0, '[':0, '{':0}
    close = {')':'(', ']':'[', '}':'{'}
    string=''
    tab_f=''
    c_old=''
    multiline=False
    check=False
    for line in code_string.split('n'):
    tab = re.findall(r'^s*',line)[0]
    if 'def ' in line and not func:
    func += line + 'n'
    tab_f = tab
    check=True
    if func:
    if not check:
    if sum(brackets.values()) == 0 and not string and not multiline:
    if len(tab) <= len(tab_f):
    func_list.append(func)
    func=''
    c_old=''
    c_old2=''
    continue
    func += line + 'n'
    check = False
    for c in line:
    if c == '#' and not string and not multiline:
    break
    if c_old != '\':
    if c in ['"', "'"]:
    if c_old2 == c_old == c == '"' and string != "'":
    multiline = not multiline
    string = ''
    continue
    if not multiline:
    if c in string:
    string = ''
    else:
    if not string:
    string = c
    if not string and not multiline:
    if c in brackets:
    brackets[c] += 1
    if c in close:
    b = close[c]
    brackets[b] -= 1
    c_old2=c_old
    c_old=c

    for f in func_list:
    print('-'*40)
    print(f)


    output:



    ----------------------------------------
    def foo(a, b):
    return a + b

    ----------------------------------------
    def __init__(self):
    self.my_list = [
    'a',
    'b',
    ]

    ----------------------------------------
    def baz():
    return [
    1,
    ]

    ----------------------------------------
    def hello(self, x):
    return self.hello(
    x - 1)

    ----------------------------------------
    def my_type_annotated_function(
    my_long_argument_name: SomeLongArgumentTypeName
    ) -> SomeLongReturnTypeName:
    # This function's indentation isn't unusual at all.
    pass

    ----------------------------------------
    def test_multiline():
    """
    asdasdada
    sdadd
    """
    pass

    ----------------------------------------
    def test_comment(
    a #)
    ):
    return [a,
    # ]
    a]

    ----------------------------------------
    def test_escaped_endline():
    return "asdad asdsad asdas"

    ----------------------------------------
    def test_nested():
    return {():[,
    {
    }
    ]
    }





    share|improve this answer


























    • Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

      – pkpnd
      2 hours ago











    • Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

      – Crivella
      2 hours ago













    • You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

      – pkpnd
      2 hours ago






    • 1





      Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

      – Crivella
      2 hours ago











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54374296%2fextract-python-function-source-text-from-the-source-code-string%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    A much more robust solution would be to use the tokenize module. The following code can handle weird indentations, comments, multi-line tokens, single-line function blocks and empty lines within function blocks:



    import tokenize
    from io import BytesIO
    from collections import deque
    code_string = """
    # A comment.
    def foo(a, b):
    return a + b

    class Bar(object):
    def __init__(self):

    self.my_list = [
    'a',
    'b',
    ]

    def test(self): pass
    def abc(self):
    '''multi-
    line token'''

    def baz():
    return [
    1,
    ]

    class Baz(object):
    def hello(self, x):
    return self.hello(
    x - 1)

    def my_type_annotated_function(
    my_long_argument_name: SomeLongArgumentTypeName
    ) -> SomeLongReturnTypeName:
    # unmatched parenthesis: ( }
    pass
    """.strip()
    file = BytesIO(code_string.encode())
    tokens = deque(tokenize.tokenize(file.readline))
    lines =
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NAME and token.string == 'def':
    start_line, start_column = token.start
    end_line, _ = token.end
    enclosures = 0
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NL: # ignore empty lines
    continue
    if token.type == tokenize.OP and token.string in '([{':
    enclosures += 1
    _, column = token.start
    if column <= start_column and token.type != tokenize.INDENT and not enclosures:
    tokens.appendleft(token)
    break
    if token.type == tokenize.OP and token.string in ')]}':
    enclosures -= 1
    end_line, _ = token.end
    lines.append((start_line, end_line))
    print(lines)


    This outputs:



    [(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]





    share|improve this answer


























    • This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

      – pkpnd
      2 hours ago











    • Oops did not actually have any logic to handle weird indentation. Added now.

      – blhsing
      28 mins ago











    • This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

      – user2357112
      9 mins ago
















    2














    A much more robust solution would be to use the tokenize module. The following code can handle weird indentations, comments, multi-line tokens, single-line function blocks and empty lines within function blocks:



    import tokenize
    from io import BytesIO
    from collections import deque
    code_string = """
    # A comment.
    def foo(a, b):
    return a + b

    class Bar(object):
    def __init__(self):

    self.my_list = [
    'a',
    'b',
    ]

    def test(self): pass
    def abc(self):
    '''multi-
    line token'''

    def baz():
    return [
    1,
    ]

    class Baz(object):
    def hello(self, x):
    return self.hello(
    x - 1)

    def my_type_annotated_function(
    my_long_argument_name: SomeLongArgumentTypeName
    ) -> SomeLongReturnTypeName:
    # unmatched parenthesis: ( }
    pass
    """.strip()
    file = BytesIO(code_string.encode())
    tokens = deque(tokenize.tokenize(file.readline))
    lines =
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NAME and token.string == 'def':
    start_line, start_column = token.start
    end_line, _ = token.end
    enclosures = 0
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NL: # ignore empty lines
    continue
    if token.type == tokenize.OP and token.string in '([{':
    enclosures += 1
    _, column = token.start
    if column <= start_column and token.type != tokenize.INDENT and not enclosures:
    tokens.appendleft(token)
    break
    if token.type == tokenize.OP and token.string in ')]}':
    enclosures -= 1
    end_line, _ = token.end
    lines.append((start_line, end_line))
    print(lines)


    This outputs:



    [(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]





    share|improve this answer


























    • This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

      – pkpnd
      2 hours ago











    • Oops did not actually have any logic to handle weird indentation. Added now.

      – blhsing
      28 mins ago











    • This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

      – user2357112
      9 mins ago














    2












    2








    2







    A much more robust solution would be to use the tokenize module. The following code can handle weird indentations, comments, multi-line tokens, single-line function blocks and empty lines within function blocks:



    import tokenize
    from io import BytesIO
    from collections import deque
    code_string = """
    # A comment.
    def foo(a, b):
    return a + b

    class Bar(object):
    def __init__(self):

    self.my_list = [
    'a',
    'b',
    ]

    def test(self): pass
    def abc(self):
    '''multi-
    line token'''

    def baz():
    return [
    1,
    ]

    class Baz(object):
    def hello(self, x):
    return self.hello(
    x - 1)

    def my_type_annotated_function(
    my_long_argument_name: SomeLongArgumentTypeName
    ) -> SomeLongReturnTypeName:
    # unmatched parenthesis: ( }
    pass
    """.strip()
    file = BytesIO(code_string.encode())
    tokens = deque(tokenize.tokenize(file.readline))
    lines =
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NAME and token.string == 'def':
    start_line, start_column = token.start
    end_line, _ = token.end
    enclosures = 0
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NL: # ignore empty lines
    continue
    if token.type == tokenize.OP and token.string in '([{':
    enclosures += 1
    _, column = token.start
    if column <= start_column and token.type != tokenize.INDENT and not enclosures:
    tokens.appendleft(token)
    break
    if token.type == tokenize.OP and token.string in ')]}':
    enclosures -= 1
    end_line, _ = token.end
    lines.append((start_line, end_line))
    print(lines)


    This outputs:



    [(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]





    share|improve this answer















    A much more robust solution would be to use the tokenize module. The following code can handle weird indentations, comments, multi-line tokens, single-line function blocks and empty lines within function blocks:



    import tokenize
    from io import BytesIO
    from collections import deque
    code_string = """
    # A comment.
    def foo(a, b):
    return a + b

    class Bar(object):
    def __init__(self):

    self.my_list = [
    'a',
    'b',
    ]

    def test(self): pass
    def abc(self):
    '''multi-
    line token'''

    def baz():
    return [
    1,
    ]

    class Baz(object):
    def hello(self, x):
    return self.hello(
    x - 1)

    def my_type_annotated_function(
    my_long_argument_name: SomeLongArgumentTypeName
    ) -> SomeLongReturnTypeName:
    # unmatched parenthesis: ( }
    pass
    """.strip()
    file = BytesIO(code_string.encode())
    tokens = deque(tokenize.tokenize(file.readline))
    lines =
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NAME and token.string == 'def':
    start_line, start_column = token.start
    end_line, _ = token.end
    enclosures = 0
    while tokens:
    token = tokens.popleft()
    if token.type == tokenize.NL: # ignore empty lines
    continue
    if token.type == tokenize.OP and token.string in '([{':
    enclosures += 1
    _, column = token.start
    if column <= start_column and token.type != tokenize.INDENT and not enclosures:
    tokens.appendleft(token)
    break
    if token.type == tokenize.OP and token.string in ')]}':
    enclosures -= 1
    end_line, _ = token.end
    lines.append((start_line, end_line))
    print(lines)


    This outputs:



    [(2, 3), (6, 11), (13, 13), (14, 16), (18, 21), (24, 26), (28, 32)]






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 21 mins ago

























    answered 2 hours ago









    blhsingblhsing

    29.9k41336




    29.9k41336













    • This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

      – pkpnd
      2 hours ago











    • Oops did not actually have any logic to handle weird indentation. Added now.

      – blhsing
      28 mins ago











    • This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

      – user2357112
      9 mins ago



















    • This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

      – pkpnd
      2 hours ago











    • Oops did not actually have any logic to handle weird indentation. Added now.

      – blhsing
      28 mins ago











    • This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

      – user2357112
      9 mins ago

















    This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

    – pkpnd
    2 hours ago





    This looks promising. Are you sure it works for the "weird indentation" cases? I tried your code and it seems to break on all of the "weird indentation" functions I provided, extracting only the first part of each function.

    – pkpnd
    2 hours ago













    Oops did not actually have any logic to handle weird indentation. Added now.

    – blhsing
    28 mins ago





    Oops did not actually have any logic to handle weird indentation. Added now.

    – blhsing
    28 mins ago













    This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

    – user2357112
    9 mins ago





    This fails to handle line continuations. Looking for INDENT and DEDENT tokens (and checking for the single-logical-line case, where there is no INDENT) would probably be more robust.

    – user2357112
    9 mins ago













    1














    Rather than reinventing a parser, I would use python itself.



    Basically I would use the compile() built-in function, which can check if a string is a valid python code by compiling it. I pass to it a string made of selected lines, starting from each def to the farther line which does not fail to compile.



    code_string = """
    #A comment
    def foo(a, b):
    return a + b

    def bir(a, b):
    c = a + b
    return c

    class Bar(object):
    def __init__(self):
    self.my_list = [
    'a',
    'b',
    ]

    def baz():
    return [
    1,
    ]

    """.strip()

    lines = code_string.split('n')

    #looking for lines with 'def' keywords
    defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]

    #getting the indentation of each 'def'
    indents = {}
    for i in defidxs:
    ll = lines[i].split('def')
    indents[i] = len(ll[0])

    #extracting the strings
    end = len(lines)-1
    while end > 0:
    if end < defidxs[-1]:
    defidxs.pop()
    try:
    start = defidxs[-1]
    except IndexError: #break if there are no more 'def'
    break

    #empty lines between functions will cause an error, let's remove them
    if len(lines[end].strip()) == 0:
    end = end -1
    continue

    try:
    #fix lines removing indentation or compile will not compile
    fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation
    body = 'n'.join(fixlines)
    compile(body, '<string>', 'exec') #if it fails, throws an exception
    print(body)
    end = start #no need to parse less line if it succeed.
    except:
    pass

    end = end -1


    It is a bit nasty because of the except clause without specific exceptions, which is usually not recommended, but there is no way to know what may cause compile to fail, so I do not know how to avoid it.



    This will prints



    def baz():
    return [
    1,
    ]
    def __init__(self):
    self.my_list = [
    'a',
    'b',
    ]
    def bir(a, b):
    c = a + b
    return c
    def foo(a, b):
    return a + b


    Note that the functions are printed in reverse order than those they appear inside code_strings



    This should handle even the weird indentation code, but I think it will fails if you have nested functions.






    share|improve this answer




























      1














      Rather than reinventing a parser, I would use python itself.



      Basically I would use the compile() built-in function, which can check if a string is a valid python code by compiling it. I pass to it a string made of selected lines, starting from each def to the farther line which does not fail to compile.



      code_string = """
      #A comment
      def foo(a, b):
      return a + b

      def bir(a, b):
      c = a + b
      return c

      class Bar(object):
      def __init__(self):
      self.my_list = [
      'a',
      'b',
      ]

      def baz():
      return [
      1,
      ]

      """.strip()

      lines = code_string.split('n')

      #looking for lines with 'def' keywords
      defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]

      #getting the indentation of each 'def'
      indents = {}
      for i in defidxs:
      ll = lines[i].split('def')
      indents[i] = len(ll[0])

      #extracting the strings
      end = len(lines)-1
      while end > 0:
      if end < defidxs[-1]:
      defidxs.pop()
      try:
      start = defidxs[-1]
      except IndexError: #break if there are no more 'def'
      break

      #empty lines between functions will cause an error, let's remove them
      if len(lines[end].strip()) == 0:
      end = end -1
      continue

      try:
      #fix lines removing indentation or compile will not compile
      fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation
      body = 'n'.join(fixlines)
      compile(body, '<string>', 'exec') #if it fails, throws an exception
      print(body)
      end = start #no need to parse less line if it succeed.
      except:
      pass

      end = end -1


      It is a bit nasty because of the except clause without specific exceptions, which is usually not recommended, but there is no way to know what may cause compile to fail, so I do not know how to avoid it.



      This will prints



      def baz():
      return [
      1,
      ]
      def __init__(self):
      self.my_list = [
      'a',
      'b',
      ]
      def bir(a, b):
      c = a + b
      return c
      def foo(a, b):
      return a + b


      Note that the functions are printed in reverse order than those they appear inside code_strings



      This should handle even the weird indentation code, but I think it will fails if you have nested functions.






      share|improve this answer


























        1












        1








        1







        Rather than reinventing a parser, I would use python itself.



        Basically I would use the compile() built-in function, which can check if a string is a valid python code by compiling it. I pass to it a string made of selected lines, starting from each def to the farther line which does not fail to compile.



        code_string = """
        #A comment
        def foo(a, b):
        return a + b

        def bir(a, b):
        c = a + b
        return c

        class Bar(object):
        def __init__(self):
        self.my_list = [
        'a',
        'b',
        ]

        def baz():
        return [
        1,
        ]

        """.strip()

        lines = code_string.split('n')

        #looking for lines with 'def' keywords
        defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]

        #getting the indentation of each 'def'
        indents = {}
        for i in defidxs:
        ll = lines[i].split('def')
        indents[i] = len(ll[0])

        #extracting the strings
        end = len(lines)-1
        while end > 0:
        if end < defidxs[-1]:
        defidxs.pop()
        try:
        start = defidxs[-1]
        except IndexError: #break if there are no more 'def'
        break

        #empty lines between functions will cause an error, let's remove them
        if len(lines[end].strip()) == 0:
        end = end -1
        continue

        try:
        #fix lines removing indentation or compile will not compile
        fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation
        body = 'n'.join(fixlines)
        compile(body, '<string>', 'exec') #if it fails, throws an exception
        print(body)
        end = start #no need to parse less line if it succeed.
        except:
        pass

        end = end -1


        It is a bit nasty because of the except clause without specific exceptions, which is usually not recommended, but there is no way to know what may cause compile to fail, so I do not know how to avoid it.



        This will prints



        def baz():
        return [
        1,
        ]
        def __init__(self):
        self.my_list = [
        'a',
        'b',
        ]
        def bir(a, b):
        c = a + b
        return c
        def foo(a, b):
        return a + b


        Note that the functions are printed in reverse order than those they appear inside code_strings



        This should handle even the weird indentation code, but I think it will fails if you have nested functions.






        share|improve this answer













        Rather than reinventing a parser, I would use python itself.



        Basically I would use the compile() built-in function, which can check if a string is a valid python code by compiling it. I pass to it a string made of selected lines, starting from each def to the farther line which does not fail to compile.



        code_string = """
        #A comment
        def foo(a, b):
        return a + b

        def bir(a, b):
        c = a + b
        return c

        class Bar(object):
        def __init__(self):
        self.my_list = [
        'a',
        'b',
        ]

        def baz():
        return [
        1,
        ]

        """.strip()

        lines = code_string.split('n')

        #looking for lines with 'def' keywords
        defidxs = [e[0] for e in enumerate(lines) if 'def' in e[1]]

        #getting the indentation of each 'def'
        indents = {}
        for i in defidxs:
        ll = lines[i].split('def')
        indents[i] = len(ll[0])

        #extracting the strings
        end = len(lines)-1
        while end > 0:
        if end < defidxs[-1]:
        defidxs.pop()
        try:
        start = defidxs[-1]
        except IndexError: #break if there are no more 'def'
        break

        #empty lines between functions will cause an error, let's remove them
        if len(lines[end].strip()) == 0:
        end = end -1
        continue

        try:
        #fix lines removing indentation or compile will not compile
        fixlines = [ll[indents[start]:] for ll in lines[start:end+1]] #remove indentation
        body = 'n'.join(fixlines)
        compile(body, '<string>', 'exec') #if it fails, throws an exception
        print(body)
        end = start #no need to parse less line if it succeed.
        except:
        pass

        end = end -1


        It is a bit nasty because of the except clause without specific exceptions, which is usually not recommended, but there is no way to know what may cause compile to fail, so I do not know how to avoid it.



        This will prints



        def baz():
        return [
        1,
        ]
        def __init__(self):
        self.my_list = [
        'a',
        'b',
        ]
        def bir(a, b):
        c = a + b
        return c
        def foo(a, b):
        return a + b


        Note that the functions are printed in reverse order than those they appear inside code_strings



        This should handle even the weird indentation code, but I think it will fails if you have nested functions.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 1 hour ago









        ValentinoValentino

        39929




        39929























            1














            I think a small parser is in order to try and take into account this weird exceptions:



            import re

            code_string = """
            # A comment.
            def foo(a, b):
            return a + b
            class Bar(object):
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            def baz():
            return [
            1,
            ]

            class Baz(object):
            def hello(self, x):
            return self.hello(
            x - 1)

            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            def test_escaped_endline():
            return "asdad
            asdsad
            asdas"

            def test_nested():
            return {():[,
            {
            }
            ]
            }
            """.strip()

            code_string += 'n'


            func_list=
            func = ''
            tab = ''
            brackets = {'(':0, '[':0, '{':0}
            close = {')':'(', ']':'[', '}':'{'}
            string=''
            tab_f=''
            c_old=''
            multiline=False
            check=False
            for line in code_string.split('n'):
            tab = re.findall(r'^s*',line)[0]
            if 'def ' in line and not func:
            func += line + 'n'
            tab_f = tab
            check=True
            if func:
            if not check:
            if sum(brackets.values()) == 0 and not string and not multiline:
            if len(tab) <= len(tab_f):
            func_list.append(func)
            func=''
            c_old=''
            c_old2=''
            continue
            func += line + 'n'
            check = False
            for c in line:
            if c == '#' and not string and not multiline:
            break
            if c_old != '\':
            if c in ['"', "'"]:
            if c_old2 == c_old == c == '"' and string != "'":
            multiline = not multiline
            string = ''
            continue
            if not multiline:
            if c in string:
            string = ''
            else:
            if not string:
            string = c
            if not string and not multiline:
            if c in brackets:
            brackets[c] += 1
            if c in close:
            b = close[c]
            brackets[b] -= 1
            c_old2=c_old
            c_old=c

            for f in func_list:
            print('-'*40)
            print(f)


            output:



            ----------------------------------------
            def foo(a, b):
            return a + b

            ----------------------------------------
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            ----------------------------------------
            def baz():
            return [
            1,
            ]

            ----------------------------------------
            def hello(self, x):
            return self.hello(
            x - 1)

            ----------------------------------------
            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            ----------------------------------------
            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            ----------------------------------------
            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            ----------------------------------------
            def test_escaped_endline():
            return "asdad asdsad asdas"

            ----------------------------------------
            def test_nested():
            return {():[,
            {
            }
            ]
            }





            share|improve this answer


























            • Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

              – pkpnd
              2 hours ago











            • Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

              – Crivella
              2 hours ago













            • You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

              – pkpnd
              2 hours ago






            • 1





              Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

              – Crivella
              2 hours ago
















            1














            I think a small parser is in order to try and take into account this weird exceptions:



            import re

            code_string = """
            # A comment.
            def foo(a, b):
            return a + b
            class Bar(object):
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            def baz():
            return [
            1,
            ]

            class Baz(object):
            def hello(self, x):
            return self.hello(
            x - 1)

            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            def test_escaped_endline():
            return "asdad
            asdsad
            asdas"

            def test_nested():
            return {():[,
            {
            }
            ]
            }
            """.strip()

            code_string += 'n'


            func_list=
            func = ''
            tab = ''
            brackets = {'(':0, '[':0, '{':0}
            close = {')':'(', ']':'[', '}':'{'}
            string=''
            tab_f=''
            c_old=''
            multiline=False
            check=False
            for line in code_string.split('n'):
            tab = re.findall(r'^s*',line)[0]
            if 'def ' in line and not func:
            func += line + 'n'
            tab_f = tab
            check=True
            if func:
            if not check:
            if sum(brackets.values()) == 0 and not string and not multiline:
            if len(tab) <= len(tab_f):
            func_list.append(func)
            func=''
            c_old=''
            c_old2=''
            continue
            func += line + 'n'
            check = False
            for c in line:
            if c == '#' and not string and not multiline:
            break
            if c_old != '\':
            if c in ['"', "'"]:
            if c_old2 == c_old == c == '"' and string != "'":
            multiline = not multiline
            string = ''
            continue
            if not multiline:
            if c in string:
            string = ''
            else:
            if not string:
            string = c
            if not string and not multiline:
            if c in brackets:
            brackets[c] += 1
            if c in close:
            b = close[c]
            brackets[b] -= 1
            c_old2=c_old
            c_old=c

            for f in func_list:
            print('-'*40)
            print(f)


            output:



            ----------------------------------------
            def foo(a, b):
            return a + b

            ----------------------------------------
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            ----------------------------------------
            def baz():
            return [
            1,
            ]

            ----------------------------------------
            def hello(self, x):
            return self.hello(
            x - 1)

            ----------------------------------------
            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            ----------------------------------------
            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            ----------------------------------------
            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            ----------------------------------------
            def test_escaped_endline():
            return "asdad asdsad asdas"

            ----------------------------------------
            def test_nested():
            return {():[,
            {
            }
            ]
            }





            share|improve this answer


























            • Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

              – pkpnd
              2 hours ago











            • Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

              – Crivella
              2 hours ago













            • You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

              – pkpnd
              2 hours ago






            • 1





              Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

              – Crivella
              2 hours ago














            1












            1








            1







            I think a small parser is in order to try and take into account this weird exceptions:



            import re

            code_string = """
            # A comment.
            def foo(a, b):
            return a + b
            class Bar(object):
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            def baz():
            return [
            1,
            ]

            class Baz(object):
            def hello(self, x):
            return self.hello(
            x - 1)

            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            def test_escaped_endline():
            return "asdad
            asdsad
            asdas"

            def test_nested():
            return {():[,
            {
            }
            ]
            }
            """.strip()

            code_string += 'n'


            func_list=
            func = ''
            tab = ''
            brackets = {'(':0, '[':0, '{':0}
            close = {')':'(', ']':'[', '}':'{'}
            string=''
            tab_f=''
            c_old=''
            multiline=False
            check=False
            for line in code_string.split('n'):
            tab = re.findall(r'^s*',line)[0]
            if 'def ' in line and not func:
            func += line + 'n'
            tab_f = tab
            check=True
            if func:
            if not check:
            if sum(brackets.values()) == 0 and not string and not multiline:
            if len(tab) <= len(tab_f):
            func_list.append(func)
            func=''
            c_old=''
            c_old2=''
            continue
            func += line + 'n'
            check = False
            for c in line:
            if c == '#' and not string and not multiline:
            break
            if c_old != '\':
            if c in ['"', "'"]:
            if c_old2 == c_old == c == '"' and string != "'":
            multiline = not multiline
            string = ''
            continue
            if not multiline:
            if c in string:
            string = ''
            else:
            if not string:
            string = c
            if not string and not multiline:
            if c in brackets:
            brackets[c] += 1
            if c in close:
            b = close[c]
            brackets[b] -= 1
            c_old2=c_old
            c_old=c

            for f in func_list:
            print('-'*40)
            print(f)


            output:



            ----------------------------------------
            def foo(a, b):
            return a + b

            ----------------------------------------
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            ----------------------------------------
            def baz():
            return [
            1,
            ]

            ----------------------------------------
            def hello(self, x):
            return self.hello(
            x - 1)

            ----------------------------------------
            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            ----------------------------------------
            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            ----------------------------------------
            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            ----------------------------------------
            def test_escaped_endline():
            return "asdad asdsad asdas"

            ----------------------------------------
            def test_nested():
            return {():[,
            {
            }
            ]
            }





            share|improve this answer















            I think a small parser is in order to try and take into account this weird exceptions:



            import re

            code_string = """
            # A comment.
            def foo(a, b):
            return a + b
            class Bar(object):
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            def baz():
            return [
            1,
            ]

            class Baz(object):
            def hello(self, x):
            return self.hello(
            x - 1)

            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            def test_escaped_endline():
            return "asdad
            asdsad
            asdas"

            def test_nested():
            return {():[,
            {
            }
            ]
            }
            """.strip()

            code_string += 'n'


            func_list=
            func = ''
            tab = ''
            brackets = {'(':0, '[':0, '{':0}
            close = {')':'(', ']':'[', '}':'{'}
            string=''
            tab_f=''
            c_old=''
            multiline=False
            check=False
            for line in code_string.split('n'):
            tab = re.findall(r'^s*',line)[0]
            if 'def ' in line and not func:
            func += line + 'n'
            tab_f = tab
            check=True
            if func:
            if not check:
            if sum(brackets.values()) == 0 and not string and not multiline:
            if len(tab) <= len(tab_f):
            func_list.append(func)
            func=''
            c_old=''
            c_old2=''
            continue
            func += line + 'n'
            check = False
            for c in line:
            if c == '#' and not string and not multiline:
            break
            if c_old != '\':
            if c in ['"', "'"]:
            if c_old2 == c_old == c == '"' and string != "'":
            multiline = not multiline
            string = ''
            continue
            if not multiline:
            if c in string:
            string = ''
            else:
            if not string:
            string = c
            if not string and not multiline:
            if c in brackets:
            brackets[c] += 1
            if c in close:
            b = close[c]
            brackets[b] -= 1
            c_old2=c_old
            c_old=c

            for f in func_list:
            print('-'*40)
            print(f)


            output:



            ----------------------------------------
            def foo(a, b):
            return a + b

            ----------------------------------------
            def __init__(self):
            self.my_list = [
            'a',
            'b',
            ]

            ----------------------------------------
            def baz():
            return [
            1,
            ]

            ----------------------------------------
            def hello(self, x):
            return self.hello(
            x - 1)

            ----------------------------------------
            def my_type_annotated_function(
            my_long_argument_name: SomeLongArgumentTypeName
            ) -> SomeLongReturnTypeName:
            # This function's indentation isn't unusual at all.
            pass

            ----------------------------------------
            def test_multiline():
            """
            asdasdada
            sdadd
            """
            pass

            ----------------------------------------
            def test_comment(
            a #)
            ):
            return [a,
            # ]
            a]

            ----------------------------------------
            def test_escaped_endline():
            return "asdad asdsad asdas"

            ----------------------------------------
            def test_nested():
            return {():[,
            {
            }
            ]
            }






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 1 hour ago

























            answered 2 hours ago









            CrivellaCrivella

            33627




            33627













            • Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

              – pkpnd
              2 hours ago











            • Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

              – Crivella
              2 hours ago













            • You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

              – pkpnd
              2 hours ago






            • 1





              Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

              – Crivella
              2 hours ago



















            • Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

              – pkpnd
              2 hours ago











            • Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

              – Crivella
              2 hours ago













            • You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

              – pkpnd
              2 hours ago






            • 1





              Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

              – Crivella
              2 hours ago

















            Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

            – pkpnd
            2 hours ago





            Writing a parser is hard. I haven't run your code but just by glancing at it, I think it fails for multiline strings (delimited with """) and escaped string delimiters, and it doesn't understand comments (which may contain stray brackets or string delimiters).

            – pkpnd
            2 hours ago













            Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

            – Crivella
            2 hours ago







            Please do try it i should've included cases including strings and open/close brackets should not count if inside a string. EDIT: the escaped delimiters are an exception i will include it

            – Crivella
            2 hours ago















            You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

            – pkpnd
            2 hours ago





            You aren't checking for comments so there's no way you can tell if a close parenthesis should be counted or not (it shouldn't count if it's inside a comment).

            – pkpnd
            2 hours ago




            1




            1





            Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

            – Crivella
            2 hours ago





            Included both escaped characters and comments. Sorry i do tend to write parsers by starting simple and adding stuff as i find exception, not the best practice i realize

            – Crivella
            2 hours ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54374296%2fextract-python-function-source-text-from-the-source-code-string%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Accessing regular linux commands in Huawei's Dopra Linux

            Can't connect RFCOMM socket: Host is down

            Kernel panic - not syncing: Fatal Exception in Interrupt