Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Python Parsing (separate statements and/or blocks) a C# code - regex or machine state

$
0
0

I need to parsing a C# code. Just separate the statements, considering break lines. Need to ignore comments, multiline comments, verbatim strings and multiline verbating strings.

What i try...I read the file into a variable and then split by break lines (because i need the original line number)... then i add the line number with a pattern, then i break the string by characters ;, {, }and remove the patterns not needed (keep the first one)...

   with open("./program.cs", "r") as f:        prg=[]        for number, line in enumerate(f):             prg.append(f"<#<{number}>#>{line}")               dotnet_lines=re.split(r'[;\{\}]',"".join(prg))        for i in range(len(dotnet_lines)):            dotnet_lines[i] = dotnet_lines[i].replace("\n","")            dotnet_lines[i] = re.sub(r'(.)(\<#\<[0-9]+\>#\>)',r'\1',dotnet_lines[i])        # Result....        for ln in dotnet_lines:            ocorrencia=ln.find('>#>')+3            line=ln[ocorrencia:]            number=re.sub('[<#>]','',ln[:ocorrencia])                    print(f"Ln Nr: {number}   {line}")

It's a basic solution, but it doesn't solve the issue of comments or strings.

Using pygments is ok too... but i want to separate sentence blocks only ...

from pygments.lexers.dotnet import CSharpLexerfrom pygments.token import Tokendef tokenize_dotnet_file(file_path):    with open(file_path, 'r') as file:        code = file.read()    lexer = CSharpLexer()    tokens = lexer.get_tokens(code)    for token in tokens:        token_type = token[0]        token_value = token[1]        print(f"Type: {token_type}, Value: {token_value}")if __name__ == "__main__":    file_path = "./program.cs"      tokenize_dotnet_file(file_path)

this is better but i need the sentences and not the tokens.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>