Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16506

SQL Select Statement Parser not returning JOIN type

$
0
0

I want to parse through a SQL Select Statement that has all the features a normal SQL dialect like MySQL has too. I looked for parsing libraries in python but couldn't find one that is doing the job. By that I mean I found some parsing libraries, but they were only able to parse through basic select statements (FROM and WHERE, not even ORDER BY). So as alternative I made my own parser (which I know is not a great solution at all). I spent a few hours working on it, but I keep getting an weird error and don't know how to approach it.Before I show the code I just want to mention that if you know a python library that is able to parse through SQL statements, not just SELECT but also CREATE TABLE, INSERT, etc., let me know.

Here is my language grammar string:

select_grammar = """    start: select_statement ";"    select_statement: "SELECT" column_list "FROM" table_list join_list? where_clause? groupby_clause? having_clause? orderby_clause?    column_list: "*" | column_expr ("," column_expr)*    column_expr: function_call | column_name | subquery    column_name: (table_name ".")? NAME ("AS" NAME)?    table_name: NAME ("AS" NAME)?    function_call: NAME "(" function_args ")" ("AS" NAME)?    function_args: expression ("," expression)*    where_clause: "WHERE" condition    groupby_clause: "GROUP BY" column_expr ("," column_expr)*    having_clause: "HAVING" logical_expr    orderby_clause: "ORDER BY" order_column ("," order_column)*    order_column: column_expr ["ASC" | "DESC"]?    condition: logical_expr    logical_expr: logical_term                | logical_expr "AND" logical_term                | logical_expr "OR" logical_term                | "NOT" logical_term    logical_term: comparison_expr                | "(" logical_expr ")"                | subquery    comparison_expr: expression OPERATOR expression                    | expression "IS" ("NULL" | "NOT NULL")    expression: (table_name ".")? NAME | INT | string | function_call | subquery    table_list: table_name ("," table_name)* | subquery    subquery: "(" select_statement ")"    join_list: join_expr+    join_expr: join_type (table_name | subquery) "ON" condition    join_type: "INNER JOIN" | "LEFT JOIN" | "RIGHT JOIN" | "FULL JOIN"    string: ESCAPED_STRING | /'[^']*'/    OPERATOR: ">" | "<" | ">=" | "<=" | "=" | "!="    %import common.CNAME -> NAME    %import common.INT    %import common.ESCAPED_STRING    %import common.WS    %ignore WS"""

I also created the Transformer class, which looks like this:

@v_args(inline=True)class SelectTransformer(Transformer):    def start(self, *args):        print("start result: ", args)        return Tree("SELECT statement", args)    def column_list(self, *args):        return args    def column_expr(self, *args):        return args[0] if len(args) == 1 else args    def function_call(self, name, args, alias=None):        return (name, args, alias)    def subquery(self, value):        print("Subquery:", value)    def where_clause(self, condition=None):        return condition    def groupby_clause(self, *args):        return args    def having_clause(self, condition=None):        return condition    def orderby_clause(self, *args):        return args    def order_column(self, *args):        return args    def condition(self, *args):        return args    def logical_expr(self, *args):        return args    def logical_term(self, *args):        return args    def comparison_expr(self, *args):        return args    def expression(self, *args):        return args[0] if len(args) == 1 else args    def column_name(self, *args):        if len(args) == 1:            return args[0]  # No alias present        elif len(args) == 3:            return args[0], args[2]  # Alias present, return a tuple        else:            return args    def table_list(self, *args):        return args    def join_list(self, *args):        return args    def join_expr(self, *args):        return args    def join_type(self, *args):        return args    def subquery(self, *args):        return args    def string(self, value):        return value.strip("'")    def table_name(self, *args):        if len(args) == 1:            return args[0]  # No alias present        elif len(args) == 3:            return args[0], args[2]  # Alias present, return a tuple        else:            return args

I don't know if it matters, I also created a little function that displays the final tree nicely:

def format_ast(ast, level=0):    result = ""    indent = "  " * level    if isinstance(ast, tuple):        for item in ast:            result += format_ast(item, level + 1)    elif isinstance(ast, Token):        result += f"{indent}{ast.type}, Token('{ast.value}')\n"    elif isinstance(ast, Tree):        result += f"{indent}Tree({ast.data}), [\n"        for child in ast.children:            result += format_ast(child, level + 1)        result += f"{indent}]\n"    else:        result += f"{indent}{ast}\n"    return result

Here's the statement I'm parsing:

sql_query = 'SELECT ' \'name AS alias, ' \'COUNT(age) AS age_alias, ' \'(SELECT department_name FROM departments WHERE department_id = employees.department_id) ' \'FROM employees AS emp, department ' \'INNER JOIN departments AS dep ON employees.department_id = departments.id ' \'LEFT JOIN other_table AS ot ON other_table.id = employees.table_id ' \'WHERE age > 25 ' \'GROUP BY age, name ' \'HAVING COUNT(age) > 1 ' \'ORDER BY name ASC, age DESC;'

The code I'm executing is this:

parser = Lark(select_with_joins_grammar, parser='lalr', transformer=SelectTransformer())tree = parser.parse(sql_query)# Print the custom export formatprint(format_ast(tree))

The problem is related to the method join_type() of my class SelectTransformer. Somehow *args is always empty, although it should theoretically contain (like defined in the rule) "INNER JOIN" or "LEFT JOIN" or "RIGHT JOIN" or "FULL JOIN".My output looks like this:

  Tree(SELECT statement), [  Tree(select_statement), [        NAME, Token('name')        NAME, Token('alias')        NAME, Token('COUNT')        Tree(function_args), [          NAME, Token('age')        ]        NAME, Token('age_alias')        Tree(select_statement), [            NAME, Token('department_name')            NAME, Token('departments')                  NAME, Token('department_id')                  OPERATOR, Token('=')                    NAME, Token('employees')                    NAME, Token('department_id')        ]        NAME, Token('employees')        NAME, Token('emp')      NAME, Token('department')          NAME, Token('departments')          NAME, Token('dep')                  NAME, Token('employees')                  NAME, Token('department_id')                OPERATOR, Token('=')                  NAME, Token('departments')                  NAME, Token('id')          NAME, Token('other_table')          NAME, Token('ot')                  NAME, Token('other_table')                  NAME, Token('id')                OPERATOR, Token('=')                  NAME, Token('employees')                  NAME, Token('table_id')            NAME, Token('age')            OPERATOR, Token('>')            INT, Token('25')      NAME, Token('age')      NAME, Token('name')            NAME, Token('COUNT')            Tree(function_args), [              NAME, Token('age')            ]            None          OPERATOR, Token('>')          INT, Token('1')        NAME, Token('name')        NAME, Token('age')  ]]

As you can see, no join type is displayed.I am relatively new to parsing so I don't really know what to try.


Viewing all articles
Browse latest Browse all 16506

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>