Friday, April 29, 2011

How do I remember which way round PRIOR should go in CONNECT BY queries

Hi everyone. I've a terrible memory. Whenever I do a CONNECT BY query in Oracle - and I do mean every time - I have to think hard and usually through trial and error work out on which argument the PRIOR should go.

I don't know why I don't remember - but I don't.

Does anyone have a handy memory mnemonic so I always remember ?

For example:

To go down a tree from a node - obviously I had to look this up :) - you do something like:

select
    *
from
    node
connect by
    prior node_id = parent_node_id
start with
    node_id = 1

So - I start with a node_id of 1 (the top of the branch) and the query looks for all nodes where the parent_node_id = 1 and then iterates down to the bottom of the tree.

To go up the tree the prior goes on the parent:

select
    *
from
    node
connect by
    node_id = prior parent_node_id
start with
    node_id = 10

So starting somewhere down a branch (node_id = 10 in this case) Oracle first gets all nodes where the parent_node_id is the same as the one for which node_id is 10.

EDIT: I still get this wrong so thought I'd add a clarifying edit to expand on the accepted answer - here's how I remember it now:

select
    *
from
    node
connect by
    prior node_id = parent_node_id
start with
    node_id = 1

The 'english language' version of this SQL I now read as...

In NODE, starting with the row in which node_id = 1, the next row selected has its parent_node_id equal to node_id from the previous (prior) row.

EDIT: Quassnoi makes a great point - the order you write the SQL makes things a lot easier.

select
    *
from
    node
start with
    node_id = 1
connect by
    parent_node_id = prior node_id

This feels a lot clearer to me - the "start with" gives the first row selected and the "connect by" gives the next row(s) - in this case the children of node_id = 1.

From stackoverflow
  • Think about the order in which the records are going to be selected: the link-back column on each record must match the link-forward column on the PRIOR record selected.

  • I always try to put the expressions in JOIN's in the following order:

    joined.column = leading.column
    

    This query:

    SELECT  t.value, d.name
    FROM    transactions t
    JOIN
            dimensions d
    ON      d.id = t.dimension
    

    can be treated either like "for each transaction, find the corresponding dimension name", or "for each dimension, find all corresponding transaction values".

    So, if I search for a given transaction, I put the expressions in the following order:

    SELECT  t.value, d.name
    FROM    transactions t
    JOIN
            dimensions d
    ON      d.id = t.dimension
    WHERE   t.id = :myid
    

    , and if I search for a dimension, then:

    SELECT  t.value, d.name
    FROM    dimensions d
    JOIN
            transactions t
    ON      t.dimension = d.id
    WHERE   d.id = :otherid
    

    Ther former query will most probably use index scans first on (t.id), then on (d.id), while the latter one will use index scans first on (d.id), then on (t.dimension), and you can easily see it in the query itself: the searched fields are at left.

    The driving and driven tables may be not so obvious in a JOIN, but it's as clear as a bell for a CONNECT BY query: the PRIOR row is driving, the non-PRIOR is driven.

    That's why this query:

    SELECT  *
    FROM    hierarchy
    START WITH
            id = :root
    CONNECT BY
            parent = PRIOR id
    

    means "find all rows whose parent is a given id". This query builds a hierarchy.

    This can be treated like this:

    connect_by(row) {
      add_to_rowset(row);
    
      /* parent = PRIOR id */
      /* PRIOR id is an rvalue */
      index_on_parent.searchKey = row->id;
    
      foreach child_row in index_on_parent.search {
        connect_by(child_row);
      }
    }
    

    And this query:

    SELECT  *
    FROM    hierarchy
    START WITH
            id = :leaf
    CONNECT BY
            id = PRIOR parent
    

    means "find the rows whose id is a given parent". This query builds an ancestry chain.

    Always put PRIOR in the right part of the expression.

    Think of PRIOR column as of a constant all your rows will be searched for.

    Nick Pierpoint : I like "Always put PRIOR in the right part of the expression" - it does make it easier. Also putting "start with" at the start.

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.