Storing hierarchical data in a relational database is a classic problem.
 Relational databases are not really designed for this purpose, because 
they store data in flat rows. Storage is actually the least of our 
problems. Imagine a commenting system with an unlimited level of nesting
 in replies. This is easy to store in a relational database:
 
   | comment_id 
 | body 
 | parent_id 
 | 
   | 1 | JungleDragon rocks 
 |  | 
   | 2 | No it doesn't! 
 | 1 
 | 
   | 3 | Oh yes it does! 
 | 2 | 
   | 4 | The economy sucks 
 |  | 
   | 5 | True | 4 | 
For the sake of simplicity, I have omitted other comment fields such as 
author, date, etc. The table shows that we can easily store unlimited 
levels of nesting by simply linking to the parent of the current 
comment. Comments that have no parent are the root comments, the others 
are leafs. If we would apply a foreign key to the parent_id column and 
link it to the id column we even get full referential integrity. 
So far so good? Yes, but the real problems start when we want to query that table for useful stuff. Imagine the following tasks:
- Get the full comment tree sorted in the correct order. Hint: you 
cannot use date sorting, since replies can be made anywhere and must be 
positioned below their parent at all times.
- For each comment in the tree, we want to know how deep it is (how 
many levels deep), so we can use that for indenting the comment below 
its parent.
- For each comment we want to know how many replies there are, no matter how deep in the tree.
None of the above tasks can be queried efficiently using the table we 
started with. The problem is that since we do not know how many levels 
deep the hierarchy is, we need to make recursive calls, leading to many 
queries to the database, high memory usage in our scripts and generally a
 system that does not scale.
As said, this is a classic problem, and luckily there are existing solutions and patterns for this. 
This article explains four different approaches, I highly recommend it. In summary, these are the four methods:
- Recursive. The approach we already discussed. This approach is inefficient and does not scale.
- Stack. This one is not much different from the recursive 
approach. The idea is that you build a stack of hierarchical data by 
looping through the rows, typically inside a stored procedure. Once you 
have the stack built, it is very easy to use and query, however, it 
still requires a lot of database connections.
- Modified Preorder Tree Traversal. This surely is the most 
sophisticated, but also most complex, of the methods to work with 
hierarchical data. I encourage you to read the full description to 
really understand it. The principle of this method is that we assign 
smart numbers (left and right) to each leaf in the tree, these numbers 
indicate the relative position of the node towards other nodes. With a 
tree preordered like this, querying the hierarchical data becomes very 
fast and easy. There is a major downfall though: as soon as you insert 
or change something in the tree, it has to be rebuild. This will lead to
 many update queries, and these are expensive. The larger the tree, the 
larger the time of inserting a new node.
- Flat table model. Surely we know that if we could preprocess
 things like the level of the comment (how deep is it) and the rank 
(sort order), querying would become much easier. The author also admits 
that this does shift the performance problem to the writes: upon 
inserting a comment, all others ranks have to be recalculated. Anyways, 
the idea of this model is that you store essential attributes of the tree, instead of calculating them each time we retrieve the tree. Kind of like a tree cache.
Based on the options presented so far, it seems the flat table model 
looks promising, yet it does have a major limitation: a very expensive 
write operation. Luckily, one of the commenters of the article posted a 
way around this. I have taken his idea as a basis, and extended it 
somewhat. 
The Flat Table Model done right
Let's just get to it right away. Here's my(heavily inspired) solution:
 
   | comment_id 
 | body 
 | parent_id 
 | lineage | deep | 
   | 1 | JungleDragon rocks 
 |  | 1 
 | 0 | 
   | 2 | No it doesn't! 
 | 1 
 | 1-2 
 | 1 
 | 
   | 3 | Oh yes it does! 
 | 2 | 1-2-3 
 | 2 | 
   | 4 | The economy sucks 
 |  | 4 
 | 0 | 
   | 5 | True | 4 | 4-5 | 1 | 
Instead of using a rank, we are using a lineage. They have the same 
purpose (correctly ordering the comments), it's just that a lineage has 
one major advantage: it never needs to be updated! As you can see, the 
lineage column displays the hierarchy ids in a flat column (from highest
 parent to lowest child). This field is calculated upon insertion, no 
other inserts need to take place. The deep column simply sets the 
nesting level of the comment, which is handy when we need this value for
 indentation later on. By storing it, we require no recursive queries at
 all.
Let's go into a little bit more detail concerning the insertion of a new comment. Here's how it works:
- in our save routine of a new comment, we check whether it came from a parent (is parent_id set?)
- if so, we do a query to retrieve the parent row
- next, we calculate the lineage and deepness of the new comment. For
 lineage, we use the parent's lineage. For deepness, we use the parent's
 deepness + 1.
- we now insert our new comment. next, we use the returned comment id
 of the new comment, add it to the lineage of the newly created comment,
 and save it again (update).
All in all, we are retrieving a row, inserting one, and updating one. 
Considering the other highly inefficient methods and knowing that most 
people will read comments and not write them, this is pretty awesome. We
 have no recursion, not even a regular loop.
Querying this structure is even better. It is super easy to get all 
comments in the right order, get the indentation, and even the replies 
per comment, no matter how many levels of nesting we have. This query 
kind of combines these three tasks:
SELECT c.id, c.user_id, c.body, c.deep, c.lineage, c.parent_id, 
(SELECT COUNT(*) FROM comment where comment.lineage LIKE 
(CONCAT(c.lineage,'%')) AND comment.lineage!=c.lineage) as replies
FROM comment as c
order by c.lineage
Careful readers will see that we are using a subquery to count the 
replies for each comment. This part does have some performance overhead,
 but not much. If you do not need to show the number of replies to 
each
 comment, leave it our for even faster results. Or, you could take this 
even further by storing the number of replies in the database. The 
downfall of that approach is that you will need to recursively update 
all parents of the current comment row. That's not as bad as it sounds, 
it is not likely that you will have more than a handful of nesting 
levels for a single comment, in fact, most will have none, one or two. 
Referential integrity 
What about referential integrity? Since we have flattened the lineage 
and deep columns, we have lost some of that. No worries though, we are 
keeping our referential parent_id -> id foreign key. We do have 
referential integrity at the foundation. Should anything go wrong with 
the lineage or deep columns, then we can easily use the parent_id -> 
id relationship to "rebuild" those values.
Usage
The advanced flat model is optimized for a comment system, it should not
 be used everywhere. Particularly it is suitable for wide trees (little 
nesting), not deep trees. The reason for this limitation is the lineage 
column, which flattens the hierarchy in a single column. When there are 
too many levels of nesting (let's say over 20), a different model may be
 more suitable. 
Yet another way
Throughout my online research, I found 
yet another way to work with hierarchical data.
 This approach is again a flat model approach, but this time the author 
reserves one column per nesting level in the table to store the 
deepness. The problem is that this results in a hardcoded nesting level 
limit. The other problem is that it is patented. Go figure. That's like 
patenting breathing.
Conclusion 
There are multiple approaches towards storing and retrieving 
hierarchical data in (my)SQL, but for comment systems and other wide 
tree hierarchies, I hope you like my tweaked flat model approach. It has
 no recursion, not even loops, fast retrieves, the least writes, and a 
solid foundation of referential integrity. Credits go out to the author 
of the article and the commenter that suggested the lineage method. I 
only extended those ideas a little bit. 
In closing, I like to end with a small tip. In many online examples, I 
see people looping through a depth counter to append things like spaces 
for indentation. This is not needed, almost every language has a 
function for this, here it is in PHP:
str_repeat("   ",$row['deep']); 
...where the first param is the indentation string, and the second param is the number of indentations.
Enjoy!