Talk:Van Emde Boas tree

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Mid

This article has been rated as Mid-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

van or Van?[edit]

Is this structure really known as the vEB tree, with a lowercase v? In Dutch last names written separately, initial prepositions or articles are capitalized, thus:

Peter van Emde Boas
Van Emde Boas, Peter
meneer Van Emde Boas (mister VEB)

Qwertyus 01:16, 21 July 2006 (UTC)[reply]

They are popularly called vEB trees in many other sources that I've seen, which I'm pretty sure predate this article.[1][2][3] This may not be technically correct, but it seems to be conventional. I don't know whether the lowercase v is used when the name is spelled out in full. Deco 01:48, 21 July 2006 (UTC)[reply]

In many sources that I have noticed when the name is spelled out in full it is written as "Van Emde Boas Queue". But when written in the abbreviated form it is written as vEB queue. 203.200.95.130 12:28, 1 November 2007 (UTC)[reply]

I've changed it to Van Emde Boas tree since that seems common enough.. With a lowercase initial letter, it just looks too strange. Qwertyus (talk) 12:19, 22 June 2012 (UTC)[reply]

Usually spelled van Emde Boas Trees, see van Emde Boas Trees - Stanford University or van Emde Boas Trees (vEB trees) 185.176.78.179 (talk) 07:51, 4 June 2019 (UTC).[reply]

.max is never returned by findNext[edit]

findNext returns always only ".min" values, never ".max"... What if the value we are searching for is stored in some .max field? (E.g. the tree contains values 34 and 47 and none between them, 47 is stored in some .max field, and "findNext(38)" is called.) —Preceding unsigned comment added by 80.95.85.174 (talk) 22:01, 7 October 2009 (UTC)[reply]

I implemented the algorithms as described here, and found they are full of mistakes (such as the one you mentioned - the last line for FindNext is completely wrong) or unclear (M as the keysize depends on the level of the subtree, but here it looks as if it's the same for all trees). I suggest removing the current code altogether until someone provides a correct sample implementation. Greetings, IP. 92.77.65.80 (talk) 13:27, 23 July 2011 (UTC)[reply]

I looked up the algorithm in the textbook CLRS. I only looked at it for a few minutes, but I found at least the fix for one of the problems mentioned above: T.min should not be stored recursively, but T.max SHOULD. — Preceding unsigned comment added by 128.238.245.59 (talk) 02:22, 5 March 2013 (UTC)[reply]

${\sqrt {M}}$ could be changed to ${\sqrt {T.M}}$ to make it more clear that M is a function specific to T. Dhirallin (talk) 06:33, 4 March 2014 (UTC)[reply]

Note also, the FindNext function is broken. the <= operators need to be replaced with <, and the > operator needs to be replaced with >=.

This is for 2 reasons: Firstly, a successor function should find the next item AFTER x, and should not return x. Otherwise you will be unable to iterate over items in the tree using FindNext(). Secondly if x falls into bucket i, but x >= T.children[i].max then T.children[FindNext(T.aux, i)] == T.children[i] which is not correct. So it should look like this:

function FindNext(T, x)
    if x < T.min then
        return T.min
    if x ≥ T.max then // no next element
        return M
    i = floor(x/ ${\sqrt {M}}$ )
    lo = x %  ${\sqrt {M}}$ 
    hi = x - lo
    if lo < T.children[i].max then
        return hi + FindNext(T.children[i], lo)
    return hi + T.children[FindNext(T.aux, i)].min
end

Dhirallin (talk) 00:51, 6 March 2014 (UTC)[reply]

How to pronounce?[edit]

I cannot find anywhere on the net about how to pronounce Van Emde Boas. Can someone add the IPA into the article? --112.118.153.103 (talk) 03:46, 12 July 2010 (UTC)[reply]

I added my best shot. QVVERTYVS (hm?) 15:27, 7 March 2014 (UTC)[reply]

Change letter[edit]

I think, it would be easier to read, if the letter m was changed to w, which is the conventional letter for "word length". Gabriel (talk) 15:26, 13 May 2011 (UTC)[reply]

Bug in insert function[edit]

There's a bug in insert function when T.min<T.max<x The T.max won't be updated Please fixTheme (talk) 03:13, 13 March 2013 (UTC)[reply]

Ok I think someone made an attempt to fix that last bug, however, now if you look at the following code:

   if T.min == T.max then
       if x < T.min then
           T.min = x
           return
       if x > T.max then
           T.max = x

The return statement above is surely an error, because if x < T.min, then the old T.min needs to be added to a sub-tree. In fact that whole block seems unnecessary to me and the Insert function could look like this:

function Insert(T, x)
   if T.min > T.max then // T is empty
       T.min = T.max = x;
       return
   if x < T.min then
       swap(x, T.min)
   else if x > T.max then
       T.max = x
   i = floor(x /  ${\sqrt {M}}$ )
   Insert(T.children[i], x %  ${\sqrt {M}}$ )
   if T.children[i].min == T.children[i].max then
       Insert(T.aux, i)
end

Correct me if I'm wrong? Dhirallin (talk) 05:34, 4 March 2014 (UTC)[reply]

You were absolutely right. I just implemented the pseudocode in C++ and it lost elements every time the subtree was assigned a new value. The code worked just fine after modifying the code like you suggested. I will correct the pseudocode in the article right away. Kilgore T (talk) 07:57, 19 July 2018 (UTC)[reply]

Initial values[edit]

I can't find anything that talks about the initial values of T.min and T.max Theme (talk) 03:13, 13 March 2013 (UTC)[reply]

unreachable code in delete function[edit]

       if T.aux is empty then
           T.min = T.max
           return

       if T.aux is empty then
           T.max = T.min
           return

This fragment of code is unreachable because if T.aux is empty, the only element in the tree is min, so min=max and is caught in the first ifTheme (talk) 03:13, 13 March 2013 (UTC)[reply]

Bug in delete function[edit]

When the only data in the tree are T.min and T.max, and x==T.max, T.children[T.aux.max].max will be T.max, so T.max=T.children[T.aux.max].max is incorrectTheme (talk) 03:20, 13 March 2013 (UTC)[reply]

Indeed, I agree with the two bugs above. Also the following statements are generally incorrect:

T.min = T.children[T.aux.min].min 
T.max = T.children[T.aux.max].max

i.e. remember these values were stored in the sub-tree as x % sqrt(M) and need to be re-combined with the high part of the number. e.g.

i = T.aux.max
T.max = i *  ${\sqrt {M}}$  + T.children[i].min

The code would look like:

function Delete(T, x)
   if T.min == T.max == x then
       T.min = M
       T.max = -1
       return
   if x == T.min then
       i = T.aux.min
       x = i *  ${\sqrt {M}}$  + T.children[i].min
       T.min = x
   
   i = floor(x /  ${\sqrt {M}}$ )
   Delete(T.children[i], x %  ${\sqrt {M}}$ )
   if T.children[i] is empty then
       Delete(T.aux, i)

   if x == T.max then
       if T.aux is empty then
           T.max = T.min
       else
           i = T.aux.max
           T.max = i *  ${\sqrt {M}}$  + T.children[i].max
end

Dhirallin (talk) 06:13, 4 March 2014 (UTC)[reply]

Practical applications[edit]

It would be nice to read something about practical applications of the tree. Maybe some software that uses it? — Preceding unsigned comment added by Michael.lill (talk • contribs) 11:32, 1 June 2017 (UTC)[reply]

Fixed two bugs in insert and delete[edit]

I just fixed the pseudocode bugs in 'insert' and 'delete' previously reported here. I tested the original version with a C++ program and gave wrong results for both operations. The fixed version correctly applied the two operations. Kilgore T (talk) 08:19, 19 July 2018 (UTC)[reply]

simple explanation for its efficiency[edit]

As someone who does not have a good background in computer science, it would be really nice to add a few words to intuitively explain why the query operation takes $O (log log M)$ time, as compared to $O (log M)$ time for the more conventional binary search algorithm.

According to my understanding, maybe one can mention that the van Emde Boas tree is a tentative algorithm for the predecessor problem. While the binary search algorithm cuts the total number of sorted elements in half at each step and proceeds recursively, the total amount of time $n$ is $M^{{\frac {1}{2}}n}=O(1)$ . By using two indices of equal length, the van Emde Boas tree gets the square root of the remaining number of the elements at each recursive step, and therefore one has $M^{\left({\frac {1}{2}}\right)^{n}}=O(1)$ , which results in a more efficient approach.

Gamebm (talk) 22:30, 13 February 2022 (UTC)[reply]

The easiest explanation is, in my opinion, that going from a vEB tree to one of its subtrees takes constant time (as you can calculate which is the subtree you seek), and makes the elements you need to look at go from

M

to

{\sqrt {M}}

.

So, the number of times

x

you have to select a subtree till you reach some constant

c

is given by

M^{1/2^{x}}=c

. Solving that for

x

then gives you the

{\mathcal {O}}(\log \log M)

bound. Sanitiy (talk) 18:23, 2 August 2022 (UTC)[reply]

Incorrect space requirement for vEB trees with hash tables[edit]

In section "Similar structures", the article states that by replacing the arrays in the data structure with hash tables, one can reduce the space to $O(n)$ . Although this was believed to be the case for some time and and a CLRS (ed. 3) exercise asks for a proof of this, the actual worst case space bound is $O(n\log m)$ . This bound is proved in https://v.cunat.cz/theses/master.pdf (page 29 f.) by proving an upper bound and a matching lower bound. That the $O(n)$ bound is incorrect is also mentioned in the CRLS errata (https://www.cs.dartmouth.edu/~thc/clrs-bugs/bugs-3e.php).

It is possible to achieve $O(n)$ using hash tables, but this requires an additional technique of dividing the elements into groups of size $\Theta (\log m)$ (the thesis above also proves this on page 33 f.). However, this is additional technique is not mentioned by the (Wikipedia) article. Sith98 (talk) 09:09, 30 May 2023 (UTC)[reply]

I updated the space bound in the article text. The lower bound counterexample is really simple: just use a set of numbers whose low-order bits are all zero, as far up as possible for how many there are, so that everything goes into the high-halfword structure and then recurses in the same way. —David Eppstein (talk) 17:45, 30 May 2023 (UTC)[reply]