Optimized lists

Joe Armstrong (AL/EAB) joe.armstrong@REDACTED
Mon Mar 14 10:50:38 CET 2005


> Currently there's no length information stored in the list 
> elements, so
> length/1 is an O(n) operation.
> (This is not so nice if an application wants to safeguard against
> operating on too long lists, but even length itself may last 
> too long.)
> However, list handling BIFs seem to count as a fixed number of
> reductions, irrespectively of list length.
> (See 
> http://www.erlang.org/ml-archive/erlang-questions/200502/msg00
168.html)



>Idea:
> Store in each list element the length of the remaining part of the list.

> This way, length() would become an O(1) operation, 
> [this may be a benefit if it's used frequently enough]
> but also, the list BIFs could count as variable number
> of reductions (based on length), keeping the real time behaviour.
> Tail recursion would also work, as the length information in the
> tail of the list would be correct too.

No no no.

Whenever you write an algorithm you must have some idea in the back of you head
as to the sizes of the data structures involved.

The one-data-structure fits all approach is doomed to failure.

If I want to write a program that manipulates a file I need to have some idea of
the limitations of the program. If the file is s mall (a few meg) I'd do manipulations in
memory, with file-at-a-time operations. If the file was large Tera Bytes then I'd do things on
disk. Between the small and large file categories there are many varients and almost unlimited
scope for imagination.

If you need or want "a list and a length" then just make a tuple {List, Len} and maintain the
invarient that len is the length of List.

There is absolutly no point in making specialised data structures in excess of a small number
of pre-defined and efficiently implemented primitive data structures.

If you really want "very large lists" (like infinite lists, then the length gets too big to be 
stored :-) - so then you need to make a abstract data type for lazy lists etc.

> (Currently I think it is possible to call a BIF on a long list that keeps
> the beam busy in that process so long that the net_ticktime elapses and
> other nodes lose contact with the node. Am I wrong ?)

If you have to call a BIF on a very long list like this your algorithm might be wrong :-)

> How feasible this would be to implement ?
> (The more ambitious solution would be that if a list BIF would take more
> than 1000 reductions (the limit for rescheduling) then it is scheduled out.
> This part may be tricky, though.)

But all processes get rescheduled after something like 500 reductions - not just ones
that do things to lists ...


/Joe

 



More information about the erlang-questions mailing list