[erlang-questions] Not an Erlang fan

Caoyuan dcaoyuan@REDACTED
Sun Sep 23 21:16:42 CEST 2007


Tim's example is not about io, read whole file into binary is very
quick, but, when you even simply travel a binary byte by byte, it cost
a lot of time. I wrote a simple test module, and please take a look at
test2/1, which is a funtion simply travel a binary. when the binary is
read from a 200M file, travel it will cost about 30s.

-module(widefinder).

-export([test/1,
         test1/1,
         test2/1,
         test3/1]).

test(FileName) ->
    statistics(wall_clock),
    {ok, IO} = file:open(FileName, read),
    {Matched, Total} = scan_line(IO),
    {_, Duration} = statistics(wall_clock),
    io:format("Duration ~pms~n Matched:~B, Total:~B", [Duration,
Matched, Total]).

scan_line(IO) -> scan_line("", IO, 0, -1).
scan_line(eof, _, Matched, Total) -> {Matched, Total};
scan_line(Line, IO, Matched, Total) ->
    NewCount = Matched + process_match(Line),
    scan_line(io:get_line(IO, ''), IO, NewCount, Total + 1).

process_match([]) -> 0;
process_match("/ongoing/When/"++Rest) ->
    case parse_until_space(Rest, false) of
	true  -> 0;
	false -> 1
    end;
process_match([_H|Rest]) -> process_match(Rest).

test1(FileName) ->
    statistics(wall_clock),
    {ok, Bin} = file:read_file(FileName),
    {Matched, Total} = scan_line1(Bin),
    {_, Duration} = statistics(wall_clock),
    io:format("Duration ~pms~n Matched:~B, Total:~B", [Duration,
Matched, Total]).

scan_line1(Bin) -> scan_line1(Bin, [], 0, 0).
scan_line1(<<>>, _Line, Matched, Total) -> {Matched, Total};
scan_line1(<<$\n, Rest/binary>>, Line, Matched, Total) ->
    %Line1 = lists:reverse(Line),
    scan_line1(Rest, [], Matched, Total + 1);
scan_line1(<<C:1/binary, Rest/binary>>, Line, Matched, Total) ->
    %NewCount = Matched + process_match(Line),
    scan_line1(Rest, [C|Line], Matched, Total).

test2(FileName) ->
    statistics(wall_clock),
    {ok, Bin} = file:read_file(FileName),
    Total = travel_bin(Bin),
    {_, Duration} = statistics(wall_clock),
    io:format("Duration ~pms~n Total:~B", [Duration, Total]).

travel_bin(Bin) -> travel_bin(Bin, 0).
travel_bin(<<>>, ByteCount) -> ByteCount;
travel_bin(<<_C:1/binary, Rest/binary>>, ByteCount) ->
    travel_bin(Rest, ByteCount + 1).

test3(FileName) ->
    statistics(wall_clock),
    {ok, Bin} = file:read_file(FileName),
    Total = travel_list(binary_to_list(Bin)),
    {_, Duration} = statistics(wall_clock),
    io:format("Duration ~pms~n Total:~B", [Duration, Total]).

travel_list(List) -> travel_list(List, 0).
travel_list([], CharCount) -> CharCount;
travel_list([_C|Rest], CharCount) ->
    travel_list(Rest, CharCount + 1).

parse_until_space([$\040|_Rest], Bool) -> Bool;
parse_until_space([$.|_Rest], _Bool) -> true;
parse_until_space([_H|Rest], Bool) -> parse_until_space(Rest, Bool).



On 9/24/07, Thomas Lindgren <thomasl_erlang@REDACTED> wrote:
>
> --- Keith Irwin <keith.irwin@REDACTED> wrote:
>
> > On 9/23/07, Alex Alvarez <eajam@REDACTED> wrote:
> > >
> > > He definitely seems kind of bias at different
> > points, but still it would
> > > be great to find out where he went wrong!
> > >
> >
> > Isn't it that he's basing his whole analysis on file
> > io, what's more, a
> > single file which doesn't lend itself to
> > parallelism?  Had he started by
> > writing a client/server or p2p application around
> > some domain other than
> > system-admin stuff, perhaps he'd be much more
> > favorable towards the
> > multitude of strengths Erlang has to offer.
>
> He's also using the obvious, tempting but very slow
> io:read_line. Reading the entire file into a binary
> takes 7 ms (sic) using file:read_file, not 34 seconds
> using io:read_line as he reports. For a beginner it's
> not obvious what to use, though, so life could be
> easier.
>
> My own experience with parsing XML in Erlang vs Ruby
> is that xmerl parsing about 4 MB of XML handily beat
> "the obvious" Ruby library the other guy used
> (REXML?), being 10+ times faster or more -- xmerl
> needed 10 seconds versus "a few minutes" for Ruby. So
> I wouldn't say Erlang is inherently slow w.r.t.
> parsing, but again, one may need some experience to
> get it right.
>
> Best,
> Thomas
>
>
>
>       ____________________________________________________________________________________
> Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.
> http://autos.yahoo.com/index.html
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>


-- 
- Caoyuan



More information about the erlang-questions mailing list