ARTICLES

File IO
Written by Raimo, 29 Apr 2010

Next: Socket IO

The original article and examples was written by Claes Wikstrom in 1998. In april 2008, We removed an example that no longer works and revised the examples to eliminate warnings and use the modern type tests (when is_list(List) instead of when list(List)). In april 2010 we removed the GS example since that GUI library should be deprecated.

Counting x'es

Our first example is real simple, the idea is to open a file read the contents from the file and count the number of characters in the file. A file can be in either of two modes, binary or normal. In all our examples here all IO will be in binary mode. This means that all IO that comes from the file are Erlang binary data objects. So lets create a new Erlang module. We do that by invoking our favourite editor on a file, let's call it "count_chars.erl". If we use the famous emacs editor, we can get a whole lot of support in our Erlang programming. Turn on all the bells and whistles, font-lock-mode and everything. Anyway, The head of the file shall be:

%%%----------------------------------------------------------------------
%%% File    : count_chars.erl
%%% Author  : Claes Wikstrom [klacke@bluetail.com]
%%% Purpose : Count the x chars in a file
%%% Created : 20 Oct 1998 by Claes Wikstrom [klacke@bluetail.com]
%%%----------------------------------------------------------------------

-module(count_chars).
-author('klacke@bluetail.com').

-export([file/1]).

The actual code to open the file is contained in a function file/1. This function has a minor flaw which we shall soon rectify. But here goes:

file(Fname) ->
    case file:open(Fname, [read, raw, binary]) of
 {ok, Fd} ->
     scan_file(Fd, 0, file:read(Fd, 1024));
 {error, Reason} ->
     {error, Reason}
    end.

scan_file(Fd, Occurs, {ok, Binary}) ->
    scan_file(Fd, Occurs + count_x(Binary), file:read(Fd, 1024));
scan_file(Fd, Occurs, eof) ->
    file:close(Fd),
    Occurs;
scan_file(Fd, _Occurs, {error, Reason}) ->
    file:close(Fd),
    {error, Reason}.

The file/1 function opens the file and reads the characters in chunks of 1 k. For each chunk it calls a function count_x/1 to do the real counting. This function transforms each binary to a list of characters in order to be able to traverse and count. We have:

count_x(Bin) ->
    count_x(binary_to_list(Bin), 0).
count_x([$x|Tail], Ack) ->
    count_x(Tail, Ack+1);
count_x([_|Tail], Ack) ->
    count_x(Tail, Ack);
count_x([], Ack) ->
    Ack.

Now to compile and run this code we invoke the erlang system at the unix prompt and enter the following commands:

% erl
Erlang (JAM) emulator version 4.7.3
 
Eshell V4.7.3  (abort with ^G)
1> c:c(count_chars).
{ok,count_chars}
2> count_chars:file("count_chars.erl").
17
3> 

So the file "count_chars.erl" contains 17 x'es.

The abovementioned flaw is that we do not close the file in the same function where we opened the file. In this particular case we close it in the function just below, but it is a general good rule to release resources in the source code where they are allocated. One way to rectify this here is to rewrite the code like:

file(Fname) ->
    case file:open(Fname, [read, raw, binary]) of
 {ok, Fd} ->
     Res = scan_file(Fd, 0, file:read(Fd, 1024)),
     file:close(Fd),
            Res;
         ....

However we can do better, we can write a general purpose function with_file which feed a user provided Fun with chunks of data until done.

General purpose file IO

The technique of generalizing a common pattern of functionality into a framework which uses higher order functions to do the work is a powerful lines-of-code saver. So the general purpose function is:

with_file(File, Fun, Initial) ->
    case file:open(File, [read, raw, binary]) of
        {ok, Fd} ->
            Res = feed(Fd, file:read(Fd, 1024), Fun, Initial),
            file:close(Fd),
            Res;
        {error, Reason} ->
            {error, Reason}
    end.

feed(Fd, {ok, Bin}, Fun, Farg) ->
    case Fun(Bin, Farg) of
        {done, Res} ->
            Res;
        {more, Ack} ->
            feed(Fd, file:read(Fd, 1024), Fun, Ack)
    end;
feed(Fd, eof, Fun, Ack) ->
    Ack;
feed(_Fd, {error, Reason}, _Fun, _Ack) ->
    {error, Reason}.

The user provides three arguments to the function.

File
the name of the file to work on.
Fun
a functional object which must return either of {more, Ack} or {done, Res} in order to guide the feed loop what to do.
the initial ackumulator parameter to the fun.
Initial

This code is typical general purpose code and we shall add it to a library of "nice to have functions". We call this library klib.erl.

Now the original count_chars:file/1 function becomes much shorter.

file1(File) ->
    F = fun(Bin, Int) -> 
               {more, count_x(Bin) + Int}
        end,
    klib:with_file(File, F, 0).

As a matter of fact we can do even better than that, we can use a function defined in "lists.erl" which folds over a list. This is admittedly not code that ought to be in the first section of a beginner guide but we provide it anyway. We have:

file2(File) ->
    {ok,B} = file:read_file(File),
    lists:foldl(fun($x, Ack) ->
                       1 + Ack;
                   (_, Ack) ->
                       Ack
    end, 0, binary_to_list(B)).

Now we have three version of the same function say we want to meassure how fast they are. The mudule "timer" has a function called "tc(Mod, Fun, Args)" which executes a function and times it. Lets do it at the shell prompt:

32> timer:tc(count_chars, file, ["index.html"]).
{59554,68}
33>  timer:tc(count_chars, file1, ["index.html"]).
{69939,68}
34> timer:tc(count_chars, file2, ["index.html"]).
{335579,68}

 The tc/3 function returns a tuple {MicroSeconds, Result where MicroSeconds is the number of micro seconds it took to eavluate the function and Result is the avaluation result.

We see that the first (and most complicated) version is the fastest. It takes 63 milli seconds whereas the next version which uses the with_file/3 function takes 69 milli seconds. On the other hand the most beautiful version, file2/1 which folds over the entire lists takes an awful 337 milli seconds.

Word count

We continue with a program which is a little bit more useful than just counting the 'x' characters in a random file. We want to write a program that counts the number of words, chars and lines in a file. We wish the function to have an interface We wish the module to have an interface

wc:file(File)
wc:files(FileList)

-module(wc).
-author('klacke@erix.ericsson.se').

-import(count_chars, [with_file/3]).
-import(lists, [map/2, foreach/2]).

-export([file/1, files/1]).

file(File) ->
    output([gfile(File)]).

gfile(File) ->
    Fun = fun(Bin, Count) ->
    count_bin(binary_to_list(Bin), inspace, Count)
   end,
    {File, with_file(File, Fun, {0,0,0})}.


count_bin([H|T], Where, {C,W,L}) ->
    case classify_char(H) of
 newline  when Where == inspace ->
     count_bin(T, inspace, {C+1, W, L+1});
 newline when Where == inword ->
     count_bin(T, inspace, {C+1, W+1, L+1});
 space  when Where == inspace ->
     count_bin(T, inspace, {C+1, W, L});
 space  when Where == inword ->
     count_bin(T, inspace, {C+1, W+1, L});
 char ->
     count_bin(T, inword, {C+1, W, L})
    end;
count_bin([], inword, {C, W, L}) ->
    {more, {C, W+1, L}};
count_bin([], inspace, {C, W, L}) ->
    {more, {C, W, L}}.


classify_char($ ) ->
    space;
classify_char($\t) ->
    space;
classify_char($\n) ->
    newline;
classify_char(_) ->
    char.

files(Files) ->
    output(map(fun(F) -> gfile(F) end, Files)).

output(Counts) ->
    io:format("~-25s ~-10s ~-10s ~-10s~n",
       ["file", "chars", "words", "lines"]),
    foreach(fun({File, {C,W,L}}) ->
      ok = io:format("~-25s ~-10w ~-10w ~-10w~n", 
         [File, C, W, L])
     end, Counts).

This function is really not very representable as an example of file IO. All we do is call on the previously define klib:with_file/3 function with an appropriate Fun. An example session with the erlang shell is:

60> {ok, L} = file:list_dir("."), wc:files(L).
file                      chars      words      lines     
wc.beam                   2079       40         21        
count_chars.beam          1796       31         22        
klacke_ex.html~           6173       897        219       
test                      11         3          1         
wc.erl~                   587        46         27        
klacke_ex.html.orig       6173       897        219       
klacke_ex.html            7642       1079       286       
wc.erl                    1661       176        73        
count_chars.erl           1767       198        87       

Finding files

Next example is a function that 'finds' files and does things with the files it finds. For example we might want to look for all erlang files in a directory tree and recompile them. We want to be able to write things like:

find:files("/home/klacke", 
           ".*\.erl", fun(F) -> {File, c:c(File)} end)

In order to find all my erlang files and compile them. We have three arguments.

  • A top directory where to start the search
  • A regular expression that must match the files.
  • A Fun to perform some action on the files we find.

The source code is:

-module(find).
-author('klacke@erix.ericsson.se').
-include_lib("kernel/include/file.hrl").

-export([files/3]).

%% Top is the Top directory where everything starts
%% Re is a regular expression to match for (see module regexp)
%% Actions is a Fun to apply to each found file
%% Return value is a lists of the return values from the 
%% Action function

%% Example: find:files("/home/klacke", 
%%                     ".*\.erl", fun(F) -> {File, c:c(File)} end)
%% Will find all erlang files in my top dir, compile them and 
%% return a long list of {File, CompilationResult} tuples
%% If an error occurs, {error, {File, Reason}} is returned 
%% The Action fun is passed the full long file name as parameter


files(Top, Re, Action) ->
    case file:list_dir(Top) of
	{ok, Files} ->
	    files(Top, Files, Re, Action, []);
	{error, Reason}  ->
	    {error, {Top, Reason}}
    end.

files(Top, [F|Tail], Re, Action, Ack) ->
    F2 = Top ++ "/" ++ F,
    case file:read_file_info(F2) of
	{ok, FileInfo} when FileInfo#file_info.type == directory ->
	    case files(F2, Re, Action) of
		{error, Reason} ->
		    {error, Reason};
		List ->
		    files(Top, Tail, Re, Action, List ++ Ack)
	    end;
	{error, Reason} ->
	    {error, {F2, Reason}};
	{ok, FileInfo} when FileInfo#file_info.type == regular ->
	    case catch regexp:match(F, Re) of
		{match, _,_} ->
		    files(Top, Tail, Re, Action, [Action(F2) | Ack]);
		nomatch ->
		    files(Top, Tail, Re, Action, Ack);
		{error, Reason} ->
		    {error, {F2, {regexp, Reason}}}
	    end;
	_Other ->
	    files(Top, Tail, Re, Action, Ack)
    end;

files(_Top, [], _Re, _Action, Ack) ->
    Ack.

The code includes the file_info record definition from the file.hrl include file. The code is completely straigtforward since it simply reads the file_info records recursively and applies the supplied function. It also includes the ".hrl" file by means of an "include_lib" compiler dirctive.

A simple term logger

In this final section on file IO we provide a simple term logger that writes Erlang terms to a file.

The BIF term_to_binary/1 produces a binary data object from any term. Such a binary can be written to a file. The reverse operation is binary_to_term/1 which can be used to reproduce the orignal term. The format we choose to have on the file is to prepend each term with a four (4) byte length field in order to indicate the length of the actual term. The logger runs as a separate process and we shall provide two different versions of the logger, one written with plain Erlang and the other by utilizing the generic server gen_server module. One function which is used by the logger is the function of transforming an integer to a four byte list and vice versa. This function is added to the klib.erl library. Here is the code:

i32(B) when is_binary(B) ->
    i32(binary_to_list(B, 1, 4));
i32([X1, X2, X3, X4]) ->
    (X1 bsl 24) bor (X2 bsl 16) bor (X3 bsl 8) bor X4;
i32(Int) when integer(Int) ->
    [(Int bsr 24) band 255,
     (Int bsr 16) band 255,
     (Int bsr  8) band 255,
     Int band 255].

It is also nice to be able to read an integer from a file, the code in klib.erl is:

getint32(F) ->
    {ok, B} = file:read(F, 4),
    i32(B).

This is what I call an aggressive function. It assumes that everything goes well and crashes hard if for example the read call returns eof or if the read attempt should fail for any other reason. The module exports a number of functions, in particular:

-module(slogger).
-author('klacke@bluetail.com').

-export([start/0, start/1, stop/0, log/1, upread/1, truncate/0]).

We need to be able to stop and start the server. The client functions are:

log/1
Log a term to the end of the file.
truncate/0
Truncate the log file.
upread/1
Read all the terms in the logfile and apply Fun to each and every term.

The start and stop code is the classical traditional style Erlang start server code. This code has flaws, but we will come back to that in the next session. We have:

-define(LOGFILE, "slog.log").

start() ->
    start(?LOGFILE).

start(F) ->
    case whereis(?MODULE) of
	undefined ->
	    register(?MODULE, spawn(?MODULE, loop0, [F]));
	Pid ->
	    true
    end.

stop()->
    req(stop).

req(R) ->
    ?MODULE ! {self(), R},
    receive
	{?MODULE, Reply} ->
	    Reply
    end.

The loop function that is spawned by the the start function has two part, an initialization part and a loop part. The code is:

loop0(FileName) ->
    case file:open(FileName, [read, write, raw, binary]) of
	{ok, Fd} ->
	    {ok, Eof} = file:position(Fd, eof),
	    file:position(Fd, bof),
	    FilePos = position_fd(Fd, 0),
	    maybe_warn(FilePos, Eof),
	    loop(Fd);
	{error, Reason} ->
	    exit(Reason)
    end.

maybe_warn(FilePos, Eof) ->
    if
	FilePos == Eof ->
	    ok;
	true ->
	    warn("~w bytes truncated \n", 
		 [Eof - FilePos])
    end.

We have added some extra code to check the logfile when it is opened. We need to position the file descriptor to the end of the logfile. To do that we could have called file:position(Fd, eof), however we need to cater for the case where the last logger term in the previous section was corrupted due to an interrupted write operation. Thw function that does the check is:

position_fd(Fd, LastPos) ->
    case catch getint32(Fd) of
	Int when is_integer(Int) ->
	    case file:read(Fd, Int) of
		{ok, B} when size(B) ==  Int ->
		    position_fd(Fd, LastPos + 4 + Int);
		_ ->
		    file:position(Fd, LastPos),
		    file:truncate(Fd)
	    end;
	_ ->
	    file:position(Fd, LastPos),
	    file:truncate(Fd),
	    LastPos
    end.

The position_fd/2 function returns last file position that was ok. If this is equal to the physical file position, all is ok. So, we are approaching the real code that does the actual logging work. The loop/1 server loop.

loop(Fd) ->		    
    receive
	{From, {log, Bin}} ->
	    From ! {?MODULE, log_binary(Fd, Bin)};
	{From, {upread, Fun}} ->
	    From ! {?MODULE, upread(Fd, Fun)};
	{From, truncate} ->
	    file:position(Fd, bof),
	    file:truncate(Fd),
	    From ! {?MODULE, ok};
	{From, stop} ->	
	    file:close(Fd),
	    From ! {?MODULE, stopped},
	    exit(normal)
    end,
    loop(Fd).

The truncate and the stop requests are handled immediataly in the loop whereas the request to log and upread are handled by special help functions. First we have the log_binary/2 function:

log_binary(Fd, Bin) ->
    Sz = size(Bin),
    case file:write(Fd, [i32(Sz), Bin]) of
	ok ->
	    ok;
	{error, Reason} ->
	    warn("Cant't write logfile ~p ", [Reason]),
	    {error, Reason}
    end.

The only noticeable thing here is the type of the structure that is passed to file:write/2 for IO. It is a list on the form [[int, int, int, int], binary]. This structure will be flattened by the port. This applies to all ports and if the Port supports the so called writev() interface (which is documented elsewhere) the writev() function of the native operating system will be called with an array length of 2 where the first array will hold the four bytes produced by the call to i32() and the second array will hold the actual binary data.

Finally the function to do the upreading, i.e a function that blocks the server for logging and then traverses the log and applies a function to each and every logged item is:

upread(Fd, Fun) ->		
    {ok, Curr} = file:position(Fd, cur),
    file:position(Fd, bof),
    upread(Fd, catch get_term(Fd), Fun).

upread(Fd, {'EXIT', _}, Fun) ->
    ok;
upread(Fd, Term, Fun) ->
    Fun(Term),
    upread(Fd, catch get_term(Fd), Fun).
    
    
get_term(Fd) ->
    I = getint32(Fd),
    {ok, B} = file:read(Fd, I),
    binary_to_term(B).

And finally the client functions to access the server:

upread(Fun) ->
    req({upread, Fun}).

truncate() ->
    req(truncate).

log(Term) ->
    req({log, term_to_binary(Term)}).

A typical little caveat here with the upread/1 function is that the client function is sitting in a receive statement waiting for upread/1 to perform its work. It is thus not possible to have the upread/1 function send all the terms to the client. (Since the client is allready suspended). To do that, we need an auxilliary process. Here is a little example session at the shell prompt:

2> slogger:start().
true
3> slogger:log({abc, "cba"}).
ok
4> slogger:log(code:which(slogger)).  
ok
5> slogger:upread(fun(X) ->  io:format("~p~n", [X])  end).
{abc,"cba"}
"/home/super/klacke/doc/examples/slogger.jam"
ok
6> 

A simple term logger (again)

In this section we have rewritten the above term logger but this time using the gen_server module as a utility to get help with the process structure.

gen_servers are extremely powerful and easy to use. The absolutely easiest way to write a gen server is to invoke the gen_server skeleton from the emacs mode while editing.

The entire framework of a "do nothing" gen server is generated and we just fill in the details. This way we get all the goodies that come from gen servers in general. They can be upgraded and downgraded wile running, they can be debugged and traced and they fit into the general application concept of Erlang.

The code is available in glogger.erl.

Next: Socket IO

Browse articles

Powered by Erlang Web