[erlang-questions] List Question

Joe Armstrong erlang@REDACTED
Tue Aug 8 12:23:00 CEST 2017


On Tue, Aug 8, 2017 at 11:52 AM, Andrew McIntyre
<andrew@REDACTED> wrote:
> Hello Joe,
>
> Thanks for you thoughts, I have been following erlang since your first
> book, but have 20yrs of OO Delphi code that parses hL7 and I was trying to
> learn erlang by implementing something I am very familiar with in an
> OO way, in a functional way, I am sure will take a while to get a hang
> of the community and functional programming. >

> Have a bad habit of
> trying to implement everything myself,

I have this too - I think it's a very good habit. There are only a few
ways to *really*
learn things - Implement it yourself, teach it, write a book about it.

/Joe



> and HL7 has nuances that make
> backward and forward compatibility quite good and that starts in
> parser. We have also done in elixir, but a little to much magic and
> syntax inconsistencies for my taste in elixir. I guess thats because I
> am used to very readable explicit pascal code, but elixir appeals to
> people coming from ruby. In the end it runs on the beam, and that is
> what attracts me!
>
> Will try and explain questions better, thanks for all the responses
>
> Andrew
>
>
> Tuesday, August 8, 2017, 6:00:59 PM, you wrote:
>
> JA> Hello,
>
> JA> I'm going to go way off topic here and not answer your specific
> JA> question about lists ...
>
> JA> Your last mail had the information I need - you're trying to parse HL7.
> JA> I have a few comments.
>
> JA> 1) Your original question did not bother to mention  what problem you
> JA> were trying to solve -
> JA>     You asked about a sub-problem that you encountered when trying to
> JA> solve your principle
> JA>     problem (principle problem = parse HL7) (sub-problem = differentiate lists)
>
> JA>  2) It's *always* a good idea to ask questions about the principle
> JA> problem first !!!!
>
> JA> I didn't know what HL7 was - my immediate thought was
> JA>  'I wonder if anybody has written an *proper* HL7 parser in Erlang' - by
> JA> proper I mean "has expended a significant amount of thought on writing a parser"
>
> JA> Google is your friend - It told me what HL7 was (I hadn't a clue here
> JA> - "never heard of it")
> JA> and it turned up a parser in elixir
>
> JA>     https://github.com/jcomellas/ex_hl7
>
>>>From the quality of the documentation I assume this is a *proper*
> JA> implementation.
>
> JA> Now elixir compiles to .beam files and can be called from Erlang -
> JA> which raises another
> JA> sub problem "how do I compile the elixir code and call it from Erlang"
> JA> and begs the
> JA> question "is this effort worthwhile"
>
> JA> Given that a parser for HL7 exists in elixir it might be sensible to
> JA> use it "off the shelf"
>
> JA> I have a feeling that elixir folks are good at reusing erlang code -
> JA> but that reuse in the
> JA> opposite direction is less easy.
>
> JA> The last time I fiddled a bit (yesterday as it happened) - it turned
> JA> out to be less than
> JA> blindingly obvious how to call other than trivial elixir code from erlang.
>
> JA> I was also wondering about cross-compilation. Has anybody written
> JA> something that turns
> JA> erlang code into elixir source code or vice. versa.
>
> JA> Cheers
>
> JA> /Joe
>
>
>
>
>
> JA> On Mon, Aug 7, 2017 at 3:46 PM, Andrew McIntyre
> JA> <andrew@REDACTED> wrote:
>>> Hello Craig,
>>>
>>> Thanks for your help.
>>>
>>> I am trying to store the data as efficiently as possible. Its HL7
>>> natively and this is my test:
>>>
>>> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>>>
>>> |~^& are delimiters. The hierarchy is only so deep and using lists of
>>> lists to provide a tree like way to access the data eg Field 3, repeat
>>> 1 component 2 subcomponent1
>>>
>>> Parsed it looks like this:
>>>
>>> [["OBX","17",
>>>   ["FT","TEST"],
>>>   [["8265-1",[],["LN","SUBCOMP"]]],
>>>   [[["1","2","3","4"]]],
>>>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>>>
>>> As the format evolves over time the hierarchy can be extended, but
>>> older clients can still read the value they are expecting if they
>>> follow the rules, like reading the first value in the list when you
>>> only expect one value to be there.
>>>
>>> Currently a typical system might have 12 million of these records so
>>> want to keep format as small as possible in the erlang format, hence
>>> reluctant to tag 2 much, but know how to get value of interest. Maybe
>>> that is my non erlang background showing up? Traversing 4 small lists
>>> by index should be fast??
>>>
>>> I guess I could save strings as binary in the lists then is_binary
>>> should work?? Is that the case. I gather on 64bit system especially
>>> binary is more space efficient.
>>>
>>> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>>>
>>> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>>>> Hello zxq9,
>>>>>
>>>>> Thanks, Unfortunately I do not know the value of the string that will
>>>>> be there. Its an extensible hierarchy that can be several lists deep -
>>>>> or not. Might need to revise the data structure
>>>
>>> z> In this case it can be useful to consider a way of tagging values.
>>>
>>> z> Imagine we want to represent a directory tree structure and have a
>>> z> descent-first traversal function recurse over it while creating the
>>> z> tree. We have two things that can happen, there is a flat list of
>>> z> new directories that need to be created, and there is the
>>> z> possibility that the tree depth extends deeper at each node.
>>>
>>> z> The naive version would look like what you have:
>>>
>>> z> ["top_dir_1",
>>> z>  "top_dir_2",
>>> z>  ["next_level_1",
>>> z>   "next_level_2"]]
>>>
>>> z> This leaves a bit to be desired, not only because of the problem
>>> z> you have pointed out that makes it difficult to know what is deep
>>> z> and what is shallow, but also because we don't really have a good
>>> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>>>
>>> z> So consider instead something like this:
>>>
>>> z> [{"top_dir_1", []},
>>> z>  {"top_dir_2", []},
>>> z>  {"top_dir_3",
>>> z>   [{"next_level_1", []},
>>> z>    {"next_level_2", []}]}]
>>>
>>> z> Now we have a representation of each directory's name AND its contents.
>>>
>>> z> We can traverse this laterally AND in depth without any ambiguity
>>> z> or need for carrying around a record of where we have been (by
>>> z> using depth recursion and tail-call recursion):
>>>
>>>
>>> z> make_tree([{Dir, Contents} | Rest]) ->
>>> z>     ok =
>>> z>         case filelib:is_dir(Dir) of
>>> z>             true ->
>>> z>                 ok;
>>> z>             false ->
>>> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
>>> z>                 file:make_dir(Dir)
>>> z>         end,
>>> z>     ok = file:set_cwd(Dir),
>>> z>     ok = make_tree(Contents),
>>> z>     ok = file:set_cwd(".."),
>>> z>     make_tree(Rest);
>>> make_tree([]) ->>
>>> z>     ok.
>>>
>>>
>>> z> Not so bad.
>>>
>>> z> In your case we could represent things perhaps a bit better by
>>> z> separating the types and tagging them. Instead of just "FT" and
>>> z> whatever other string labels you might want, you could either use
>>> z> atoms (totally unambiguous) or tuples as we have in the example
>>> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>>>
>>> z> [{value, "foo"},
>>> z>  {tree,
>>> z>   [{value, "bar"},
>>> z>    {value, "foo"}]},
>>> z>  {value, "baz"}]
>>>
>>>
>>> z> So then we do something like:
>>>
>>>
>>> z> traverse([{value, Value} | Rest]) ->
>>> z>    ok = do_thing(Value),
>>> z>    traverse(Rest);
>>> z> traverse([{tree, Contents} | Rest]) ->
>>> z>    ok = traverse(Contents),
>>> z>    traverse(Rest);
>>> traverse([]) ->>
>>> z>    ok.
>>>
>>>
>>> z> Anyway, don't be afraid of varying your value types to say exactly
>>> z> what you mean. If your strings like "FT" only had meaning within
>>> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>>>
>>>
>>> z> [foo,
>>> z>  bar,
>>> z>  [foo,
>>> z>   bar],
>>> z>  foo]
>>>
>>>
>>> z> So then we can do:
>>>
>>>
>>> z> traverse([foo | Rest]) ->
>>> z>     ok = do_foo(),
>>> z>     traverse(Rest);
>>> z> traverse([bar | Rest]) ->
>>> z>     ok = do_bar(),
>>> z>     traverse(Rest);
>>> z> traverse([Value | Rest]) when is_list(Value) ->
>>> z>     ok = traverse(Value),
>>> z>     traverse(Rest);
>>> traverse([]) ->>
>>> z>     ok.
>>>
>>>
>>> z> And of course, you can not use a guard if you want to match on a
>>> z> list shape in the listy clause there, but that is a minor detail.
>>> z> The point is to make your data types MEAN SOMETHING REASONABLE
>>> z> within your system. Use atoms when your values are meaningful only
>>> z> within your system. Strings are for the birds.
>>>
>>> z> -Craig
>>> z> _______________________________________________
>>> z> erlang-questions mailing list
>>> z> erlang-questions@REDACTED
>>> z> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>  Andrew                             mailto:andrew@REDACTED
>>>
>>> sent from a real computer
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
>
> --
> Best regards,
>  Andrew                             mailto:andrew@REDACTED
>
> sent from a real computer
>
>



More information about the erlang-questions mailing list