Hacker News

Map/Reduce for Mortals(billwadge.wordpress.com)

75 pointsherodotus posted 5 days ago27 Comments
27 Comments:
darkkindness said 5 days ago:

The conversation Bill's touching on -- the tradeoffs between different notations -- is really valuable but I think it misses something a lot of us desire in a syntax: compositionality. Take the provided "onion" notation, loops, and the "new" syntax. They all look something like this:

  ┌────────┬────────────────────┐
  │myreduce│┌─────┬────────────┐│
  │        ││mymap│┌────────┬─┐││
  │        ││     ││myfilter│x│││
  │        ││     │└────────┴─┘││
  │        │└─────┴────────────┘│
  └────────┴────────────────────┘
(The math one looks more like this:)

  ┌────────┬────────────┐
  │myreduce│┌────────┬─┐│
  │        ││mymap   │x││
  │        ││        │ ││
  │        ││myfilter│ ││
  │        │└────────┴─┘│
  └────────┴────────────┘
But none of them provide the same kind of "putting pieces together" feeling as this:

  ┌────────┬─────┬────────┬─┐
  │myreduce│mymap│myfilter│x│
  └────────┴─────┴────────┴─┘
which we see in the wild as this:

  (myreduce ∘ mymap ∘ myfilter)(x)
or this:

  x | myfilter | mymap | myreduce
or this:

  myreduce mymap myfilter x
nikonyrh said 5 days ago:

I didn't spot Clojure here yet:

  (->> [1,2,3,4,5]
       (filter #(-> % (mod 3) (= 1)))
       (map    #(* % %))
       (reduce +))
#( ... ) is an anonymous function, arrows are threading macros: https://clojure.org/guides/threading_macros

If you wanted more trickery you could play with transducers and compose them with "comp".

fulafel said 4 days ago:

List comperhension version without threading:

  (reduce +
    (for [n [1, 2, 3, 4, 5]
          :when (= 1 (mod n 3))]
      (* n n)))
vmchale said 5 days ago:

> reduce(lambda t,x t+x,map(square,filter(lambda x: x % 3 == 1,[1,2,3,4,5])))

> Clear? As mud

> the lambdas and the functions obscure what is being computed.

Simply use J!

g =: +/ @: : @ (] * =&1@(3&|))

Or:

f =: +/ @: *: @ (#~ (=&1@(3&|)))

enragedcacti said 5 days ago:

This already exists in Python in the form of comprehensions

  sum(x * x for x in [1,2,3,4,5] if x % 3 == 1)
I'm not familiar with Lucid but maybe evaluating the implementation of comprehensions could help solve some of the design challenges.
bade said 5 days ago:

Might work, worth thinking about

lalaithion said 5 days ago:

Having pointfree functions and curried application makes the functional example more readable, as it removes the obscuring lambdas:

    (reduce (+) . map (^2) . filter ((==1).(%3))) [1,2,3,4,5]
The above is pseudo-Haskell. If we want to
lgas said 5 days ago:

In actual Haskell I would write it as

    sum [ n^2 | n <- [1..5], n % 3 == 1 ]
which seems to express the idea fairly directly and I'm fairly sure has a close analog in python.
lightsighter said 5 days ago:

python: sum([n * n for n in range(1,5,3)])

marai2 said 5 days ago:

haskell: sum [n^2 | n <- [1,3..5]]

I just remembered haskell's "range" operator ".." can also take a step size.

['a', 'd' .. 'z'] => "adgjmpsvy"

1-more said 5 days ago:

yeah to me this just hollers for the kind of "pour it through the functions" approach we see in functional languages or even with Ramda.pipe in JS. Ezpz.

adamch said 5 days ago:

Or in functional rust

  (1..5)
    .into_iter()
    .filter(|i| i % 3 == 1)
    .map(|i| i * i)
    .sum();
I don't know if we really need a new syntax for map reduce.
crooked-v said 5 days ago:

That's also basically the same syntax as doing it in JS, aside from the lack of builtins:

    import { range } from 'ramda';

    range(1, 6)
      .filter(i => i % 3 === 1)
      .map(i => i * i)
      .reduce((a, b) => a + b, 0);
Or, if you're using the pipeline operator (experimental, stage 1, nobody can quite decide on exactly the syntax but this gives the rough idea):

    import { range, sum } from 'ramda';

    range(1, 6)
      |> x => x.filter(i => i % 3 === 1)
      |> x => x.map(i => i * i)
      |> sum
1-more said 4 days ago:

We can get rid of these arrows using ramda's curried list functions, at the expense of being slower and harder to read but making us feel more clever:

  pipe(
    range,
    filter(pipe(
      modulo(__, 3), // x => x % 3
      equals(1) // y => y === 1
    )),
    map(flip(Math.pow)(2)),
    sum
  )(1,6)
oneplane said 5 days ago:

> "that’s as close as I can get in wordpress but you know what it looks like"

Um, I still don't know what it looks like. I know mod and there is a range of numbers it seems and an i to the power of two, but I really have no idea what the other symbols mean or if their position/layout in that fashion is important. I can probably reverse engineer what the mathematician writes by reading the code but I'm still confused as to why many articles about software assume a complete mathematics syntax knowledge.

darkkindness said 5 days ago:

Agreed, I think the syntax used is not that common. (This is a problem, I think, with math notation -- everyone's notation means different things, google "substitution notation history" for the worst.) They definitely meant to write:

https://latex.codecogs.com/gif.latex?\sum_{\substack{i\in[1,...

"sum of all i² where i is in [1,5] and i mod 3 = 1".

sp332 said 5 days ago:

https://en.wikipedia.org/wiki/Element_(mathematics)#Notation...

I'm not sure what the third line is supposed to look like, but I guess it's a filter. So: Sum i^2 where i is in the set {1,2,3,4,5} and i % 3 = 1.

said 5 days ago:
[deleted]
np_tedious said 4 days ago:

What exactly is this pyLucid? I searched for it and found a CMS and a generative video ML library. Don't see how either one connects with this.

https://github.com/jedie/PyLucid https://github.com/yelantingfeng/pyLucid

herodotus said 4 days ago:

The author of the post is the co-inventor of the Lucid programming language. In an earlier blog posts he describes a Python interpreter for a version of Lucid that he named pyLucid.

np_tedious said 3 days ago:

Thanks. For anyone else reading, it may or may not be this https://code.google.com/archive/p/plucid/

proc0 said 5 days ago:

List comprehension in Haskell is probably the best approach so far, imho. However once you're using these tools then map, reduce, filter should be easy to understand, so the terseness and clarity of it doesn't come without a cost.

let mod3 n = if n `mod` 3 == 1 then n else 0

sum [ i^2 | i <- [ mod3 n | n <- [1..5] ] ]

vmchale said 5 days ago:

> sort of objects are ranges? In Python they would be iterators, and that works out fine. In Haskell they would be lists.

> I think the best choice is to make them vectors (arrays)

In Haskell they are lazy linked lists! Which in theory is more efficient than vectors (in terms of space).

dnautics said 5 days ago:

this seems to be a syntax problem. The for loop is not really 'equivalent' in complexity to the functional solution. In Elixir, I'd do this:

   list
   |> Enum.reduce(0, fn ->
      x, acc when mod(x, 3) == 0 -> acc + x * x
      _, acc -> acc
   end)
But the presented functional solution throws more functions at it, which might look like this, which I think is quite clear.

    list
    |> Enum.filter(&(mod(&1, 3) == 0))
    |> Enum.map(&(&1 * &1))
    |> Enum.reduce(acc, &Kernel.+/2)  #better yet use Enum.sum
nesarkvechnep said 5 days ago:

You might even want to use the Stream module. Beautiful Elixir.

cutety said 5 days ago:

Stream would technically better, however given the discussion is about Map/Reduce, the only thing Stream has in common with Map/Reduce is it's lazy. If you wanted something comparable (mapping is done in parallel, reducing as well just over partitions), then you'd want to use the Flow[1] library. As it does the same thing as Stream.map |> Enum.reduce just parallelized/partitioned, and what's great is the Flow module is more-or-less a drop in replacement for Enum/Stream (with a few caveats like calling Flow.partition before Flow.reduce). But, with just some quick a dirty benchmarks you can see Flow outperforms Stream on all but the smallest data set (range 1..100):

    with_stream = fn range ->
      range
      |> Stream.filter(&(rem(&1, 3) == 0))
      |> Stream.map(&(&1 * &1))
      |> Enum.reduce(0, &Kernel.+/2)
    end
    
    with_flow = fn range ->
      range
      |> Flow.from_enumerable()
      |> Flow.filter(&(rem(&1, 3) == 0))
      |> Flow.map(&(&1 * &1))
      |> Flow.partition()
      |> Flow.reduce(fn -> [0] end, fn val, [acc | _] ->
        [Kernel.+(val, acc)]
      end)
      |> Enum.sum()
    end
    
    iex(4)> Benchee.run(
    iex(4)>   %{"stream" => with_stream, "flow" => with_flow},
    iex(4)>   inputs: %{"small" => 1..100, "medium" => 1..10_000, "large" => 1..10_000_000}
    iex(4)> )
    Operating System: macOS
    CPU Information: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Number of Available Cores: 4
    Available memory: 8 GB
    Elixir 1.9.4
    Erlang 22.2.1
    
    Benchmark suite executing with the following configuration:
    warmup: 2 s
    time: 5 s
    memory time: 0 ns
    parallel: 1
    inputs: large, medium, small
    Estimated total run time: 42 s
    
    Benchmarking flow with input large...
    Benchmarking flow with input medium...
    Benchmarking flow with input small...
    Benchmarking stream with input large...
    Benchmarking stream with input medium...
    Benchmarking stream with input small...
    
    ##### With input large #####
    Name             ips        average  deviation         median         99th %
    flow          0.0994        10.06 s     ±0.00%        10.06 s        10.06 s
    stream        0.0782        12.78 s     ±0.00%        12.78 s        12.78 s
    
    Comparison:
    flow          0.0994
    stream        0.0782 - 1.27x slower +2.72 s
    
    ##### With input medium #####
    Name             ips        average  deviation         median         99th %
    flow           83.87       11.92 ms    ±20.48%       11.30 ms       25.53 ms
    stream         74.88       13.35 ms    ±32.02%       12.32 ms       30.22 ms
    
    Comparison:
    flow           83.87
    stream         74.88 - 1.12x slower +1.43 ms
    
    ##### With input small #####
    Name             ips        average  deviation         median         99th %
    stream        4.98 K        0.20 ms    ±87.16%       0.169 ms        0.56 ms
    flow          0.70 K        1.42 ms    ±21.58%        1.35 ms        2.52 ms
    
    Comparison:
    stream        4.98 K
    flow          0.70 K - 7.06x slower +1.22 ms
[1] https://hexdocs.pm/flow/Flow.html
said 5 days ago:
[deleted]