Write your own Excel in F# (2018)(tomasp.net)
I was lead developer for a company ten years ago who developed tools for industrial companies. We had a steel rack manufacturer approach us with a panic inducing spreadsheet of the calculations they needed. It was 8 very full worksheets and included everything you could think of, including seismic data by zip code.
To build this proper was going to take our small team at least a couple months. The sales people didn’t like this. I had an idea however to extend a SpreadsheetML parser I’d written a few years prior with a formula evaluator. I had a prototype working within a couple days, I reimplemented a large number of Excel functions warts and all, and since Excel saved all the last computed values along side the formulas it was easy to automatically test. It was the first project that really taught me the value of unit testing.
It was a fun project and took around a month in the end, which was still over the amount of time sales had given us. The client actually sent us a fair number of revisions to the spreadsheet such that had we built it out any other way would have been a major project, but was as easy as swapping the spreadsheet the tool read. I left the company a few months later but I was sad to hear from coworkers they never did anything else with the tool. I thought it was a really neat hack.
Damn It. Where are these problems today? Or is it a matter of just choosing between getting paid vs passion? As a hirer in a FANG, I feel ashamed at how much we put some of the smartest and hard-working people to work on yet-another-crud-rest-api.
All the engineering companies that aren't software engineering companies. I work for a largish civil engineering and urban planing consultancy and we deal with all kinds of hard and interesting technical problems like this all the time.
The main downside is that the software development culture in these companies are at least a decade behind the state of the art and you will have to deal with weird things like TFS 2012 and no one having heard of the phrase "Unit Test". On the upside since there is no software development culture in place, there is less argument when you tell them you're moving all your code to gitlab and want to use F#.
I work at one of those rare manufacturing companies here in the bay area(pay isn't anywhere close to what FAANG median is).We tend to use Excel for everything from manufacturing quality Pareto charts, SPC charts and its a Big mess.All I hear from Big tech enterprise tech is IoT and datalakes but no immediate solution that aggregates all this data in manner that can be consumed by various LOB Apps that can do things with the data. So I am in the process of building one right now. But You are right. FAANG along with every other unicorn is running after the NBT, wherein there is so much more to automate in well established Non-big tech(Dino tech) like P&G case below.
You can do nearly anything in Excel/Office with VBA, from automating manual processes, to BI reporting, to gluing together any and all data sources, it's just that CS type geeks hate it and refuse to learn it.
Being limited to only Office for a few months has not made me think "how horrible it is not to have real tools", but rather "how amazing is the amount of money and effort people waste on 'enterprise solutions' that are both orders of magnitude more expensive and inferior".
On the other hand I once had the enviable task of converting a spreadsheet to SQL - minus the data import, the calculations in Excel ran for hours, after the conversion the stored procedure took about 2 minutes to run.
And this was something that the business had to do daily - you can imagine how effective it was to start the calc in the morning, have it hog all your CPU for hours then get the results after lunch if you were lucky and did not have to restart the whole shebang.
Yes, Excel is brilliant at communicating with business users and getting stuff done quickly. But it does not scale and has some problems with the normal software development flow - just try to put your Excel files in version control and you'll see what I mean...
I am inclined to think that 99% of the time people blame a tool for being slow, there are very, very basic issues in how they use it. And often people are just not motivated to solve those issues if they don't like the tool anyway.
I was handed a T-SQL script once, that produced a report on a few thousand items in a document management system. I was asked to use it to get statistics on a few hundred databases with millions of documents each. Which would have taken a few centuries at the speed that it ran. So I looked at it and realized the core was a procedural loop, which ran a separate SQL query for every row of output. Once the procedural part was replaced with one SQL query, it ran thousands of times faster and then all that remained was to replicate it to a bunch of databases and build the report in parts.
My rule of thumb is that if anything is too slow these days, it's because something is terribly misconfigured or something is severely wrong with the algorithm.
>> the amount of money and effort people waste on 'enterprise solutions' that are both orders of magnitude more expensive and inferior".
Let me give you another perspective: One of the reasons the Enron scandal happened was because traders for a major canadian bank,my former employer, kept the trades on their deskop Excel and audit and risk had no way of knowing the actual risk to the bank. When I was hired to build an expensive enterprise app that allowed only limited set of financial models, every single user hated it at first: it was both orders of magnitude more expensive and inferior". But the bank now has full uptothe minute knowledge of its risk exposure and the system has proven itself.
That's not really what I'm talking about. That's back to front, where you're using Excel as the core repository for important stuff. I'm talking about using VBA/Office the other way around, to combine and report on stuff that lives in various databases and systems, and thereby prevents what you're talking about.
The kind of alternatives I was questioning are BI reporting solutions - I have especially painful memories of Qlikview, and I just googled the licensing fees and they are insane.
Finanacial models need to exist in a central repository where they could be audited or updated. The issue with Excel is that the model is created/managed on desktops and the enterprise has no way of knowing if the model is accurate. We use Tableau and we like being able to centrally manage the model.
I agree that people waste a lot of money on solutions without really understanding the problem.
I think the solution is rarely automate with VBA - however, using Excel with a little bit of VBA can be a useful first step on prototyping what a solution could look like.
But people do waste a ton of money on enterprise solutions that are just overkill for a given problem. This usually occurs in organizations where management has no real technical background and those who do don't have any real authority.
Damn It. Where are these problems today?
They're still here. When we had to generate reports for municipalities, and ingest new data from non-tech sorts of people. What did we choose? Excel 2003 XML. Because it's what we could import easily.
I feel ashamed at how much we put some of the smartest and hard-working people to work on yet-another-crud-rest-api.
Why? Excel is inherently poorly specified and the file format support matrix is a bit… unwieldly. Interoperability is terrible, any sort of multi-user use case becomes an exercise in archaic procedures, and the various scripting languages, while powerful, are janky as fuck.
See my other answer...
Cannabis tech is where these problems are....
And i have a big network and some amazing work being developed....
It is always satisfying when you can knock out an entire class of problems in one go. Interestingly, this almost invariably involves writing either a parser or an interpreter
Uhm... can i have this app/script whatever???
I have a beautiful problem space for you: the newly legalized cannabis market.
(Btw im doing a cannabis tech consultancy and anyone who is interested in working in this space the ploblem-scope is very interesting)
Please contact me.
Nowadays, wouldn't you just use org.apache.poi for such a task? It can load an Excel file, fill in the input data, and retrieve whatever output data you need, evaluatong and caching formulas as needed.
It's in PHP and really rough by today's standards. We can talk more about it if you wanted to hit me up, my contact info is on my profile.
my typical experience with these kinds of things: I start looking at the source code, and the code is actually simple and short, the logic is easy to follow (make a grid, allow and process formulas in cells, display the grid and some additional edit-mode display if you are editing a cell)... but then I start seeing dependencies, references to other frameworks and multiple languages, libraries... and all that looks so confusing! the dependencies require at least an order of magnitude more knowledge to be understood or used effectively than what we are being shown in the post; that's the really tricky part to me. don't mean to criticize that, those are probably very powerful tools that in most cases you want to know about and use (if you are working in that specific environment), but it's like... are those 100 lines of F# even that important? writing your own excel sounds nice, but I feel it's more "how to glue a bit of F# logic to a web-compatible UI" (well, yeah, one could argue those are really the same things).
> "The final spreadsheet application is quite simple"
Is it? I easily understand the logic of the F# code, but I couldn't make another "similar" application so easily unless I was just copying the dependencies. I wouldn't really know what I'm doing, maybe it's not so simple!
(this isn't meant to be a criticism to the article itself, I just wanted to share this uneasiness I get when checking some code I don't know much about that's presented as "simple"/"short"/...)
I remember feeling this in regards to some early "rapid application development" software in the 90's (Visual Basic or Borland Delphi or something).
"Look how easy it is to make a text editor!" - Drag the "text editor component" over a blank form. Look how easy it is to make a web browser! - Drag the "web browser component".
Or, maybe it was in PHP, "As an example, here's code to format an integer as Roman numerals." Oh, that sounds interesting... Expecting a cool algorithm, then discover it's (for some reason) a standard library function.
> "Look how easy it is to make a text editor!" - Drag the "text editor component" over a blank form. Look how easy it is to make a web browser! - Drag the "web browser component".
But it is easy to make those things using those tools you mentioned and for the problems they were intended to solve these systems work perfectly fine without you needing to know how they work underneath.
However sometimes you need more than that and you do need to open the box and see what is inside and how to work with it. Both VB and Delphi allowed that and at least Delphi had extensive documentation (and full framework source code) about all aspects of it (VB was a bit different in that it didn't had documentation for itself but as it was built on full Win32/COM tech the knowledge was still part of the Microsoft tech - e.g. the MSDN CDs in VS6 had everything you needed to know - though Delphi was probably better on that front as a self-contained tool). And escape hatches were placed all the way down (e.g. getting a window handle was just a property in both systems, you could directly call Win32 API functions directly from both systems, interfacing DLLs was trivial, making custom controls was also trivial - at least in later versions of both systems, etc).
Last time i used PHP was many many years ago so i cannot judge that, but from what i remember it wasn't like VB6/Delphi that built on top of an existing system (for all their nice facades, they were inherently related to Win32 - an issue that still exists for LCL/Lazarus that reimplements VCL as a cross-platform library using native widgets) but instead was a system of its own that provided a raw(ish) interface to existing libraries through its module system.
Why solve problems other people have already solved?
To quote Feynman, "what I cannot create, I do not understand".
So in order for you to develop a payroll system you first need to know how to create a database?
Well, to write a high performance payroll system with lots of concurrent users it helps a lot to know how the database you’re using works internally.
There are situations where one uses a full featured abstraction like a database without needing to understand fully or modify it's internals. This is simply not one of them.
This is a situation where building a new applications with handmade layers that produce the outcome you want is more efficient than gluing together existing code.
You're coming at a complex problem by initially filtering the solution space. This discards much of the potential solution space for which the average case may be more expensive than The average case in the space after culling. However in many situations that solution space has many gems which are more efficient. Companies' tolerance to selecting those rather than using existing technology varies.
You have the quote slightly wrong, in order to understand a payroll system you first need to know how to create a database.
Knowing how a database is constructed and having to building one for the project are two different things. Yes friend, if a database is a major component of your system you should know how it works under the hood. They’re quite complicated and impose significant constraints on the system you’re building.
Well, yes, you should know _how_, even if you’ve never actually taken the time to do it.
If you are making a commericial product that features an in-memory database that can store non-standard types of data perhaps.
More often than not you would just use an existing product if you are not going to add any value.
That’s what they say, if they require a CS degree to work there.
with the current state of technology and the speed of it's progress, there will always be much more things that you don't understand than those that you do...
With regard to the details sure, but the fundamentals should be studied and understood. That's why this blog is not attempting to recreate the entire of Microsoft Excel.
That's not really the point. If someone says "look how easy it is to do X," then I want to see them doing X from scratch, so I can get an idea how hard it would be to use the language for my unsolved problems. If it turns out they're just dragging in prewritten code, that's useless to me.
Depends. If they told me "look how easy it is to apply this function to a certain subset of the elements of this array" and they show me map and filter instead of a for loop I would be all for it, even if it's "prewritten" code.
But I get the example about roman numerals, editors and browsers.
The browser drop-in was extremely useful. I made a blazing fast browser had view-source as a first class button was able to add in my favourites as icons. Better yet I could make the browser fullscreen.
I would consider it a browser that used an engine. Now a days everyone but firefox is using WebKit.. not much of a difference.
Curious how you would do that today without vb.net
I guess the point is that you can't really say you've made a browser. It's like bringing store-bought brownies to a party and claiming you made them yourself.
But he made the browser - the browser is the application itself (what he made) - not the browser's engine. Saying that he didn't make a browser is like saying Notepad++'s author didn't really made an editor because he used Scintilla.
A lot of the time, understanding someones solution is harder than just solving it myself.
If it's a complicated problem that can be boiled down to a simple API, I'll reach for it for sure. But I never have good luck with complicated problems wrapped in a complicated API. Usually I only need to solve 5% of the problem, so I'll either find a library that focuses on that 5%, or I'll code it myself.
Just to give my perspective, I read it totally differently. The impressive part to me was the F# core doing spreadsheety things, and the rest felt like incidental view logic that had to be covered for the sake of completeness but wasn't really important. I thought the fact that the demo is embedded right in the page was incredibly cool because it let you poke the logic interactively, but I wasn't too concerned with how it was done.
From a quick skim, it looks like the only dependencies are the basic UI library Elmish, and a parser combinator lib that itself is only 129 lines.
You might argue that if I tried to become an F# developer I would get to know most of those on days one and two, and I wouldn't have to worry about many of them as they are just auto-generated or whatever. And I would agree. But those are the files that exist on the repository and that the developer needed to make this project work, despite making a "toy project". And they are there for a reason, and are related to other, way more complex pieces of software. Not like there's anything inherently wrong with it, but that's the "hidden" knowledge and complexity I was commenting about. Not something that's particular to F#, though.
> Not something that's particular to F#, though.
So what's the point of the comment? You could make this exact same comment almost any time someone posts any cool project they made. 'Look at all the dependencies! Project tooling, IDE, runtime platform, version control, libraries.' It doesn't add anything useful to the discussion about a specific project.
Please reread my original comment. I'll try to summarize: my point is not about a specific project, but rather that sometimes we talk about / sell something as being easy, short or simple without accounting for all the things you said: project tooling, IDE, version control, etc., while all that is indeed part of the software complexity. I was just pointing out that in some cases we may be obviating that too much. Yes, I could have written this in any other project that made remarks about some code being simple or easy when (imo) it's not.
I never criticised tooling, libraries, etc, themselves, I even made that explicit a couple times, and it's true I've said that it might happen in all languages. That still doesn't mean it's equally bad in all languages, or that it can't be improved, etc. I wasn't trying to start a discussion about this, it just happened. It was just a thought that I transcribed as a comment, in case it gave something to think about to someone.
> sometimes we talk about / sell something as being easy, short or simple without accounting for ... software complexity.
Software complexity exists, yes. But in your original comment, you did say: 'the code is actually simple and short, the logic is easy to follow...' I think the fact that modern software is built on a complex base shouldn't take away from when it accomplishes something in a simple, elegant way. Abstractions are after all the key enabler of software systems. Criticizing them and the complexity they sit on top of, seems like missing the point.
I'm this case, there is virtual-dom dependency, one more file with some code to simplify the creation of UI and whatever Fable (F# to JS) provides. So it's not 100 lines, but there is a lot less dependencies than you'd expect.
I implemented a React component with a subset of Excel features and additional customisation features: https://github.com/iddan/react-spreadsheet
Fantastic example of how to structure a real life (ish) application with functional programming. I don't know any F#, but I can see how to apply this to imperative languages and make them easier to debug
How does F# on Fable compare to Elm and Purescript, and how is each doing these days?
I don't have much experience with Elm or Purescript, but one nice thing about F# with Fable is that it's a pretty flexible language that let's you do things in many ways and integrates well with libraries that use different styles.
For example, the Excel blog post is using virtual-dom directly (without higher level libraries like React) and getting that to work was a few lines of simple code with one mutable variable, but that's all you need to get a nice declarative Elm-style architecture.
Unlike Elm, F# is 'functional first' but supports OO and impurity (with good and bad).
Biggest clear plus is maybe full stack development support. Checkout SAFE stack (Saturn + Azure + Fable + Elmish).
There is also Bolero, an alternative for Fable how to run F# in client side. Where Fable compiles F# to JS, Bolero relies on Microsoft Blazor project which has a whole .Net runtime on WebAssembly allowing to run .net binaries directly on browser. I think this is still quite experimental but if you need/want some dependencies from .net ecosystem it might be easier than you think.
Fable/Elmish is a cool combo, I'm writing really simple admin tool for work with it now and really enjoying it.
Thanks for sharing. I'll try to write an equivalent code using Rust for learning purposes. Since it has algebraic datatypes too, it should be fine.
How does F# compare to OCaml or Haskell?
F# started as Ocaml for .Net and then added all sorts of pragmatic language constructs. I'm not an expert - F# AFAIK misses some of the higher level type theoretic stuff, but you can write 'basic' Ocaml that would be isomorphic with F#.
The key high level features that F# inherits from Ocaml is a succint syntax due to type inference and 'if it compiles it's bug free' development.
F# is missing OCaml modules and functors, which are the biggest reason to prefer OCaml versus other similar languages. OCaml also has structural typing via row polymorphism. And GADTs.
It's easy to dismiss this as "type theoretic stuff", however these features we are talking about are very natural to use and when dealing with static typing and FP you bump into them quite a lot. And given that OCaml has some of the best type inference in the business, this happens without you even knowing about it.
F# is also missing a means for doing ad-hoc polymorphism like in Haskell. So no type class capabilities, no higher kinded types either. This greatly diminishes what you can express in it and the gymnastics you have to pull off to work around it are not fun.
Here's for example what you need to do in absence of GADTs: http://fssnip.net/mq (n.b. this pattern is called "tagless final", resembling "visitor" from OOP land).
I love this sample because if it looks complex, well it's in fact more complex than it seems, as the type system is actually not fine with this usage either, so you end up with forced type castings anyway.
F# is a fine language, all things considering and not having certain features might be considered an advantage in certain circles.
However do learn OCaml and Haskell, because they have plenty to offer and you'll learn useful abstractions that in F# are not very accessible.
At $WORK we use initial algebras in F# instead of tagless final. (They're isomorphic ideas.) It's possible to hack up a GADT using the initial algebra pattern, if you have existential types (which you can get via skolemisation) and type-equality types (which you can pretend exist as a pair 'a -> 'b and 'b -> 'a).
What is a GADT?
"Generalised algebraic data type". Pattern-matching on a GADT not only gives you information about the term that was in the datatype, but also about a type that was in the datatype.
For an ADT that doesn't need the power of GADTs: pattern-matching on a List<'a> gives you either an Empty or a Cons(x, xs), where you know up front before you do the pattern-match that x and xs are of type 'a and List<'a>.
For a GADT that is not an ADT: pattern-matching on an Expression might give you "Const(a) where a is an int", or "Equal(a, b) where a, b are bools", etc. Without doing the pattern-match, you can't necessarily tell what types you've ended up with.
F# has operator overloading, which is missing from ocaml?
That is by design. OCaml philosophy has been to be explicit about code. It's very Go-ish in that sense. So for example integer and float operators are different:
However OCaml does allow operator redefinition and shadowing, so you can redefine and use any operators you want just by opening their specific modules (local opens, i.e. module opens that last for the scope of a single expression, are preferred):
let int3 = int1 + int2 let float3 = float1 +. float2
Note: the above modules Int.Ops and Float.Ops are for illustration purposes only, they don't exist in the standard library. You could write them pretty easily, though.
let int3 = Int.Ops.(int1 + int2) let float 3 = Float.Ops.(float1 + float2)
My Ocaml is rusty, but I am pretty sure you can do Int32.(3 + 2) or similar. This may be the core library though..
I'm not sure if I'd phrase it in terms of something missing from OCAML (or F# for that matter).
There's a substantial diff between OCAML and F# at this point, once you get past "core" programming with functions in both languages. Some of the more prominent things F# has that
* Computation Expressions
* Perf abstractions and compiler analysis for them, namely Span<'T>
* Type Providers
* The .NET generics system
* Anonymous Records
* Slices and ranges for slicing or generating list-like data
And as the parent user said, there's plenty OCAML offers that is missing from F#. I think it's a good idea to consider them quite differently, despite having the same functional core.
It's interesting to compare some of those things, though.
* OCaml has now landed monadic and applicative binding syntax ( http://jobjo.github.io/2019/04/24/ocaml-has-some-new-shiny-s... ), which I think makes it a superset of F# computation expressions.
* Regarding Span<'T> (or my preference 'T Span :-), I believe F# and OCaml have developed different optimizations and constraints. For example, in OCaml allocation is really cheap. But there are still well-known techniques for minimizing it.
* OCaml also has the PPX system, which allows to programmatically transform a program's syntax tree into a new tree. I believe this allows somewhere around the same power as type providers and quasiquotations, combined.
* Anonymous records are cool, OCaml objects are pretty similar in that they are structurally typed. Also interestingly neither of them supports pattern matching–if I recall the F# limitation correctly.
* Slices and ranges are cool, in practice there are pretty powerful and well-known OCaml libraries that provide those, e.g. https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base...
This isn’t really true.
You can do: let ( + ) a b = ...
That's a custom operator definition. Operator overloading is when the same operator can be used on different types (for example, if `+` worked on both floats and ints, instead of explicitly having to reach for the correct `+` like you do in OCaml).
Which can be done, since OCaml has objects
let (+) a b = a#add b
Yeah, I guess you can get a limited sort of operator overloading by building it out of row polymorphism. That doesn't solve the most common use-case of operator overloading, though, which is allowing the same operators to work on floats and ints, because it only works on objects.
Does "if it compiles it's bug free" actually work in practice? I've heard a lot of advocates for Haskell and Rust use similar lines to this, but it seems to me like it can only be true for a limited subset of all bugs. Elm's "no runtime errors in production" claim has a more reasonable-sounding scope.
Sure, it's limited to a subset of all bugs but with a proper typesystem the category of bugs you can remove from code by just encoding the semantics and algebra into the types is quite large.
So if you know how to formulate the problem so it leans heavily on the type system, you can implement something like a binary tree container, and presume it is correct when it compiles.
Following the same parameters bestows rest of the codebase with this robustness.
See for example the ACM article "Ocaml for the masses" by Yaron Minsky of Jane Street for very lucid practical examples https://queue.acm.org/detail.cfm?id=2038036
"It doesn't eliminate bugs, but the bugs that remain are better bugs".
Better bugs in this context is bugs at a higher level, eg. application logic issues.
... to be fair, it takes a bit getting used to a compiler that is as strict, but, when you learn how you need to implement various things, it increases your velocity since you can type stuff and not bother with too much intermediary testing because you can trust you've pretty much typed as you meant.
If anyone is interested in gamedev in F#, try Xelmish. Its F# + the elm architecture + monogame
Value of this cell = formula which may depend on other cells
The coolest part is that it's in F#. Add that to the title. How many F# samples do we get around here anymore?
F# was in the title when I upvoted. Someone changed it.
The original title is "Write your own Excel in 100 lines of F#." I don't know why the F# part was removed either. HN guidelines say "please use the original title, unless it is misleading or linkbait; don't editorialize.". Having read the article, I don't think the "in 100 lines of F#" part is misleading, nor is it Buzzfeed-style clickbait.
"in 100 lines", especially when not strictly accurate, is a nerdier version of "9 zany reasons ..." and the like. HN's title star chamber is also generally down on magic numbers in titles.
That's pretty yucky since anything written in Rust gets to keep that detail in its title
Rewriting code in rust is new fashion, so for the purpose of publicity anyone writing anything in Rust will put the language name in headline.
But I believe this trend started with Go lang and later picked up by rust community.
It's nice to see some F# code. I am happy a lot of new programming language innovations happening, I did not pick up rust, because the programming paradigm is similar to other programming language with focus on memory safety. I will try F# may be later, these days more busy with learning lisp as many people recommend it to become a better programmer even though it's not popular as Go and Rust.
F# is in the title, but it was changed after posting on HN
> But I believe this trend started with Go lang
You might be right, when node js was becoming popular.
Personally I observed it only when I myself started programming in go, at one point in time in the early 1.0 version.
The Rust people have taken over YCombinator. It's sad.
The Rust people understand that language adoption is a game of popularity, not a meritocracy as we'd all like to imagine. Can you blame them for doing all they can for their language to win?
Totally agree with this. We, (folks working in tech), needs to get away from click-baity titles like this, and write more factual titles.