Hacker News

Guix Workflow Language(guixwl.org)

153 pointssmartmic posted 3 months ago51 Comments
batbomb said 3 months ago:

A workflow language is only as good as it’s engine.

Nextflow was mentioned. I think what most people want is probably closer to Airflow, although it takes some time getting it up to production in a cloud environment (there is astronomer.io and a GCP product).

HTCondor via DAGMan has existed a long time, and there’s even engines built on that (Pegasus, Wings).

There’s Swift (http://swift-lang.org/main/) and it’s successor Parsl. Cray has Chapel. These are a bit different, in that they are more like a distributed computer program. Of course, so is Julia, but built into these languages is the assumption you can be using unreliable, in some way, computing. Makeflow and GNU Parallel are closer to this category too.

Then there’s Beam, but that’s dataflow.

The crappy thing about this is it’s hard to understand when to use a solution and when to not use a solution. Why are there so many solutions? Because there’s a ton of different needs, and a lot of these focus on a few in particular:


Scalability or workers

Dynamic Scalability of workers



Integration with existing Schedulers

Workflow Code Management (container support)

Maintainability of very large DAGs

Testability of DAGs/Development support

Execution Management support/Web APIs

Error recovery (especially for long running workflows)

Re-execution capabilities

Provenance tracking

Domain Specificity

Data Management (next to data processing)

... the list goes on.

djtriptych said 3 months ago:

Just regarding Airflow: unless Google has done a lot of work upgrading the internals since embracing Airflow as a supported cloud provider, I would think twice about using it.

It's amazing it works at all in my opinion.

This file [0] contains much of the complexity as a messy, stateful, monolithic block of Python. Having had to chase down deep bugs / limitations in this software, I'm now convinced that Python, with it's GIL, weak typing, lack of concurrency primitives, and generally OOP / imperative style is just the wrong tool for the job.

[0]: https://github.com/apache/airflow/blob/master/airflow/jobs/s...

j88439h84 said 3 months ago:

I don't know if Python is the best tool for the job, but with modern tooling it is leagues better for complex applications than old python.

https://trio.readthedocs.io is an extremely good python concurrency library based on the model of Structured Concurrency (https://vorpus.org/blog/notes-on-structured-concurrency-or-g...).

The typing issues are far improved in current Python with annotations and attrs/dataclasses.

thesorrow said 3 months ago:

I'm using Airflow for a lot of critical tasks and it works really well. But I agree that Python may not be the best language to implement a workflow engine.

djtriptych said 3 months ago:

It's fine for moderate workflows. We ran into several hard limits when scaling up, and thought to try to patch some limitations. I think it's got a number of edge cases / scalability issues that will be very hard for them to fix without a full rewrite of the internals.

kinow said 3 months ago:

>Because there’s a ton of different needs, and a lot of these focus on a few in particular:

Indeed! I am working on Cylc [1] right now, which is a cyclic workflow system, where users need more than DAG.

It was created to automate weather forecast operations, but now there are a few cases of users trying to use it for cyclic graphs for more general problems.

[1] https://github.com/cylc/cylc-flow

arianvanp said 3 months ago:

I wonder why it is not in Guile? I thought one of the selling points of Guix was one config language to rule them all. Or is it some syntactic sugar on top of Guile? It's not clear. The page doesn't really explain the syntax clearly anywhere so I'm a bit confused.

leethargo said 3 months ago:

I think it is actually in Guile. Even though there are not parens in sight, this is a whitespace-based syntax for Scheme.

I don't have a reference at hand now, but I vaguely remember such a comment from the FOSDEM presentation.

serhart said 3 months ago:

It is implemented in Guile. The language is just syntactic sugar via macros and utilizes WISP.

https://git.savannah.gnu.org/cgit/gwl.git/tree/gwl/sugar.scm https://srfi.schemers.org/srfi-119/srfi-119.html

Sean1708 said 3 months ago:

> Processes and workflows are composed using a domain specific language embedded in the general purpose language Scheme. They can be executed in order with the guix workflow command.

So I guess it's just a set of macros for Guile (which I believe is a Scheme implementation, or contains a Scheme implementation, or something like that...).

jimhefferon said 3 months ago:

Guile is an implementation of Scheme that supports Revised^5 and a good chunk of Revised^6.

danielecook said 3 months ago:

For bioinformatics, take a look at Nextflow. I personally think it is miles ahead of the competition having reviewed about a dozen options out there.

This looks useful, but can it submit jobs to cloud compute clusters or HPC systems and operate locally? Maybe I’m missing the point in terms of the purpose.

snackematician said 3 months ago:

Nextflow is indeed the best option today. It's sad that it's based on Groovy though, which seems past its heyday as a language/community.

I'm very excited that this is based on Scheme -- I've long wanted a lispy workflow language!

I'm a little concerned about the coverage of the Guix package manager though. I guess instead of writing Dockerfiles the user would have to learn to write Guix packages.

The "Getting Started" example uses samtools so I guess this is oriented towards a similar bioinformatics audience. However without HPC/Cloud support it's probably not too practical, yet.

Addendum: Listened to the FOSDEM 2019 talk, seems like it does support Docker and HPC. However I need AWS Batch support for it to be really useful to me, hopefully that will be implemented at some point.

zekrioca said 3 months ago:

If there is Docker/HPC/AWS support in the system through its local commands, there will be support.. Check the cluster mode setup in guix documentation (https://www.gnu.org/software/guix/manual/en/guix.html)

Edit: adding gnu guix manual link

zmmmmm said 3 months ago:

> s sad that it's based on Groovy though

It's actually very cool since you can drop in any Java library you like, which is particularly nice in the bioinformatics space where HTSJDK, Picard and co. give you enormous power in that space.

vk3wtf said 3 months ago:

Nextflow does not capture the software required, which is important for reproducibility. Containers are not a solution for this as it only shifts the problem up a level.

My efforts with BioNix (https://github.com/PapenfussLab/bionix) achieve reproducible pipelines by using Nix to capture the software, workflow, and handle execution either locally, on a compute cluster, or HPC.

Guixwl looks similar to BioNix, though BioNix is a thin layer of Nix expressions and Guixwl seems to be more then that and could be more general. BioNix is targeted at bioinformatics and just builds on nixpkgs.

totalperspectiv said 3 months ago:

I've been migrating to nextflow over the last month. It really is fantastic. I even use it in place of bash scripts for little things now just because resume is so nice.

zekrioca said 3 months ago:

It seems to be used mainly for generic operations (locally available in localhost). However, commands like 'sbatch' (HPC manager system) could theoretically be used to manage the infrastructure.. however, again, I dunno if there would have an integration of such commands natively with guix, though it would be interesting..

fwip said 3 months ago:

I'm not sure if you're talking about nextflow, but it has support for many different job managers (slurm plus like ten more), AWS batch, and Google cloud pipelines.

zekrioca said 3 months ago:

No, I was indeed talking about guix.

svd4anything said 3 months ago:

I’ve found that combining Nix with Luigi provides a solution to managing complex reproducible workflows.


“Conceptually, Luigi is similar to GNU Make where you have certain tasks and these tasks in turn may have dependencies on other tasks.”

Having this directly type of functionality integrated into Guix (an alternative Nix) looks very interesting. I’d encourage the Guix workflow developers to study Luigi and SciLuigi for inspiration on design ideas.

I will be sure to follow this effort and see how it progresses.

dkimbel said 3 months ago:

For the curious, 'Guix' is pronounced the same way as 'geeks' [0].

0: https://www.gnu.org/software/guix/manual/en/html_node/Introd...

yarrel said 3 months ago:

Nobody is going to do that.

It's goo-icks.

lfam said 3 months ago:

Debian, Ubuntu, UNIX, Linux — none of these have an obvious pronunciation to a native English speaker from USA.

Why take such a defeatist attitude? I'm sure you can get the pronunciation right with a little effort.

michaelmrose said 3 months ago:

I also have a hard time saying gaaaa nome or matey like all aboard matey with a straight face or hey you should use the gimp.

vector_spaces said 3 months ago:

What is a workflow language exactly? What benefits do they bring vs using say Python?

danielecook said 3 months ago:

The two main things you want out of a workflow language are re-entrancy and a DAG of job dependencies. Re-entrancy is the ability for the workflow to pick up where it left if something crashes (basically via caching or detecting the presence of expected output files). The DAG is a directed acyclic graph of job dependencies: First do A, then B, then C. The DAG is worked out by the workflow manager, and jobs can be managed accordingly.

A good workflow manager builds on these ideas further by managing environments, job submission, parallelization, cloud/cluster submission, and other options that make processing large amounts of data a lot easier and more efficient.

jkh1 said 3 months ago:

Workflow languages are meant to formally describe computational data processing pipelines involving many steps and tools. They are often associated with an engine that allows setting up, running and monitoring of a workflow.

TeMPOraL said 3 months ago:

I see a surprising lack of parenthesis for something related to Guix. What's the story behind it?

jboynyc said 3 months ago:

There are talks by the creators presented at FOSDEM that go into the development of the workflow language.



chriswarbo said 3 months ago:

Lisp doesn't need to use s-expressions (i.e. parentheses), that just-so-happens to be the most popular serialisation format. These examples look more like I-expressions to me ( https://srfi.schemers.org/srfi-49/srfi-49.html ); i-expressions and s-expressions are equivalent and can be converted back and forth trivially.

Lisps can also support arbitrary input formats using reader macros, so it might be using that (I haven't looked at the implementation yet).

blunte said 3 months ago:

But with sexps, if your code suffers some formatting catastrophe (such as all instances of whitespace being reduced to a single space), the sexp code is trivially recoverable/reformattable.

Depending on whitespace and indentation creates such fragile code that I can't understand why the trade-off would be made.

chriswarbo said 3 months ago:

If that's a concern then serialise using s-expressions. You can still edit using i-expressions or something equivalent if you like (it's trivial to convert, after all). Code on-disk doesn't need to be the same as code in-editor (for example, syntax highlighting isn't saved to disk either)

nerdponx said 3 months ago:

Formatting catastrophes are exceedingly rare. You have your code in version control anyway, right?

shakna said 3 months ago:

It's based around WISP [0], an SRFI that grew out of a couple previous attempts to build alternative syntax.

[0] https://srfi.schemers.org/srfi-119/srfi-119.html

lenkite said 3 months ago:

https://www.commonwl.org/ is pretty good and supported by several workflow engines

lkirk said 3 months ago:

It would be nice if this language was extensible to running on various compute cluster managers. From what I can tell, these workflows only run on one machine. I like the bioinformatics tool examples though... you can tell who their target market is ;P

eterps said 3 months ago:

What are common use cases for this?

stilley2 said 3 months ago:

I've recently been using nipype [0] for workflows, which is fairly domain specific, but pretty nice.

0: https://nipype.readthedocs.io/en/latest/

gravypod said 3 months ago:

Would something like this make sense for defining machine learning/data science processes? Like obtain, clean, reformat, split (train/test) datasets.

1-6 said 3 months ago:

I love workflow languages. I would really want to make a visual one with input output nodes (in a true GUI fashion). See Autodesk Dynamo for a close concept.

jkh1 said 3 months ago:

There are already a few visual workflow composers, e.g. Rabix,KNIME

jkh1 said 3 months ago:

Q:How many workflow languages/engines does the world need? A: As many as possible: https://github.com/common-workflow-language/common-workflow-...

heavenlyhash said 3 months ago:

Is it really so surprising that people continue to iterate and explore the space of possible DSLs -- literally, domain specific languages -- especially when people are solving problems from many different specific domains?

jkh1 said 3 months ago:

but many, if not most, are not domain-specific.

zmmmmm said 3 months ago:

Depends what you mean by that. For example, in the bioinformatics space it's super common to parallelise a workflow over genomic regions and then merge the results. So I use a tool that has a top level construct for that, literally language syntax which makes that both utterly trivial and extremely robust (for example, deals with the annoying problems of edge effects, overlapping regions, trying not to create breaks in important regions, etc). You can argue all of that is basic parallelism and not domain specific, but in practice it's extremely useful to have these constructs at the language level.

heavenlyhash said 3 months ago:

That's a really excellent example -- crossing the regionality information of genomics with an otherwise-basic parallelization problem definitely makes it nontrivial. Thank you :D

fwip said 3 months ago:

They're specific to the problem domain, which happens to be cross-field.

The venerable `make` is a DSL. awk is a DSL.

xmonkee said 3 months ago:

`make`, sure, but calling awk a dsl is a huge stretch

fwip said 3 months ago:

It's not, really. It's even cited on the Wikipedia page as a well-known DSL. Give it a read, there's a lot of DSLs that maybe don't seem like one at first blush.

said 3 months ago: