Suppose you find a library that kinda does what you need, but not really. Do you adapt it to your needs or do you build a version from scratch that really suits your needs, but might take a bit of time to develop?
My rule of thumb when considering adding a dependency these days is to start by trying to implement the functionality myself. But I don't let myself spend too much time on it (maybe an hour or two. maybe a day). More often than not I'll get bogged down pretty quickly and realize the problem is more complicated than I assumed. This approach gives me:
1. Greater appreciation for the dependency
2. Better chance of modifying the dependency when it breaks or doesn't work the way I want, because I'm at least somewhat familiar with the types of tradeoffs you have to make to implement such a thing.
3. A chance that the problem turns out to be simple enough to not require a dependency and the prototype works just fine forever.
4. Get to learn something new.
Obviously there are exceptions. I wouldn't try to implement a relational database just because I need one. But in these situations I try to take a step back and ask if I even need a full DB. Would flat files work fine for the task at hand?
I second this approach which is what I do as well. My baseline is that I prefer to not have any dependencies. This is usually not realistic nor necessarily pragmatic in practice (i.e. reinventing a relational database), but its a good baseline.
My general feeling towards dependencies (in programs or life) is "do I really want to depend on this thing?" Almost always the answer should be "no". But the follow-up is "do I really need to depend on this thing?" and sometimes the answer is yes because I don't want to or can't re-implement the thing due to time or because I don't have the expertise.
When the packages and configuration files for the project are more complicated than the python code it would take to do the one simple thing that you need. A lot of frameworks tend to bloat to handle every use case and become overly complicated because of it.
Things like passing protobufs/gRPC when C structs, JSON, or CSV files will do, job scheduling systems that could be replaced by a postgres table, most Kafka applications.
I once worked with a guy who would pattern match whatever problem you had to the trendy Apache or Google library and then spend weeks/months setting it all up. He was easily the least productive person I've ever worked with.
>Things like passing protobufs/gRPC when C structs, JSON, or CSV files will do
These will always get the job done, but at great cost to future maintainers and callers. Explicitly declared interfaces that give some thought to future evolution aren't that hard, and can save a lot of pain down the road.
I vividly remember the outage I caused by changing a struct field name, only to find out it was implicitly a critical part of the API "contract" when serialized to JSON. Another where I relied on a certain field from a dependency's response, only to find it out it was populated in some situations (the examples I used to learn the interface) and not others (revealed by production). Our Thrift services do not have these problems. Anyone can look at a diff and tell a) that the IDL is changing, b) whether the change is breaking, and c) whether a field is required or optional.
GRPC is great as an API generator, but there's a lot of machinery under the hood in its network layer. It's not small - there was an embedded project where we were considering using it to replace our roll-your-own, then we discovered that the compiled library for GRPC was larger than the entire application.
C structs are so brittle they should not be used for any kind of serialisation. You're just setting yourself up for upgrade nightmares. I wish BSON or the ASN.1 toolset were easier to use for this use case.
For C structs, start with a uint32_t 4 char code identifier and a size field. Then you can extend them as long as you only ever append, which is a constraint on protobufs anyways. This is how most media containers have worked since the Amiga days.
Meanwhile, their encoding/decoding is infinitely faster than most other serialization formats, allowing you to serialize data back and forth to disk or as IPC faster than anything else.
.. as long as you never change endianness, structure packing, word alignment requirements, and always use types with explicit sizes. And you have to keep the data in the structure itself, so as soon as you hit a pointer you need to write explicit serialisation.
> .. as long as you never change endianness, structure packing, word alignment requirements, and always use types with explicit sizes.
#pragma pack(1) and byteswapping solve all this and neither are more work than specifying some intermediate format and calling some external tool in your build system. In C++ you can make types that swap automatically if required.
> And you have to keep the data in the structure itself, so as soon as you hit a pointer you need to write explicit serialisation.
I have yet to see a serialization format that does this well and quickly. Just make a new message type for it that follows the first.
Or juste use riff which useful and provide just enough to implement wav, bmp and other file formats (so you'll find heaps of examples(.
Indeed. Using plain structs one can't just add a single field with a reasonable default and still be backward compatible - which is a very very common case in API design.
You could add completely new revision of the API. But having to maintain it - typically additionally to the old revision since both sides might not updateable in atomic fashion - is a lot of extra work.
Serialization which allows to add and remove fields in a backward compatible fashion is a mandatory requirement for any bigger sized project for me. If you need super high performance maybe use Flatbuffers or Cap'N'Proto instead of Protobuf/JSON/CBOR - but I really wouldn't recommend to go for plain structs.
>but there's a lot of machinery under the hood in its network layer
GRPC is designed to enable peer-to-peer communication in a microservices mesh, so it has hooks for a number of things like service discovery, authentication, health checking, retry policy, connection pooling, observability, etc. If you're just looking for a serialization format you might prefer Protobuf itself.
I hope the CBOR tooling is better https://cbor.io/tools.html
I vividly remember the outage I caused by changing a struct field name, only to find out it was implicitly a critical part of the API "contract" when serialized to JSON.
Lack of unit tests for serialization can be deadly for projects where the serialization format is defined implicitly via the application model. I've seen enough projects bitten by this to be convinced it's worth writing ser/des unit tests for every API object. For some reason it's a form of testing that even gung-ho unit testers tend to leave out. Arguably it should be covered by integration tests, but integration tests usually only test round-trip serialization, which rarely breaks when you're testing a codebase against itself.
But I think the issue you faced was a documentation issue as well. It should have been evident which objects defined the API. In some projects the objects have to be heavily annotated to make ser/des work at all, which makes it obvious, but when it's possible to separate concerns and define ser/des separately, there should be documentation or naming to make it just as easy to see.
I agree, but in this case I moved the interface in question to Thrift because it provides both documentation and compile-time safety for substantially less code than doing it manually with JSON. (There are also various ways to do some or all of it with JSON, such as JSON Schema and Swagger).
Oh yeah NIH is a problem, but "we can't invent it here" is also a problem
Person tries to find a library that does what one line of code would do (cough left pad cough and others) and then guess what, the library fails at that thing that maybe not 90% of people need but 50% need
And that even applies to some corners of famous apps or libraries big sigh
This is too common of a problem because many developers are functionally illiterate in their primary programming language.
Invented here syndrome: https://dev.to/mortoray/invented-here-syndrome-4mg8
>...pattern match whatever problem...
I agree with your comment. I would add that there are cases where your problem looks really easy right now, but will get increasingly complicated over time in ways you don't expect. In this case, "overkill" 3rd party solutions are actually better. For example, you could write an HTTP client using sockets pretty easily, and it will work in simple cases. But you will, over time, run into edge cases and problems that a mature library has already solved for you.
Also, this problem reminds me of having to decide whether to solve the integral, or look it up, back in the day. Solving from scratch was always preferable if you could do it, but sometimes it was just too hard.
> But you will, over time, run into edge cases and problems that a mature library has already solved for
It’s entirely possible that the new problems you encounter will be different than the ones the mature library was designed to address, and then you end up hacking around solutions to problems you don’t have in order to fix the ones you do.
I actually did have to write an HTTP library on sockets once. It wasn’t trivial (took a month to productionize), but it was core to our business. We were writing developer tools (Parse) and our file upload library had major problems on the latest iPhone due to an OS bug (if a kernel buffer filled before the modem could upload it, the file transfer would fail).
We became the third company I know about to ever solve this problem, but I had built what my colleague called “the hardest to use HTTP library ever”. I was happy to move back to AFNetworking once we could do so without a regression in performance or hitting that bug.
Can you shed a bit more light on this -
> job scheduling systems that could be replaced by a postgres table
I've never used Postgres in my life ever.
I think the point is to use a DB as a message broker. Schedule a job by adding a row to a table. Have services running that secure a lock on a row, run the task, and then update/remove the job from the DB.
Django-Q (the multiprocessing task queue framework, annoyingly named the same as Django Q objects), allows this as one of the configurable brokers. https://django-q.readthedocs.io/en/latest/brokers.html#djang...
+1. And I think specifically the SKIP LOCKED Postgres feature is relevant for this -https://tnishimura.github.io/articles/queues-in-postgresql/
Oh, I hadn't noticed that. I'm not sure how Django Q manages with a non-postgres DB, since it doesn't mention that as a requirement.
I have used nowait before to batch scheduled jobs together for a slow API call.
Why would you pick django-q over Celery, which also supports a database backend?
We've had a lot of mysterious "it just stops working" issues with Celery, and their # of open tickets is approaching scary levels.
Moving everything to rq, and so far so good.
Are you sure that changing your implementation didn't fix a bug?
I've noticed that a lot of "dependency" problems on "complex systems" are usually programmer implementation problems. Changing the implementation forces you to review the original implementation (or throw it out) and fix the bug that caused you to change the implementation in the first place.
We’ve had exactly the opposite experience. RQ was an absolute nightmare in every aspect across two companies I’ve worked at. And when you need something that Celery supports that RQ does not, you are in a pickle.
I wish you the best, but I cannot help but think you will regret that decision.
Also, Kiwi.com gave a talk about their experiences with RQ and how/why they migrated to Celery at EuroPython last week. It's worth a watch.
Have never experienced this and we sold a huge number of our app using celery. Based on other replies one should pause, take a step back and consider if their code or design is to blame..
Funnily enough, we also use this approach in Google. There's at least once chapter in the first (the unactionable) SRE book about this.
As long as I want to avoid very edgy race condition cases (low throughput), any DB will work right?
Either Mongo, MySQL or List in Redis.
Is there anything sepcific about Postgres when it comes to this? (except SKIP TABLES, as mentioned in other comments)
I said postgres because it's just all around good at a bunch of stuff. But I've used redis lists in the past to great success. Redis streams might be even better but they weren't ready the last time I needed a queue.
rq ("redis queue") for python does this with, obviously, redis.
API is pretty small and comprehensible.
It can't. Not easily.
Job scheduling is trivial. Doing it reliably, tracably and in a way that handled failure predictably is really a pain in the arse.
There are zillions of ETL frameworks out there, just use those. some are unspeakably complex (airflow I'm looking at you) Some are simple, but require glue (AWS batch) some are trendy (argo on kubernetes)
However, most CI tools are effectively job scheduling systems. Jenkins works surprisingly well as a cron replacement for >100 machines
C structs are really, really brittle; CSV is alarmingly complicated (unicode); JSON can't pass "None". I like CBOR, myself.
> JSON can't pass "None"
What about null?
Ahh, then I might be talking crap. I'll check it out, thanks.
You can pass null, undefined, or remove the property.
JSON Schema – a whole topic unto itself – gives you further control over locking down how this can be represented (e.g. field MUST be present but can be string matching this regex or null).
CBOR has extremely slow encoding/decoding in C/C++ when compared to structs. Nothing is perfect.
For sure, compared to structs. Which you can almost certainly RDMA from one place to another. It's a little bit apples and oranges though since CBOR is self describing (and very lightly compressed).
Shameless plug: https://github.com/RantyDave/cppbor
Depends on the quality of what currently exists and the license. Typically if I find something liberally licensed that does 80-90% of what I want, I'll fork the project, add the remainder and send a pull request back to the parent project (which is normally ignored).
Other things I take into account are how important the X is to the business. My company's product parses a very strange industry-specific file format. There are some commercial parsers and some open-source ones which do most of what I need but I opted to roll my own as working with this file format is very important and I can't risk license concerns.
Also, it might not be very nice but I don't want to help my competitors.
I'll strongly second your comment regarding business importance. However, I will slightly rephrase it:
By expending resources to build "X", your company is now in the business of building "X". Is building "X" a business the company wants to be in?
The story behind Amazon building AWS for internal needs, and then selling it externally, is an exemplar of this thought process. They were already in the AWS business... They just decided to package and sell the product to others.
only maintain a separate codebase if the feature is accepted, that way there is a finite timeline on having to be in the build business. As someone who maintained a forked openstack distribution for 5 years as a company selling openstack tooling... we wasted significant time tracking down integration bugs rather than working with the community to integrate our changes upstream
"If it’s a core business function — do it yourself, no matter what." was Joel Spolsky's take.
He wrote an article on the topic: https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-...
Where do you draw the line though on what’s “core”.
Let’s imagine you’re a software shop that sells a SaaS app. Something like Slack as an example.
Should you roll your own dedicated compute for your SaaS app or use AWS?
Should you roll your own web framework, or use Rails/etc?
Should you use AWS for file storage of Slack media files that are uploaded, or roll your own dedicated storage solution?
I have the upmost respect for Joel. His company created their own Programming Language because they felt it was so core to their business to do so .
As did Facebook did with Hack.
This really depends on what the company wants to adopt as “core”. They cannot be great at everything, so they must be careful in picking what topics to be experts in.
For example, maybe the Slack like app wants to be able to upload, distribute, and archive huge computer files. Everyone has tried to upload a file to an email, slack, or discord message and was told the file was too large. Our platform is going to be the platform for sharing 1TB sized files for collaboration purposes, with the goal of supporting 10TB in summer 2020 and 1PB before 2022.
The company should 100% make their own network protocol for transferring files.
* Is your business managing hardware and data centers? No, so you use 3rd party hosting.
* Is your business developing and maintaing a general-purpose web framework? No, so you pick one off the shelf.
Once you have your product, have proven your market and have a healthy customer base you can start thinking about optimizing your verticals.
* Maybe your cloud hosting bill is enormous because of your workload and it turns out that colocating at a local data center will be a lot cheaper so you migrate.
* Maybe you realize that none of the current web frameworks make your devs feel productive and they keep having to fight the tooling to accomplish what they want. If you find that you're carrying a lot of patches for your framework maybe it's worth your time to have someone working full-time to upstream them. If you think it's completely lost then it might be worth it to hire a team to work on building a new one.
>>”Is your business managing hardware and data centers? No, so you use 3rd party hosting.”
So should Dropbox roll its own storage solution, or use cloud hosting - based on your statement?
Hack for Facebook isn't really a good example of this. It, along with HHVM and kin was more of a mitigation of the fact that there was too much PHP to rewrite.
Ah this is a very difficult, multi-variable question. I think this question should be given with a lot more context. Of importance, out of the top of my head, are:
- Do you need the get results now, or long-term maintenance is more important?
- Are you going to roll your own internal tool or publish it as open source?
- How responsive are the developers of the original library? Did you attempt to ask them for help/consulting/etc?
- Can you wrap the library so that you get 80-90% of the work out of the way? Can you fork it so that you get 50-70% of the work already done? As everyone mentions here, licensing is important for this question.
- Do you have people who have created and maintained libraries before? Do you know what it takes to publish your own library?
I heartily agree with all the concerns here, but there are also business and professional development factors to consider, in addition to your present situation:
- How stable are your needs? Will your ideal setup be the same 6 months from now?
- Is this a core piece of the business or a peripheral concern?
- Which is more valuable: developers that understand the underlying theory, or developers that are familiar with the preexisting library?
- Will the act of rolling your own library make your team more valuable? Will it make them too expensive?
- How will this choice affect developer retention? Your ability to hire new developers?
A problem I wonder about is that frameworks and doing things well are often diametric opposites. For example, studies keep pointing out that for e-commerce, low latency and ridiculously fast sites matter a lot. They get you from zero to functional really fast, and then you're stuck.
You're not getting that with django, or ror, or ... And no, caching will not get you there, especially not if you take every framework's approach to caching: always set cache headers to never ever cache because it might be dynamic.
The thing is, if you expect to really grow, you're going to have to roll your own, as that's the only way you'll ever get it really fast, really doing what you want.
Take two nominally identical physical products (from IKEA, for example), one that’s on the shelves today and the other produced several years ago. There’ll be lots of subtle changes that got made as the manufacturer figured out how to improve the product, by cutting costs or improving reliability.
The frameworks you mention are great prototyping tools but as a community, we’re missing the knowledge of how to take a proven prototype and continue improving the quality rather than bolting on questionable new features.
> as a community, we’re missing the knowledge of how to take a proven prototype and continue improving the quality rather than bolting on questionable new features.
No, we aren't. We know how to do that, and can do it when we want to. In software dev we rarely want to, because (loaded use of the qualifier “questionable” aside) product teams tend to perceive (not entirely inaccurately, though sometimes the particulars are in error) market demand for additional features. (Also, what physical products are very often optimizing is unit production cost for equivalent products, not quality. But software already has an essentially zero unit cost, so there's essentially no gains to be had optimizing that.)
Oh yeah, I just realized I missed a whole category: personal development. I would definitely not be here (probably not in software at all) if I hadn't reinvented the wheel again and again for fun and learning.
> Will the act of rolling your own library make your team more valuable? Will it make them too expensive?
Are you saying that developers should be kept ignorant and incompetent so that they'll have no choice but to stay at your company writing poor-quality software?
I’m saying that pushing your developers to improve will increase their market value, possibly outside your personnel budget. One possible outcome here is that they decide to leave and join a company that will give them a higher salary to reflect their increased skills.
On the other hand, withholding opportunities for growth is also a way to frustrate employees and cause them to leave. Know your people, and take their changing needs into consideration when planning business strategy. There’s lots of strategies that can work here, but not all of them are right for every company. A few off the top of my head:
- Build a training pipeline so you can regularly hire promising young developers for cheap and help them grow into a more senior position elsewhere.
- As your developers improve, be prepared to give raises and promotions to reflect their increasing value.
- Hire experienced developers with the skills you need at a fair price and offer stable employment instead of fast-track growth.
the most important aspect: is the functionality provided by the library core to your business need, or just an incidental need?
E.g., you need a web-framework to program your e-commerce site, The web-framework is not a core business need (you are just selling shit online), but incidental. Therefore, don't roll your own.
You need a web-framework to implement a SaaS app, and this SaaS app is your main business (think Canva, or lucid charts etc). Therefore, you should roll your own to suite your SaaS app, and make it fit intimately with your business needs.
OP doesn't mention a company at all. If they are doing a fun side project at uni, or a research project, it might that business needs are not be the most important aspect!
> If they are doing a fun side project
in that case, their business need is to have fun! And i would say writing it all from scratch is most fun of all.
Isn't it always better to open-source something that you need a solution to? (as someone else will likely need it, too)
If it's something that others will likely need and will likely pay for, it may make more sense to build it and sell to them. By open sourcing, you eliminate one major potential revenue stream for unknown benefit.
That someone may be a direct competitor to you. Giving them access to your R&D may let them undercut your prices and make it harder for you to find clients.
It depends on how critical the X is to your risk.
For example, I lead a project where we wrote a driver instead of using an open source one. Writing our own manged risk, because the open source ones were unstable or had data integrity issues. Overall, the "cost" of "writing a driver" was about 5-10% of the total effort, once we consider integration, QE, installers, signing...
The thing with 3rd party frameworks is that they aren't built for your requirements. So, if you don't "roll your own X," you need to make sure that the X you choose will meet your business critical requirements early in a proof of concept.
Also, if you are going to "roll your own X," it's worth it to do a few POCs with other Xs. This way you can learn how to make a better design, and what to encapsulate in your X.
(In my case, the 3rd party drivers encapsulated poorly. Most of my effort spent writing our driver was things I'd need to figure out anyway, because the other drivers didn't encapsulate basic details that I didn't care about.)
Other times, it makes sense to "roll your own X" to keep things simple. ORMs are a great example. Their leaning curve can be higher than simple queries and boring code. If you only have a few queries, why bother adding something that's just going to get in the way?
Which gets to: Do not use frameworks in place of design patterns.
"When it adds value", or so it is said. Define when it value is added and you get the edge you need to cross when it becomes viable.
Say you are working on a FOSS project to replace something and the one of the dependencies doesn't work out. If you have the project capacity, it might add more value to roll a replacement.
Example: say you are writing a metrics collector, and you need something that can run inside a container because you want to deploy it as a sidecar. A dependency you use needs root network access for some reason, and you can't really bypass it using CAP_'s because it's hardcoded to look for UID 0. You can patch it if the code allows for it, but if it doesn't, you still need a fix so you might as well roll your own.
On the other hand, say you are in a commercial setting, then value is probably not added by rolling your own until it's either freeing up resources or addressing a business need.
Example: you are running a bunch of APIs behind a gateway, and the gateway exposes those APIs on a subdomain based on their name. But you kinda want them on the path of a single domain instead. You could do an ugly patch with a reverse proxy in front of it and rewrite all requests, but at that point you might as well implement a replacement gateway that does the same thing. While it might be a slightly bigger effort, it both frees up resources and gets a business requirement resolved.
There really is no one-size-fits-all rule, even if you take licenses, quality, maintenance, sourcing, cost etc. into account it still won't cover all cases. Say you are a small company but your software is used on billions of devices. Rewriting a part of a kernel would give you a few cycles less CPU time required for a task that multiplied by billions saves a lot of computational power, but you, as a small company, don't have the resources to do so. Even if the license, quality and maintenance is right, you still wouldn't rewrite something like that.
Every time you roll your own, you need to attach a future cost: bug fixes, documentation, testing, training, and so on. Your future costs can quickly spiral out of control when you implement your own solutions. Furthermore, because these are so expensive to maintain, yours will be a bare-bones version and start to show its age rather quickly. The cost of switching away from something increases over time, so inertia will incentivize staying with your minimal, custom solution while other, better solutions will become more full-featured and robust.
Unless you pay the cost to keep up with the industry (or switch away from your custom solution once others meet your needs), you will fall behind. But keep in mind that others are most likely not paying this cost, so they're investing in features, continuing to outpace you in areas that matter.
In short, only implement something yourself when absolutely necessary—for example, when it's a core competency, or when no current solution exists. Even a de facto solution that's suboptimal tends to be better than your own, because it will be documented, new contributors won't have to train on it, and it may improve over time. If you have to switch away from it at some point, there may be tools to automate or simplify that transition.
The smartest decision for things outside your core competency is "do what everyone else is doing." Focus on innovating on the things that matter instead.
One of the key considerations I look at is license. If the 3rd party library has a license I can work with, modify and use etc. And you have to consider distribution too in that conversation. A lot of times this one fact takes away all doubt on the right way forward. People use Apache licenses on a lot of projects that it is questionable if not down right improper, and I am kinda anal about that as for commercial products you don't want someone coming back and saying you violated the terms/spirit of the license.
Outside of licensing. It is usually based upon the needs of the project, if I can find a library that does 80% of what I need and I can add the remaining 20% and I am happy to live with that code then I'll definitely use a 3rd party. In general I favor those libraries, but just so many times the licensing trips the flag that it isn't worth the risk and adding a few extra days or even weeks to a project is well worth the time.
One other point, I also am really big on logging/metric collection, if I need to go in and instrument an entire library then it adds another level of work so I start thinking about just creating our own. But not all libraries do need detailed logging and metrics (although IMO most do).
The Apache 2.0 license very clearly allows commercial use. The sole major difference between Apache 2.0 and, say, MIT is that Apache 2.0 prevents you from using the trademark of the entity that created the original software that you are modifying.
"This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file." from https://www.apache.org/licenses/LICENSE-2.0
I'm not a lawyer and not your lawyer.
I typed Apache and was thinking GPL, totally my fault.
Over the years I have sought council and been advised by multiple different attorney's that for a closed source commercial product to avoid certain licenses, like GPL and others. Apache in fairness is not one of them, my bad. MIT and Apache are the two we have used and included in projects in recent memory.
In the end I am not a lawyer either and my core point still is the same, a major question for whether to use a 3rd party library still is around what license the library is released under. And it is usually best to get advice from an attorney for any of the more complex license models.
GPL allows for commercial use, but you must also provide source code with your product. As such, it can often be difficult to monetize.
What is a situation where liscensing your software as GPL is "questionable if not down right improper"?
The entire point of the GPL is to spread the adoption of free as in freedom software. If you aren't willing to share your project with the community you are not allowed to use the community's work.
I agree overall, nothing is wrong or improper with GPL if the product you are creating also complies with the same license terms. In the context I said questionable and improper is discussing commercial software. When I think of commercial software it is a non open source product being sold, licensed etc. Not that open source can't be commercialized but if you notice almost all companies run a dual license for their commercial vs open source software, e.g. QT, Mongo and others. They do this because it is the only way to monetize the software and not have every competitor have the same proprietary add ons they created and companies are willing to pay for.
An example I dealt with a few years ago, I was working on an embedded project and the associated desktop software that the original developer utilized some GPL licensed open source libraries, compiled into the binaries (both embedded and desktop). When I did a license review and discussed it with the attorney it was deemed that GPL licensed components could not be included due to the licensing terms and the way that product was distributed along with the closed nature.
In general I follow the rule, and have been told multiple times, that GPL (any version) with rare exception is not to be used in commercial software (including API's and server based products) unless you are strictly dynamically linking to the library. But if you are compiling it into your commercial product and especially if you are making alterations to it and compiling it in you are in violation of the license terms (and definitely the spirit of the license). Of course, if your product is open source, then that is a totally different story.
Even more open licenses like Apache and MIT still have copyright notices you need to comply with at minimum, so all that has to be taken into consideration as it needs to be added to product manuals, about pages etc otherwise you are in technical violation of the terms. Hence, the license terms of any library should be a key consideration on whether you include it into your project.
So someone you work with broke (or attempted to break? It's unclear whether the product you're referring to was released) the law by using GPL code in a closed source commercial product? This is done right improper (and illegal) for the developer who used the GPL software. Not the creator who choose the GPL likely to specifically prevent what you described.
People tend to understand open source software as software that is free to use however you want. This is not what GPL software is though. It's entended to be an ecosystem of software that is open and free to use as long as you are also willing to share the software you create using the ecosystem.
My default these days is to roll my own (at least within some predefined context), this is because I view each new dependency as a significant cost, so i'll invert my answer: For me to include an external dependency it must provide very relevant and high value to the project to outweigh it's inherent costs.. even if that is satisfied I feel like I should also trust the authors.
This is interesting to me because I often see rolling my own X as having a higher cost than bringing in a dependency. Maybe this is due to a lack of experience on my part, so I’m curious what you all consider when evaluating the cost of a dependency?
My opinion: rolling your own does have a higher initial cost, but it can have a much lower long term cost. Consider the case where you need a very small modification to the dependency. In many cases that will require a much greater investment because you will now have to a) understand the dependency, b) make and test the change and c) support the change long term (unless it is folded into the mainline). Also consider the case where some dependency or specification changes (way too frequently) or there is a bug that you desperately need fixed. What do you do then? Technical debt. For me then, understanding the probability and consequences of events like that is important.
One thing I’ve gotten better at over the years is identifying the viability of off-the-shelf tools early on. If you minimize subpar tooling at the source, you can mitigate long-term frustration down the line.
Yes this is my point, upfront cost for a dependency is obviously zero (well maybe not zero if you include the cost for learning someone elses interface and integrating it into your code), it's always later that you pay in unforeseen ways.
> this is because I view each new dependency as a significant cost
I view lines of code that other people have to maintain as less costly than lines of code I have to maintain.
> I view lines of code that other people have to maintain as less costly than lines of code I have to maintain.
Initially... but they can break your lines of code in unforseen ways in the future, and when that happens it will be you maintaining those external lines of code, because no one else cares.
To make things worse external dependencies are often highly generalised and complex because they are trying to cater to more than your project, as a result you may incur the cost of those lines of code even if your project doesn't need them as you have to wade through them and accommodate them when things go wrong with the parts you care about.
I'm talking worst case, but the point is, you want to only use dependencies you trust, which usually means minimizing your total number of dependencies also.
Code reuse is initially harder, as you have to figure out the existing system, how to integrate with it, how to frame your problem in its terms, etc. This is usually slower than just writing the thing. It's over time, when other people/teams are contributing to and fixing bugs in the shared component, where you and the company are saving work vs. each team with a similar-shaped problem learning the same lessons independently.
My brain says never, for everything there is some turn-key thing. Even your entire project might be franksteined with some third-party products and Zapier into something working. Same with libs, better a crappy lib than working days for something you might not need.
My gut says always: most of the times turn-key stuff has weaknesses and good libs with perfect apis, well maintained and with a great community are rare but they are there of course.
Hard question since what the gut says is more fun while the alternative is just about gluing libs together. But using libs and once they don't fulfill your needs build yourself is the right but more boring way to go.
Your brain is falling into the classic JS/NPM trap of thinking that dependencies are zero cost. The time it takes to use something is not 0, there's still a comparison to make. There's tons of stuff out there with APIs, documentation and community resources that are way more complex to navigate than whipping out a text editor and building something that handles your own use case.
You are nitpicking and TBH I don't get your message. My post was a general advice and the tendency of a creative mind, still you need to look into each case and decide if make or buy. And your NPM trap analogy is just blatantly wrong and doesn't help OP. You could have mentioned any package manager btw.
I'm not nitpicking, I just picked out the first part of your post because it's the only bit I disagree with.
I'm saying it's dangerous to have this line of thought that the "correct" way to build a system is with a mish mash of third party libraries and that doing any non-glue coding yourself is only for fun. You're insinuating that the decision OP is making is between efficiency and fun, whereas in reality it's an optimisation problem for efficiency that a lot of people mess up because they don't understand one side of the equation. (and if OP wants to have fun and roll his own that's great but it's another case entirely, I'm assuming he's not asking HN for permission to do that).
I use NPM as an example because for JS devs in the places I visit it's becoming a cultural thing, which IME is not the case in other communities. I don't see as many python devs entertain the thought of maintaining something for years with "crappy libraries" in it for the sake of saving "days" of work. But in JS land that attitude is all over the place and the phrasing of the first bit of your post reminded me of it, so...
If the library is crucial to what you are making it is worthwhile to consider rolling your own. Not because it will be better but in the process you will understand the problem in a way that you otherwise cannot.
Libraries also tend to be general purpose so up to half of the code in a library may not be pertinent to your usage.
I wrote an app that used a lot of Oauth and ended up writing my own instead of using the excellent Passport library. It would have been easier to use that, but I gained an understanding of Oauth that I would not otherwise have, and my code is small, understandable and easy for me to maintain and modify. Just my 3 cents worth.
For personal projects, I will happily build everything from scratch myself. It’s great fun and I learn so much, so why wouldn’t I? (Although it’s fricken awesome when people show up to help out.)
For work projects, I like to look for the best available FOSS option and then try to help out on that project so I can feel a bit invested in it. This helps calm down my hacker urges to build my own thing from scratch :)
I always makes sense to roll your own X when you're a leader in the field and X doesn't exist yet until you (or somebody else) makes it and releases it.
I've had a lot of situations like that and been accused of NIH syndrome, but what is the alternative…not doing something at all because nobody else has built or released something that can be used? Every library or tool in existence started out because somebody wasn't content with the status quo enough to bother inventing a new thing. This is the only way the state of the art can progress.
This is not a popular opinion, but I think you should always rewrite the wheel. The reason is simple, the product you are building cannot be more interesting than the technology that lies under it. If you rewrite that, you will have to learn how things really work. And only when you do that, can you improve the fundamentals.
To me the flaw in this approach is that we have finite time and attention. I'm all for rewriting something when that something has the potential to radically improve what I'm building. But if we are busy rewriting everything else, we may never get to the one thing that delivers the most value.
As an example, if I'm building some new piece of SaaS that has an API, I could write my own JSON parser, my own HTTP server, my own network stack, my own OS. Heck, I could eventually learn how to make my own servers, my own processors. But I'd be comfortably dead before I learned how to do all those things as well as off-the-shelf components deliver for a SaaS API.
Instead I'd rather spend my time on activities that are more valuable to my users. If one of those things turns out to writing a custom JSON processor, sure, I'll do it. But if the only purpose in me doing that is to learn something, then I shouldn't be jamming my educational experiments into a production system that doesn't need them. I've had to clean up too much of other people's flavor-of-the-month experiments to want to inflict that on whoever maintains my code next.
Writing something new comes with tremendous future cost. 99% of the functionality will be pretty quick to implement. But 3 years later, you'll still be fixing edge cases and you'll have to keep the documentation up to date if you ever want to hire other developers to work on it.
Writing something from scratch just because you want to understand it is fine. But you should think long and hard before actually using the new thing you just wrote in production code.
A few scenarios I can think of are:
1) If it doesn't work or works poorly.
2) If it is trivial to implement.
3) Licensing problems.
4) If the implementation is overly complicated.
5) The (framework) vendor have a habit of updating and breaking your code. This happens a lot with the Angular team. There is nothing like the annoyance of an Angular update and they have broken some feature you are using and you have to update code that was working absolutely fine. Since then I have avoided Angular entirely.
Things I am less strict on:
1) The project is EOL. It may work perfectly fine. This happens to open-source software as well. Typically this means there is a better alternative out there.
2) Awkward syntax / api. You can normally just wrap this.
3) Maintainers are difficult to work with or don't fix bugs.
Depends on the library. If it's a good one, makes sense to adopt, I sometimes do: https://github.com/Const-me/nanovg
Some libraries have hundreds of megabytes of their own dependencies hard to remove.
Some libraries have very low code quality, makes it expensive to modify.
Some libraries are very hard to build, e.g. I've never managed to build Skia. When you can't build it you can't adopt it.
You should also pay attention to license, there're GPL licensed libraries out there, if you'll try to use or adapt them you'll have to open source your complete software under the same license.
Makes sense to do it when you can hack together a quick prototype that will teach you more about the problem and will be fun.
We needed a sensor data capture platform at work. There were a lot of odd requirements, like consuming data from embedded hardware, severe bandwidth restrictions, and high-precision (GPS) timekeeping. We hacked something together in python + flask in 2 weeks, and it was easy to experiment on the analysis side by just writing another flask route and generating some html with plot.ly and leaflet.
We're also evaluating thingsboard and the elastic stack, but setting those up, experimenting, and extending them was slower, less fun, and harder to debug. At some point we'll do a shootout between all the platforms and pick the winner if this is something that gains significant users, in which case the community and active support of the platforms will be a big advantage.
Why hack something instead of using gprmc which is designed to handle the backend for exactly what you are looking for?
If the subset of feature that I need can be written in the amount of time it would take me to read thru and internalize the library, I'll just write it from scratch. Then if there's a problem with it, my team and I will already have a good understanding of the code, instead of discovering that the library is in fact crap (which they often turn out to be).
Then if it's something that doesn't really matter and the library code is pedestrian (a logging library), I'll use the library.
If it's something that matters a lot and it's expert knowledge and is a large body of work (a complex math/compression/crypto library), I'll use the library.
You might want to roll your own prototype to discover what you actually need. Then when you know that, it's much easier to pick a library that satisfies you actual needs not the needs you thought you'll have.
Whenever X is closed/proprietary and doesn't have an open alternative. Never rely on something you do not ( or at least cannot if you wanted ) control. Never. You WILL regret it one day.
I have a closed source proprietary database that I wish I did not have to manage. I pushed for the open source but CTO didn't trust that there would not be a long development tail (it probably would) since there were few good options at the time so he bought it and I regret not fighting harder against it nearly every day since it went live.
The core product works fine and has little maintenance which is nice but its cost scales badly and takes every bit of my budget now and the fancy features never work or can actually be used in production. If I had that budget instead we would have something far more suited for the company but seems like few managers will take that risk unless its the core function of the company.
Also the mostly relevant xkcd: https://m.xkcd.com/1205/
Google for how to do X in language Y. If 99% of the top answers on stackoverflow say use library Z then just do it. Usually there will be 3 different Z in which case you should take some time to decide on one of them.
If there is no Z, don't try to find the needle in the npm haystack of poorly maintained libraries with tons of needless sub-dependencies. Unless the problem domain is very beyond your capabilities, just roll your own.
Programming your own GUI in an "engineless" plain-OpenGL/DirectX graphical project or game. I rolled my own GUI for my C++ game because all existing examples were either very badly designed (think singletons everywhere), obscenely expensive, or just outright abandoned. Writing something that uses a hierarchial box model with style attributes isn't too bad.
Reasons for valid NIH:
- you are on your own and don't care about the future maintenance cost of your code
- you want to learn
- the available solutions are objectively crap
- job security
- you're a control freak
- you think all code you write is immediately better than other people's battle tested solutions, aka inflated ego.
- you want to be seen as the maintainer of some prestigious project rather than as the contributor to someone else's prestigious project.
I wrote an article that covers this from a DevOps perspective, among other things. Perhaps you'll get some value from it: https://calebfornari.com/2019/07/11/devops-decision-making/
Why would you make your own leftpad when someone already has spent the time and the energy perfecting and publishing leftpad?
Another good question to always ask is "If I'm getting X from Y, what do I do if Y stops providing X the way I need it?"
An example might be analytics. It's good to know that you can drop in a third-party library and just go, but you need to make sure to look at how you can move your data off of that platform and use it yourself. You also need to consider how much of a time investment switching off of Y would be. How much effort will rolling your own be? How much effort will rolling your own after using Y? How hard will it be to swap Y out for Z?
Another classic case here is web hosting. I use GCP. I've used AWS. I really don't want to, but if Google or Amazon kicked me off/out/raised the prices too much, I'll host from my own hardware, and go to Fry's if I need to get more hardware than I have. I really don't want to, but I will.
Can we go over bad reasons to roll your own x?
1. You don’t want to take the time to learn an existing product
2. You’re not going to properly code and test your own solution
3. You think your product is too unique to use an existing solution
4. You’re going to leave features out that you would have gotten for free
> You don’t want to take the time to learn an existing product
Existing libraries can be very complex. Spending time on them is not guaranteed to pay off. When a library is marketed as a framework, it’s a red flag, such libraries often come with a strong opinion how the software should be designed.
> You’re not going to properly code and test your own solution
This article is old, but still good: https://www.joelonsoftware.com/2002/05/06/five-worlds/ Different software needs different tradeoffs for quality, budget, and the rest of them. If you’re working on a tool you’ll run only once, grab the output data, and forget about the tool, you don’t need neither proper coding, not much testing.
> You’re going to leave features out that you would have gotten for free
Are you going to need these left features? For many real-life libraries, like https://www.boost.org/, you only going to use small subset of these libraries. The features you’re not using aren’t exactly free. Inflated binary size, slow compilation, tons of dependencies, complicated build setup, complicated deployment are quite common issues.
Some context might help my comment.
A programmer who used to work here, wrote his own PHP micro-framework, instead of using one of the popular, standard ones.
Instead of just using one that already existed, and taking the time to learn it, he used his own. He may have done it partially for the experience or his own desires (he was probably actually bored because we have a bad manager, but that's another discussion)
We regularly find and fix bugs with this PHP framework. Had he chosen a mature, community-supported product, this wouldn't happen.
There are incomplete features you would have gotten for free with other frameworks. It has maybe, 20% of the features other frameworks have, even though it has much of what we 'need'. We had to write our own password-reset email code, for example.
Other reasons are possible, too.
1. When they started, pre-existing frameworks weren’t such popular nor standard.
2. When they started, the functional requirements they had were very different from what the product grew eventually. Many software projects start small and grow.
3. Couple times in my career I worked in projects where management didn’t believe in functional specs, the requirements changed dramatically in the middle of development. With large libraries, some changes in architecture are handled by replacing the library with another one. Quite expensive to do. An in-house library can be more flexible, just because it’s less code.
I’m not saying making in-house libraries was justified in your case. Maybe it wasn’t and that developer did it mostly for fun. Just the trade off is quite complicated, and your reasons are not necessarily that bad.
In many cases, I think it’s technical management’s job to make the decision, i.e. evaluate alternatives, maybe allocate resources to make a prototype.
Also, from the management standpoint, if a developer can write their own frameworks, there’s very high probability they can use existing ones. Maybe it makes them slightly less happy i.e. need to pay them more to compensate, but they’ll do the job. The opposite is not true, there’re people skilled at combining libraries and copy-pasting third-party code who can’t write their own code.
It really depends on the task at hand and what the alternatives are.
For example I would much rather write 200 lines of my own code to deal with user authentication (leaning on my framework or a lower level library to deal with the gory bits of setting and deleting sessions), vs. pulling in some higher level authentication library that has 3,000 lines of code but tries to be so generic that it's a jumbled mess which is difficult to customize.
Basically the more important the feature is for my application, the more inclined I am to write my own code. Although I may at times draw inspiration from other libraries that don't quite solve my problem but helps me get closer to what I want to do.
Depends on the license. If it's permissive I'll pop open the code to see if it's any good. Maybe 50/50 it's good enough.
From scratch only if I'm doing something truly unique. New algorithm/protocol. Unpopular language I'm forced to use is missing something I need. Existing solutions too slow or buggy. Maybe a couple times a year I get to make wheels.
NIH is a huge problem at most (all?) software companies. At my current place I spend about 1/2 of my time dicking around with crap internal tools and libraries.
It's rare I write anything that's not glue code. Makes everything boring but I don't get paged often and we have relatively few bugs.
Almost never. If you have (for instance) scale that the canonical solution doesn't handle, consider it. If you'll be completely shafted if the canonical solution disappears, consider it.
Please don't make another build system. Please.
Polyfilling is also some kind of "roll-your-own X".
In web programming, it might sometimes be useful to write your own polyfill when you need to add a single feature for backward-compatibility reasons. Well, a third-party polyfill library that provides that missing feature perfectly might be available. However, the library is too big (with respect to file size) or adds too much complexity during the build process (by pulling in lots of other dependencies).
> Suppose you find a library that kinda does what you need, but not really. Do you adapt it to your needs or do you build a version from scratch that really suits your needs, but might take a bit of time to develop?
Which do you think will take more time to develop and maintain: the wrapper layer that adapts the existing library to your needs, or the ground-up homegrown implementation? Your estimate of the answer here (along with the risk calculus of the external dependency and the reliability of it's maintainer) are the key factors in answering the question
There are numerous articles out there that will tell you not to roll your own rule engine and indeed there are lots of them out there. But unless you are solving a well known problem, I've always found the open source rule engines to be either opinionated or just too large/unwieldy. It's not THAT hard to make your own. You only ever a handful of chances in your career to implement a rule engine anyway - since they are usually the core of the software that a company sells. Make it right!
Tend to agree. When I've seen Drools integrated in Java apps, it's usually a bloated mess and a simple DSL would have done much better.
> adapt it to your needs
If it's a feature that can be cleanly added into the architecture - i.e. it's just filling in a gap.
But if you have to modify the whole thing in a way that doesn't quite fit... it's a nightmare. Even with your own code, it's simpler to start from scratch with a clean slate.
Also Alan Kay's extreme position: "those who can create their own [libraries], should"
In the "kind of does what you need" case, I'd write my own but maybe borrow ideas from the original.
The moment you fork something, you've created work for yourself when you want to upgrade it.
That may be fine for a prototype. But I've seen it done on software that lasts 5-10 years and those are always the things that turn into maintenance headaches and blockers to unrelated upgrades.
If you want to own governance and direction of an X.
As always, the answer is highly dependant on a number of factors: - How complex is X? - What expertise do you have available for X? - How critical is X?
But in general, if X is not part of your core business, only as a last resort.
99% of the time, even if there is a suboptimal library, as long as its actively maintainted, you're better off extending the functionality, than re-implementing.
When the total lifecycle cost to use existing solutions is higher than the cost to build and maintain, when the future direction of development is unlikely to match your requirements, when the existing solutions lack critical functionality and modification or extension are non-viable, or when you want to learn more about the implementation.
I frequently run across libs that are more complicated to configure and call than implementing the features myself. I'll often roll my own simply because the lib (even if it's really popular) takes more time to use than that "bit of time to develop".
I come from a era where free and open source software was quite rare when doing commercial ventures. Also I did a fair amount of embedded work in my early career. As a result, my normal first reaction is build over "buy".
There are lots of very good reasons to prefer building something. First, the design of the code will fit your use case more or less perfectly. Unless you get it wrong, in which case you can change it relatively easily. If you use an off the shelf library/framework, you may need to jump through hoops to use it. Those hoops may result in other hoops, which result in other hoops, etc, etc. In the end, you may introduce nearly as much code complexity adapting your code to someone else's design than you save by using their library/framework. (NB: You could just use "convention over configuration" ala Rails, which is another way of saying, "Do everything my way and you'll never have any conflicts" ;-) ).
Second you have control over the code. If you use a library/framework, it may change over time in ways that don't fit your project. You are stuck making the choice of maintaining all of that change or eventually suffering from bitrot. A good example is if your library has another dependency. It replaces that dependency with a new one which isn't supported on your system.
Related to that, you may decide that you need new functionality that the library/framework does not give you. If you were working with your own code, it's easy to add that functionality. If you don't, you need to either maintain a series of patches, or try to get your changes merged upstream. Some projects are easier than others to work with.
When I've decided that I'd rather not roll my own, if possible, I look at the alternatives. The first thing I do is read the code. Can I understand it? Will I be able to fix bugs if I need to? How much of the code is related to what I'm doing and how much is unrelated? How will the structure of the code affect what I'm doing? Are there any controversial dependencies?
Then I look at the community. I look at open issues. I look at how discussions progress. I look at closed issues. Do people get yelled at for asking questions? Are suggestions for improvement valued? I look at open and closed PRs. How easy is it for an outsider to make contributions?
After that, I make an estimate in my head of the maintenance cost of code I wrote myself vs the maintenance code of using one of the alternatives that I've researched. Usually if we are talking about 1 week of work or less rolling your own wins out (there are exceptions, though). If I only need a handful of lines of code, I'll frequently make my own derived library with those lines of code (For example, I do a fair amount of Ruby code and there is useful code in Rails... But I'm not going to grab ActiveSupport just because I want stringify_keys or something similar).
Finally, if a library is important to my code, I'll usually make an adaptor for it. Instead of calling the functions of the library directly, I'll make another library that wraps it. That way if I run into problems with the dependency I can swap it out without much difficulty. Of course, for frameworks that doesn't make sense because part of what you are buying with the framework is the design.
I think part of the jumping through hoops consideration is whether the project developers as a group prefer jumping through their own hoops or someone else's.
On some teams it is easier to say "the wise developers of leftPad think these are the hoops we need to jump through" than it is to present your own design that has strengths and weaknesses. On others they will walk miles through flaming coals just to avoid the downsides of someone else's designs.
I honestly have no idea which group is right but devs seem to fall into one or the other. I myself started in the first camp but am wandering toward the second. Sounds like you went the other direction.
If you read through the code and determine the quality is beyond repair, or the project model itself does not lend itself to repair, it’s a fine time to chart your own course.
If you decide to open source it then kudos to you. If you don’t, I won’t think less of you.
If X=crypto, then never.
You’re likely being downvoted due to the pithiness of your comment, but you make an excellent point. I’ve always promoted the three Iron Laws of never implementing
1) Encryption 2) Date/time math 3) Distributed locking/concurrency primitives
All of these are of sufficient complexity to virtually guarantee that you’re going to miss a score of subtle edge cases, and the consequences thereof are likely simultaneously pervasive, silent and catastrophic.
Some of the downvotes may be from crypto-programmers' frustrations, too.
I mean, obviously, someone has to code crypto, or else it wouldn't exist. But, whenever you try talking about the topic online -- whether you're doing it professionally or for self-education -- a probably well-meaning chorus of spam comments warn against doing anything.
I suspect that some crypto-folks have given up rebutting the don't-roll-your-own stuff and just down-vote reflexively.
My frustration with don't write you own is you get strongly urged to use either libraries that are nothing but a tinker toy set of primitives. Or sudo apt-get install openssl.
Which open source distributed locking libraries would you recommend?
Whoever downvoted this deserves to have their license to code revoked. It's malpractice to roll your own crypto.
He got down-voted for posting a meme in a mostly unrelated topic, not for the validity of the advice.
... and how do you know the person rolling their own crypto doesn't have the skills? Isn't the crypto libraries out there written by humans? If you're rolling your own crypto because you believe in security by obscurity then you're probably not smart enough to roll your own. But if you're willing to open it up to the world because you have a new trick up your sleeves then sure if you have the background.
I still think that ties into the original question, when does it make sense to roll your own? If it's not part of your core business and doesn't give you an edge, then it never makes sense, unless you're doing it for the fun of it.
If you have the skill, we already know who you are.
You have millions in the project and a dozen PhDs working with you, and you've already authored public papers demonstrating the weaknesses of things previously believed to be strong.
If you are not at least one of those things, it's malpractice for you to even consider letting your employer pay for your crypto "invention"s. That's like a nurse building her own OR and insisting the hospital let her cut people open in it. No. Don't. Stop.
Amen. This is equivalent to an attorney strongly encouraging a client to represent themselves in court.
When you feel you could better implement X for your niche needs.
depending the time that needs to be spend of the fix ... max 4h
Always roll your own.
Almost never. People should even consider using SWIG or similar to use libraries for other languages instead of wasting time on duplicating efforts and neglecting their core product/project.
The X window system is being dropped from dev by Red Hat. So yeah, you'll probably need to roll your own. As for the other X, non-institutional chemistry is generally frowned upon by governments.
Generally X existing things are going to be way too complex and heavy. Implementing your own allows you to avoid all that. But of course this only applies for individuals. Anyone stuck as part of a company or institution won't have the choice.