Hacker News

bquinlan said 7 months ago:

I created word cloud as a Valentine's Day gift this year:

- https://raw.githubusercontent.com/brianquinlan/word-cloud-va...

- https://github.com/brianquinlan/word-cloud-valentine/blob/ma...

My implementation (https://github.com/brianquinlan/word-cloud-valentine) is a lot less sophisticated than stylecloud but I think that I had a few interesting ideas about text extraction.

I used nltk to extract only nouns and to do word stemming (e.g. so that "time", "times" and "timing" are only counted as one word).

I also experimented a lot with various method of determining word size i.e. size proportional to frequency, size proportional to log(frequency), size proportional to sqrt(frequency).

cjauvin said 7 months ago:

It's funny, I did exactly the same, with my Hangouts Takeout extract, a couple of weeks ago, but didn't go as far, because I kept struggling with stopwords and some ways to filter out uninteresting stuff (my implementation was much more naive than yours). I'm still thinking about what other types of analysis I could perform on that interesting dataset though (because it's so personal after all).

amrrs said 7 months ago:

Super awesome. I just tried Obama's inaugural speech as Linkedin Icon - https://github.com/amrrs/stylecloud-demo/blob/master/obama_s... It came out very well!

danso said 7 months ago:

This is a great tool! Ironically, I might use it someday to illustrate why word clouds are an absurd format. But it'll be great for joking around too :)

meej said 7 months ago:

I wrote a paper about word clouds for my Information Retrieval class when I was in Library School, when word clouds were still a trendy and popular content browsing UI.

I concluded that they do have some utility as "semantic cartograms" in certain contexts, but beyond that they're mostly just decoration. Especially the ones created by this tool. =)

paultopia said 7 months ago:

That looks really cool!

A while ago I tried to write some wordcloud code, but couldn't quite figure out how to do the layout. Does anyone know where one might find a good writeup of the algorithm that tends to be used for this?

dwyerm said 7 months ago:

You might look through the source code for Wordcram[1]. It is built under Processing, and is relatively easy to understand, I think.

[1] http://wordcram.org/

y42 said 7 months ago:

I once wrote a pretty simply and straight forward algorithm to create a WordCloud in VBA for PowerPoint:


It does not cover white-space with words, as Wordle et al will do. Also it's in German. But I guess the code itself is quite clear.

lgas said 7 months ago:

This stackoverflow answer from the creator of Wordle links to a free chapter from a book that describes the way Wordle does it: https://stackoverflow.com/questions/342687/algorithm-to-impl...

danso said 7 months ago:

The library used by the submitted package seems to be well-documented: https://github.com/amueller/word_cloud

chrisweekly said 7 months ago:

Beautiful! Thank you for creating and sharing this! :)

foobarbecue said 7 months ago:

Presumably this person means "stylish," not "stylistic"?

jedberg said 7 months ago:

Stylistic is probably the right word here: of or pertaining to style, especially to linguistic or literary style

foobarbecue said 7 months ago:

I know what the word means, and I don't see how it could possibly make sense here. If he doesn't mean stylish, he might mean stylized.

What could "word clouds pertaining to style" possibly mean? Maybe if you gave it a text and it spit out a cloud with things like: "using passive voice," "flowery," "strident," "long-winded," "plain."

klysm said 7 months ago:

I believe word clouds are one of the most useless presentations of data that are widely used

calmworm said 7 months ago:

Very impressive! Well done!