A better ranking algorithm
Whether we like it or not, ranking algorithms affect how we see the world. They are one of the most important parts of informational websites (whether that be search engines, news aggregators, or social media) as they literally determine what people see, and therefore, experience and think.
I am the creator of Bear, a minimal blogging platform (mostly for tech people who are disillusioned with current state of the web). I’ve been thinking about (and experimenting with) ranking algorithms for the discover feed quite extensively over the past few months, and have some thoughts to share.
The anatomy of a good ranking algorithm
It prioritises good content
Quality content is the most important attribute of a ranking algorithm (obviously). We need to ensure the content which boils to the top is of a certain standard. Despite the simplicity of the statement, what (and who) defines “good”?
One way to address this is by having content curators who have authority (and hopefully good taste) to sort through the content. This is, however, a resource intensive process that few aggregators have the privilege of doing.
The most common way is to log the number of upvotes (or likes/downvotes/angry-faces/retweets/poop-emojis/etc) and algorithmically determine the quality of a post by consensus. This democracy obviously has a bias to the type of users on the platform, but this is mostly unavoidable without curation (and with curation, if you want to be pedantic).
It keeps content fresh
The second attribute of a good ranking algorithm is that it prioritises new content. If done well, this means each time a user goes to the platform they will have something fresh (and hopefully good) to read or interact with. The balance (as we’ll explore later) is the trade-off between “newness” and “quality”.
It adjusts for the “Katamari ball” effect
Once a post goes viral on Twitter, Hacker News, Reddit, or anywhere else off-platform, it has the potential to form a “Katamari ball” where it gets upvotes because it has upvotes (which means it gets more upvotes, because it has more upvotes, which means…well…you get it). This is also known as "the network effect", but I feel a Katamari ball better illustrates it. These posts generally have several orders of magnitude more upvotes than the rest of the submissions. Does this mean that is has several orders of magnitude more value? Perhaps, but for newness' sake we’ll need to degrade it disproportionally instead of sticking it to the top of the aggregator for 100 to 1000 times as long as other good posts.
It avoids false negatives
If a submission of a decent quality receives no attention and decays without anyone reading it, it is considered a "false negative". Hacker News is particularly bad with this since Newest doesn’t get much attention. Personally, whenever I complete an essay I post it to Hacker News to crickets, but then see my post trending on the front page a few days later posted by someone else. What do they know that I don’t? Is it a timing thing?
A good algorithm will try and reduce the number of false negatives while retaining quality. This is a difficult trade-off to make.
It is computationally friendly
Computationally friendly, in this context, means: it doesn’t crash your server. Essentially, the algorithm is constrained by the computing resources available. Facebook has a massive amount of expensive servers and can, therefore, run machine learning algorithms that optimise for your outrage and sell you shoes at the same time. In smaller cases (like Bear) it means that I’d preferably like to run this as a SQL query.
How’s it done elsewhere?
With those constraints in mind, let’s take a look at some popular algorithms (which I have, as you will see later, gerry-rigged into my own flavour of ranking algorithm). There may be some small errors in the explanation of these algorithms, but the overall context of the design remains. Don’t eat me.
The Hacker News algorithm
This algorithm is fairly straight forward (although there’s some magic going on under the surface when it comes to moderation, shadow banning, and post pinning).
This is slightly simplified, but in a nutshell:
Score = Upvotes / Time since submission ^ Gravity Where Gravity = 1.7
As upvotes accumulate the score rises but is counterbalanced by the time since submission. Gravity makes it exponentially more difficult to rank as time goes by.
This can be run as a simple SQL statement:
SELECT post.id, post.upvote_count, post.published_date ( ( post.upvote_count / ( EXTRACT( EPOCH FROM ( STATEMENT_TIMESTAMP() - post.published_date ) ) ^ 1.8 ) ) ) AS "score" FROM posts ORDER BY "score" DESC
This actually checks most of the boxes of a decent ranking algorithm in the context of Hacker News (bar the false negatives). It, however, doesn’t scale downwards, as we’ll see later with Bear.
The Reddit “hot” algorithm
Similarly, Reddit ranks posts with a democratic system (using both up and downvotes), however it is turned on its head when calculating time in the score. Instead of using Time since published, it uses the number of intervals of 12.5 hours since the oldest post datetime (we’ll dig into this). The algorithmic decay of posts over time is then handled by using a logarithm of the sum of upvotes and downvotes, which also corrects for the magnitudinal nature of upvotes due to the Katamari ball effect.
Again, I’ve simplified this for brevity:
Score = log(Upvotes - Downvotes) / (EpochSeconds - 1134028003) / 45000 Where: EpochSeconds = number of seconds since Jan 1, 1970 to the published date/time 1134028003 = Unix timestamp for the oldest post (essentially the birth of the platform) 45000 = 12.5 hour interval (originally this was 25 hours)
In a nutshell, a post has to have 10 times as many points as something 12.5 hours younger in order to outrank it.
The Reddit comment algorithm
Reddit ranks its comments using a different system than its posts. This makes sense considering comments shouldn’t be affected by time, but instead should be ranked by usefulness. I’m not going into detail here as it doesn’t apply to this essay, but I thought it was quite interesting. You can read about it here.
So, what would be an ideal algorithm for Bear, which has much less traffic than either Reddit or Hacker News, yet needs to account for huge, infrequent influxes of upvotes coming from those platforms?
The first attempt was a pure SQL implementation of the Hacker News algorithm with some slight modifications. It ran quite well for over a year. You can see it running here with the following algorithm:
Score = (Upvotes - 1) / (Time in hours + 4) ^ Gravity Where Gravity = 1.2
You’ll notice the problem with it right away. There are a few posts on the first 3 pages which have several orders of magnitude more upvotes than the other posts and have stuck there for up to a year. Increasing Gravity solves this at the expense of reducing overall quality of the posts as decent posts (which fall into the 10 - 20 upvote range) are also deprecated faster. This makes the tradeoff a bit of a pickle since we want those older posts off, while keeping lower-upvoted (yet above average) fresher posts on the page.
The second attempt was pretty straight forward, using the Reddit “hot” algorithm for inspiration I added a logarithm to the number of upvotes, getting the following:
Score = log(Upvotes) / (Time in hours + 4) ^ Gravity Where Gravity = 1.1
The issue here is that there are now 2 forms of post score decay, the gravity and the logarithm on the upvotes. This caused all posts to shoot off of the home page pretty fast (hence the reduced gravity). These values could have probably been tweaked to work well, but I ran into an issue while playing around with this: the queries were taking some time to complete, sometimes up to a few seconds.
This is more of a database optimisation issue but falls well within the “computational friendliness” constraint for this project as I have no interest in doing fancy database stuff (I’m an ORM bitch). Every time someone viewed the discover page the DB query had to collate all the upvotes from a separate table for every post, determine a score, and return the queryset. This is computationally expensive and I’m running on a cheap-ish server.
Third attempt (which I’m quite happy with)
The third attempt takes heavy inspiration from the Reddit algorithm. Each time a post is upvoted it has its score recalculated (instead of having all of them recalculate on viewing the discover feed). This means I don’t have access to the time since submission for the discovery feed and instead use the delta of the published time and some arbitrary date in the past. The algorithm is as follows:
Score = log(Upvotes) + (Seconds / Days modifier * 8600) Where, Seconds = Seconds since Jan 1st, 2020 Days modifier = 14
Here we can see that the logarithm mostly prevents the Katamari ball effect. I also opted for adding the number of fortnights since Jan 1st, 2020 (the year in which I started Bear) as the method of determining freshness, since newer posts will have a higher number of fortnights. Since this score is calculated and stored every time an upvote occurs the computational overhead is negligible and the algorithm works quite well. Essentially, every post that is 2 weeks old needs to have 10 times as many upvotes as something posted now to outrank it. As the platform becomes more popular I can reduce the Days modifier to shorter and shorter intervals to accommodate for the increased number of “good” posts.
Some future additions
Random new posts in Trending
I dislike the separation of Trending and Newest. This is one of the main reasons for false negatives as new articles don’t receive many (if any) views. I’m thinking about randomly interspersing new articles in the trending feed to give them the potential of getting their first few votes. This (as ever) has an effect on quality, so has to be done with care.
Potentially using views as downvotes
An interesting idea which was brought to my attention by a user (thanks Felix) was to use the number of views as a factor in determining the score of a post. Essentially using the “upvote rate” instead of pure upvote number.
Score = (Upvotes - Views) / (Time in hours + 4) ^ Gravity
Or removing time from consideration entirely, decaying posts the more they are read.
Score = Upvotes / Views ^ Gravity
These, however, have a bias against longer posts (although this is the case with the all these algorithms in general, but exacerbated here). Since longer posts (by virtue of them being long) may take time to read or are saved to read later, and may get a lot of views initially which actually degrade the score despite people finding the content valuable. It’s an interesting idea though and could be used for platforms where all posts are of a similar digestibility.
Another way of reducing false negatives for good content is to assign certain privileges to people who are consider as having “good taste”, whatever the devil that means. Essentially giving them 10x the number of upvotes (and potentially downvotes) to dramatically influence ranking. It would be important for these pseudo-moderators to spend most of their time in Newest instead of Trending since this is where the problem lies. If you’re keen to be a pseudo-mod for Bear, send me a mail.
Time agnostic scoring
Social media marketers (and some Hacker News dowsers) reckon there are the “best” and “worst” times to post in order to get the highest number of upvotes. One way to counter this is to round down the time element of each of these algorithms to the day. This won’t work for sites like Hacker News and certain subreddits since the frequency of posts is too high, but may be a viable option for Bear (at least until we start hitting over 100 posts a day).
During my experimentation I built out a few tools to test algorithms on the fly. They are not available to the general public (and can crash the server if used recklessly) but could be fun to publish in the future after some child-proofing, essentially allowing people to experiment with their own ranking algorithms.
Originally the feed, as well as the posts themselves had an upvote button. This was a classic case of “this is how everyone else is doing it, therefore it’s correct”. But, on closer inspection I realised a distinct difference. Twitter has all the content (at least for text tweets) available next to the upvote button. Same with other social media platforms. Reddit and Hacker News, however, need in-feed upvote (and downvote) buttons since they do not control the linked content pages. With this in mind, in order to prevent people from upvoting posts based purely on title (without reading the content) I removed the in-feed upvote button, making posts only up-votable at the bottom of the post itself. This increases the vote quality (if not the quantity).
This has been an incredibly fun project (even though it has a marginal effect on the platform as a whole). I think what makes it the most interesting is how each small factor can have a large effect, both positive and negative. It’s like algorithmic Factorio. I have still to figure out how to not have a short-form content bias. If you have any thoughts or suggestions, pop me an email or leave them in the inevitable Hacker News thread.