How Stream Started Building News Feeds

cover
20 Mar 2018

Founder Interview

Disclosure: Stream, the API for scalable feeds, has previously sponsored Hacker Noon.

Today, we’re going to catch up with their CEO, Thierry Schellenbach about the origin, scale, and direction of his company.

David: Let’s start with some numbers. What’s the scale and traction of your company?

Thierry: Tommaso and I started Stream about 3 years ago. At the start, it was just the two of us doing all engineering work. Soon after, we were accepted into Techstars NYC — which is when things really took off for us. As of today, we have 600 companies relying on Stream. We also power the feeds of over 300 million end users and handle approximately 34 billion feed updates a month. It’s hard to believe that we’ve had this type of growth in such a small amount of time!

300 million end users. That’s a lot. In terms of distribution, are your larger customers at the top taking up most of that?

Yes, several of our largest customers have user counts in the hundreds of millions. Of course, not all of those are active, but our large customers drive a significant portion of that 300 million number.

Cool. So you really are an expert in news feeds. They’ve really blown up in the last 10–15 years — going from something that didn’t really exist to something that supplies much of our news. If you could, please give me a walkthrough of the technological milestones that made the news feeds we expect possible, and what upcoming breakthroughs for the news feed that you are excited about.

As you know, companies like Yahoo, Facebook, Twitter and Linkedin have invested heavily on their feed technology. Back in early 2010, in order to keep up with their massive growth they had to push the limits of what was available at the time. Many of us still remember Facebook’s blackouts in the early days or Twitter’s fail whale.

One huge breakthrough has been the creation of NoSQL databases. For scaling news feeds it’s necessary to have a very capable and high throughput database.

Operating news feed at scale requires a very capable and high throughput database. Projects like Cassandra, Kafka, RocksDB, Twemproxy, FlockDB, Voldemort and Tornado were created by companies working on some of the largest news feeds and are the building blocks of many of the major news feeds we see today.

It’s also interesting to note how only 15% of our customers build social apps — which is what people initially think of when they think about feeds. Feeds these days power content discovery in a wide range of applications. Developers are leveraging Stream for B2B applications, transport systems, educational platforms, music, sports, e-commerce, etc.

And so do you think that NoSQL has reduced the amount of time to load stories and find the right stories to load?

Definitely, Stream for instance uses a combination of Go, RocksDB, and Raft.

(Read this StackShare post to learn about the technology that enables Stream to power the feed for over 300 million users.)

What are the common reasons companies plug into your API instead of building a news feed on their own?

The biggest reason we see companies integrating with Stream is time to market. Another big reason is cost. Try to think about building your own in-house feed solution and all the moving parts and components associated with that — it’s a daunting thought! Many feeds, for example, run around 9 different clusters of servers that consist of the following:

  • API servers
  • Message brokers
  • Autoscaling worker clusters
  • Database clusters (Cassandra is a common option)
  • Real-time infrastructure
  • Redis as a backend for distributing real-time work & handling locks
  • Analytics infrastructure
  • Machine learning APIs
  • Machine learning worker infrastructure

It’s a headache to build, document, monitor and maintain.

Even the most simple feeds can become expensive as your user base grows. Typically, Stream is substantially cheaper than hosting an in-house solution.

We’ve even seen companies that had already created a feed in-house, switch to Stream simply because it’s more affordable.

As it so happens, this was a big reason why we created Stream. I had this exact experience at my previous company. We needed to build feed technology in-house and it quickly became a very expensive piece of our technology stack. This is a common theme we see.

Take companies like Instagram, even when they were small, they were spending huge amounts of money on just hosting and maintaining their feeds.

Dubsmash also talks about their in-house vs Stream experience in a case study of ours.

I think you talked a little bit about this earlier, but what aspect of Stream is driving down the cost of hosting as a competitive advantage for you?

Our entire team is focused on building feed technology for over 600 companies. Because of this, we have to spend a lot of time and effort on high availability, monitoring, and optimizations. Part of the cost advantage comes from this focus and our ability to spread the work over 600 apps.

On the tech side, the cost advantages comes from 3 capabilities of Stream:

  • We use an in-house database called Keevo, built on top of RocksDB & Raft, which is optimized specifically for feeds (Linkedin and Instagram do something similar).
  • Stream uses a combination of fanout on-read and fanout on-write.
  • Our various services are powered by Go (except www.getstream.io and machine learning, which use Python).

Those are the technical aspects that stand out most, but there’s also hundreds of smaller optimizations that have compounded over the last few years.

And what’s the simplest way to see the power of your API?

As with many things, I believe that when you’re a developer and you’re trying out a new solution, it really helps if it’s easy to use. With that in mind, we created a 5 minute interactive tutorial: getstream.io/try-the-api/. It allows you to try out Stream’s API right there in your browser.

Could you tell us about your previous company, Fashiolista?

We started that company around nine years ago now. It was a bit like Pinterest before there was Pinterest. We went through a period of crazy growth.

Where were your users coming from?

Our users mostly came from magazines. We had a few large magazines and bloggers. Influencers started pushing the app and sharing their profiles, which drove a ton of traffic. If you take a look at popular fashion bloggers, you’ll see they have a huge audience. That space is pretty big. We grew to a few million members in just a few months, which was an exciting experience.

However, once we had millions of members, everything seemed to fall apart. While we were able to easily fix most of the things that broke, the one thing we kept having difficulties with was the feed.

Where would it break and why would it break?

Feeds are particularly tricky to scale because the data is highly connected. It’s not a linear problem, it becomes substantially harder as the size of your network grows.

The first thing you’ll notice is that the feed will start to load slowly for some users, which is a very common problem. Twitter used to have their Fail Whale. In the beginning, Facebook had five second loading time. The issue stems from all data being connected to each other.

So backing up, I read that you also created an open source solution for building feeds. How was that received by the community?

I started Stream-Framework about five years ago. At the time there were some good papers from Yahoo & Princeton about feed technology but no open source solutions. Stream-Framework quickly became the most widely used open source solution for building feeds.

Because of this I ended up talking to many developers that were developing feeds. While Stream-Framework was substantially easier than building it from scratch, it was still a huge hassle to set up, maintain, and build an API around. The minimum cost of deploying a scalable and highly available solution was also very high.

That was one of the reasons for building Stream. Stream’s tech stack is highly optimized and easy to access via an API.

It can be very hard to go from an open-source project to a company. So what advice do you have for others that are trying to do the same? And when I say hard, I mean from the idea of logistically doing it.

My advice would be to take note of industry trends and give the people what they want. The most relevant and interesting trend we noticed was that developers were starting to leverage APIs for everything that’s not unique to their app. Companies started providing specific components of an app as a service. Think Sendgrid for email, Algolia for search, Twilio for SMS, Stripe for payments, Mapbox for maps. The list goes on and on. Stream is that lego piece for adding feeds to your application.

I personally think this is a great time for developers to start a company. If you have deep domain expertise around a certain topic you can leverage that to build a company. It’s of course hard, but it’s becoming more and more of a possibility.

So backing up to your customers. You’re in an interesting spot because you’re seeing news feed activity across different industries and across different use cases. I was curious about the iterations on the news feed for your customers. So how are you seeing them learn and evolve from what Stream did to how they want Stream to give a better experience for their customers, and how that kind of feedback loop has been working? How do you see that working?

One feature I’m seeing quite often lately is aggregation. For instance, say a friend of yours opens up an app and adds a hundred items, the app developers/designers will have those updates aggregate depending on whatever makes sense for their platform. Aggregation is very helpful for reducing noise in the feed.

Another thing I see more and more apps doing is a “while you were away” type of experience. Take Facebook and Twitter for example. While there are a few of these platforms where users come back to daily, the users of majority of apps using feeds come back once a week, or even once a month.

You better win that moment, when they come back, with good content.

Exactly. We see not only see apps creating aggregated feeds, but also using personalization in addition to the basic chronological version of their feed. This way users get tailored, relevant content when they return to the app. While those are the main things we see, of course real-time is still a big part of it as well. Many developers will make sure their feed updates in real-time. A few years ago, this used to be a very trendy topic. Nowadays it’s taken for granted — users expect things to update while you’re looking at them.

So what does Stream look like down the road, in say five or 10 years? I mean, what is the long-term vision?

I think the most interesting part about feeds is that they are the leading way through which we discover content online. Whether your following people on Instagram, RSS on Feedly or Winds, topics on Quora, music on Spotify, creators on Youtube or stocks in financial apps, the underlying tech is based on feeds. Apps nowadays still struggle with feeds, since the technology is relatively young. It’s exciting to see how Stream can help these apps improve content discovery.

Within the onboarding experience, how do you make sure you show the power of Stream and move people through quickly? I guess that’s an interesting project in and of itself.

As developers we’ve researched and tested MANY products, so we know what really works and what doesn’t from experience. Once we created a version of ‘Try The API’ that we thought was powerful, we then asked our developer friends to go through the tutorial while we sat next to them. That qualitative feedback was super helpful to make Stream easy to use.