top of page

Managing 900+ Million Data Points, $5 Budget


Okay, the title is a bit much but true! I love data analytics but I’m also passionate about gaming. With that in mind I decided to mine lots of game play data from one of my favorite games, Brawl Stars. My initial goal was to analyze the data to give myself an advantage. I planned on making data driven decisions! Once I started, I had to keep going….



Now I had a few goals in mind and a modest budget. They were;

  1. Obtain API access to the gaming data

  2. Mine and normalize as much data as possible

  3. Store the data on a budget

  4. Create Analytics products based off of the data

  5. Publicly publish the data in an easy to use format on a budget

I’ll detail how I achieved all of this below, but the TLDR version is –

  1. I did it. I also developed my own custom distribution file system in the process

  2. Over the course of 12 months I will have mined approximately 900 million data points

  3. The data has been published in an easy to use format for everyone to use at www.BrawlSmarts.com

  4. At the end of 12 months, this will have cost me $4.71


Obtaining API Access

This was actually quite easy. One must only apply online for a fan content based developers license and you get access to the documentation and API.


Mine and normalize as much data as possible

Once I had API access I investigated and found the right end-points for the data I was looking for. There are limits to the number of API calls any given developer can make so I would need to configure my code to pull data with latency between each request to maximize the amount of data I could pull.


Store the data on a budget

Now, where can I store the data? A local database server was more maintenance than I wanted to handle. Online cloud data storage would eventually be cost prohibitive for me. So I did what any reasonable person would do. I developed my own custom distributed file system and used a spare hard drive to store the data. I used parquet files and indexed them across my system for high performance. I also created compressed backups for redundant storage.


Create Analytics Products

To make the publication of aggregate data simpler, I added a process to pre aggregate data on a schedule cadence in a parallel file system. This made the distributed file system even easier to access and more performant.


Publicly publish the data in an easy to use format on a budget

A traditional web application wouldn’t work for me here because I’m storing the data locally. I also didn’t want to use a website builder because those services are quite expensive! I would go over budget in a month. I read and that if you build a static website and store the objects on S3, you can publish that content as a website. You can also then route that through the AWS cloud front service for a SSL certificate and you can choose your DNS provider. The best part is S3 is free for smaller data sets.


I went the S3 route. This meant that I had to code static HTML, CSS, and JavaScript files to support my website. This added development hours but it was free. Once I had the HTML, CSS, and JavaScript I needed, I configure my Python scripts to automatically refresh the html files with the most recent content on a regular basis. It's all published at www.BrawlSmarts.com


In the end it cost me:

  • $3.99 for the Domain

  • 61 cents for the first month of S3 because of a service I accidentally enabled

  • 1 penny for each of the following month.

After 12 months, I expect to have paid $4.71 to publish my 900+ million data points.


In terms of end-to-end development, this cost me a few thousand lines of codes and roughly 60-100 hours (I lost count).

 

Thanks for checking out this blog post. If you found this post helpful please consider donating via PayPal. Any contribution is appreciated!



56 views0 comments

Recent Posts

See All
bottom of page