Whenever I want to learn something new, it helps to think of a project or use case that I can work on rather than just learning snippets here and there. For a while now I’ve been tinkering with node.js and R. I’ve really begin to adore these two, especially when it comes to data. Sourcing data with node.js is a glorious thing and analyzing that data with R is even better. At Equation, we’re engulfed with data day in and day out so ramping up on these two has been great.
I’ve been looking forward to the upcoming StartFEST, Utah’s very own startup festival. Since I’m focused on data, I wondered if I could get some context on the current startup ecosystem in Utah. I decided to take some time and use this opportunity to use node.js and R to pull together some data. Below are the steps I took to gather and display the data using node.js and R.
Finding a Data Source
AngelList is a great place to pull data about startups, investors, and the ecosystem in general. They provide a great API with many helpful endpoints. You can review the API documentation at https://angel.co/api.
To get started with the AngelList API, you’ll need to create an AngelList account and register a new application. Both processes are easy and take only a few minutes.
Calling AngelList API and Creating CSV in Node.js
Once you have AngelList all setup, the next step is to create a node.js script to access the API and build a simple CSV of the data. Here is the script I put together doing just that:
You’ll notice a few packages included, of which most are pretty standard. There is a nice little AngelList package you can download from https://www.npmjs.com/package/angellist that does most of the work. You will need to modify the URL methods to pass the access_token parameter since AngelList has updated there API since the creation of the package.
What’s nice about this script is that the calls are broken out by page as required by AngelList, but ran in parallel. This allows the CSV to be created very quickly.
Analyzing Startup Data with R
Now that all of the AngelList data for Utah startups is in a CSV we can run some quick analysis on it using R. I’d recommend downloading and using RStudio when you’re getting started with R. It’s a free GUI for R that makes it pretty easy to ramp up.
You can import the CSV file of startups into RStudio using the Import Dataset function.
Now that the startup dataset is in RStudio, we can do some quick analysis. I decided to look at startup distribution throughout the state. For this, I grabbed all of the location data located in the location column for the dataset and loaded it into a new variable called, locations.
locations <- summary(utahStartups$location)
With the location data now mapped to a variable, we can plot the data fairly easily with R using the following barplot() command:
barplot(locations, ylab = "Startups", main = "Utah Startups by Location", las=2, width=.4, xlim=c(0, 18), cex.names=.5)
With the bar plot we can see that the majority of startups on AngelList list a location of Salt Lake City. I think the data is skewed because there is a tipping point when a city has enough visibility/credibility and founders start to claim that city as their base. For example, Equation is based in Layton though whenever we talk with people or list positions online, it’s easier to say Salt Lake City as people aren’t familiar with Layton. Obviously Provo, Orem, and Lehi have reached that tipping point and I think Ogden is coming up as well. It’s interesting to have this conversation relative to startup communities and how cities go about building economic growth.
It’s amazing how much data is available on the internet. You can pretty much find any data you want. The trick is knowing what to do with it. Once you have that, there are so many tools like node.js and R that make it relatively easy to bring insights to light.
As far as the startup ecosystem in Utah, I’m incredibly excited about it. Utah has been an economic powerhouse for some time and it’s growing into a significant place for startup growth.