A new tech recruitment project scraped user data from GitHub and other similar websites and inadvertently leaked it online through a misconfigured MongoDB database.
Australian security expert Troy Hunt, the owner of the Have I Been Pwned service, was recently provided a 600 Mb MongoDB backup file containing data from a tech recruitment website called GeekedIn.
A closer analysis revealed that the file contained information on more than 8 million GitHub profiles, including names, email addresses, locations and other data. However, just over one million of the exposed email addresses are valid, while the rest are represented as “email@example.com” and are associated with GitHub accounts with no public email address. The MongoDB database also included thousands of accounts apparently taken from BitBucket.
GeekedIn, announced by its developer in June, is a service that crawls code hosting websites, such as GitHub and BitBucket, and creates profiles for open-source projects and developers. The goal of the service is to help recruiters find developers who match their needs and help developers “enrich their CV.”
The data harvested by GeekedIn is publicly available on GitHub and it does not include any sensitive data such as passwords. However, while GitHub does allow users to scrape public data from its website, it prohibits the use of scraped information for commercial purposes. GeekedIn was planning to ask recruiters and companies for hundreds of euros per month to use the harvested data.
The second problem is that the data was stored in a MongoDB database that was not protected and could have been accessed by anyone. These types of incidents are increasingly common, with some organizations exposing the details of hundreds of millions of individuals due to misconfigured databases.
“As someone in the data breach myself, I don't want my data being sold this way,” Hunt said. “And again, yes, you can go and pull this data publicly on a per-individual basis but the constant response I got from close confidants I shared this information with is that ‘it just feels wrong’. And it is wrong, not just the scraping of GitHub in the first place in order to commercialise our information, but then subsequently losing it via a MongoDB with no password and now having it float around the web in data breach trading circles.”
After being notified by Hunt, GeekedIn developers promised to take measures to secure the data. They have also taken the website offline. Users affected by this incident can use the Have I Been Pwned service to find out exactly which of their information was leaked.