This approach shows how and where API secret keys and tokens are exposed in real-world settings, not only in code repositories.
Our data collection process was a significant undertaking, both in terms of scale and technical complexity. To manage this, we deployed our containerized web spider on a Kubernetes cluster. The cluster was capable of scaling up to 150 concurrent worker instances.
The collection spanned over 69 hours, with our system analyzing an average of 4 domains per second.
In total, our process led us to visit 189,466,870 URLs. This extensive coverage was key to ensuring that our analysis was as thorough and inclusive as possible.
By examining such a large number of URLs, we were able to gain deep insights into the current state of secret sprawl across a wide spectrum of the internet.