Ultrawide archaeology on Android native libraries
Luca Di Bartolomeo (cyanpencil), Rokhaya Fall
A bug in a scraper script led to us downloading every single native library in every single Android app ever published in any market (~8 million apps).
Instead of deleting this massive dataset and starting again, we foolishly decided to run some binary similarity algos to check if libraries and outdated and still vulnerable to old CVEs. No one told us we were opening Pandora's box.
A tragic story of scraping, IP-banning circumvention, love/hate relationships with machine learning, binary similarity party tricks, and an infinite sea of vulnerabilities.