38C3

Ultrawide archaeology on Android native libraries
2024-12-29 , Saal GLITCH
Language: English

A bug in a scraper script led to us downloading every single native library in every single Android app ever published in any market (~8 million apps).
Instead of deleting this massive dataset and starting again, we foolishly decided to run some binary similarity algos to check if libraries and outdated and still vulnerable to old CVEs. No one told us we were opening Pandora's box.
A tragic story of scraping, IP-banning circumvention, love/hate relationships with machine learning, binary similarity party tricks, and an infinite sea of vulnerabilities.


A rumor has been going around: Android developers are slow to update native dependencies, leaving vulnerabilities unpatched.
In this talk we will show how wrong this rumor is: Android developers are not slow to patch - they never heard of the word patching.
We conduct a massive study over the every single app ever published on Android (more than 8 million!).
We explore trendy topics like Play Store scraping, Androzoo scraping, Maven repository scraping, the state of the Android ecosystem, binary similarity state-of-the-art methods vs binary similarity pre-historic methods, and the consequences of thinking you know how databases work when you actually don't.

PhD @ hexhive/EPFL
Capture the Flag w/ polyg0ts & 0rganizers!