
Facebook uses Python frameworks Django and Tornado, but Pysa can support other frameworks with “a few lines of configuration” to tell the tool where data enters the server. Pysa is extendable, as it can be used with different Python frameworks and libraries. “Even with Pysa’s bias to avoid false negatives and our willingness to accept a good number of false positives, we still managed to cap false positives at 150 (45 percent) of the reported issues,” Facebook said.

To reduce the number of false positives to verify, Facebook introduced sanitizers and other features in Pysa to filter the results after analysis is done.
#Code decipher tool code#
Pysa initially returned in scanning Instagram code were were false positives. That meant accepting there would be a high rate of false positive, when the tool said there was a security issue when there really wasn’t. On top of being fast, Pysa has to find security vulnerabilities, so it is designed to “avoid false positives and catch as many issues as possible.” False negatives would be cases where a tool doesn’t detect a real security issue. That design decision meant a “trade off performance for precision and accuracy,” Facebook said.
#Code decipher tool manual#
A manual code review could take weeks or months. Pysa is capable of going over millions of lines of code from anywhere between 30 minutes and hours, Facebook said.
#Code decipher tool windows#
If it takes too long to scan, the tool with be less likely to used because waiting for the analysis may result in missing code release windows or delay when code is shipped. Prioritizing SpeedĪ static code analyzer capable of scanning something as large as Instagram’s codebase has to be fast. “The dynamism of Python means there are endless pathological examples of flows of data that Pysa cannot detect,” Facebook said. The fact that Python is a dynamic language also makes it difficult for a static code analyzer-code can be dynamically imported virtually at any point, but the analyzer doesn’t know what that code is doing yet. Pysa won’t be able to ensure that an authorization check was performed before launching a privileged operation, for example. Not all security or privacy issues are related to data flows. However, Pysa’s focus on data flows means there are limitations on the kinds of security issues it can find. However, if there was a way to go from user-controlled input to a SQL query, then Pysa would flag the issue. For example, a code used to load a user’s profile photo would not be an issue because the data it receives is restricted. Pysa can also detect cross-site-scripting and SQL injection flaws. Pysa can check that internal frameworks designed to prevent access to user data and expose user data are implemented properly. Sinks can include APIs that execute code, or access the file system. Common kinds of sources are places where user-controlled data enters the application. If Pysa uncovers a path where a source eventually connects to a sink, the tool flags it as an issue. Pysa looks for connections between sources, or where important data originates, and sinks, or where data from the source should not be able to end up.

Pysa detected 330 unique issues in proposed code changes, of which 15 percent (49) were categorized as “significant issues,” and 40 percent (131) were “real but had mitigating circumstances that made them less severe,” Facebook’s Graham Bleaney and Sinan Cepel said. The internally-developed tool detected 44 percent of all security bugs in Instagram’s server-side Python code in the first half of 2020, Facebook said. A security and privacy flaw is frequently described as a situation where data flowed into a place it shouldn’t.


Many attacks rely on figuring out a way to get user input to access the codebase in unexpected ways, or to return a result that was not intended. The Python Static Analyzer, or Pysa, scans the Python source code and analyzes how data flows through the application to identify security vulnerabilities, Facebook said. Facebook has open-sourced a static code analyzer for finding and fixing flaws in Python code.
