The growing importance that Internet users attach to the security of the information they exchange online has led to a massive increase in the use of HTTPS (HyperText Transfer Protocol Secure). HTTPS, which provides encrypted communication between an Internet user and the Internet server being accessed, thereby improving data security, has rendered obsolete the usual methods of managing network cybersecurity that are based on filtering by monitoring ports or inspecting the data packets exchanged.
To analyse traffic on a company’s or institution’s network, its IT managers generally use a decryption proxy to access and analyse its content, with encrypted data transmission upstream (between the Internet user and the proxy) and downstream (between the proxy and the Internet service being accessed). While this monitoring using a decryption proxy is feasible in a professional context, the process does breach the confidentiality of these exchanges.
Analysing without decrypting
To ensure that exchanges remain encrypted end-to-end without this encryption being an obstacle to the need to identify, and possibly intercept, illegal online activities, Inria can now offer its H2Classifier technology. This technology is based on an artificial intelligence technique and is suitable for traffic under the HTTPS protocol used since 2015 (HTTP2 + TLS security).
The H2Classifier analysis process was described in the IEEE Transactions on Network and Service Management journal of September 2019 by Pierre-Olivier Brissaud, who is currently completing his CIFRE thesis (a specific PhD, jointly supervised by someone from the academia and someone from the industry) on this subject in Nancy, under the joint supervision of Jérôme François, Inria Nancy-Grand Est, and Olivier Bettan from the Thales Group.
“This innovative technology could be a very good replacement for a decryption proxy ”, says Pierre-Olivier Brissaud.“This is because the H2Classifier algorithm does not monitor each request from each network user, but instead sends an alert – or even blocks communications depending on the settings configured by the network manager – when a request violates certain pre-established rules. ” It then becomes possible to block any attempt to perform unauthorised use to an online service while protecting the confidentiality of exchanges and leaving this service accessible for use that is considered “normal” by the network manager. In fact, the algorithm can identify a suspicious search, previously defined as such using keywords, while respecting the purpose of the HTTPS protocol in a way that does not “break” the encryption.
An analysis of the content of the encrypted response
To do this, the H2Classifier algorithm is based solely on the variability of the sizes of the network messages that comprise the responses given by a service to the keyword used, whether a product name, a word or a proper name. Although encryption varies from one request to another for the same keyword, researchers have managed to infer certain common characteristics from the response received.
In practical terms, for each keyword, researchers made several dozen identical requests to the same online service, then recorded and analysed the content of the encrypted traffic received in response. In so doing, they are able to identify a response signature to a particular keyword, not by comparing the data exchanged but by analysing, for example, the size of the data blocks, which is itself linked to the information being transmitted.
For instance, if an Internet user searches for “Nancy" several times on an online mapping service using HTTPS, the website’s response will be slightly different with each new request due to encryption and variable metadata. However, in the content of the response, the information transmitted remains the same. This “reproducibility” of the response to a specific request – always typically with the same type of exchange and the same amount of data – means that it can be detected when it happens.
“We have tested our algorithm on Google, Google Images, Google Maps, Amazon and Instagram traffic flows, with a few thousand keywords, and in 94 to 99% of cases, we actually get a valid alert ”, explains Jérôme François. Although only tested on these commonly used services, the algorithm can be used with any other service simply by working through the data collection and response analysis phase to “recognise” when the keyword appears in the data exchanged.
Work that reveals a weakness in HTTPS security
This first tool, which maintains data security while identifying potentially illegal activities, also reveals an interesting fact: even encrypted Internet traffic carries potentially exploitable information. And H2Classifier is a tool that is a step in the right direction: it is less intrusive than a decryption proxy since it only alerts for specific requests without decrypting the data exchanged between an Internet user and the service being accessed.
As for using it “excessively” or for monitoring on a massive scale, the risk is very low:“Even with only ten times as many keywords as we used, processing by the algorithm would inevitably take more time and be less accurate. The data collection and learning phase would also be much longer ”, explains Pierre-Olivier Brissaud.
The article describing these results was published in the IEEE Transactions on Network and Service Management journal in September 2019. It is available via open access in the HAL-Inria open archive: Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic
The authors: Pierre-Olivier Brissaud; Jérôme François; Isabelle Chrisment; Thibault Cholez; Olivier Bettan