The task was to write a driver to look through the network traffic in search of HTTP connections (HTTP is detected on the content of the TCP-stream, not by port); each outgoing HTTP request is compared with the set of the configured rules and if a rule forbids a request to the resource, then, instead of server answer, the driver inserts a false answer with the page corresponding to the forbidding rule, into the TCP stream.
HTTP1.1 allows so called keep-alive connections. It means that within one TCP connection several HTTP requests can be processed, some of which can be allowed (and must be processed transparently for user), and others can be forbidden and the request must be cut from the stream, and certain html page must be inserted as an answer. The problem is that HTTP-standard does not provide any terminal characters, so the end of one HTTP message and the beginning of the next one can be detected only after full processing of all HTTP headers and message body.
Actually, an HTTP message can be described by context-sensitive grammar which in turn is a finite state machine. Programming a finite state machine by hand is a bulky routine task (for example, even such relatively small machine as machine for HTTP requests processing can have about a hundred of states, related to each other by transitions at certain events). There are a great number of utilities for user-mode, which allow automating this process, but naturally there are no such utilities for kernel-mode.
Variants of solution
The most obvious way of solution is to write a finite state machine by hand. The main advantage of this variant is that no additional development tools are required. But such solution has several important disadvantages:
- large code bulking that causes more bugs;
- the meaning is buried behind the implementation. There is a great deal of the same type code, behind which the essence of performed actions is not visible, and thus it’s difficult to support the code (to add new features and find bugs)
- impossibility of code reuse. Such code is useless for anything except for the task it was written for.
The second variant is to port one of the utilities intended for the automation of finite state machine programming. The main disadvantage is that considerable work on porting is required. The advantages of this variant are the following:
- the machine is described in separate language, all associated code is generated automatically. It leads both to decreasing of bugs number and greater transparency in machine semantics.
- produced result can be reused for other machines.
The third variant was the following: when getting TCP data, pass them to user-mode service which can be written by using user-mode development tools. The advantages here are basically the same as for porting of development tools to kernel, but we don’t have to port anything here. The disadvantage is that very big overhead takes place during data processing and this leads to significant decline of network performance.
At first we tried to use State Machine Compiler (smc), but it didn’t suit much to the considered task. That’s why we chose GNU Bison and wrote a skeleton file, producing a code suitable for using in kernel-mode. Main advantages of this variant are described higher, but positive results became apparent very quickly: within this same project the client decided to keep configuration files in XML. The simplified grammar of XML was written and kernel-mode xml-parser was developed without any additional efforts.
GNU Bison, M4 Macroprocessor (Bison skeleton-files are written in it), text editor.