Server - Behaviour Logging Tool

As part of the project we had to implement a centralised storage facility. We implemented this by installing Syslog-NG on a Ubuntu 12.04 server on a virtual machine at GUC(Gjøvik University College). With Syslog-NG, we could use our server as a centralised storage facility four our application that can output our data in the following formats.

• RAW²

• CSV

• XML

• Unindexed database

• Indexed database There are pros and cons to all of these file formats, but some are worse than oth-ers when it comes to scalability, future implementations and performance. During our testing, section 6.3, of the file formats we found out that continuously retrieving and inserting data to an indexed database is by far the worst format to use.

First and foremost we wanted to have a baseline test (RAW format) where we wrote the received data from the users directly to file without parsing or manipulation of the received data. This format is not very usable without a lot of extra work and can’t be used for analysis, or be displayed in a good way. We’ve only used this format for creating a baseline when testing our server.

CSV formatted files, Comma Separated Values, involves separating each value re-ceived from the user is separated by a comma. In our file we escape ”,” with ”%2C” (it’s hexadecimal representation), all other values are written as is. This ensures that there is no confusion of where each field begins and end. The biggest disadvantage with this file format is that, you have to understand what each field means to analyse the data.

The big pro of this file format is that it is very easily read, implemented and generated.

It doesn’t require much resources, storage space or any additional software to read or write it.

2No filtering or manipulation of data, every event is dumped to file.

XML files biggest drawback is that it requires a lot of padding around its values for it to be readable. Each value has to encapsulated by a fixed tag inside ”<>” brackets, i.e for our numeric flag values we have to store them as ”<flag>1</flag”. This generates a need for storage space which is wasted on padding for the actual values. Another problem with XML is that there are symbols that is not allowed to use, adding these symbols results in a corrupted the file format. However, this error is easily fixed by escaping these non-valid symbols. I.e the ”&” symbol is not allowed in XML, but by replacing it with its escaped value ”&”, we are able to avoid corrupting the file format.

Though there are a few serious drawbacks with XML, is it still a very user friendly file format. Because of the padding that contains specific names, we can easily read this file by querying for the value within a specific tag. Which as opposed to CSV does not necessarily have to exist and thereby making it easier to read and use later on. A very powerful and nice feature is that XML files can be processed using XPath expressions and XMLtransformation. Where XPath expressions finds and selects values within the XML file while XMLTransformation utilizes a stylesheet to transform the entire document into another format.

Though the Syslog-NG configuration to generate XML configuration is complex and long it is no problem and offers no performance issues, just as CSV, when parsing the received event.

The database formats we used was a MYISAM database, which we chose to use be-cause we needed to a have a database without any indexes or keys. The unindexed database inserts the data to the database without having to create an index for each field that has an index related to it. This file format requires less storage space than an indexed database since each index creates a relation that requires additional storage space to be represented. This means that for each event received the database had create several additional indexes for the event that had to be created and added to existing indexes.

Because of this it requires much more resources to add events to an indexed database, than an unindexed database. The pros of having an indexed database is that it is much faster to search through an indexed database than an unindexed. This is because the indexes are ordered based on their value and has a relation to its corresponding database field.

The most important drawback with Syslog-NG and databases is that Syslog-NG doesn’t have an internal method to import data to the database. This then forces one to imple-ment this file format using external scripts and tools which makes it very time consum-ing.

5.2.1 CSV file format

In addition to storing our data as XML and in relational databases we have also created a CSV file format. We’ve done this because it was the preferred file format to our employer.

Since we have several different event types mouse, keyboard, software and hardware we had a problem with CSV since these events doesn’t contain the same amount of information.

Because of this we have developed a file format where the first three values are the same throughout the format and based on these three values one will know which fields comes next. All events are identified by the following fields:

Event ID is an integer that represents the order of when the event occurred. Here shown

by n – 1,2,3...n.

Event type is a letter that represents the type of event that occurred. B - BeLT, S - Soft-ware, K - Keyboard, M - Mouse and H - Hardware

Action is a specific action or type within the Event type. These codes are described in detail under each table for their event type. This further divide what type of event that happened.

Other values that are presented in the same way throughout a line is:

Time is the timestamp in milliseconds, written in ISO8601 compatible format, when written directly from Syslog-NG. If exported to CSV, either from the database or from CSV. The timestamps from all events, except BeLT messages will be an in-teger representing milliseconds since the start event (first line). An example for timestamp in ISO format is:2013-05-15T12:13:14.0123+00:00. This includes full date, time with milliseconds and time zone.

Relation Says which event this is related to, points to anEvent ID. See section 5.4.2 for a discussion of relations between events.

In the tables below, all fields that are printed directly, are marked in bold, if the text is not in bold, it should be replaced by something else, which is described in the list above or below the table.

BeLT system-events

Event ID Event Type Action Time

n B start T

n B pause T

n B stop T

n B resume T

Table 5: CSV format for BeLT system-messages

BeLT only sends events when the application has been started, paused, resumed and stopped.

Mouse events Event

ID Event

Type Action Value Time Relation Flag Additional fields

n M M X Y T Event ID Int

n M U X Y T Event ID Int Rectangle

n M D X Y T Event ID Int Rectangle

n M W Delta T Event ID Int

Table 6: CSV format for mouse events Action can be:

M Mouse move, the next field is where the mouse is now.

W Mouse Wheel, the next field will indicate the wheel delta.

U/D Mouse press. Mouse Down / Mouse Up, action will say where it happened.

X Y coordinates separated by colon(:).

Delta: If the event was mouse wheel, value will be the delta value, which says how much it scrolls. Negative value means scroll downwards, while positive value means scroll upwards.

Flag Will indicate mouse button on press (1 = left, 2 = middle, 3 = right). The flag will be 4 on mouse wheel and 0 and mouse move.

Rectangle The last software rectangle we saw. On down events it will likely be wrong, so you should look at the value given in mouse up events. It still might be wrong.

Format is the same as above in the software events. See 5.4.3 for a discussion of weaknesses with our correlation here.

Software events Event

ID Event

Type Action Value Time Relation Flag Additional fields

n S FC Process name T Event ID Element

type Element desc., Element ID, Rectangle

n S MO Process name T Event ID Element

type Element desc., Element ID

n S MMS Process name T Event ID Element

type Element desc., Element ID

n S TC Process name T Event ID Element

type Element desc., Element ID, Extra descrip-tion

n S OCS Process name T Event ID State Element desc.,

Element ID, Rectangle

n S EI Process name T Event ID Element

type Element desc., Element ID, Rectangle

n S WO Process name T Event ID Element

type Element desc., Element ID

n S VC Process name T Event ID Element

type Element desc., Element ID, Flag Table 7: CSV format for software events

Action can be:

OCS Object Change State, occurs when the state of an element changes, like check-ing a checkbox, presscheck-ing bold in Word and so on. Can also occur in elements that appear to be buttons.

FC Focus Change, occurs when the user shifts focus to a new element, this can be a new window, pressing a button, moving to a textbox and so on. Will indicate

which element receives input on further events.

EI Element Invoked, typically pressing a button.

MO Menu opened, occurs when the user changes which menu he is looking at, also occurs the first time he starts looking at the menu.

TC Text Changed, occurs when the text in an element changes, like an edit box.

MMS Menu Mode Started, occurs the first time the user starts looking at the menu.

WO Window opened, typically when the user starts a new program or opens a new window.

VC Visual Change, occurs when minimizing, maximizing or restoring a window.

The extra description will indicate which of the three occurred. Also occurs as restored when the visual state of the window changes.

Process name This is the name of the process executable.

Flag This is always an integer, it can be:

state Says whether the state of the element is pressed or unpressed. 0 = pressed and 1 = unpressed.

Element type Says what type of element it is, for a full list see this pagehttp://

msdn.microsoft.com/en-us/library/windows/desktop/ee671198%28v=vs.

85%29.aspx . We subtract 50.000 from each flag, so the button is actually number 0. Negative value indicate that something went wrong.

Element description A name that describes what the element is called, this should de-scribe what is the purpose of the element.UIA_NamePropertyId³in the documen-tation for UIA. If the element type is a document or hyperlink, this field will be the URL, orUIA_ValueValuePropertyId⁴in the documentation for UIA.

Element ID This is an identifier for the element, it should be unique among all it’s sib-lings. It does not tell you what the purpose of the element is, but it should let you correlate between elements and lets you check if it’s the same element you saw before.UIA_AutomationIdPropertyId⁵in the documentation for UIA.

Extra description For some events this is an extra description to describe what hap-pened.UIA_ValueValuePropertyIdin the documentation for UIA. This field is for-matted as a string.

Rectangle Describes the area on the screen that the element occupies. Has the following format:<X coordinate of upper-left corner>, <Y coordinate of upper-left corner><

X coordinate of lower-right corner >, < Y coordinate of lower-right corner >, Er-ror is indicated by all negative 1.

Flag2 As of now, onlyVCgives another flag, the following values are possible:

3http://msdn.microsoft.com/en-us/library/windows/desktop/ee684017(v=vs.85).aspx#UIA_

NamePropertyId

4http://msdn.microsoft.com/en-us/library/windows/desktop/ee671200(v=vs.85).aspx#UIA_

ValueValuePropertyId

5http://msdn.microsoft.com/en-us/library/windows/desktop/ee684017(v=vs.85).aspx#UIA_

AutomationIdPropertyId

1 Restored 2 Maximized 3 Minimized

4 Unknown – Should never happen

Keyboard events Event

ID Event

Type Action Value Time Relation Flag Additional fields

n K D value T Event ID flag

n K U value T Event ID flag Count if > 1

Table 8: CSV format for key events

Action Indicates what type of key event, can be ”D” or ”U” for key down and key up.

Value Roughly what appears on the keyboard, if it is an UTF-8 value. Other keys we can get are system-keys and keys that generate whitespace. See the source code documentation for all possible values.

Flag Will indicate which system keys are active. If a bit is turned on it means the follow-ing:

1. bit Alt is pressed.

2. bit CTRL is pressed.

3. bit Shift is pressed.

4. bit Windows key is pressed 5. bit Caps lock is active 6. bit Num lock active 7. bit Scroll lock active

Count Indicates how many key presses was sent. Only sent for KU event, but KD is what is what we mean. The reason for this is that we don’t know how many key down events that was sent until we get a key up event. Will be omitted if it is 1.

Hardware events Event

ID Event

Type Action Action

spe-cific values Time

n H KEY Language Type T

n H RES CPU Memory T

n H SCR_Info Resolution ID T

n H SCR ID T

n H DEV action T

Table 9: CSV format for hardware messages

All hardware changes starts with a event ID as before and then the letter ”H”. Actions can be:

KEY Indicates that this is information about the keyboard.

Language tells which language and sub-language that is used. 16-bit integer for-matted according to this linkhttp://msdn.microsoft.com/en-us/library/

windows/desktop/dd318691%28v=vs.85%29.aspx.

Type is an integer determining what type of keyboard it is. Values can be 1 to 7, formatted after this link: http://msdn.microsoft.com/en-us/library/

windows/desktop/ms724336%28v=vs.85%29.aspx. (First table under remarks.) RES Indicates that this event shows resources used.

CPU average represented by a float value.

Memory shows the current memory usage, formatted as integer.

SCR / SCR_Info Indicates that this event is related to the screen. SCR_Info is used if this is the first time we have seen this screen.

Resolution If it is the first time we have seen this screen in this session, it will print out a rectangle, if we have seen it before, it will print out a rectangle representing the resolution.

ID is an integer identifying which screen we have changed to. This is unique throughout the rest of the session.

DEV indicates that a device has been inserted or removed.

Action 1 means that a device has been inserted, 2 means that it has been removed.

5.3 Development

5.3.1 Documentation

For our project, documentation is an extremely important task, since we are developing a prototype application that later on will be developed even further. But before this happens, our application will be used to collect information from users from a closed set of up to 50 people so our employer can start the task of correlating the collected information and develop an algorithm for validating users. So for this part we have to document our entire system so the system administrator will understand how it works, and what part of the system performs which task.

We decided that the best solution for documenting would be to write documentation for each user group.

For the administrators and developers we created a system manual (Appendix A), it contains the information about how we’ve set up the system, how we configured the different services and how they work together. In addition to this it describes how to make a new software release.

For the end users we created a manual (Appendix B), it containins all the information about how to use the application – what different buttons do, what does the system tray colors mean etc.

The documentation of the source code needs to be easy to understand, maintain and scale. At first we thought of using the XML commenting standard for MS Visual Studio[33], but we found it being too troublesome to work with, also it was no method

for gathering all of the comments afterwards. Instead we opted for the standard that Doxygen[34] uses. When you start a comment with a forward slash and two starts and end it with a star and a forward slash – it will intepret this as a Doxygen comment. See Appendix E to se the entire Doxygen report. Here is an example of a Doxygen comment:

1 /**

2 * \ brief Initializes all the variables needed to start a session 3 * \ author Robin Stenvi

4 * \ param [ in ] bufTmp A string containing the current date and time , with the format %Y -%m -% d %H -%M -% S

5 * \ remark This function is not enough to start a new session , to start or stop a new session , you should

6 * use writeTime with the appropriate parameter . 7 * \ returns Returns true on success , false on failure . 8 */

9 bool handleData :: startNewSession ( std :: string bufTmp )

Doxygen uses keywords within the comments of the source code. By declaring these keywords it can understand the information and correlate it to the corresponding func-tion/class/member in the reference manual. This makes it easy to write into the source code because all one has to do is write simple one word keywords to add the specified information. The comments are processed and Doxygen automatically generates a for-matted reference manual as defined in the configuration file. We configured Doxygen to use Latex, since we also use Latex in our report.

We have configured Doxygen to perform a recursive search of our source code direc-tory, and parse only ”.CPP”, and ”.H” files. Because of this we can add any other directory if we expand our application and still use Doxygen with minimal additional configu-ration. This also enables any future work to use doxygen without any problems and conflicts. Since our source code for this project is about 10.000 lines of code, a factor to consider is that the documentation must be generated quickly, which Doxygen is able to do.

Doxygen also supports call graphs – call graphs depicts which functions a documented function is calling during its runtime process. Doxygen cn create many types of graphs, but the call graph is the most necessary since it gives a very good overview of the flow within the apllication. To make it easy to understand the source code we have docu-mented function with the aminimum of a brief description that states what the function does, we’ved documented all input values, and return values for the functions and where it is not absolutely elementary what’s happening we’ve implemented a detailed descrip-tion that describes what the funcdescrip-tion does. All this is in addidescrip-tion to the name of the author and normal comments, which is left out by Doxygen.

5.3.2 Software distribution and Continuous integration

To implement CI we had to create a manual system by creating custom scripts that man-age our need for CI. Since we had to manman-age CI and code analysis from the client side, the need for distribution of new software releases is also managed from the client side

In document Behaviour Logging Tool - BeLT (sider 47-56)