• No results found

Algorithms

In document Behaviour Logging Tool - BeLT (sider 56-63)

5.4.1 Mouse compression

Storing every single mouse move event takes up a lot of space on the server and possibly the client, the client application also has to send more data to the server. According to RUI[4], it was 22 KB/m with ”extensive mouse movements”7. On a full workday, this is about 10MB, just in mouse movements. Since we store a few bytes more on every mouse movement, this number would likely be higher.

We wanted to limit this number while still managing to recreate the original path with good accuracy. The algorithm is based on two variables, the difference in length between two points and the change in degrees between two points8.

6http://www.win-rar.com

7We assume this number is based averages, since they have a separate number for continuous mouse move-ments.

8We only do this on mouse movements, mouse presses and wheel scrolling events are always logged.

Figure 12: Mouse compression with

30 % of original dataset Figure 13: Mouse compression with 19 % of original dataset

To calculate the real distance between two points, we subtract the x and y we logged last with the current x and y values. Both results are squared, and added together9. If this distance is higher than some predetermined value, we go on to the next test.

The difference in degrees is measured by taking the current x, y coordinates minus the last x, y coordinates we saw10. We then find the arc tangent for this value, multiply it with 180 and divides it by PI to find the current arc tangent degrees. if the difference between this value and the arc tangent from our previous log is above some predetermined value, it have passed the test.

Both of these test have to pass for us to log this event.

Code snippet 5.1 shows our implementation of the compression algorithm. We first check if we have a previously stored point, if not we need to log it.

After limited testing we saw that having a distance change of 10, and a degree change of 5 provided decent result, which is what is used in the final program. We think we can use higher values, but we think it’s best to start with modest values, where we know we can get enough data and avoid the risk of not getting enough data.

Code 5.1: Mouse compression algorithm

1 REAL_DISTANCE = 10 2 DEGREE_CHANGE = 5 3

4 lastMove .x = -1 5 lastMove .y = -1 6

7 bool procedure difference (b < a) 8 if (b < a) {

9 c = a; a = b; b = c;

10 }

11 if ( (a + DEGREE_CHANGE ) < b)

12 return true ;

13 return false ; 14

15 bool procedure printThis ( POINT now )

16 if lastMove .x == -1 and lastMove .y == -1)

17 return true

18

19 if now .x < 0

9To get the real distance we would have to take the square root of this again, but it is not necessary for us.

10The last x, y coordinate we saw might be different from the last x, y coordinates we logged

Figure 14: Mouse compression with

14 % of original dataset Figure 15: Mouse compression with 11 % of original dataset

20 now .x = 0

21 if now .y < 0

22 now .y = 0

23

24 xSquare = square ( ( now .x - lastWritten .x) ) 25 ySquare = square ( ( now .y - lastWritten .y) ) 26

27 if ( xSquare + ySquare ) > REAL_DISTANCE 28 xAbsDistance = abs ( now .x - lastMove ->x) 29 yAbsDistance = abs ( now .y - lastMove ->y) 30

31 newDegrees = arctangent ( yAbsDistance , xAbsDistance ) * 180 / PI 32

33 if difference ( newDegrees , oldDegrees ) > DEG_CHANGE 34 oldDegrees = newDegrees

35 return true

36 return false 37

38 if ( printThis ( now ) == true ) 39 lastWritten = now 40 Send event to server 41 lastMove = now

Figure 12, 13, 14 and 15 shows an example of the mouse compression algorithm.

The original path is painted in black, while the path we got from the compressed file is painted in red.

Figure 12 shows the values we used, the other are here to show what kind of differ-ences you can get, depending on how much accuracy you need. The complete dataset contained 440 mouse movements (without any compression). The Python script in ap-pendix D.2 was used to compress each file. Another Python script in apap-pendix D.3 was used to draw the graphs.

These graphs were generated from mouse movements done with a touchpad. This is slightly different from movements done with an external mouse. In our experience, external mouse gives higher accuracy with fewer stored points. This is most likely be-cause movements with a touchpad tends to start and stop a lot, while movements with an external mouse tends to be in a much smoother motion.

Another important thing to mention, is that, there is no guarantee for how effective the mouse compression will be. We did some limited testing, where we tried to do it as realistically as possible, on several systems. The only thing that is guaranteed, is that it

will be as accurate as the boundary values we have set and the mouse events received.

In our experiments, about 35 % of the logs consist of mouse movements, after com-pression. So if we have a file with 1 MB11 size. Then 350 KB will be mouse move-ments, when compression is applied. If mouse compression was not applied, the mouse movements would have a complete size of 1167 KB (350/30⇥100). Here we assume that that we store 30 % of the original data. The new files size would be 1.817 MB ((1000-350) +1167).

When running the application in debug mode, it will generate log files for all mouse movements, both without compression and with compression. We did this so it’s easy to both test the compression algorithm and check new values. Read more about this in the manual, in appendix A.

5.4.2 Relation between events

When we see an event that is not a hardware event, it belongs to somewhere or is an extension of another event. If we see a mouse down event for example, this belongs to a window, if we see a mouse up event, is is a regular extension of the last mouse down event. The same logic applies to key events. Software events work the other way around, they are caused by a mouse event or a key event, in other words, the relation points to what caused the event.

The relation value is a reference to an event ID in a previous event. It will always point backwards. All the events are placed in one of the following categories.

1. User event tied to window 2. Indirect user event 3. Independent 4. Software event

The first are events that are different depending on which software element they belong to, they consist of the following events.

• Key down

• Mouse down

• Mouse wheel

The second group is events that are direct cause of an event in the first category, it consist of the following elements:

• Key up

• Mouse up

One of these events should always happen when there is a key down or mouse down event.

The third group is just mouse move, since it is independent from other events and does not necessary belong to any window.

The software category is all the UI Automation events. Whenever we get a software event we assume it happened because of a user event.12

11Whenever we talk about MB and KB in this paragraph we mean106and103bytes, respectively.

12This is not always the case, since the operating system does something at regular intervals, plugging in a

For all events in the first category, the relation flag will point to the last active window.

For events in the second category, the relation flag will point to it’s counterpart in the first category. Mouse move will point to zero if we this is the first in a series of mouse moves, otherwise it will point to the last mouse move we saw. If the a mouse button is held down, the relation flag will point to it, since it will be a dragging action.

The relation flag on software events will always point to the last event that happened in category 1, 2 or 3 (all except software events). This is the flag we are most insecure about, since we have multiple threads and there might be a delay before a new process starts, or a new window is opened.

Table 10 summarizes what we talked about, whenever we receive a key down event, it is tied to a software window. When we receive a key up event it is tied to the last key down event with that specified character. When we receive a mouse down event, it is tied to a software window, mouse up is tied to the last mouse down event. Mouse wheel is tied to the last active software window.

Mouse move can be tied to mouse down, if mouse button is still pressed, or the last mouse move or it is tied to nothing, if user input has happened since last mouse move or mouse down.

On software events, we take an educated guess at what generated it, so the relation flag on software events will point to the last user event, which can be mouse moves, all mouse presses, and all key presses. A guess is all we can do here, there is no sure way, that we know of, to correlate software events with mouse and key events. After talking with employer this was satisfactory, because they would have to do the same thing.

Summary of possible relations values

Type Action Relation

Key Down Software

Key Up Key Down

Mouse Wheel Software

Mouse Down Software

Mouse Up Mouse Down

Mouse Move Mouse Down

Mouse Move Mouse Move

Mouse Move 0

Software all Last user event

Table 10: Table of events and their corresponding relationship

5.4.3 Area around a mouseclick

Whenever you get a mouseclick, it belongs to somewhere, typically a button, a checkbox or something similar. All these elements are rectangles, and whenever you receive a mouse press, you know that it belongs to one of these rectangles. Some useful properties to know is whether they clicked in the middle of the rectangle, one the left side and so on.

Every time we see a mouseclick, we try to find out which area it belongs to, see table 6. To do this, we look at the last rectangle we saw, which came in the form of a software event. We assume that the mouseclick belongs to that rectangle. This assumption has

USB stick will maybe generate an event and so on. We ignore that here for simplicity.

some uncertainties:

1. The rectangle on mouse down events will usually point to the wrong place, because you first get a mouse down, then an event, then a mouse up event13. The rectangle from mouse down is therefore unreliable and the rectangle from mouse up should generally be used, but only if there is a software event between them. You could use the timestamps to increase the likelihood of finding it’s correct or not.

2. Normally we receive a mouse event, then the software event, but because we have multiple threads, events may not come in order. If for example, a mouse down event generates a software event; then we would like to see Mouse down – software – mouse up, if the software event comes after the mouse up event, both mouse events will reference the wrong rectangle. Events out of order can probably be detected when analysing, because the timestamps will also be out-of-order.

3. Not all mouseclicks generate an event and if an event is generated, and it happened because of the mouseclick, it may have done something with a different part of the screen. If you click an item in the taskbar for example, it will restore a window, but this happens somewhere else on the screen. In these cases the rectangle will be wrong, but in most cases, you will be able to detect it as the mouseclick is not inside the rectangle.

13This is the general case, sometimes the event occur after mouse up

In document Behaviour Logging Tool - BeLT (sider 56-63)