• No results found

Any given trackTcontains (by its definition) one or more elements. All elements have the properties of position, length and value. In addition, elements may have distance properties in regards to other elements. Thus, the challenge becomes to define measures which capture the available information afforded by the properties ofT.

2.2.1 Capturing properties

The strategy of section 2.1.2 states that a samplesi is created for every positioni in a trackT, providing a valuesi,j for everyj-th measure in the generated matrix representation|T|Sn (i≤1≤n, 1j≤ |T|,s∈S). Ideally, all «available» information should be accessable at any positioni inT. A list1 of standard properties could be:

• The current positioni.

• A condition, determining if the current position isopenoroccupied.

• A condition, determining if the current position is apointor asegment.

• A condition, determining if the current position isvaluedorunvalued.

• The assigned valuev.

• Thelengthof the element which occupies the current positioni. Ifiis open, then no element is present and the element length is undefined.

• Thedistanceto the preceding element. If no preceding element exists, the distance is undefined.

• Thedistance to the subsequent element. If no subsequent element exists, the distance is undefined.

2.2.2 Using properties to create measures

A measure can make use of as many of the available properties as it desires. Any measure is only limited by the available properties and the creativity of the user.

Designing «all» measures whichmightbecome useful is an overwhelming task.

However, enabling dynamic creation of measures on a case-to-case basis might be a more fruitful path to follow.

2.2.3 Measurement utilization

Creating «good» features plays a key role in retrieving information from genomic data to learn concepts for classification. A good feature[18] collects unique data, which is not yet captured by any other feature (feature-feature inter-correlation), while having a high correlation to a concept (feature-class correlation).

1The list does not claim to be complete. More «available» properties may be added in the future.

The features are grouped based on their nature, which means that they act in a certain way which is common for other features within the same group. The groups are «distance», «value» and «condition».

Distance group

Features of the «distance» group deals with both distances between genomic elements (points and segments), as well as any other range related distances.

Value group

Features of the «value» group targets function tracks, where all positions inside the track is both connected, occupied an valued.

Condition group

Features of the «condition» group outputs only a discrete number of values. The theorethically minimum amount of values are two, in order to represent the presence or absence of a property.

2.2.4 Transformations

A transformation t adds an (optional) extra layer of flexibility to and reuse of a measure m. It is a function which enables the output of m to be changed dynamically, rather than statically (programatically). It is optional, because not all measures need to change its output. The purpose of t is to «map» a single real number xunmapped to a corresponding real number xmapped number, using a mapper. Concretly, ift is added to m, then any ouput xunmapped ofm is mapped intoxmapped before stored in the datasetS.

xmapped=T(xunmapped)|xunmapped,xmapped∈R

A mapper can either bestatic or dynamic, depending on if it’s inner (mapper) variables isregulatedbym. Variables used by the mapper oft may either be set dynamicallyor statically. A transformation is only static if all its variables are static. Otherwise, it is dynamic. A dynamic variable is a variable which may be set by m at runtime2, where a static variable (by contrast) can not. A static variable must be set before runtime. A dynamic variable must allways have a specified default value, in case it is not set at runtime.

Regardless of whether a mapper is static or dynamic, the output must be a consistent equivalence relation to be reliable. All flexibility must be achieved by using a combination of the available structures, and not left to a single structure.

Thus, a dynamic variable may not be changed after it has been used at runtime (in a way which changes this relation) during runtime.

There is no theorethical upper limit for how many transformations which could be added tom. Every transformation which is added tom, is to be chained.

Thus, the output of one transformation would simply be the input of the next.

By keeping track of the order of whichn transformations are added, a mapping

«pipeline» of all transformations would look like:

2Runtime is when the program executes.

xmapped=Tn(Tn1( . . .T1(xunmap ped) ) ), where|T| =n

A transformation t could also be configured to respond differently to various situationsand still uphold the equivalence relation. Thus, it is yet another way of affording flexibility. A situation s is a conditional evaluation offered by m for every output as a dynamic variable. Thus,t may (by its configuration) either choose to respond by eitherexecutingorskippingits mapping. Thus, it is possible to configure transformations to only execute under certain situations. However, t should execute in all situations by default if not explicitly configured otherwise.

2.2.5 Transformation utilization

Transformations may be generalized into groups by the nature of their operations, e.g. exponential, polyomial, etc. Operations should aim to enhance certain genomic behaviour, in a generic and flexible way.

Favouring transformation

A favouring transformation aim to favour a part of a genomic range over another part of the same range. The favouring may be beneficial when there exists insights in regards to how certain areas should be weighted over other areas.

Relativity transformation

A relativity transformation makes use of the mathemathical similtude property, for comparing values found in a single situation to another similar situation using a relatively equal scale.

Condition transformation

A condition transformation «clamp» a measurement range into a discrete amount of values. Thresholds are used to evaluate the selection of the discrete values.

Logarithmic transformation

A logarithmic transformation aim to improve help distinguis between distinct values which lie relatively close to each other. It aims to make the distance of relatively close elements smaller, while keeping relatively distant elements relatively further away.