• No results found

Finding a balance

– Already implemented, just requires some planning and data modelling.

– Fast

– Transparent to the user

– If used with a HSM it can be made certifiably secure.

– Maturing technology, unlike a from-scratch implementation.

6.8.4 Cons

– Relies on the policy engine being fast enough

– On HDFS it can not offer granularity finer than per-file.

6.8.5 Viability

“Very small” and “Small” data

Full-disk encryption schemes like Microsoft BitLocker [Kor09] (Used by Windows OS) or Apple FileVault [CGM13] can use this for encrypting user home directories with other keys than the main system. So in a sense, this kind of access control is in use on a very coarse access level.

“Big” data

This should be obvious. As it is a solution specially developed for the big data field, it must to some degree be suitable, at least in the eyes of the designer. The advantage this system has is that it intervenes at a very early stage and does simple access control checks. The only problem that may arise is that the single policy node may become a bottleneck. On that subject the answer is that HDFS has been doing just fine with just a single active active NameNode.

6.9 Finding a balance

The problem with all of these solution is to find a balance between praticality and The solution using AES scores high on speed, but the number of keys generated is so high it is going to make a big data problem out of the key management.

The solution using ABE scores low on speed because it takes a long time to decrypt each record. For a quick test on the Core i5 used in the preliminary tests (mentioned

80 6. BUILDING A SYSTEM

in section 6.5) the command “openssl speed -evp aes-128-cbc” gave “650820.84kB/s (16B blocks)”. For a 40msen/decryption to take up half the crypto-time each record would have to be “6500820·40 = 2600320800” bytes long.

There is no reason to grant row-level access to all data, so the easiest solution to this problem is to reduce the number of keys in use. For instance with an ABE scheme the same AES key can be used for a number of records with the same attribute combination. The same goes for an AES solution. The problem is then going to be the number of access control combinations possible, because the unit doing the encryption needs to keep track of a key for every combination it has encrypted recently.

As an example for ABE: the master key has 50 attributes, the number of possible combinations is going to be 250. This is not feasible. If we know a bit about the attribute-space it is possible to reduce this number. For instance if we knew that the number of attributes present was about three on average the number of combinations can be estimated using a binomial coefficient to be approximately:

50 3

= 19600

Within an order of magnitude, worst case scenario. If these 50 attributes actually belong to more distinct attribute spaces it turns out that:

25

So it is beneficial to split up these spaces. In the case of ABE the master key size may also impact encryption/decryption performance in some schemes, even though the GPSW tests in section 5.3 did not indicate this.

In the following sections several solutions for optimising the solutions will be presented and analysed for strengths and weaknesses. For good measure the same analysis will be applied to the existing Apache Ranger; and for a lot of use cases it is going to turn out to be the best solution.

6.10 Conclusion

Although the prospect of using cryptography as an access control mechanism may be appealing in principle it would require a whole new generation of data processing tools to be developed in order to take advantage of it. It offers no tangible advantages over the tools available today apart from a higher level of security. For these reasons it is unlikely that cryptographic access control mechanisms like the ones described here will ever be commonly deployed. If they do get implemented it is going to

6.10. CONCLUSION 81 be because of a spread of mistrust in Platform As A Service (PAAS) providers, or stricter regulations.

All access control schemes require a certain level of planning, this defies the principle/idea of a data lake, but it can’t be helped. As long as the data is legally considered sensitive, there is a legal requirement for planning.

Modern cryptographic schemes like ABE offer new features that can be used in new ways, but at the cost of processing power. In a course on cryptography one might be told that RSA is a slow cipher, but when compared to the GPSW algorithm in chapter 5 it becomes extremely fast in comparison.

82 6. BUILDING A SYSTEM

Table 6.1: Summary of the schemes described in this chapter

Scheme Pros Cons

All AES Fast encryption/decryption. A lot of keys, thus key man-agement/mapping problems.

Encryptor must have the de-cryption key.

All ABE One decryption key per

“user”.

Still a lot of keys to keep track of.

Key Reuse (RSA) Encryptor only has encryp-tion key for extensive time periods.

Can be scaled to a good per-formance compromise.

Still a lot of keys to keep track of.

Key Reuse (ABE) One decryption key per

“user”.

Chapter