Implementing Custom Field Encryption with Entity Framework Core
Written on
In this article, we explore the intricacies of developing a customized field encryption strategy leveraging Entity Framework Core, focusing on the rationale, design choices, and performance enhancements involved.
Database field encryption serves as a vital method for safeguarding both our data and that of our clients. While many database systems, like SQL Server, offer built-in field encryption capabilities such as ‘Always Encrypted,’ they lack a standardized framework and often fall short in terms of customization for complex needs.
This article discusses our specific requirements that prompted us to create a bespoke encryption solution for our .NET application. We will delve into the design choices made, the coding process, and the performance optimizations implemented. The complete code is available on my GitHub.
Sample Application
We will demonstrate the field encryption implementation using a sample application that incorporates a FinancialEntity class, which contains sensitive financial information stored in a dictionary format.
While SQL Server will serve as the demonstration platform, the approach we present is applicable to any database that supports Entity Framework. The sensitive financial amounts are serialized as JSON prior to being stored.
Before diving into the coding aspect, it is essential to outline the encryption requirements that led us to devise our own field encryption solution.
Encryption Requirements
- Only specific columns/fields necessitate encryption, all of which are text-based fields containing confidential financial information in JSON format.
- Diverse logical groups/rows of data should utilize different encryption keys, accommodating scenarios like multi-tenant environments or ‘bring your own key’ provisions.
- Key rotation should be both quick and straightforward.
- The solution must support various encryption methods and algorithms across different data rows.
- Compatibility across multiple Entity Framework database providers, such as SQL Server, PostgreSQL, and SQLite, is essential.
Envelope Encryption
For efficient key management and rotation, we aim to streamline the process. If we must decrypt and encrypt all data each time a key rotation occurs, it can be slow and unreliable. Many cloud providers utilize envelope encryption to simplify this task.
In envelope encryption, we categorize keys as follows:
- Data Encryption Key (DEK) – This key is responsible for encrypting data and remains unchanged.
- Key Encryption Key (KEK) – This key is used to encrypt the DEK, a process referred to as wrapping.
The wrapped DEK can be securely stored alongside the encrypted data, while the KEK must be kept secure and separate. Ideally, a Hardware Security Module (HSM) or a key management solution like Azure Key Vault or AWS KMS should safeguard your KEKs, providing the ability to wrap and unwrap keys without exposing the underlying KEK, thereby enhancing overall security.
When key rotation is needed, we simply unwrap the DEK using the old KEK and wrap it with the new KEK, without modifying our data. This approach allows for various methods of segregating DEKs and KEKs based on tenant requirements and chosen encryption algorithms.
Entity Framework Value Converters
Custom Entity Framework ValueConverters enable us to manage the serialization of data to and from the database for individual properties within our entities. This is an optimal way to address our encryption needs without requiring significant modifications to the rest of our application. A ValueConverter facilitates compatibility across any Entity Framework database provider.
Initially, our ValueConverter was employed to serialize our dictionary of financial data into JSON format for storage as text in the database. Any necessary ValueConverters can be configured using an Entity Framework ‘code-first’ IEntityTypeConfiguration.
Typically, when employing a custom ValueConverter, it is also necessary to instruct Entity Framework on how to compare instances for equality using a ValueComparer.
If we were to use a single encryption key for all data, transitioning to encrypting the JSON would be straightforward. However, our requirements necessitated accommodating different Data Encryption Keys (DEKs) within the same column and supporting various algorithms. When data is retrieved from the database, we might only select a single field, meaning we need all metadata required to decrypt the property stored alongside that column. We cannot depend on looking up the DEK from another column since it may not be included in the query results.
Our updated ValueConverter will serialize data as a byte[], allowing it to be stored as binary in the database. Given that the data we are handling is JSON, it compresses very efficiently. Since we already require a byte[], it makes sense to also implement GZip compression.
Encryption Metadata
We understand that our encrypted data will be stored as a byte[] via a ValueConverter. We can leverage this structure to prepend any necessary metadata for decryption at the beginning of the byte[]. Upon decryption, we can simply remove the initial bytes, allowing us to handle the decryption process effectively.
The first byte is allocated to store a CompressionType Enum that indicates the type of encryption and compression utilized. This design allows us to flexibly accommodate additional compression types in the future without disrupting historical data.
We also consistently store the byte count of the original JSON. This is crucial for performance optimization, which we will discuss further. An integer is used for the byte count, requiring 4 bytes.
Advanced Encryption Standard (AES)
We will implement AES for encrypting our data using a 256-bit key. AES is a well-established symmetric encryption algorithm.
AES requires an Initialization Vector (IV), which functions similarly to a salt in hashing. By providing a unique IV for each encryption operation, we ensure that identical values yield different encrypted outputs, thus achieving ‘non-deterministic’ encryption, which bolsters data security against potential attackers who may deduce the values of individual rows in the database.
Both the DEK and IV are essential for decrypting data. Naturally, we cannot store the actual key alongside the data, so we store the ID and retrieve the key during the decryption process. If GZipWithAes256 is designated as the CompressionType, the DEK and IV are stored as the subsequent bytes in the byte[].
Encryption Components
Several classes are involved in managing encryption throughout the application. The core functionalities are primarily handled by the ValueConverter and Compressor classes.
- ValueConverter - Converts entity properties to and from byte[] for database storage, including JSON serialization.
- Compressor - Handles compression and encryption of JSON using specified CompressionType.
- Keychain - A singleton that caches DEKs and manages their lifecycle and expiry.
- Key - Represents the unwrapped DEK along with metadata regarding recent usage and expiry.
- EntityFrameworkKeyStore - Implements IKeyStore for storing wrapped DEKs in Entity Framework.
- TestKeyWrapper - A testing implementation of IKeyWrapper for wrapping/unwrapping DEKs using a KEK. In a production environment, this would involve an HSM or secure key service like Azure Key Vault or AWS KMS.
- KeyManager - A BackgroundService that initializes the Keychain singleton and refreshes in-use DEKs prior to their expiry.
Most additional components focus on managing the lifetime of keys, which may be critical depending on the use case. If you permit clients to ‘bring their own key (BYOK),’ this management layer becomes essential for expiring keys after a designated period, ensuring data access is revoked if a client withdraws access to their KEK.
Unwrapping and retrieving DEKs can be time-consuming, based on the implementations of IKeyStore and IKeyWrapper you choose. The KeyManager and Keychain work together to automatically refresh keys currently in use, functioning similarly to a cache with a sliding expiry.
Encryption Interactions
The primary interactions among the encryption classes are illustrated in the following diagram.
The Keychain caches DEKs for a specified duration, reducing the need to look them up from the database and unwrap them for every operation. It also incorporates a circuit breaker for each Key instance held in memory, limiting requests if access to the relevant KEK or Key Store is interrupted.
Encryption and Compression
Now, let's examine some code! Our FinancialEntity class requires minimal modifications. We only need to wrap any encrypted properties with an EncryptedData structure.
The EncryptedData wrapper is necessary to track any metadata required for compression and encryption, such as the CompressionType and DataEncryptionKeyId.
We must ensure our ValueConverter has all the essential metadata for decryption when reading the column from the database.
Here is how our data appears in the database now.
- Rows 1–10 utilized GZipWithAes256 with alternating DEKs. The initial bytes are consistent across alternate rows, while subsequent bytes differ due to unique IVs.
- Rows 11–20 employed GZip, resulting in identical values across rows, as all JSON data was the same.
Value Converter
As mentioned earlier, an Entity Framework ValueConverter will facilitate our field encryption and decryption, minimizing the need for extensive alterations to the rest of our application.
Our new ValueConverter now serializes our financial JSON data into a byte[].
Unfortunately, ValueConverters do not support dependency injection, so we must employ static classes for compression and decompression operations.
Compressor
Here is a streamlined version of the original Compressor implementation. The necessary metadata for encryption is prepended to the resulting byte[] and extracted during decompression. Due to the lack of dependency injection with the ValueConverter, a singleton class is employed to manage the Keychain.
Given the frequent usage of this class throughout the application, it is crucial that it operates with maximum efficiency. We achieved significant performance enhancements for the compressor by optimizing memory utilization through stack allocation and spans whenever feasible.
This led to a 12% speed improvement and a substantial 60% memory reduction during encryption.
Storing the byte count of the original string allowed us to allocate an output MemoryStream of the exact length required for decompression, greatly enhancing decompression performance. This resulted in a 7% speed improvement and a 65% memory reduction during decryption.
Keychain and Keys
The Keychain class is utilized for caching keys in memory, ensuring that encryption and decryption processes are as rapid as possible. A singleton instance of the Keychain must be initialized with an IKeyStore by the application. While dependency injection would facilitate this process, it is not feasible through Entity Framework Value Converters.
The Key class retains an unwrapped DEK in memory and includes logic for retrieving and refreshing the key. Each key necessitates its own lock and circuit breaker for managing failures and ensuring thread safety. The LastOperationDate is monitored for each key, allowing for periodic access refresh if the KEK remains valid.
Encrypted vs Non-Encrypted Benchmark
The newly implemented EncryptedFinancialEntity was benchmarked against the non-encrypted FinancialEntity by inserting 100,000 rows and updating 20,000 rows of data.
The results reveal that the GZip compressed version outperforms the original version. The GZipWithAes256 encrypted rows also exhibited faster performance compared to the original, though there was approximately a 1% performance penalty between GZip and GZipWithAes256, as anticipated.
You can find all the code and benchmarks on my GitHub page linked below.
GitHub - matt-bentley/EntityFramework.FieldEncryption: .NET Field Encryption solution using Entity…