Privacy By Design: Integrating Privacy into the Software Development Life Cycle

As data breaches and privacy violations continue to make headlines, it is evident that mere reactive measures are not enough to protect personal data. Therefore, behind every privacy-aware organization lies an established software engineering process that systematically includes privacy engineering activities. Such activities include the selection of privacy-enhancing technologies, the analysis of potential privacy threats, as well as the continuous re-evaluation of privacy risks at runtime. In this blog post, we give an overview of some of these activities which help your organization to build and operate privacy-friendly software by design. In doing so, we focus on risk-based privacy engineering as the driver for »Privacy by Design«.

04. June 2024

1. Introduction – Privacy Engineering
2. The Privacy-Aware Development Life Cycle
2.1. Privacy Threat Modeling with LINDDUN
3. Privacy Engineering in the Envision, Plan and Design Phases
3.1. Plan and Design: Systematic technology selection
3.2. Plan and Design: Architecture Evaluation
4. Privacy Engineering in the Build and Stabilize Phases
5. Privacy Engineering in the Support and Retire Phases
6. Conclusions
7. Bibliography

1. Introduction – Privacy Engineering

The concept of Privacy by Design has gained considerable traction in recent years. This is no surprise as integrating privacy as a dedicated software quality brings considerable advantages: First, users’ trust in the software increases. Second, trust in a software’s privacy qualities may motivate users to provide more data (and in higher quality). Third, a high level of privacy can be a competitive advantage. And, fourth, it can reduce liabilities. This is because privacy-friendly software processes less sensitive data and is thus exposed to fewer risks. Processing less personal data also reduces red tape and the risk of data breaches. Furthermore, the collection of personal data entails the implementation of costly consent management and data flow tracking mechanisms.

An efficient and effective way to achieve a high level of privacy is to integrate a risk-based privacy engineering approach into the complete life cycle. While security risk assessment in general has been an established field of research and has been widely used in practice, its usage in the privacy domain is more recent. Privacy risk assessment pursues goals like anonymity, unlinkability, as well as user awareness, and goes beyond security goals and regulatory data protection requirements.

In this blog post, we show how comprehensive and systematic privacy engineering can be approached. To do so, we build on different existing privacy engineering methods like architectural design strategies and the privacy risk assessment framework LINDDUN (see Section 2.1). This way, privacy risks can be uncovered and mitigated early in the software development life cycle (SDLC), and privacy becomes an inherent quality of the software.

First, we briefly review the definitions of security, privacy and data protection. In Section 2, we introduce privacy risk assessment as the basis for privacy engineering. In Sections 3-5, we introduce different privacy tools and methods that can support privacy engineers during different phases of the SDLC.

The Basics: Security, Privacy, Data Protection

First, we briefly look at the differences between privacy, security and data protection. These concepts go hand in hand and have considerable overlaps. Still, they also have their peculiarities:

Security focuses on protection against external attacks. Therefore, it primarily focuses on protecting the confidentiality, integrity and availability of systems and data.
Privacy focuses on minimizing the use of personal data and maximizing the control users have over their personal data (after the data has been collected). It thus refers to achieving the protection goals of confidentiality, anonymity, unlinkability, repudiation, undetectability, awareness and policy compliance (which are explained in Section 2.1 in more detail).
Data protection refers to the general practice of protecting sensitive data against intentional and unintentional damage. It combines aspects of both security and privacy and adds other special requirements regarding the handling of data, for example portability and reporting obligations. Data protection, therefore, overlaps with security and privacy but is usually tied to regulatory practice.

2. The Privacy-Aware Development Life Cycle

Privacy should be considered throughout the complete SDLC – adding privacy features late into the life cycle is difficult and costly, because they often touch upon fundamental architectural design aspects which are costly to change at a later stage.

In the security domain, Secure Software Development Life Cycles (SSDLCs) have become an established model to structure security activities along the phases of an SDLC. Many organizations use this approach to create company-wide standards and to guarantee a certain level of security in their products. Figure 1 shows an overview of an SSDLC, including some typical security activities.

While an (S)SDLC provides a useful structure for organizing development activities, in practice, some of these phases are often conducted concurrently – and for good reasons. For example, some (re-)design activities may be done alongside the implementation and testing.

Similar to security activities, there are numerous existing privacy engineering activities as shown in Figure 2 which broadly assigns them to the phases introduced in Figure1.

Figure 1: Overview of commonly used SDLC phases with example security activities. Note that we do not assume a strict waterfall-like development in this blog post but use these phases as a simple structure to organize development activities.

Figure 2: A selection of privacy engineering activities along the SDLC phases. Note that some activities are not always clearly assignable to a certain phase which is why they are only loosely assigned in this figure.

Some SSDLCs already consider privacy to some degree. Often, however, they subsume privacy in security or conflate it with regulatory data protection. The Microsoft Secure Development Life Cycle, for example, explicitly integrates privacy documentation, testing and risk assessment. Yet, it considers »privacy« as a set of regulatory requirements rather than a system design that provides the desired functionality using a minimal amount of personal data by design.

Overall, privacy as a software quality (beyond regulatory data protection) is often neglected in today’s development process. Privacy-aware organizations, however, require a consistent approach to integrating privacy activities. One possibility to achieve this is risk-based engineering which continuously identifies, prioritizes and mitigates privacy risks – in the architecture, source code, as well as the actual data flows in real time. An established method for conducting a privacy risk assessment is LINDDUN which we look at in the following section.

2.1. Privacy Threat Modeling with LINDDUN

Privacy threat modeling is a foundational element for privacy engineering. LINDDUN is a framework for privacy threat modeling and risk assessment. It is the privacy equivalent to the security threat modeling framework STRIDE. The latter is an acronym for the security threats Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service and Escalation of Privilege. LINDDUN, in contrast, is an acronym that refers to the following threats:

Linkability: Multiple data items can be linked to the same person, for example multiple messages or multiple devices.
Identifiability: A person can be identified, e.g., via the data that the person submits to the service.
Non-repudiation: The user’s actions, like the submission of sensitive data, cannot be denied later, for example, because the submission requires strong authentication methods, and logs are created. Thus, this point often presents a conflict between privacy and security goals.
Detectability: The communication between the person and the service may be observable by attackers. The existence of communication alone can reveal sensitive information, for example, if the service targets patients with a specific disease.
Disclosure of Information: This category overlaps with STRIDE’s Information Disclosure, i.e., the violation of confidentiality of sensitive data.
Unawareness: The person is not aware of the data collection and processing practices of the service, for example, which data is collected and for how long it is stored.
Policy Non-Compliance: The service does not comply with certain policies, e.g., regulatory requirements and principles like data minimization.

The identification of these threat types is embedded in a process which comprises the following (simplified) steps: First, a data flow diagram (DFD) is created that illustrates the data flows, actors and elements of the target system. The DFD can be used in a second step to identify potentially problematic data flows according to the threat types listed above. If there are multiple flows between user and service or between the service and a third party, for example, they may be linkable. Third, the impacts and risks related to the threats can be assessed and fourth, potential mitigations can be determined. This process ensures that the most important threats are prioritized, and it provides a baseline for risk-based engineering during the development life cycle as we will see in the following.

3. Privacy Engineering in the Envision, Plan and Design Phases

In the first phases of the SDLC, the software is envisioned, planned and designed (see Figure 3). This includes requirements engineering, architectural drafts and more detailed design concepts. These activities are essential as they set the course for achieving the functional requirements and non-functional software qualities. To create software that is privacy-aware, privacy engineering should start alongside the first engineering phases: A risk assessment, for instance, can reveal relevant privacy risks, even if only high-level elements, actors and data flows are known.

Figure 3: A selection of privacy engineering activities along the SDLC phases. The two activities that are described in this section, are highlighted in red.

In the following, we look at two privacy engineering steps that can be included in these phases: The selection of privacy-enhancing technologies (PETs) and the evaluation of design candidates.

3.1. Plan and Design: Systematic technology selection

One important, but challenging step in designing software is the selection of PETs to be used. Translating (privacy) requirements into PETs is a complex task, as the selected technologies must align with the overall functional and non-functional design goals.
Numerous conditions can be considered when selecting a PET, depending on the requirement and the development scenario. For example, some technologies significantly influence how an architecture can be designed and which qualities it can achieve: Resource-intensive technologies like Trusted Execution Environments (TEEs) or homomorphic encryption, for instance, have a considerable impact on the performance of the system. The usage of PETs can also influence the achievement of other qualities like security, data utility and the maintenance effort that is implied when applying the technology.

At Fraunhofer AISEC, we design and develop PETs in the areas of Self-Sovereign Identity (like reclaimID and DISSENS), secure naming and advanced cryptography in general (like libpabc and librabe). Furthermore, we research methods to support software engineers in selecting PETs as described in the following (see also [1] and [2] for more details).

The following steps should be considered when selecting PETs:

Step 0 – Perform threat modelling to elicit threat category for PET selection
When creating a threat model of a software multiple (LINDDUN-based) threats may be discovered. Knowing the exact type of threat that should be mitigated is a useful prerequisite to selecting a suitable PET.
Step 1 – Threat-based PET elicitation
To elicit suitable PETs that can mitigate a certain threat, we can first filter existing technologies for the threat type at hand. In a publication [1], we proposed a classification of PETs according to LINDDUN threats and other criteria, like their impact on the overall architecture and performance penalties.

Step 2 – Service-driven PET elicitation
In the second step, potentially useful PETs can further be filtered using service-driven requirements. These requirements may, for instance, refer to the necessary accuracy of values and the existence of inherent attribute dependencies.

Step 3 – PET selection
Finally, a PET can be selected from the ones elicited in step 2. If multiple technologies are deemed suitable, their impact on the architecture design should be evaluated and compared, which leads us to the evaluation of the architecture in the next section.

Overall, the careful selection of suitable PETs is essential in building software that protects user privacy.

3.2. Plan and Design: Architecture Evaluation

A subsequent step to the technology selection is to evaluate and compare architectural designs to be able to decide which one best achieves the desired goals.

Typically, architects use views to understand, document and illustrate software architectures. In [3], we have developed an architectural view to reveal privacy properties in a software architecture. One such aspect is the control users retain over their personal data even if it is sent to the service. Depending on the different PETs, different levels of control can be preserved for the user. For example, using confidential computing technologies, like homomorphic encryption or a Trusted Execution Environment, users can keep control of their data even when the data is processed on resources that are controlled by another entity.

For more details and examples on how privacy properties can be revealed in an architectural design, see our publication [3].

4. Privacy Engineering in the Build and Stabilize Phases

Once an initial design has been created, the implementation activities can start (see again Figure 1). While writing source code is not necessarily an activity that is relevant for privacy engineering, the testing of the created code is. Different types of tests can be considered here, for example:

Testing the outputs of privacy-relevant functions. For such kind of tests, synthetic or real personal data may be used in accordance with data protection regulations and internal policies.
Testing of data flows, retention and deletion mechanisms, recovery systems, etc.
Testing for potential threats in the source code which we take a closer look at in the following.

Figure 4: A selection of privacy engineering activities along the SDLC phases. The two activities that are described in this section, are highlighted in red.

Privacy threat modeling and risk assessment (as introduced in Section 2.1) can be performed as soon as a data flow diagram of the application can be created. Having source code available, however, opens new possibilities for threat modeling, as it reveals a lot more information about the data flows and other functionalities that actually exist in the application. We therefore take a closer look at identifying threats in source code.

When developing code, organizations often follow an iterative approach: After an initial design has been created, features are implemented and tested, and for the next iteration, changes (possibly based on user feedback) can be introduced. This procedure should also include software testing for security and privacy weaknesses. At Fraunhofer AISEC, we have extensive experience in software analysis. Our analysis tool cpg, for example, builds a code property graph that allows to analyze code for potential security weaknesses. We have also extended the cpg for cloud systems [4] and for privacy analysis, called the Privacy Property Graph (PPG) [5] to support engineers in conducting privacy threat modeling efficiently.

As described in Section 2.1, the first step of threat modeling is to create a data flow diagram that shows which elements and actors exchange which types of data. When source code of the software is available, such a diagram can automatically be created using the PPG (see [5]). Engineers can then use the graph by analyzing it manually—e.g., expanding certain nodes and analyzing data flow paths.

To enable the automatic detection of privacy weaknesses using the PPG, we first translated the LINDDUN threats to specific code properties, like data transmissions or database read and write operations. The PPG introduces these properties into the graph which can then be detected automatically as well.

Figure 5: A Cypher query and the resulting graph excerpt: The query detects paths (p) in the graph which start at a PseudoIdentifier which moves via an arbitrary number of data flow edges (DFG*) and via an HTTP endpoint to a database operation.

Having a code property graph available that includes all the privacy properties related to the LINDDUN threats, we can now write queries to detect them automatically. In Figure 5, such a query is also included (see the input field on the top). Note that the query is application-independent, reusable, and automatically executable.

Staying on top of privacy threats to mitigate them in a timely manner is crucial for developing and operating a software that processes personal data. Tools like the PPG support this endeavor throughout the entire life cycle and take away workload from engineers.

The PPG can also be used to detect some privacy smells which we look at in the next section.

5. Privacy Engineering in the Support and Retire Phases

When a software is designed, many assumptions are made regarding the privacy of its future users: Which and how much data will they submit? How sensitive will the data be? How and how often will they want to access, modify and delete their data? Will the protective measures, like privacy-enhancing technologies, hold up? Will all these criteria change over time?

Figure 6: A selection of privacy engineering activities along the SDLC phases. The three activities that are described in this section, are highlighted in red.

Once a software has been deployed, the assumptions that have been made about user privacy should be (re-)tested: Risks can change as, for example, the user base of the service, the legal requirements or the threat context changes. Furthermore, the countermeasures’ effectiveness should be tested (see [6]).

Tracking Data Flows

A first prerequisite for testing the assumptions mentioned above is to track flows of personal data, so they can be analyzed.

Today, more and more organizations outsource their infrastructure management to the cloud which makes it even more difficult to operate a privacy-aware software.

Especially in cloud systems, it can be difficult to collect meaningful measurements about data flows. To collect and analyze information about how sensitive data moves through a cloud system, cloud-native services can be used, e.g., Microsoft Azure Monitor or Amazon Web Services (AWS) CloudWatch. Alternatively, customized data flow tracking systems can be created as we have shown in a previous publication [7]. Building such systems, however, is laborious.

In this blog post, we do not go into the details of data flow tracking in the cloud. It is important to note, however, that thorough data flow tracking is useful for measuring privacy threat indicators (as we will see in the following) and essential for the retirement of a system, since all personal data needs to be removed.

Measure Privacy Indicators: Privacy Smells

Two useful indicators for privacy issues in a (cloud-based) system are the access requests to personal data as well as the design of the authorization system (which is often a role-based access control system, RBAC).

Code smells are indicators for deeper-rooted problems in the code base. If, for example, methods become very large, nested or complex, ensuing problems can be expected. To keep track of such indicators, certain measurements can be used. We have developed similar smells in the context of privacy, i.e., indicators for potential privacy problems in cloud systems [8]. The smells are partly derived from the privacy design strategies by Hoepman [9]. One of Hoepman’s design strategies, for example, is the Minimise Strategy. It states that the amount of personal data that is processed and stored should be limited to the required minimum. This is a common requirement which can also be found in data protection regulations and other privacy design principles. A smell that can be derived from this strategy is called Data Hoards. Data Hoards are storages that excessively store personal (or otherwise sensitive) data. For example, a Data Hoard could be a storage that stores usage statistics of a mobile application – however, without this data ever being used.

Measuring privacy smells is one example of integrating privacy engineering into the software life cycle. They can, for example, be used as an input to a continuous risk assessment process.

Continuously Assessing Privacy Risks

Once a software is in operation, its development does not stop. Rather, the iterative work on possible bugs, new features and security updates continues. As software evolves, it also deviates from its initial design and the assumptions about privacy (and security) risks it is exposed to. Thus, threat modeling should continue to identify threats and risks as soon as they appear.

We have previously proposed a continuous risk assessment process in [10]. In such a continuous process, it is crucial to combine manual activities, such as experts analyzing a system for potential attack vectors and privacy issues, and automated tools, like an automated measurement of privacy threat indicators in cloud systems and source code. In [9], we have described such a process.

For the retirement of a service – and thus the end of the SDLC – at least two considerations should be made: First, a thorough data flow tracking (see above) allows to appropriately retire a software including the personal data that has been stored, possibly in multiple locations and backups. Second, different regulations exist regarding the retention times of personal data. Thus, different retention times may be applicable in different regions.

6. Conclusions

Organizations that acknowledge and integrate privacy as an important software quality benefit from numerous advantages: Reduced liability risks, increased user trust, higher data quality, competitive advantages, and others.

The integration of privacy is not easy but feasible if done in a systematic fashion, throughout the whole life cycle and when considering the right tools.

Privacy threats are diverse and detecting them requires systematic processes and experience.

In the plan and design phases existing privacy-enhancing technologies need to be selected carefully, considering their impacts on other qualities. Design candidates need to be evaluated and compared regarding privacy properties.
During the build and stabilization phases, privacy threat modeling is a central activity to make sure that privacy risks can be identified and mitigated as early as possible.
Finally, during operations, flows of (personal) data should be tracked around the system. Collecting this data also allows us to measure indicators for privacy issues in the system (using privacy smells). Finally, all this information can be used in a continuous risk assessment process that assures privacy-awareness throughout the complete life cycle.

At Fraunhofer AISEC, we develop tools and methods that support organizations in achieving this systematic privacy engineering – be it in the design, implementation, or operations phase. Contact us if you need support in the engineering of privacy in your organizational processes or contexts.

7. Bibliography

[1] Kunz, I. & Binder, A. (2022, May). Application-Oriented Selection of Privacy Enhancing Technologies. In Annual Privacy Forum (pp. 75-87). Cham: Springer International Publishing.

[2] Kunz, I., Banse, C., & Stephanow, P. (2020). Selecting privacy enhancing technologies for IoT-based services. In Security and Privacy in Communication Networks: 16th EAI International Conference, SecureComm 2020, Washington, DC, USA, October 21-23, 2020, Proceedings, Part II 16 (pp. 455-474). Springer International Publishing.

[3] Kunz, I. & Xu, S. (2023, July). Privacy as an Architectural Quality: A Definition and an Architectural View. In 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) (pp. 125-132). IEEE.

[4] Banse, C., Kunz, I., Schneider, A., & Weiss, K. (2021, September). Cloud property graph: Connecting cloud security assessments with static code analysis. In 2021 IEEE 14th International Conference on Cloud Computing (CLOUD) (pp. 13-19). IEEE.

[5] Kunz, I., Weiss, K., Schneider, A., & Banse, C. (2023). Privacy Property Graph: Towards Automated Privacy Threat Modeling via Static Graph-based Analysis. Proceedings on Privacy Enhancing Technologies, 2, 171-187.

[6] Sion, L., Van Landuyt, D., & Joosen, W. (2020, September). The never-ending story: On the need for continuous privacy impact assessment. In 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) (pp. 314-317). IEEE.

[7] Kunz, I., Casola, V., Schneider, A., Banse, C., & Schütte, J. (2020, October). Towards tracking data flows in cloud architectures. In 2020 IEEE 13th International Conference on Cloud Computing (CLOUD) (pp. 445-452). IEEE.

[8] Kunz, I., Schneider, A., & Banse, C. (2020, December). Privacy smells: Detecting privacy problems in cloud architectures. In 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (pp. 1324-1331). IEEE.

[9] Hoepman, J. H. (2014, June). Privacy design strategies. In IFIP International Information Security Conference (pp. 446-459). Berlin, Heidelberg: Springer Berlin Heidelberg.

[10] Kunz, I., Schneider, A., & Banse, C. (2022, May). A Continuous Risk Assessment Methodology for Cloud Infrastructures. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (pp. 1042-1051). IEEE.

Author

Immanuel Kunz

Immanuel Kunz is a researcher in the field of security and privacy engineering with a focus on cloud environments. He has been working at Fraunhofer AISEC since 2019 in various industry and research projects, like EU-SEC and MEDINA, and he is part of the Applied Privacy Technologies research group.

Contact: immanuel.kunz@aisec.fraunhofer.de

Privacy By Design: Integrating Privacy into the Software Development Life Cycle

Table of Contents

1. Introduction – Privacy Engineering

2. The Privacy-Aware Development Life Cycle

2.1. Privacy Threat Modeling with LINDDUN

3. Privacy Engineering in the Envision, Plan and Design Phases

3.1. Plan and Design: Systematic technology selection

3.2. Plan and Design: Architecture Evaluation

4. Privacy Engineering in the Build and Stabilize Phases

5. Privacy Engineering in the Support and Retire Phases

6. Conclusions

7. Bibliography

Author

Immanuel Kunz

Most Popular

Using Prototypes for Private Machine Learning

Multi-Party Computation in the Head – an Introduction

How to build suitable datasets for successful detection of audio deepfakes

Never want to miss a post?

Leave a ReplyCancel Reply

Other Articles

Using Prototypes for Private Machine Learning

Multi-Party Computation in the Head – an Introduction

How to build suitable datasets for successful detection of audio deepfakes

Parsing X.509 Certificates: How Secure Are TLS Libraries?