Last year, I worked on a project that required a robust audit log to track user activities as they accessed secure, premium contents. This included capturing information about user devices and approximate locations as part of a fraud detection system.

Most users were in jurisdictions that enforce rigorous privacy laws, such as GDPR and CCPA.

The challenge was to balance tracking user activities with a Privacy by design approach, emphasizing:

Collecting only the necessary personal data (PII), ensuring it's protected and accessible only to those with proper clearance.
Respecting users' rights regarding their personal data, particularly their right to be forgotten.

In this article, I’ll focus on the first point and explore how masking and redaction techniques play a crucial role. The second point will be the focus of a follow-up article.

Different Levels of Masking Based on Data Sensitivity

Keeping personal data safe would be much simpler if it stayed in one place. But as systems grow, this information inevitably spreads across different services, vendors, and storages, each with its own level of security and access requirements.

That’s why limiting access to personal data is essential, following data minimization and least privilege principles. Masking and redaction support this goal by providing flexible ways to restrict sensitive information exposure..

The sensitivity of PII varies between data types, making it logical to define categories so that appropriate masking can be applied to each case and context. For example:

IP Address: an IP address may require complete redaction in certain situations, whereas in others, partially masking it (e.g., hiding the lower bits) may be sufficient.
Email Address: In many cases, it's sufficient to partially mask an email address by hiding the local part (before the "@").

Practically speaking, to achieve this in Go, we can leverage struct tags to identify and categorize sensitive data types at the field-level, allowing for tailored masking rules. Here’s an example of how this might look like:

type User struct {
	ID       string
	Email    string `pii:"data,kind=email"`
	Fullname string `pii:"data"`
}

For now, let’s assume we have a sensitive package that offers a Mask function that receives a pointer to the struct and masks its tagged fields accordingly.

var user := User{
		ID:       "10010",
		Email:    "john.smith@example.com",
		Fullname: "John Smith",
	}

_ = sensitive.Mask(&user)

// Output:
// User{
//   ID: "10010",
//   Email: "**********@example.com",
//   Fullname: "**********",
// }

After we understand the importance of masking and redaction and how they work, it makes sense to discuss scenarios where they can be applied effectively.

System logs vs Audit logs

It’s important to clarify the difference between system logs and audit logs, as they serve distinct purposes:

System Logs: (aka event logs) capture general operational events for applications and infrastructure. They typically track performance metrics, errors, and other issues, supporting system health monitoring, troubleshooting, and performance optimization.
Audit Logs: (aka audit trails) record security and administrative events tied to critical user or system actions. They support regulatory, security, and accountability requirements and are often retained longer than system logs.

These differences influence architectural decisions for log storage and the handling of sensitive data protection.

Masking sensitive data at the point of logging system logs

Given the primary purpose of system logs, it often makes sense to redact or mask sensitive data before logging it. System administrators or developers typically don’t need full access to personal details like a user’s email, name, age, or approximate device location to troubleshoot and debug the system effectively.

Additionally, since system logs are frequently propagated across various storage systems (e.g., file systems and monitoring tools) with differing levels of security and compliance, masking sensitive data reduces the risk of unauthorized exposure.

If it happens that developers do require access to sensitive data, then it’s best to log it in a separate, secure location where access requests and authorizations are better controlled.

A practical example in Go would involve creating a masked copy of the sensitive struct, revealing only what’s needed in the system logging context.

To achieve this, we can use a sensitive.MaskedCopy function (assumed to exist for now), which generates a masked version of a struct, making it compatible with logging libraries:

maskedUser := sensitive.MaskedCopy(user)

logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
logger.Info("structured log", "user", maskedUser)

// Output in structured JSON format:
// {
//     "time": "2024-11-12T10:31:55.621344+01:00",
//     "level": "INFO",
//     "msg": "structured log",
//     "user": {
//         "ID": "10010",
//         "Email": "************@example.com",
//         "Fullname": "************"
//     }
// }

log.Println("standard log", maskedUser)

// Output in standard log format:
// 2024/11/12 10:31:55 standard log {ID:10010 Email:************@example.com Fullname:************}

Under the hood, sensitive.MaskedCopy returns a generic wrapper type that contains the masked copy:

type Masked[T any] struct {
	// value holds the masked copy
	value    T
}

The Masked[T any] type also implements the necessary interfaces, allowing it to integrate seamlessly with logging libraries.

// slog
type LogValuer interface {
	LogValue() Value
}

// fmt
type Stringer interface {
    String() string
}

Disclaimer: While I find this generic approach eloquent, it may not be optimal for performance. You might consider a more explicit, straightforward approach where logging and masking are handled directly at the specific struct type level.

Masking sensitive data at the point of accessing audit logs

Unlike system logs, audit logs often need to contain PII and sensitive data. While this might sound like a bold statement, and one that some developers might disagree with, here’s the reasoning behind it:

As discussed earlier, audit logs serve as proof of specific critical actions within a system, often for regulatory or security purposes. For audit logs to fulfill this role, they must capture a reliable snapshot of the event. If audit logs relied on other sources to complete this picture—sources that might not be as immutable—this could undermine their value as verifiable records.

In contexts like tracking user activities or fraud detection, it’s acceptable for audit logs to contain PII and sensitive data, as long as they are securely stored with strict access controls in place.

While securely storing sensitive data in immutable storage is essential, I'll save an in-depth discussion of how we handle it for a future article.

For now, let’s focus on controlling access to audit logs containing sensitive data. This requires a careful approach to limit exposure based on user roles and responsibilities. For example:

A customer success team member may have access to a user’s DocumentViewed events, where they can see data fields related to document interactions, but not fields related to approximate location (latitude, longitude) or IP address.
Conversely, a fraud detection system may need access to IP address in order to detect suspicious activities.

This calls for an role-based access control (RBAC) policy, but there’s a unique challenge here given the nature of audit logs:

In our case, the audit log is structured as a polymorphic stream of events, with each event type having its own structure. Implementing RBAC attribute filtering at the data layer becomes difficult with this setup, often requiring us to apply it at the application layer instead.

This is where masking and redaction come into play. The application layer loads audit logs and then selectively redacts or partially masks fields based on user roles and contextual information before displaying the data. This approach ensures that only the necessary information is revealed to each role.

In Go, The DocumentViewed event may look like:

type DocumentViewed struct {
	Document  string   // document ID
	Viewer    string   // viewer ID
	Device    Device `pii:"dive"`
	Location  Location `pii:"dive"`
	TimeSpent int64
	At        int64
}

type Location struct {
	Lat      string `pii:"data" json:",omitempty"`
	Lng      string `pii:"data" json:",omitempty"`
	City     string `pii:"data" json:",omitempty"`
	Country  string
}

type Device struct {
	IPAddr    string `pii:"data,kind=ipv4_addr" json:",omitempty"`
	Platform  string
}

Next, we need to modify our sensitive.Mask function to allow passing a closure optionally. This closure will consider the execution context, e.g., the authorized user role, and decide the masking logic to apply for each sensitive struct field value.

// authorized user role
authUser := ctx.Value(AuthUserKey).(AuthUser)

redactFunc := func(fr sensitive.FieldReplace, val string) (string, error) {
		// `FraudDetectionAgent` role has access to all sensitive data.
		if authUser.Role == "FraudDetectionAgent" {
			return val, nil
		}

		// `CustomerSuccessAgent` role doesn't have access to any sensitive data.
		if authUser.Role == "CustomerSuccessAgent" {
			return "", nil
		}

		// Otherwise, we do apply the sensitive data type mask.
		maskFn, ok := mask.Of(fr.Kind)
		if !ok {
			return "", nil
		}
		return maskFn(val)
	}

// var event DocumentViewed
err := sensitive.Mask(&event, func(rc *sensitive.RedactConfig) {
		rc.RedactFunc = redactFunc
})
if err != nil {
		log.Fatal(err)
}

As a result, the event looks as follows depending on the role:

// Case of 'FraudDetectionAgent' role:
{
    "Document": "23080",
    "Viewer": "v540103",
    "Device": {
        "IPAddr": "151.117.33.152",
        "Platform": "Computer"
    },
    "Location": {
        "Lat": "51.5074",
        "Lng": "-0.1278",
        "City": "London",
        "Country": "UK"
    },
    "TimeSpent": 100,
    "At": 1731437023
}

// Case of 'CustomerSuccessAgent' role:
{
    "Document": "23080",
    "Viewer": "v540103",
    "Device": {
        "Platform": "Computer"
    },
    "Location": {
        "Country": "UK"
    },
    "TimeSpent": 100,
    "At": 1731436948
}

A Go Library That Masks Sensitive Data at the Struct-Field-Level

After a few iterations, I refined the masking logic I demonstrated in the previous examples and decided to open-source it as a separate library. The library comes with a set of predefined masks and is easily extendable to support the masking of custom-sensitive data types.

https://github.com/ln80/struct-sensitive

I hope this library will help simplify your GDPR/CCPA compliance efforts while building your Go applications.

Feel free to try it out, contribute.

This article is the first in a series of posts I plan to write to share my experience building a privacy by design application in Go.

Privacy by design in Go: Redaction and Masking

Different Levels of Masking Based on Data Sensitivity

System logs vs Audit logs

Masking sensitive data at the point of logging system logs

Masking sensitive data at the point of accessing audit logs

A Go Library That Masks Sensitive Data at the Struct-Field-Level