If you need to share sensitive data such as PII with a third party, you need to change or hide portions of that data. For example, you might need to share data with a third-party analysis company or auditor without sharing customers' credit card details.
About this Task
You want to mask sensitive fields in a collection of credit card payments. In this example, you need to make the following changes to the payment document fields:
Partially obfuscate the card holder's name.
Obfuscate the first 12 digits of the card's number, and show only the final four digits.
Adjust the card's expiry date by adding or subtracting a random amount of time (up to 30 days).
Replace the card's three-digit security code with three random digits.
If the embedded
customer_info
sub-document's category field is set toRESTRICTED
, exclude thecustomer_info
sub-document from the results.
Steps
Create a collection that enforces schema validation
Create a payments
collection with the following schema
validation rules:
The
customer_info.category
field value must be eitherNORMAL
orRESTRICTED
.The
transaction_date
field value must have a valid date format.
Write an aggregation pipeline to mask sensitive fields
Write a pipeline to transform the data based on the changes
outlined in the About this Task section. The pipeline writes
the results to the payments_masked
collection.
Query the output collection
Output:
[ { card_name: 'Mx. Xxx Doe', card_num: 'XXXXXXXXXXXX3456', card_expiry: ISODate('2023-09-16T08:10:47.393Z'), card_sec_code: '039', card_type: 'CREDIT', transaction_id: 'eb1bd77836e8713656d9bf2debba8900', transaction_date: ISODate('2021-01-13T09:32:07.000Z'), transaction_amount: Decimal128('501.98'), reported: false }, { card_name: 'Mx. Xxx Smith', card_num: 'XXXXXXXXXXXX7654', card_expiry: ISODate('2022-12-25T06:12:26.288Z'), card_sec_code: '732', card_type: 'DEBIT', transaction_id: '634c416a6fbcf060bb0ba90c4ad94f60', transaction_date: ISODate('2020-11-24T19:25:57.000Z'), transaction_amount: Decimal128('64.01'), reported: true, customer_info: { category: 'NORMAL', rating: 78, risk: 55 } } ]
The output documents are modified based on the requirements for the third party. The following table shows the original values and the corresponding masked values:
Field | Original Value | Masked Value |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Export the output collection
In this scenario, the third-party analysis company needs you to
supply data in CSV format. Use mongoexport
to export the
masked data as a CSV.
Important
Run mongoexport
from your terminal, not the shell or
driver.
Before you run the mongoexport
command, replace the
--out
value with the file location you want to output the
results.
mongoexport --uri=<connection string uri> --db=test --collection=payments_masked \ --fields=card_name,card_num,card_expiry,card_sec_code,card_type,transaction_id,transaction_date,transaction_amount,reported,customer_info.category,customer_info.rating,customer_info.risk \ --type=csv --out=<output-location>
The output file resembles:
card_name,card_num,card_expiry,card_sec_code,card_type,transaction_id,transaction_date,transaction_amount,reported,customer_info.category,customer_info.rating,customer_info.risk Mx. Xxx Doe,XXXXXXXXXXXX3456,2023-08-28T14:18:20.562Z,301,CREDIT,eb1bd77836e8713656d9bf2debba8900,2021-01-13T09:32:07.000Z,501.98,false,,, Mx. Xxx Smith,XXXXXXXXXXXX7654,2023-01-08T21:05:52.706Z,420,DEBIT,634c416a6fbcf060bb0ba90c4ad94f60,2020-11-24T19:25:57.000Z,64.01,true,NORMAL,78,55
Results
The pipeline uses the
$cond
operator to return the$$REMOVE
variable if thecustomer_info.category
field equalsRESTRICTED
. When this happens, the aggregation engine excludes the entirecustomer_info
sub-document from the output for the document.To mask the
card_name
field, the pipeline uses a regular expression operator to extract the last word of the field's original value.$regexFind
indicates if the match succeeded and what the matched value is. If$regexFind
finds a match, the operator appends the matched value (the customer's last name) to the stringMx. Xxx
.