Data Classification & Sensitive Information TypesUnderstanding Sensitive Information Types (SITs)

Understanding Sensitive Information Types (SITs)

25 mins

Understanding the Concept

Sensitive Information Types (SITs) are pattern definitions used to identify sensitive data like credit card numbers, social security numbers, or custom organizational data patterns.

Microsoft provides over 300 built-in SITs covering various data types across regions and industries. These include financial data (credit cards, bank accounts), personal identifiers (SSN, passport numbers), and health information.

SITs use a combination of patterns (regex), keywords, checksums, and proximity rules to accurately identify sensitive data while minimizing false positives.

Key Points

  • SITs are the foundation of DLP and auto-labeling
  • 300+ built-in SITs available out of the box
  • Use regex patterns, keywords, and checksums
  • Confidence levels (low, medium, high) indicate match accuracy
  • Custom SITs can be created for organization-specific data

SIT Detection Components

Step 1

Primary Pattern

Regex pattern matching the core data format (e.g., 16-digit number)

Step 2

Supporting Keywords

Corroborating keywords near the pattern (e.g., 'credit card', 'expiry')

Step 3

Checksum Validation

Mathematical validation (e.g., Luhn algorithm for credit cards)

Step 4

Proximity Rules

Keywords must be within specified character distance

Step 5

Confidence Score

Combined factors determine low/medium/high confidence

Why This Matters in Real Organizations

Accurate data classification is the foundation of all compliance activities. If you can't identify sensitive data, you can't protect it. Poor SIT configuration leads to either missed detections (compliance risk) or excessive false positives (user frustration).

Common Mistakes to Avoid

Using only pattern matching without keywords
Not testing SITs before deploying in production
Setting confidence too low, causing false positives
Ignoring regional variations in data formats

Interview Tips

  • Explain the multi-factor approach to SIT detection
  • Discuss confidence levels and their implications
  • Describe how you would tune SITs to reduce false positives

Exam Tips (SC-401)

  • Know the components of a SIT definition
  • Understand confidence level implications for policies
  • Be familiar with common built-in SITs

Course Complete!

You've finished all lessons

Previous|Next|HHome