Splunk in Plain English — A Practical SOC Guide

Imagine you are a detective, and every device on your network — servers, laptops, firewalls, cloud systems — is leaving footprints everywhere. The problem is there are millions of footprints every single day, scattered across thousands of different files. Your job is to find the one set of footprints that does not belong. That is exactly the problem Splunk solves. It is the platform that collects every footprint from every device, puts them in one place, and gives you the tools to find the suspicious ones — fast.

In this blog, I will take you through Splunk from absolute scratch — what it is, how it works under the hood, how to write SPL queries like a pro, how to build dashboards and alerts, how to set up a SOC lab, and most importantly, the interview questions you will definitely face if you are going for a SOC analyst role. I have completed the TryHackMe Advanced Splunk rooms including SPL exploration, SOC lab setup, dashboards and reports, data manipulation, and the Fixit room — this is everything I learned, explained as simply as possible.

1. What is Splunk and Why Does Every SOC Use It?

Before we talk about Splunk specifically, we need to understand what problem it solves. Every device on a network generates logs — authentication events, network connections, process executions, file accesses, errors. A single enterprise network can generate billions of log entries per day. You cannot manually read those. You need a system that collects them all, stores them searchably, and lets you query them in real time. That system is called a SIEM — Security Information and Event Management platform. Splunk is the most widely deployed SIEM in the world.

Think of a SIEM like the control tower at a busy airport. Every aircraft (device), gate (application), and runway (network path) sends its status to the tower constantly. The tower (SIEM) sees everything, correlates events across all of them, and immediately alerts controllers when something is wrong. Without the tower, each aircraft is isolated and a collision could happen undetected.

Splunk specifically does three things: it collects data from any source, indexes it in a compressed and searchable format, and lets you search and visualize it using SPL — Search Processing Language. On top of that, you build dashboards, schedule reports, and configure alerts that fire automatically when suspicious patterns appear.

Splunk Products You Will Encounter in a SOC

Product	What It Does	Common in SOC?
Splunk Enterprise	The core platform — on-premises deployment for collecting, indexing, and searching data	Yes — the most common
Splunk Cloud	The SaaS version of Enterprise — Splunk manages the infrastructure	Yes — growing fast in enterprise
Splunk ES (Enterprise Security)	A SIEM application built on top of Enterprise — adds correlation searches, incident review, risk scoring	Yes — the SOC standard
Splunk SOAR (Phantom)	Security Orchestration, Automation, and Response — automates analyst actions like blocking IPs	Yes — used by Tier 2 and 3
Splunk Free	500 MB/day limit, single user, no alerting — good for home labs and learning	For learning only

Interview Tip: Interviewers love asking "What is the difference between Splunk and Splunk ES?" The answer is: Splunk is the platform. Splunk Enterprise Security is the SIEM application that runs on top of it. ES adds the Notable Events framework, correlation searches, risk-based alerting, and the incident review workflow that SOC analysts use daily. You need Splunk Enterprise to run ES.

2. Splunk Architecture — How the Data Flows

Understanding Splunk's architecture is not just academic — you will be asked about it in interviews, and more importantly, when something breaks in production, knowing the architecture is the only way to diagnose it. There are three core components.

Component 1: Forwarder — The Data Collector

Forwarders are lightweight agents installed on the machines you want to monitor — Windows servers, Linux endpoints, firewalls, anything. Their only job is to collect log data and send it to the indexer. There are two types you need to know:

Universal Forwarder (UF) — A tiny, dedicated agent (~20 MB) that collects and forwards data without doing any processing locally. Uses minimal CPU and RAM. This is what you install on every endpoint in the environment — hundreds or thousands of them. It cannot index data on its own.
Heavy Forwarder (HF) — A full Splunk instance that can parse, filter, mask, and route data before forwarding it. Used at aggregation points — for example, a syslog server receiving logs from 500 network devices would run an HF to normalise and route that data before it reaches the indexer.

Component 2: Indexer — The Storage Engine

The indexer receives raw data from forwarders, breaks it into individual events, extracts timestamps and fields, compresses it, and stores it on disk in time-ordered buckets. Each bucket goes through a lifecycle: Hot (actively being written), Warm (recent, read-only), Cold (older, may move to slower storage), and Frozen (archived or deleted based on your retention policy).

Data is stored inside indexes — logical containers you create (like index=windows or index=firewall). You search by specifying which index to look in. Always scope your searches to a specific index — searching without one scans everything and is extremely slow.

Component 3: Search Head — The Interface You Use

The Search Head is the web interface you log into at http://localhost:8000. It is where you write SPL queries, build dashboards, create alerts, and view results. When you run a search, the Search Head distributes it across all indexers in parallel, collects the results, and presents them to you. In enterprise environments, multiple Search Heads form a cluster for high availability.

Supporting Components

Component	What It Does
Deployment Server	Centrally pushes configurations and Splunk apps to hundreds or thousands of Universal Forwarders — you configure once and it distributes everywhere
Cluster Manager (Master Node)	Manages the indexer cluster — controls replication of data across indexers for high availability and no data loss
License Manager	Controls how much data (in GB per day) your deployment is allowed to index — exceed it and searches get blocked

Interview Tip: "What is the difference between a Universal Forwarder and a Heavy Forwarder?" — Universal Forwarder is lightweight, forwards raw data only, cannot index locally. Heavy Forwarder is a full Splunk binary that can parse, filter, mask sensitive data, and route to multiple destinations before forwarding. UFs go on every endpoint; HFs sit in front of indexers for data normalisation.

3. Splunk UI Walkthrough — Your First Login

When you log in to Splunk for the first time (default URL is http://localhost:8000), the interface can feel overwhelming. Let me walk you through the key areas so you can navigate confidently from day one.

The Navigation Bar

Section	What It Does	Used For
Search & Reporting	The primary search interface where you write SPL	Ad-hoc investigations, threat hunting, testing queries
Dashboards	Visual panels populated by saved searches	SOC monitoring screens, executive reporting
Alerts	Automated searches that trigger actions when conditions are met	Threat detection, automated notifications
Reports	Saved searches that can be scheduled and emailed	Daily/weekly compliance and summary reports
Settings > Indexes	Create and manage data indexes	Admin: set up new log buckets, manage retention
Settings > Data Inputs	Configure how data enters Splunk	Admin: add new log sources, file monitors, syslog
Settings > Lookups	Manage reference tables for data enrichment	Upload threat intel lists, asset tables, user directories

The Search Bar — Four Things to Always Check First

Time Picker (top right) — Controls the time window for your search. Always verify this before running. Leaving it on "All time" on a production system will time out or take minutes.
Search Mode — Fast mode (only event count and fields used in the query), Smart mode (Splunk decides), Verbose mode (all fields). Use Fast for large datasets, Verbose when investigating individual events.
Field Sidebar (left panel) — After running a search, this shows all extracted fields. Click any field to see its top values and add them to your search — great for building queries interactively.
Event Inspector — Click any event row to expand it and see every field Splunk extracted. This is how you discover what field names to use in your queries.

Tip: Before running any search, always specify at minimum an index and a time range. The pattern index=windows earliest=-1h latest=now scopes your search to just the last hour of Windows events — this prevents you from accidentally running a query over all time and all data, which on a real deployment can take many minutes and consume significant resources.

4. SPL Fundamentals — The Language of Splunk

SPL — Search Processing Language — is how you talk to Splunk. If you have used SQL before, SPL will feel familiar in some ways, but there is one critical difference: SPL is a pipeline language. You start with a search that retrieves events, and then you chain commands together using the pipe character |. Each command transforms the data and passes the result to the next one. This is incredibly powerful once it clicks.

Basic Search Syntax

Every SPL search follows the same basic structure:

index=indexname sourcetype=sourcetype keyword | command1 | command2 | command3

The part before the first pipe is the search expression — it retrieves raw events. Everything after the first pipe is the pipeline — it transforms those events into results.

# Simple keyword search
index=main failed

# Field-value search
index=windows EventCode=4625

# Multiple field-value conditions (AND is implied between terms)
index=windows EventCode=4625 host=dc-01

# OR and NOT
index=main (error OR failed OR denied)
index=main login NOT success

# Wildcard — asterisk matches any characters
index=main fail*                    # matches fail, failed, failure
index=firewall src_ip=192.168.*     # any IP starting with 192.168

Comparison Operators

index=web status=404               # equals
index=web bytes>1000000            # greater than
index=web status!=200              # not equal
index=web status>=400              # greater than or equal
index=windows EventCode IN (4624, 4625, 4648)   # multiple values

The Three Default Fields — Always Present on Every Event

Every event in Splunk automatically has these fields — you do not need to extract them, they are always there:

Field	What It Contains	Example Value
`index`	Which index the data lives in	`windows`, `firewall`
`sourcetype`	The format/parser applied to this data — determines which fields get extracted	`WinEventLog:Security`, `cisco:asa`
`source`	The file path or input the data came from	`/var/log/auth.log`
`host`	The device that generated the event	`dc-01`, `web-server-02`
`_time`	The event timestamp as a Unix epoch number	`1748736000`
`_raw`	The original raw log text exactly as it was received	The full unprocessed log line

Time Modifiers — Scoping Your Search Window

You can set the time range in the UI picker or directly in your SPL. Knowing the time syntax is important for alerts and reports where you cannot click a dropdown.

# Relative time — format: [+/-][number][unit]
# Units: s=seconds, m=minutes, h=hours, d=days, w=weeks, mon=months

index=main earliest=-1h latest=now         # last 1 hour
index=main earliest=-24h latest=-1h        # from 24 hours ago to 1 hour ago
index=main earliest=-7d                    # last 7 days

# Snapping with @ — round to the nearest boundary
earliest=-1d@d     # start of yesterday (midnight)
earliest=-1w@w     # start of last week (Monday midnight)
earliest=-1mon@mon # start of last month

# Absolute timestamps
earliest="06/01/2026:00:00:00"

Interview Tip: Interviewers sometimes ask "How do you search only within business hours?" The answer involves time modifiers and the date_hour field: index=windows | where date_hour>=9 AND date_hour<=17 AND date_wday!="saturday" AND date_wday!="sunday". This filters results to events that occurred between 9am–5pm on weekdays, regardless of the time range you searched.

5. Core SPL Commands — The 20 You Will Use 95% of the Time

Splunk has over 140 SPL commands. In a real SOC, you will reach for the same 20 commands repeatedly. I am going to cover every important one with real security examples so that you can immediately apply them.

A. stats — Aggregate and Count Events

The single most important SPL command. stats takes many events and computes statistics — counts, sums, averages, distinct counts — grouped by any field. If you only learn one command deeply, make it this one.

# Count events by a field value
index=windows EventCode=4625 | stats count by src_ip
# Answer: How many failed logins came from each source IP?

# Multiple statistics at once
index=web | stats count, avg(bytes), max(bytes) by src_ip

# Count distinct values (dc = distinct count)
index=windows | stats dc(user) as unique_users by host
# Answer: How many unique users logged into each server?

# List all values of a field per group
index=windows EventCode=4624 | stats values(src_ip) as source_ips by user
# Answer: What IP addresses did each user log in from?

# Earliest and latest event times per group
index=windows | stats earliest(_time) as first_seen, latest(_time) as last_seen by src_ip

stats functions to memorise: count — total events | dc(field) — distinct count | sum(field) — sum of a numeric field | avg/min/max(field) — basic statistics | values(field) — list unique values | list(field) — list all values including duplicates | earliest/latest(_time) — first and last event time.

B. timechart — Plot Events Over Time

timechart is like stats but adds a time dimension. The result is a table where the first column is a time bucket and subsequent columns are your statistics. This is what powers line charts and area charts on dashboards — perfect for spotting anomalous spikes.

# Count events per hour
index=windows EventCode=4625 | timechart span=1h count

# Count per 30-minute window, split by user (top 10 users)
index=windows | timechart span=30m count by user limit=10

# Average response time over 5-minute windows
index=web | timechart span=5m avg(response_time) by uri

C. table — Format Results as a Clean Table

# Show specific fields as a readable table
index=windows EventCode=4625
| table _time, host, user, src_ip, Failure_Reason

D. where — Filter After Aggregation

Use where to filter the results of a previous pipeline stage — especially after stats. It accepts eval-style expressions, null checks, and string functions.

# Find IPs with more than 50 failed logins
index=windows EventCode=4625 | stats count by src_ip | where count > 50

# Filter where a field contains a substring
index=main | where like(user, "%admin%")

# Filter null / not null fields
index=main | where isnull(user)
index=main | where isnotnull(dest_ip)

E. eval — Create or Modify Fields

eval is the most versatile command in SPL. It creates new computed fields or modifies existing ones using expressions, math, string functions, and conditional logic. You use it constantly.

# Basic math
index=web | eval size_kb = bytes / 1024

# Conditional logic with if()
index=windows EventCode=4625 | stats count by src_ip
| eval threat_level = if(count > 100, "HIGH", if(count > 20, "MEDIUM", "LOW"))

# case() — like a switch/case for multiple conditions
| eval event_type = case(
    EventCode=4624, "Successful Login",
    EventCode=4625, "Failed Login",
    EventCode=4648, "Explicit Credential Login",
    1=1, "Other"
  )

# String operations
| eval domain = lower(mvindex(split(user, "@"), 1))

# Format epoch timestamp as readable string
| eval readable_time = strftime(_time, "%Y-%m-%d %H:%M:%S")

# Round a number to 2 decimal places
| eval pct = round(failed / total * 100, 2)

F. rex — Extract Fields With Regular Expressions

When Splunk's automatic field extraction does not get what you need, rex lets you define your own regex with named capture groups. The group names automatically become Splunk field names.

# Extract username and IP from an SSH failure log
index=syslog | rex field=_raw "Failed password for (?<user>\S+) from (?<src_ip>\d+\.\d+\.\d+\.\d+)"

# Extract a port number
index=firewall | rex "dst_port=(?<dest_port>\d+)"

# rex mode=sed — substitute/redact a pattern
index=main | rex mode=sed "s/password=[^ ]*/password=REDACTED/g"

G. dedup — Remove Duplicate Events

# Keep only the first event per unique field value
index=windows | dedup user
index=windows | dedup user, src_ip     # unique combination of both fields
index=main | dedup 3 host              # keep the first 3 events per host

H. sort — Order Your Results

index=main | stats count by src_ip | sort - count   # descending (highest first)
index=main | stats count by src_ip | sort + count   # ascending (lowest first)
index=main | stats count by src_ip | sort - count limit=10  # top 10 only

I. rename — Rename Fields for Readability

index=main | rename src_ip as "Source IP", dest_ip as "Destination IP"
# Useful for dashboard panels and emailed reports where technical field names look bad

J. fields — Include or Exclude Fields

# Keep only these fields (faster queries — Splunk skips extracting the rest)
index=windows | fields host, user, EventCode, src_ip

# Remove fields you do not need
index=windows | fields - _raw, punct

K. head and tail — Limit Results

index=main | head 20     # first 20 events (most recent if sorted by time)
index=main | tail 10     # last 10 events

L. search — Filter in the Middle of a Pipeline

search is like adding a new keyword filter mid-pipeline. Unlike where, it accepts the same syntax as the initial search bar — wildcards, field comparisons, and keyword matching against _raw.

index=main | stats count by user | search count>10
index=windows | table host, user, src_ip | search src_ip="10.0.*"

Interview Tip — Classic Question: "Write a SPL query to detect a brute force attack." The answer: index=windows EventCode=4625 | stats count by src_ip, Account_Name | where count > 10 | sort - count. Walk through it: Event 4625 is failed logon, I group by source IP and targeted username, filter for more than 10 failures, then sort by worst offenders. In production I would add a 5-minute time window and throttle the resulting alert to avoid repeat notifications for the same IP.

6. Data Manipulation — Shaping Raw Logs Into Useful Data

Real-world logs are messy. They contain inconsistent field names, multi-value fields, missing values, custom timestamp formats, and encoded data. Data manipulation commands let you clean, normalise, and reshape that raw data into something that is actually useful for analysis.

transaction — Group Related Events Into Sessions

transaction groups individual events that belong to the same logical sequence into a single result. This is extremely useful for session tracking — grouping all activity by the same user session, or tracking login-to-logout sequences.

# Group all web events by session ID into one row per session
index=web | transaction session_id maxspan=30m

# Track login to logout as one session per user
index=windows EventCode IN (4624, 4634)
| transaction user startswith=(EventCode=4624) endswith=(EventCode=4634) maxspan=8h
| eval session_minutes = round(duration / 60, 1)
# transaction auto-creates: eventcount (how many events), duration (seconds), _time (start time)

Performance Warning: transaction is memory-intensive and slow on large datasets. For simple session tracking, prefer using stats earliest(_time) as login_time, latest(_time) as logout_time by user combined with eval duration = logout_time - login_time — it does the same thing orders of magnitude faster.

mvexpand — Explode Multi-Value Fields

Sometimes a single field contains multiple values — for example, a list of blocked IPs, or multiple group memberships. mvexpand creates one row per value so you can count or filter them individually.

# If a firewall event has multiple blocked IPs in one field
index=firewall | mvexpand blocked_ips
# Now each IP is its own row — you can stats count by blocked_ips

mvcount, mvindex, mvfilter — Work With Multi-Value Fields

# Count how many values are in a multi-value field
index=main | eval ip_count = mvcount(src_ip)

# Get only the first value (index 0)
index=main | eval first_ip = mvindex(src_ip, 0)

# Filter a multi-value field to only matching values
index=main | eval internal_ips = mvfilter(match(src_ip, "^10\."))

fillnull — Handle Missing Values

# Replace null values with a default string
index=main | fillnull value="unknown" user, src_ip

# Fill all null numeric fields with 0
index=main | fillnull value=0

streamstats — Running Statistics as Events Flow

streamstats is like stats but it computes a running calculation for each event in sequence — the result is added as a new field on every row rather than collapsing events into a summary table.

# Calculate time between consecutive events from the same source IP
index=firewall | sort src_ip, _time
| streamstats last(_time) as prev_time by src_ip
| eval interval_seconds = _time - prev_time

Complete Data Manipulation Example — Classifying Failed Logins

# Full pipeline: extract, classify, rank, and label threat level
index=windows EventCode=4625
| rex "Account Name:\s+(?<target_user>\S+)"
| rex "Source Network Address:\s+(?<src_ip>[\d\.]+)"
| eval targeting_admin = if(match(target_user, "(?i)admin"), "YES", "NO")
| stats count as attempts, dc(target_user) as accounts_targeted, values(target_user) as accounts by src_ip, targeting_admin
| where attempts > 5
| eval risk = case(
    targeting_admin="YES" AND attempts>20, "CRITICAL",
    attempts>50, "HIGH",
    1=1, "MEDIUM"
  )
| sort - attempts
| table src_ip, attempts, accounts_targeted, accounts, targeting_admin, risk

7. Lookups, Macros, and Field Extractions

These three features transform Splunk from a log aggregator into a true threat intelligence platform. Lookups enrich your data with external context, macros make your searches reusable, and field extractions teach Splunk how to understand custom log formats.

Lookups — Enriching Events With External Data

A lookup joins your search results with an external reference table — typically a CSV file. This is how you add threat intel context (is this IP known malicious?), asset data (which team owns this server?), or user directory information (is this a privileged account?) to raw log events.

Step 1: Create your reference CSV. For example, a threat intelligence file:

ip,reputation,country,threat_type
185.220.101.5,malicious,RU,Tor exit node
45.142.212.100,malicious,DE,C2 server
198.51.100.0,suspicious,CN,Scanner

Step 2: Upload it at Settings → Lookups → Lookup table files, then define it at Settings → Lookups → Lookup definitions.

Step 3: Use it in SPL:

# Enrich firewall events with threat intel
index=firewall
| lookup threat_intel ip as src_ip OUTPUT reputation, country, threat_type
| where reputation="malicious"
| table _time, src_ip, dest_ip, action, reputation, country, threat_type

# inputlookup — search the lookup table itself like a regular index
| inputlookup threat_intel.csv | where threat_type="C2 server"

# outputlookup — write your search results to a CSV file
index=windows EventCode=4625 | stats count by src_ip | outputlookup brute_force_ips.csv

Tip: In Settings → Lookups → Automatic Lookups, you can configure Splunk to run a lookup automatically every time a specific sourcetype is searched — no need to add | lookup to every query. This is how enterprise deployments enrich events transparently without making analysts remember to add enrichment commands.

Macros — Reusable SPL Snippets

A macro is a saved SPL string that you can call by name in any search using backtick syntax. Think of it like a function in programming — define complex logic once, reuse it everywhere, and update it in one place when something changes.

# Define a macro called "brute_force_check" that expands to "count > 50"
# Set up at Settings → Advanced Search → Search Macros

# Use it with backtick syntax — double backticks surround the name:
index=windows EventCode=4625 | stats count by src_ip | where `brute_force_check`

# Macros with arguments — define "high_count(1)" where $threshold$ is the argument
# Macro body: count > $threshold$
index=windows | stats count by src_ip | where `high_count(100)`

Field Extractions — Teaching Splunk Your Log Format

For standard formats (JSON, key=value, CEF, LEEF, Windows Event Logs, syslog), Splunk extracts fields automatically. For custom application logs, you need to teach it.

Using the Field Extractor (easiest way): Run a search, click on any event, then click "Extract New Fields." Highlight the value you want to extract and name the field — Splunk generates the regex for you automatically and saves it as a permanent extraction.

Using transforms.conf (production-grade):

# transforms.conf — define the regex extraction
[extract_ssh_failure]
REGEX = Failed password for (?<ssh_user>\S+) from (?<ssh_src_ip>[\d\.]+) port (?<ssh_port>\d+)
SOURCE_KEY = _raw

# props.conf — apply it to the right sourcetype
[linux_secure]
REPORT-ssh = extract_ssh_failure

Subsearches — Nested Queries

A subsearch runs inside square brackets, executes first, and feeds its results as a filter into the outer search. This lets you answer questions like "show me logins from IPs that also appeared in our threat feed today."

# Show logins from IPs that were also detected port-scanning us
index=windows EventCode=4624
    [search index=nids alert.category="Port Scan" | return 100 src_ip]

# The subsearch runs first, returns up to 100 src_ip values
# The outer search then filters 4624 events to only those source IPs

Subsearch Limitations: By default, subsearches return a maximum of 10,000 results and timeout after 60 seconds. For large threat intelligence comparisons (millions of IOCs), use lookups instead — no row limit, much faster execution, and no timeout risk.

8. Dashboards and Reports — Visualising the Threat Landscape

A well-built SOC dashboard is what lets an analyst understand the current threat posture in under 30 seconds — without running a single search manually. Dashboards are collections of panels, each powered by a saved SPL search that runs on a refresh schedule.

Building a SOC Dashboard — Step by Step

In Splunk, go to Dashboards → Create New Dashboard. Give it a name, choose Classic Dashboard (XML-based, more widely documented) or Dashboard Studio (modern, JSON-based, drag-and-drop). Then add panels.

Every good SOC overview dashboard needs these panels at minimum:

Event volume over time — a line or area chart using timechart to show total events per hour. Spikes are your first indicator something unusual happened.
Failed login count — a single-value panel showing total 4625 events in the last hour. Configure colour thresholds: green under 20, yellow 20–100, red above 100.
Top 10 source IPs generating blocked traffic — a bar chart of external IPs hitting your firewall deny rules.
High-severity alert table — the last 50 high or critical events with time, host, source IP, and description.
Alert breakdown by category — a pie or bar chart showing which alert types are firing most.

SPL for Common Dashboard Panels

# Panel 1: Event volume per hour (last 24 hours)
index=main earliest=-24h | timechart span=1h count by index

# Panel 2: Failed login count — single value
index=windows EventCode=4625 earliest=-1h | stats count

# Panel 3: Top 10 denied source IPs
index=firewall action=deny | stats count by src_ip | sort - count | head 10

# Panel 4: Recent high-severity events table
index=main severity=high earliest=-4h
| table _time, host, src_ip, dest_ip, signature, severity
| sort - _time

# Panel 5: Alert category breakdown
index=nids | stats count by alert_category | sort - count

Input Tokens — Making Dashboards Interactive

Tokens are variables in a dashboard that users can change via dropdowns or text boxes, dynamically updating all panels. For example, a host selector dropdown where choosing a different machine re-runs all searches filtered to that host.

# In dashboard XML, a time input creates a $time_tok$ token
# Then reference it in panel searches:
index=windows EventCode=4625 $time_tok$ | stats count by src_ip

Reports — Scheduled and Emailed Searches

Reports are saved searches that run automatically on a schedule and optionally email results to a distribution list. This is how you deliver daily security summaries to management without manual effort.

Run your search → Save As → Report → give it a name
Edit the report → Schedule → Enable scheduling → set cron (e.g., 0 8 * * 1-5 = 8am Monday–Friday)
Under Actions → Send email → add recipients, choose format (CSV, PDF, inline HTML)
For expensive reports run frequently, enable Report Acceleration — Splunk pre-computes results using indexed summaries, dramatically reducing search time

Interview Tip: "How would you set up a daily failed login report for the security manager?" Answer: Save the query index=windows EventCode=4625 earliest=-24h | stats count by src_ip, Account_Name | sort -count as a report. Schedule it to run at 7am daily. Enable PDF generation. Add the security manager's email to the delivery list. If the dataset is large, enable report acceleration. This shows you understand the full workflow — not just querying, but operationalising it.

9. Alerts — Making Splunk Your Automated Watchdog

Alerts are scheduled searches that automatically take action when a condition is met. This is the core of how a SIEM does threat detection — instead of analysts watching dashboards 24/7, Splunk watches for you and fires notifications when something needs attention.

Alert Types

Type	When It Runs	Best For
Scheduled	On a cron schedule you define (e.g., every 5 minutes)	Most SOC detections — reliable and controllable
Real-time	Continuously, as each event arrives	Immediate detection for critical events — use sparingly, high system load
Rolling window	Real-time but evaluated over a sliding time window	"More than X events in the last Y minutes" type conditions

Building a Brute Force Alert — Full Configuration

# Alert query: fire when any IP generates 10+ failed logins in 5 minutes
index=windows EventCode=4625
| stats count by src_ip, Account_Name
| where count >= 10

# Alert settings:
# Schedule: every 5 minutes
# Trigger when: Number of Results is greater than 0
# Throttle: suppress for 60 minutes per src_ip (prevents 100 duplicate alerts from one attacker)
# Actions: Send email to soc-team@company.com + Create Notable Event in Splunk ES

Alert Actions

Email — Send alert details to the SOC team or a ticket queue
Webhook — POST a JSON payload to Slack, PagerDuty, Jira, or your SOAR platform for automated response
Run a script — Execute a Python or bash script to block an IP, create an incident ticket, or isolate an endpoint
Create Notable Event (ES) — Generates an incident in Splunk ES's Incident Review workflow for SOC triage
Add risk score — Contribute risk points to a user or host in Splunk ES's Risk-Based Alerting framework

SOC Alert Reference — Build These From Day One

Alert Name	Detection Logic	Severity
Brute Force Login	EventCode=4625, count > 10 by src_ip in 5 minutes	🔴 High
Password Spray	EventCode=4625, many unique users targeted from one IP, low count per user	🔴 High
Account Lockout Storm	EventCode=4740, count > 5 unique accounts in 10 minutes	🔴 High
New Admin Account Created	EventCode=4720 AND EventCode=4728 for Administrators group	🔴 High
Security Log Cleared	EventCode=1102	🚨 Critical
Audit Policy Disabled	EventCode=4719	🚨 Critical
Suspicious Service Installed	EventCode=7045, service path contains %TEMP% or %APPDATA%	🔴 High
Large Data Exfiltration	Sum of bytes_out > 500 MB to external IP from single host	🔴 High
Encoded PowerShell Execution	Sysmon Event 1, CommandLine contains -enc or -EncodedCommand	🔴 High

Tip — Throttling: Always configure throttle on your alerts. Without throttling, a single attacker generating 10,000 failed logins will trigger 10,000 email alerts — which is itself a denial-of-service against your SOC. Set throttle field to src_ip and suppression period to 60 minutes so the first alert fires and re-alerts are suppressed until the incident is resolved.

10. SOC Lab Setup — Ingesting Real Logs

Setting up your own Splunk SOC lab is the most effective way to cement everything you have learned. Here is the exact process I followed, from installing Splunk to having real Windows and Linux events flowing in with all fields properly extracted.

Lab Architecture

You need at minimum: one Splunk instance (your "server") and one or two virtual machines to monitor. A typical home lab setup:

Splunk Free installed on a Linux VM or your host machine — access at http://localhost:8000
Windows VM with Universal Forwarder + Sysmon installed — sends Windows Event Logs and Sysmon telemetry
Linux VM with Universal Forwarder — sends auth.log, syslog, and application logs

Step 1: Install Splunk

# Linux (RPM-based):
sudo rpm -i splunk-9.x.x-linux-2.6-x86_64.rpm
sudo /opt/splunk/bin/splunk start --accept-license

# Linux (tar.gz):
tar -xvf splunk-9.x.x-Linux-x86_64.tgz -C /opt
/opt/splunk/bin/splunk start --accept-license

# Set it to start on boot:
/opt/splunk/bin/splunk enable boot-start

# Access at http://localhost:8000 — default user: admin

Step 2: Install Sysmon on Your Windows VM

Sysmon (System Monitor by Microsoft Sysinternals) is a free tool that dramatically enriches Windows event logging. Without Sysmon, Windows gives you about 15 useful event types. With Sysmon, you get 30+ additional event types including process creation with full hashes, network connections, DLL loads, file creation, and DNS queries — all the telemetry you need for real threat detection.

# Download Sysmon from Microsoft Sysinternals
# Download SwiftOnSecurity's sysmonconfig.xml (best community configuration)
# Install:
sysmon64.exe -accepteula -i sysmonconfig.xml

# Key Sysmon Event IDs:
# Event 1  - Process Create (with command line and file hash)
# Event 3  - Network Connection (outbound from a process)
# Event 7  - Image Loaded (DLL loaded by a process)
# Event 8  - CreateRemoteThread (classic process injection indicator)
# Event 11 - File Created
# Event 22 - DNS Query (what domains is this process resolving?)

Step 3: Install and Configure Universal Forwarder

# Windows — silent install with basic configuration
msiexec.exe /i splunkforwarder-9.x.x-x64-release.msi ^
    RECEIVING_INDEXER="your-splunk-ip:9997" ^
    WINEVENTLOG_SEC_ENABLE=1 ^
    WINEVENTLOG_SYS_ENABLE=1 ^
    /quiet

# After install, edit %SPLUNK_HOME%\etc\system\local\inputs.conf
# to add Sysmon:
[WinEventLog://Microsoft-Windows-Sysmon/Operational]
disabled = 0
index = sysmon

# Linux — edit /opt/splunkforwarder/etc/system/local/inputs.conf
[monitor:///var/log/auth.log]
index = linux_auth
sourcetype = linux_secure

[monitor:///var/log/syslog]
index = linux_syslog

Step 4: Open the Receiving Port on Splunk

# In the Splunk UI: Settings → Forwarding and receiving → Configure receiving → New → Port 9997
# Or via CLI:
/opt/splunk/bin/splunk enable listen 9997 -auth admin:yourpassword

Step 5: Create Indexes for Each Log Type

# In UI: Settings → Indexes → New Index
# Or via CLI:
/opt/splunk/bin/splunk add index windows
/opt/splunk/bin/splunk add index sysmon
/opt/splunk/bin/splunk add index linux_auth
/opt/splunk/bin/splunk add index firewall

Step 6: Verify Data Is Flowing

# Check which indexes have data and how much
| tstats count where index=* by index, sourcetype

# Check last event time per host — detect dead forwarders
| tstats latest(_time) as last_seen where index=windows by host
| eval last_seen = strftime(last_seen, "%Y-%m-%d %H:%M")
| eval hours_ago = round((now() - last_seen_epoch) / 3600, 1)
| sort + last_seen    # hosts not heard from recently appear first

Step 7: Install These Splunk Apps in Your Lab

App Name	Why You Need It
Sysmon App for Splunk	Pre-built dashboards for all Sysmon event types
TA-Sysmon	Technology Add-on that extracts Sysmon fields correctly
Splunk Security Essentials	Library of 200+ pre-built detections mapped to MITRE ATT&CK
MITRE ATT&CK App for Splunk	Visual matrix showing which techniques your detections cover
Splunk App for Windows Infrastructure	Windows-specific monitoring dashboards

11. Fixit Room — Troubleshooting Splunk Like a Pro

The TryHackMe Fixit room teaches something most tutorials skip: what to do when Splunk is broken. These troubleshooting skills are what separate a junior from a senior Splunk user — and they come up in interviews more often than you would expect.

Problem 1: No Data Appearing in Search

This is the most common problem. Before assuming the data is lost, check these in order:

# 1. Does ANY data exist in ANY index?
| tstats count where index=* by index

# 2. Check internal Splunk logs for errors
index=_internal sourcetype=splunkd ERROR

# 3. Check forwarder connectivity — are hosts sending data?
index=_internal source=*metrics.log group=tcpin_connections
| stats latest(_time) as last_contact by hostname
| eval last_contact = strftime(last_contact, "%Y-%m-%d %H:%M")

# 4. Check time picker — is it set to a time range with no data?
# 5. Check index name — a typo in inputs.conf sends data to the wrong index
# 6. Check sourcetype — wrong sourcetype = wrong parser = events may be malformed

Problem 2: Timestamp is Wrong on Events

# Check splunkd.log for timestamp parsing warnings
index=_internal sourcetype=splunkd (timestamp OR strptime)

# Fix by specifying the format explicitly in props.conf:
[my_custom_sourcetype]
TIME_PREFIX = timestamp=
TIME_FORMAT = %Y-%m-%dT%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 25

Problem 3: Searches Are Very Slow

Always start with index= — without it Splunk scans all indexes
Always add a time constraint — earliest=-24h at minimum
Replace stats with tstats for simple counts on indexed fields — 10–100x faster
Open the Job Inspector after a search (Job → Inspect Job) — it shows which stage is slow and how many events were scanned at each step
Enable Report Acceleration for frequently scheduled reports

# tstats is dramatically faster than stats for volume metrics
# It works on index, sourcetype, host, source, _time only
| tstats count where index=windows sourcetype=WinEventLog:Security by host

Problem 4: License Warning or Exceeded

# Check daily ingestion volume
index=_internal source=*license_usage.log type=Usage
| timechart span=1d sum(b) as bytes_indexed
| eval GB = round(bytes_indexed / 1073741824, 2)

# Find which sourcetype is consuming the most license
index=_internal source=*license_usage.log type=Usage
| stats sum(b) as bytes by st    # st = sourcetype in license log
| sort - bytes
| eval MB = round(bytes / 1048576, 1)

Splunk Internal Indexes — Your Diagnostic Toolkit

Index	Contains	Use It For
`_internal`	Splunk's own operational logs and performance metrics	Errors, license usage, forwarder status, search performance
`_audit`	Splunk user activity — who searched what, who logged in	Compliance auditing, insider threat monitoring of Splunk itself
`_introspection`	Splunk process resource usage — CPU, memory, disk I/O	Performance tuning, capacity planning

Tip — Splunk on Splunk: Build a "Splunk Health" dashboard monitoring index=_internal. Include panels for: daily license usage trend, forwarder last-seen heartbeat, index queue fill percentage (if input queues fill, data gets dropped silently), and search concurrency. Monitoring Splunk itself is a mark of operational maturity.

12. Real-World SOC Threat Hunting Queries

This is where everything comes together. These are production-ready SPL queries for the threat scenarios you will encounter most frequently as a SOC analyst. Each one is something I have either used directly or built based on real attack patterns.

Detecting a Password Spray Attack

Password spray is different from brute force. Brute force = many passwords tried against one account. Password spray = one password tried against many accounts to avoid lockout policies. The SPL pattern to detect it is: one source IP, many unique targeted users, but low attempt count per user.

index=windows EventCode=4625
| stats dc(Account_Name) as targets, count as total_attempts by src_ip
| where targets > 10 AND (total_attempts / targets) < 5
| sort - targets
# Spray pattern: hits many accounts but only 1-4 times each

Detecting Lateral Movement

# Internal workstations connecting to many other internal machines on admin ports
index=firewall action=allow dest_port IN (445, 3389, 5985, 5986, 135, 139)
| where match(src_ip, "^10\.|^172\.16\.|^192\.168\.")
| stats dc(dest_ip) as unique_targets, values(dest_port) as ports_used by src_ip
| where unique_targets > 5
| sort - unique_targets

Detecting C2 Beacon Behaviour (Regular Timing Intervals)

Malware communicating with a command-and-control server beacons at suspiciously regular intervals. Legitimate user traffic is irregular — humans do not click at exactly the same second every 60 seconds. The lower the standard deviation of intervals, the more machine-like the traffic.

index=proxy action=allowed
| sort dest_host, _time
| streamstats last(_time) as prev_time by dest_host
| eval interval = _time - prev_time
| stats count, avg(interval) as avg_interval, stdev(interval) as stdev_interval by dest_host
| where count > 20 AND stdev_interval < 30
| sort + stdev_interval
# Very low stdev = very regular = suspicious beaconing

Detecting Pass-the-Hash

index=windows EventCode=4624
    Logon_Type=3
    Authentication_Package=NTLM
| where NOT match(Account_Name, "\$$")    # exclude machine accounts ending in $
| table _time, ComputerName, Account_Name, Source_Network_Address, Workstation_Name
# PTH: NTLM network logon (type 3) from a workstation is suspicious — legitimate AD environments use Kerberos

Detecting Suspicious PowerShell

index=sysmon EventCode=1
    (Image=*powershell* OR Image=*pwsh*)
    (CommandLine=*-enc* OR CommandLine=*-EncodedCommand* OR CommandLine=*IEX* OR CommandLine=*DownloadString*)
| table _time, host, User, ParentImage, CommandLine
| sort - _time

Detecting Data Exfiltration

index=firewall action=allow
| where NOT match(dest_ip, "^10\.|^172\.16\.|^192\.168\.")    # external destinations only
| stats sum(bytes_out) as total_bytes by src_ip, dest_ip
| eval MB = round(total_bytes / 1048576, 2)
| where MB > 500
| sort - MB

Detecting Kerberoasting

index=windows EventCode=4769
    Ticket_Options=0x40810000
    Ticket_Encryption_Type=0x17    # RC4-HMAC — weak encryption, targeted by Kerberoasting tools
| where NOT match(Service_Name, "\$$")    # exclude machine accounts
| stats count by Account_Name, Client_Address, Service_Name
| where count > 3
| sort - count

13. Interview Q&A — 35 Questions You Will Actually Be Asked

Architecture and Concepts

Q: What is Splunk and what problem does it solve in a SOC?

A: Splunk is a SIEM and data analytics platform. It solves the problem of scale — a SOC cannot manually monitor billions of log events per day across thousands of devices. Splunk collects all that data in one place, indexes it for fast searching, and lets analysts write SPL queries to find malicious patterns. On top of that, automated alerts fire when those patterns are detected, so the SOC is notified without anyone watching a screen 24/7.

Q: Explain Splunk's architecture — what are the three main components?

A: Forwarder, Indexer, and Search Head. Forwarders (Universal Forwarders) are lightweight agents installed on monitored systems — they collect data and ship it to the Indexer. The Indexer parses, compresses, and stores data in time-ordered indexes on disk. The Search Head is the web interface analysts use to write SPL queries, view results, build dashboards, and create alerts. In large deployments these components are separate machines; in a small lab they can all run on one.

Q: What is the difference between a sourcetype and an index?

A: An index is the storage container — the bucket on disk where data lives. A sourcetype is the parser — it tells Splunk what format the data is in and which field extraction rules to apply. For example, index=windows stores all Windows logs, but within that index you might have sourcetype=WinEventLog:Security for Security events and sourcetype=WinEventLog:System for System events — each gets different field extractions applied.

Q: What is Splunk CIM and why is it important?

A: CIM — Common Information Model — is a standardised schema that normalises field names across different data sources. Source IP might be called src, src_ip, SourceAddress, or c-ip depending on the vendor. CIM maps all of these to a single normalised field name so you can write one SPL query that works across Cisco firewall, Palo Alto, and Windows logs simultaneously. CIM is the foundation for Splunk ES correlation searches and makes cross-source threat hunting possible.

Q: What are hot, warm, cold, and frozen buckets?

A: These are the lifecycle stages for indexed data. Hot = actively being written to, most recent data. Warm = closed but searchable on fast storage. Cold = older data, may move to slower/cheaper storage. Frozen = archived or deleted — Splunk removes the search index metadata. You can configure a frozen archive script to copy data to cheap storage before Splunk removes it. Retention policies control when data transitions between stages, which is important for compliance (e.g., keep firewall logs for 1 year).

SPL Questions

Q: What is the pipe character in SPL and how does the pipeline work?

A: The pipe | passes the output of one command as the input to the next — it makes SPL a pipeline language. Each command transforms the dataset. For example: retrieve events with the initial search → | stats to aggregate → | where to filter the aggregated results → | sort to order → | table to format output. This composability means complex analysis is built from simple, chainable steps.

Q: What is the difference between stats and tstats?

A: stats searches through raw event data — flexible, works on any field, but slower on large datasets. tstats uses pre-built tsidx (time-series index) metadata — extremely fast but limited to indexed fields only (index, sourcetype, host, source, _time, and fields from accelerated data models). I use tstats for volume dashboards and forwarder monitoring where I just need counts, and stats for deep event-level analysis.

Q: When would you use transaction vs. stats?

A: I default to stats with earliest()/latest() and eval duration = last - first because it is much faster. I use transaction only when I need to see the full raw text of grouped events together (like viewing an entire HTTP session log), or when I need the eventcount field that transaction automatically provides, or when grouping by complex start/end conditions like "from EventCode=4624 to EventCode=4634".

Q: How do you use eval to handle conditional logic?

A: For binary conditions I use if(condition, value_if_true, value_if_false). For multiple conditions I use case(condition1, val1, condition2, val2, ..., 1=1, default) — the 1=1 at the end acts as a catch-all default (it is always true). For example: | eval risk = case(count>100 AND targeting_admin="YES", "CRITICAL", count>50, "HIGH", count>10, "MEDIUM", 1=1, "LOW").

Q: What is a subsearch and what are its limitations?

A: A subsearch is a Splunk search nested inside square brackets that runs first and feeds its results as a filter to the outer search. Limitation 1: returns maximum 10,000 results by default. Limitation 2: 60-second timeout. Limitation 3: the results are converted to a search filter, so only returned field values are usable, not full events. For large threat intel lists with millions of IOCs, use a lookup table instead — no row limit, no timeout, and orders of magnitude faster.

Q: Write SPL to detect impossible travel — same user logging in from two countries within one hour.

A: index=windows EventCode=4624 | lookup geoip src_ip OUTPUT country | stats values(country) as countries, dc(country) as country_count, earliest(_time) as first_login, latest(_time) as last_login by user | where country_count > 1 AND (last_login - first_login) < 3600 | eval time_diff_min = round((last_login - first_login) / 60, 1) | table user, countries, time_diff_min — this finds users who appear in more than one country within 3600 seconds.

SOC Operations Questions

Q: What is the difference between brute force and password spray — and how do you detect each in Splunk?

A: Brute force = many passwords tried against one account. Detect with: many 4625 events with the same Account_Name. Password spray = one password tried against many accounts to avoid lockout. Detect with: many 4625 events from the same src_ip but many different Account_Names, with low count per account. The SPL key is looking at dc(Account_Name) vs count / dc(Account_Name) ratio — spray shows high unique accounts, low attempts per account.

Q: What is alert fatigue and how does Splunk help solve it?

A: Alert fatigue is when analysts receive so many alerts that they start ignoring them — making even real threats go unnoticed. Splunk ES addresses this with Risk-Based Alerting (RBA): instead of alerting on every individual suspicious event, the system assigns risk scores to users and hosts as suspicious behaviours accumulate. An alert only fires when the cumulative risk score for an entity crosses a defined threshold. This dramatically reduces alert volume while maintaining detection coverage — one alert per compromised user, not one per event.

Q: Walk me through how you would investigate a potential compromised endpoint using Splunk.

A: I start with the initial indicator — confirm it is a true positive by reviewing the raw event. Then I scope the timeline: what did this host do in the 24 hours before and after the indicator? I check process creation (Sysmon Event 1) for suspicious parent-child relationships, network connections (Sysmon Event 3) for unexpected outbound connections, file creation (Sysmon Event 11) for dropped malware, and Windows Event Logs for new accounts or service installations. Then I check for lateral movement — did this host initiate SMB, RDP, or WMI connections to other internal hosts? I document everything in a timeline and escalate with the evidence gathered.

Q: What is Kerberoasting and which Event ID detects it?

A: Kerberoasting is an attack where an authenticated domain user requests Kerberos service tickets for service accounts (which anyone can do in AD), then attempts to crack the tickets offline to obtain the service account's password. The key indicator is Event ID 4769 (Kerberos service ticket requested) where the Ticket Encryption Type is 0x17 (RC4-HMAC). Legitimate modern environments use AES encryption (0x12, 0x11). Multiple 4769 events with RC4 encryption in a short period from one account is a strong Kerberoasting indicator.

Q: How do you detect Pass-the-Hash in Splunk?

A: Pass-the-Hash (PTH) uses an NTLM hash instead of a password to authenticate, bypassing the need to know the plaintext password. In Windows Event Logs, this appears as Event ID 4624 with Logon Type 3 (network) and Authentication Package = NTLM. Legitimate Windows environments increasingly use Kerberos for domain authentication, so NTLM network logons — especially from workstations to other workstations — are suspicious. I would exclude machine accounts (Account_Name ending in $) and look for NTLM Type 3 logons that do not match known service account patterns.

Q: How would you build a Splunk alert that minimises false positives?

A: Five steps: (1) Start with a high threshold and lower it gradually as you understand the environment's baseline. (2) Use a lookup whitelist of known-good IPs, users, or hosts and exclude them with NOT or lookup. (3) Add context with eval — only alert if source is external AND target is admin account AND count exceeds threshold. (4) Configure throttle per the key entity (src_ip, user) so one attacker generates one alert, not thousands. (5) Set a review schedule — check the alert monthly to tune thresholds as the environment changes. False positives come from one of two sources: wrong threshold, or missing whitelist context.

Troubleshooting and Admin Questions

Q: A forwarder stopped sending data three hours ago. How do you diagnose it?

A: Step 1 — confirm in Splunk: | tstats latest(_time) as last_seen where index=windows by host | eval last_seen=strftime(last_seen,"%H:%M") | sort +last_seen — hosts that stopped sending appear at the top. Step 2 — check network: can the forwarder reach port 9997 on the indexer? Step 3 — on the forwarder host, check the Splunk service is running. Step 4 — check $SPLUNK_HOME/var/log/splunk/splunkd.log for connection errors or certificate failures. Step 5 — check disk space on the monitored path and on the forwarder itself — full disks stop data collection. Step 6 — check inputs.conf — did the monitored file path change?

Q: What is props.conf and transforms.conf and how do they work together?

A: props.conf applies settings to sourcetypes, hosts, or sources — it is where you define timestamp format, line breaking rules, character encoding, and REPORT/EXTRACT directives that point to transforms.conf. transforms.conf is where the actual field extraction regexes and lookup definitions live. They work together: props.conf says "for data with sourcetype X, use the extraction rule named Y," and transforms.conf defines what rule Y actually does. Props is the assignment; transforms is the implementation.

Q: What is the Splunk Job Inspector and when do you use it?

A: The Job Inspector is a diagnostic panel accessible after any search via Job → Inspect Job. It breaks down how long each phase of the search took: events scanned, time to retrieve from disk, time spent in each SPL command. I use it when a search runs slower than expected — it immediately shows whether the bottleneck is the initial retrieval (fix: narrow the index and time range) or a specific pipeline command (fix: optimise or replace the slow command). It also shows the result count at each pipeline stage, which is invaluable when results disappear unexpectedly mid-pipeline.

14. Quick Reference Cheat Sheet

=== CORE SPL COMMANDS ===

RETRIEVAL
index=name                          Search a specific index
sourcetype=name                     Filter by data format/parser
host=hostname                       Filter by originating device
earliest=-1h latest=now             Time scoping (always include!)

TRANSFORMING
| stats count by field              Count events per field value
| stats dc(field) as n by grp       Count distinct values
| stats values(field) as f by grp  List unique field values per group
| timechart span=1h count           Event count per time bucket
| timechart span=30m count by user  Trend split by field (top 10 default)

FILTERING
| where count > 50                  Filter on computed/aggregated values
| where isnull(field)               Find events with missing field
| search field=value*               Keyword/wildcard filter mid-pipeline

TRANSFORMATION
| eval newfield = expression        Create or modify a field
| eval x = if(cond, val1, val2)     Conditional field
| eval x = case(c1,v1, c2,v2, 1=1, default)  Multi-condition
| eval t = strftime(_time, "%Y-%m-%d %H:%M")  Format timestamp
| rex "(?<fieldname>pattern)"       Extract field with regex

FORMATTING
| table field1, field2, field3      Clean tabular output
| fields field1, field2            Keep only these fields
| fields - field1                  Remove a field
| rename oldname as "New Name"     Rename field
| sort - count                     Sort descending
| sort + field                     Sort ascending
| head 10 / tail 10                Limit results
| dedup field                      Remove duplicates

ENRICHMENT
| lookup lookupname key_field OUTPUT f1, f2   Enrich with CSV data
| inputlookup lookupname.csv       Search the lookup table itself

=== SPL FOR SOC — READY TO USE ===

# Brute Force Detection
index=windows EventCode=4625 | stats count by src_ip | where count > 10 | sort -count

# Password Spray Detection
index=windows EventCode=4625
| stats dc(Account_Name) as targets, count as attempts by src_ip
| where targets > 10 AND (attempts/targets) < 5

# Admin Account Created
index=windows EventCode=4720 | table _time, host, TargetUserName, SubjectUserName

# Security Log Cleared
index=windows EventCode=1102 | table _time, host, SubjectUserName

# Audit Policy Changed
index=windows EventCode=4719 | table _time, host, SubjectUserName, AuditPolicyChanges

# Encoded PowerShell
index=sysmon EventCode=1 (Image=*powershell* OR Image=*pwsh*)
    (CommandLine=*-enc* OR CommandLine=*IEX* OR CommandLine=*DownloadString*)
| table _time, host, User, ParentImage, CommandLine

# Pass-the-Hash Indicator
index=windows EventCode=4624 Logon_Type=3 Authentication_Package=NTLM
| where NOT match(Account_Name, "\$$")
| table _time, ComputerName, Account_Name, Source_Network_Address

# Kerberoasting Indicator
index=windows EventCode=4769 Ticket_Encryption_Type=0x17
| stats count by Account_Name, Client_Address, Service_Name | where count > 3

# Check License Usage
index=_internal source=*license_usage.log type=Usage
| timechart span=1d sum(b) as bytes | eval GB=round(bytes/1073741824,2)

# Check Forwarder Health
| tstats latest(_time) as last_seen where index=* by host
| eval last_seen=strftime(last_seen,"%Y-%m-%d %H:%M") | sort +last_seen

=== SPLUNK ARCHITECTURE QUICK REFERENCE ===

Universal Forwarder  — lightweight agent on endpoints, forwards only, ~20MB
Heavy Forwarder      — full Splunk binary, can parse/filter/route before forwarding
Indexer              — receives, parses, stores data in time-ordered buckets
Search Head          — web UI for querying, dashboards, alerts
Deployment Server    — centrally pushes configs to hundreds of forwarders
License Manager      — enforces daily ingestion volume limit
Bucket lifecycle     — Hot → Warm → Cold → Frozen (archive/delete)

=== KEY SPLUNK INTERNAL INDEXES ===

index=_internal    — Splunk's own logs, errors, license, metrics
index=_audit       — who searched what, user login activity in Splunk
index=_introspection — Splunk process CPU/memory/disk usage

Final Thoughts

When I started with Splunk, it felt like being handed an incredibly powerful tool with no instruction manual. After completing the TryHackMe Advanced Splunk rooms and spending hours building real queries in my lab, it clicked — Splunk is not complicated, it is just layered. Learn the search syntax, then the core commands, then the transformations, then build something real. Each layer makes the previous one more powerful.

The most important advice I can give: always have a question in mind before you open the search bar. "Which IPs generated the most failed logins today?" "Did any internal host connect to an external IP on port 4444?" "Are there any accounts that logged in outside business hours?" SPL is just the language you use to answer those questions. The sharper your security instinct for what to ask, the more effective your SPL will be.

For interviews: know your architecture cold, be able to write a brute force detection query from memory, understand the difference between tstats and stats, and be able to explain how you would investigate a compromised endpoint step by step. If you have a home lab with real logs flowing, you can talk about it with genuine experience — and that is what gets you hired.

Keep building in your lab. Keep practising on TryHackMe. And the next time an interviewer asks "Have you worked with a SIEM?" — you will have a lot more than a one-word answer.

Cybermini Articles