Dhruv Kumar Sharma | Cybersecurity, Embedded Systems, and Edge AI

DFIR

Problem Architecture Artifacts Models Tech Stack Workflow Roadmap

Discuss

SYSTEM: ACTIVE

NIST SP 800-86

AI-ASSISTEDDIGITALFORENSICS

Volatile MemoryTimeline ReconstructionAnomaly DetectionExplainable AI

DFIR_PORTAL_v3.6

SYS_OK: 100%ALERTS: 0

CPU INTEL32.4%

RAM VOLATILE MATRIX112.5 GB / 512 GB

TRAFFIC VELOCITY4,890 Pkt/s

PHASE 1: ACQUIRING DISK TELEMETRY0%

EVIDENCE PARSED24,500

DIAGNOSTIC_LOGGER

Scroll to explore

The Problem Space

The Crisis in Cybercrime Investigation

Traditional forensic triage cannot cope with the sheer volume of data generated during security breaches. Let's look at the numbers.

Global Cybercrime Loss

The Investigator's Dilemma

01.Thousands of logs are generated every single minute across active compromised hosts.
02.Gigabytes of network telemetry must be captured, parsed, and searched in real time.
03.Volatile RAM artifacts decay quickly or are lost completely during emergency system reboots.
04.Timeline correlation requires connecting subtle indicators across independent, dispersed environments.

Why AI Changes Everything

→Sub-second Detection: Machine learning baselines identify complex outliers in milliseconds instead of days.
→Campaign Mapping: Instantly clusters similar threat indicators using multi-dimensional distance metrics.
→Bayesian Correlating: Computes the mathematically sound likelihood of intrusion, mitigating alert fatigue by 70%.
→Explainable Integrity: Generates structured, court-admissible audit summaries mapping back to raw bytes.

Live Global Security Portal

Active Cyber Threat Index Map

Interactive cyber threat monitoring illustrating real-time malware waves, intrusions, and security telemetry feeds across global networks.

System Architecture

Modular 5-Layer Stack

End-to-end framework layers stacking from raw system volatile hardware collection up to incident reporting.

Forensic Analyst Portal

Flask Dashboard · Kibana Timelines · PDF Exporter

Interactive web user interface displaying parsed cases, dynamic event search parameters, and MITRE ATT&CK mapping reports.

AI Intelligence Engine

Gaussian Model · Bayesian Probability · Shannon Entropy

Runs GMM anomaly scans, Bayesian posterior updates, Shannon entropy byte checks, and Time-Series decompositions.

Correlation & Threat Scoring

Bayesian Threat Confidence Score (TCS) Module

Weights diverse artifacts across hosts and calculates an automated threat score ranking most compromised targets.

Forensic Analysis Modules

Volatility 3 · Autopsy CLI · Scapy PCAP parsing · YARA

Triggers volatility RAM decoders, parses disk registry hives, and parses network stream histories.

Artifact Collection Engine

Logs · Memory dumps · Reg hives · Sysmon · Browser SQLite

Performs parallel automated extraction of all volatile and non-volatile evidence segments across target machines.

Target User Profiles

Forensic Investigators

Automated triage, memory carving, and timeline reconstruction.

Incident Response Teams

Active C2 beacon identification and lateral movement tracking.

SOC Security Analysts

Bayesian-driven telemetry correlation to reduce false positives.

Law Enforcement Units

Cryptographically checked reports aligned with legal admissibility.

Forensic Evidence Gathering

Comprehensive Collection Engine

The framework automates collection and parsing across nine distinct digital forensic artifact segments.

System Logs

Parses Windows Event Logs (.evtx), Linux syslog and auth logs using Python log normalizers and Elasticsearch ingestion pipelines.

Forensic Targets

Brute-force login signatures, privilege escalations, scheduled task creation, process spawning patterns.

Browser Web History

Extracts local browser profiles (Chrome, Firefox, Edge, Safari) using direct SQLite database decoders to reconstruct timeline traces.

Forensic Targets

Attacker reconnaissance history, phishing access vectors, cache structures, Cached Credential SQLite tables.

Windows Registry

Decrypts NTUSER.DAT, SYSTEM, SOFTWARE, SAM hives using Python python-registry modules to find persistent structures.

Forensic Targets

Persistence registries (Run/RunOnce keys), USB connection traces, shellbags, UserAssist executing timestamps.

Memory RAM Dumps

Automates Volatility 3 command plugin analysis to parse raw physical RAM dumps, recovering transient and fileless malware traces.

Forensic Targets

Active process listings (pslist/pstree), network sockets (netscan), injected DLL modules (malfind).

Network Packets (PCAP)

Inspects live packet structures or raw PCAPs utilizing Wireshark/tshark pipelines and Python Scapy decoders.

Forensic Targets

Command & Control beacon timing anomalies, DNS tunneling channels, large outbound exfiltrations.

File System Metadata

Performs file system integrity and MACB metadata scans using Autopsy pipelines and disk writing blockers.

Forensic Targets

Timestomping identification, files generated in %TEMP%/AppData, high-entropy packed directory segments.

Process Execution Traces

Decodes Windows Prefetch (.pf) files, AppCompatCache (Shimcache), Amcache registries, and Linux audit logs.

Forensic Targets

Historical process executions, execution path mismatches, program signatures run prior to automated deletion.

PowerShell & Shell Logs

Parses historical powershell scripts, transcript logs, and Linux bash/zsh shell histories.

Forensic Targets

Base64 encoded arguments, download cradles (IEX/Invoke-WebRequest), LOLBAS executions, mimikatz commands.

USB Device Registers

Queries system setupapi logs, udev properties, and Windows USBSTOR registry structures.

Forensic Targets

Removable drives, mounting serial numbers, timestamps, correlated file modifications in active windows.

AI Processing Core

Machine Learning Intelligence Engine

The framework utilizes six custom analytical methods. Toggle between the underlying LaTeX mathematics and real Python script implementations.

Evidence Evaluation

Bayesian Probability Network

Constructs probabilistic graphical models linking digital evidence — system logs, file modifications, network traffic — to investigation hypotheses. Calculates Likelihood Ratios (LR) quantifying the strength of evidence under prosecution vs. defense hypotheses. Integrates multi-source evidence and updates posterior threat scores in real-time.

BAYES_THEOREM

P(A | B) =

P(B | A) × P(A)P(B)

PERFORMANCE SCORESupports real-time posterior updates across 50+ evidence nodes

VALIDATION BASELINEPublished methodology: DFRWS 2024, IEEE S&P Workshop

Network Anomaly Detection

Gaussian Mixture Models

Models normal network behavior as a mixture of K Gaussian components, each representing a legitimate traffic cluster (DNS queries, HTTP sessions, SSH tunnels). Data points falling into low-probability density regions — unusual packet sizes, abnormal connection intervals, or rogue port usage — are flagged as anomalies.

GAUSSIAN_PDF

f(x) =

1σ √(2π)

e ^{- (x - μ)² / 2σ²}

PERFORMANCE SCOREK=8 components · AUC-ROC: 0.964 on CICIDS2017

VALIDATION BASELINEValidated against UNSW-NB15 and CIC-IDS2017 benchmarks

Behavioral Similarity

Euclidean Distance Metrics

Converts forensic activity records into multidimensional vectors (representing parameters like process count, network connections, file access rate). Matches observed behavior vectors against known attack campaign vectors using Euclidean Distance metrics to identify campaign matches.

EUCLIDEAN_METRIC

d = √ ∑ _i=1..n (x_i - y_i)²

PERFORMANCE SCOREClassifies 14 ATT&CK techniques with <12% distance error

VALIDATION BASELINEFeature set aligned with EMBER malware dataset schema

Threat Classification

Logistic Regression Classifier

Binary classification model extracting features from PE headers (entropy, section count, import table size), API call sequences, and behavioral traces. Outputs calibrated probability scores for malicious classification. Explainable AI weights provide feature-level interpretability.

LOGISTIC_SIGMOID

P(Y = 1) =

11 + e ^{- (b₀ + b₁x₁ + b₂x₂ + ...)}

PERFORMANCE SCORE96.8% accuracy · F1: 0.971 · FPR: 0.023

VALIDATION BASELINEHyperparameter optimized on 250,000 malware specimens

Malware & Ransomware Identification

Shannon Entropy Analysis

Analyzes the statistical randomness of files and memory segments by mapping byte distributions. Encrypted, compressed, or packed malware payloads exhibit high Shannon Entropy, allowing detection of ransomware file actions and packed packers in memory.

SHANNON_ENTROPY

H(X) = - ∑ _{x ∈ X} p(x) log₂ p(x)

PERFORMANCE SCOREShannon Range: 0.0 (structured text) to 8.0 (pure encrypted)

VALIDATION BASELINEResearch: High-entropy detection matches Ransomware within 3 blocks

Timeline Reconstruction

Time-Series Decomposition

Aggregates all timestamps from normalized forensic data logs (logs, system modifications, network PCAPs) and applies additive time-series decomposition to isolate trend, seasonal, and residual components.

TIME_SERIES_DECOMP

X_t = T_t + S_t + R_t

PERFORMANCE SCOREFilters daily noise, leaving raw residual spikes indicating attack windows

VALIDATION BASELINEAligned with DFRWS USA 2025 event reconstruction models

Framework Capabilities

Core System Features

Deep dive into the operational algorithms, scoring criteria, and threat taxonomies.

Bayesian Correlation

Threat Confidence Score (TCS)

A unified anomaly calculation summarizing observed anomalies across hosts using weighted threat probabilities.

0.0 – 0.3 · LOW RISKLEGITIMATE

0.3 – 0.6 · MEDIUM RISKTRIAGE REQUIRED

0.6 – 0.8 · HIGH RISKSENIOR ESCALATE

0.8 – 1.0 · CRITICALINCIDENT RESPONSE

TCS_CALCULUSTCS = ∑ (Evidence_Weight_i × Bayesian_Posterior_i) / Total_Evidence_Count

Security Standard Alignments

MITRE ATT&CK Tactic Detections

How detected system modifications map directly to standard MITRE Enterprise threat techniques.

ATT&CK Tactic	Forensic Detection	Framework Action
Initial Access (TA0001)	Phishing URL found in browser SQLite history	Flag domain + query mail IP
Execution (TA0002)	PowerShell Base64 commands + YARA match	Kill PID + RAM dump Volatility
Persistence (TA0003)	New registry Run/RunOnce keys generated	Registry snapshot restore
Privilege Escalation (TA0004)	LSASS memory dump process patterns	Isolate process + trigger RAM lock
Lateral Movement (TA0008)	Atypical internal SMB/RDP socket sequences	Quarantine local gateway endpoint

Framework Ecosystem

Tools & Technologies

Industry-standard forensic suites integrated seamlessly with modern data engines and AI libraries.

Autopsy

Disk forensics & deleted file recovery

Volatility 3

Memory forensics & RAM extraction

Wireshark

PCAP deep network protocol analyzer

Scapy

Python automated packet parsing

YARA

Malware pattern matching rule engine

Suricata

Real-time network intrusion IDS

Nmap

Active host discovery & service mapping

TensorFlow

LSTM deep learning anomaly detection

Scikit-learn

XGBoost, KNN, Isolation Forest tools

Pandas

Log dataframe Normalizations & analytics

NumPy

Mathematical calculations & entropy scales

Elasticsearch

Multi-source log indexing & fast search

Kibana

Forensic dashboard timeline graphs

SQLite

Case files & Browser DB parses

Docker

Isolated forensic sandbox pipelines

Flask / Django

Forensic Portal REST APIs

Python

Core pipeline execution script engine

Kali Linux

Primary virtual forensic OS suite

7-Phase Stepper Workflow

How an Investigation Works

The lifecycle of a digital forensic analysis mapped out phase-by-phase through our automated pipeline.

PHASE DETAIL MONITOR

Phase 1: Identification

The framework is initialized using case parameters. The incident alert is evaluated (via SIEM logs, firewall events, or manual administrator trigger) to assess the scope of compromised systems, timestamp windows, and initial indicators.

PHASE STATUS: ACTIVE1 / 7 COMPLETED

Academic Context

Research Foundation

The AI-DFIR framework stands on published research, integrating AI tools with strict forensic standards.

ML & NLP Forensic Data

AI in Digital Forensics (2024)

Rashmi Mandayam

Demonstrated that machine learning models and NLP workflows allow security analysts to parse enormous data volumes and compile threat timeline insights rapidly.

Best Paper Framework

LLM-Assisted Forensics (EAI 2025)

ICDF2C Best Paper Award

Proposes structural integration of large language models across 4 strategic stages: evidence discovery, pattern recognition, case evaluation, and court presentation.

SOAR Orchestration

SOAR Incident Automation (2025)

DFIR Automation Review

Concludes that automated threat orchestration methods accelerate breach incident handling and lessen mean-time-to-respond (MTTR) by up to 90%.

Platform Vision

Future Expansion Roadmap

The expansion milestones planned to scale the AI-DFIR framework across automated operations.

Phase 1CURRENT

Enterprise Integrations

SIEM Split Connectors (Splunk, Sentinel)
AWS / GCP cloud log parsers

Phase 2PLANNED

Autonomous Containment

Sub-second network isolation playbooks
Ransomware active behavior kills

Phase 3PLANNED

Deep Memory Automation

Volatility 3 automatic carving loops
Dark Web IOC intelligence enrichment

Phase 4PLANNED

Post-Quantum Forensics

Quantum-resistant evidence hashing
National law enforcement nodes

Outcomes & Benchmarks

Traditional vs. AI-DFIR Impact

Quantified expected improvements comparing standard manual forensic methods against automated AI-DFIR pipelines.

Evidence Collection Time[1]95% Faster

Traditional Forensic Triage8 hours

AI-DFIR Orchestrated Pipeline20 mins

Log Review (10k entries)[2]99% Faster

Traditional Forensic Triage12 hours

AI-DFIR Orchestrated Pipeline5 mins

Timeline Reconstruction[3]95% Faster

Traditional Forensic Triage3 days

AI-DFIR Orchestrated Pipeline1 hour

False Positive Alert Rates[1]75% Drop

Traditional Forensic Triage40%

AI-DFIR Orchestrated Pipeline9%

Forensic Report Compiles[2]96% Faster

Traditional Forensic Triage8 hours

AI-DFIR Orchestrated Pipeline15 mins

References & Academic Sources

1. Rashmi Mandayam (2024) — 'AI in Digital Forensics: Machine Learning and NLP for Forensic Data Analysis.' Published findings detailing accelerated evidence workflows.2. EAI ICDF2C 2025 Best Paper Award — 'LLM-Assisted Digital Forensics Framework.' Structural integrations at 4 core timeline layers.3. DFRWS USA 2025 — 'SoK: Timeline-based Event Reconstruction for Digital Forensics.' Forensic Science International: Digital Investigation.4. FBI Internet Crime Complaint Center (IC3) 2024 Annual Cybercrime Reports.5. IBM Security Cost of a Data Breach Report 2025 edition.6. World Economic Forum — Global Cybersecurity Outlook 2024 reports.7. NIST SP 800-86 — Guide to Integrating Forensic Techniques into Incident Response.

Interested in this
research?

Let's discuss the forensic framework, the ML models behind it, or how AI-driven investigation can be applied to your DFIR workflow. Open to research collaborations, speaking engagements, and consulting.

Discuss this project Back to all projects

AI-ASSISTEDDIGITALFORENSICS

The Crisis in Cybercrime Investigation

0

0

0

0

The Investigator's Dilemma

Why AI Changes Everything

Active Cyber Threat Index Map

Modular 5-Layer Stack

Forensic Analyst Portal

AI Intelligence Engine

Correlation & Threat Scoring

Forensic Analysis Modules

Artifact Collection Engine

Target User Profiles

Forensic Investigators

Incident Response Teams

SOC Security Analysts

Law Enforcement Units

Comprehensive Collection Engine

System Logs

Browser Web History

Windows Registry

Memory RAM Dumps

Network Packets (PCAP)

File System Metadata

Process Execution Traces

PowerShell & Shell Logs

USB Device Registers

Machine Learning Intelligence Engine

Bayesian Probability Network

Gaussian Mixture Models

Euclidean Distance Metrics

Logistic Regression Classifier

Shannon Entropy Analysis

Time-Series Decomposition

Core System Features

Threat Confidence Score (TCS)

MITRE ATT&CK Tactic Detections

Tools & Technologies

Autopsy

Volatility 3

Wireshark

Scapy

YARA

Suricata

Nmap

TensorFlow

Scikit-learn

Pandas

NumPy

Elasticsearch

Kibana

SQLite

Docker

Flask / Django

Python

Kali Linux

How an Investigation Works

Identification

Preservation

Collection

Examination

Analysis

Presentation

Response

Phase 1: Identification

Research Foundation

AI in Digital Forensics (2024)

LLM-Assisted Forensics (EAI 2025)

SOAR Incident Automation (2025)

Future Expansion Roadmap

Enterprise Integrations

Autonomous Containment

Deep Memory Automation

Post-Quantum Forensics

Traditional vs. AI-DFIR Impact

References & Academic Sources

Interested in thisresearch?

Interested in this
research?