How to test centralised logging
This post touches on the inter-relationship between operational monitoring, logging, and file integrity monitoring. These are pooled together within a single post to illustrate these controls working together as a security system. Any information security framework (e.g. NIST CSF, PCI, ISO27000) will require each of these to be in place.
Sometimes we see these controls implemented independently without due consideration to benefits from interdependencies. Rather than looking at these controls in isolation, in what follows, we outline how they can be configured to work together and improve security assurance and visibility.
To illustrate the point, we’re looking at the following:
At the end of each of the controls sections, I’ve included reference to relevant requirements from the PCI DSS and/or NIST CyberSecurity Framework.
The monitoring system is monitoring the availability of business and system services as well as other operational data. The monitoring system should also be used to verify if security controls are enabled and running, based on how they are represented on the monitored system.
For example, depending on how your environment and systems are configured, your monitoring system may be providing visibility and/or assurance that some, if not all, of the following are in place on all systems, and that they are active and running:
- Centralised logging agent
- Host-Based IDS/IPS
- File Integrity Monitoring
Often the above are not monitored, resulting in holes in visibility of the environment. If some antivirus and/or log agents and/or other critical components have stopped working, you lose security controls, and associated protection and visibility, in parts of the environment. Orchestration software that periodically verifies a baseline does not know if the service it has enabled stops after 1 second. Monitoring that these security controls are in place provides some level of assurance.
When performing an assessment, we commonly ask responsible personnel to enumerate technical security controls in place and demonstrate how these are monitored.
In terms of compliance, here is a requirement from the PCI DSS in relation to the monitoring of security controls:
10.8.a Examine documented policies and procedures to verify that processes are defined for the timely detection and reporting of failures of critical security control systems, including but not limited to failure of:
- Physical access controls
- Logical access controls
- Audit logging mechanisms
- Segmentation controls (if used)
File Integrity Monitoring
FIM is a tool used to monitor the integrity of system and application files and configurations. This is different to operational monitoring software which is monitoring the availability of systems and services. FIM can monitor system and application changes.
Imagine an application directory changes because a malicious entity has, through some vulnerability in your application, installed a root shell. If FIM is monitoring the directory for changes, it will generate an alert based on the directory or file modification. If there are files or folders within the directory that do, and should, change frequently, these can be excluded from the monitoring profile. If they change in a predictable manner, such as new images or documents, then this should also be mirrored in the configuration to catch deviations.
We often see FIM software installed with a default configuration. While this may be of some use for monitoring changes to a given platform (Windows, Linux etc), it needs to be configured to monitor business-relevant functions for it to be useful. If it does not monitor custom configuration and parameter files or application changes on an application server, it will provide little value in detecting potential threats.
In terms of industry standards, here is a relevant requirement from the PCI DSS and NIST CSF for FIM:
11.5 Deploy a change-detection mechanism (for example, file-integrity monitoring tools) to alert personnel to unauthorized modification (including changes, additions and deletions) of critical system files, configuration files, or content files; and configure the software to perform critical file comparisons at least weekly.
PR.DS-6: Integrity checking mechanisms are used to verify software, firmware, and information integrity
Centralised logging provides visibility of the environment without needing to log onto various systems to identify, collate, and correlate logs to reconstruct a sequence of events. It provides us a single interface to identify who, or what, did what, and when it happened. Depending on functionality and configuration, the interface can provide dashboards and alerts on various criteria.
The way in which centralised log management is configured may be different per platform. For appliances, there is generally a configuration to enable syslog and configure a destination. For servers there can be agent or agentless configuration. Regardless of the client configuration, the centralised log server and interface should provide a security analyst the appropriate information to facilitate incident detection and investigation.
Rather than detailing how to configure logging, it may be easier to look at this from another perspective and consider how we review the efficacy of logging in place.
When performing an assessment, we often begin by reviewing the network and server infrastructure hardening which requires administrators to logon to those systems and walk us through the settings. This provides us with a series of known events. As an example:
- Alice logged onto firewall X at 13h33 on Monday
- Bob logged onto Windows server Y at 15h02 on Monday
- Charlie logged into Linux server Z at 10h23 on Tuesday
- Charlie failed login to Linux server Z2 at 10h26 on Tuesday
A simple exercise, given these known events is to identify them in the centralised log interface. This is the most basic test we can perform and is a useful first step for validating centralised logging.
If we cannot identify and correlate events we know occurred, there’s little chance of being able to identify or respond to indicators of potential or actual compromise.
To validate processes beyond simple authentication events, we can select some changes from the change control management system and attempt to correlate these with logs. Again, we’re verifying that known events are captured and identifiable within the log system. Here are a couple of examples:
CHG00001 – update ACL on FW01 to accept inbound comms on port 22
It should be possible to identify, in the log system, that a network administrator logged onto the firewall and modified the ruleset. Furthermore, the detail of the modification should be available without an analyst needing to diff current and previous versions of the ACL.
CHG0002 – install monitoring software agent on ADMSVR01
It should be possible to identify in the log system that a system administrator logged onto the server and installed a software package.
CHG0003 – deploy application v1.2.3 to APPSVR01
Depending on the mechanism for deploying application changes, this may not appear as a single log entry. However, this should be identified by File Integrity Monitoring, and should be feeding information into your centralised log solution.
Industry compliance references for logging from PCI DSS and NIST CSF respectively:
Requirement 10: Track and monitor all access to network resources and cardholder data
DE.AE-2: Detected events are analyzed to understand attack targets and methods
DE.AE-3: Event data are collected and correlated from multiple sources and sensors
Application changes and logging
Let’s go back to CHG0003 above - deploy application v1.2.3 to APPSVR01
Often applications are updated through a Continuous Integration/Continuous Deployment (CI/CD) methodology so we don’t necessarily see that Bob logged onto the server and made a change but rather we can see that an application was deployed. To know what changed, we may be able to see a diff provided by the FIM software or we may need to track back through the software development lifecycle to identify the feature requests, code review and the application package which was approved for deployment.
In the Software Development LifeCycle and in environments where we consider Infrastructure as Code, the facility of backwards traceability becomes very important. There is a significant difference in transparency between a log entry which states ‘Alice modified FW01 ACL for IP x.x.x.x to accept traffic on port 22’ and a log which states ‘RC1.2.3 deployed to APPSVR01’. Security controls which produce an audit trail must be in place at each stage of the development lifecycle to enable transparency and traceability.
How can you test and improve your logging?
Identify or create a sequence of events and identify these in the logging system. If they cannot be identified, check configurations and repeat the process until you can identify the full sequence. Then, perform similar tests using a range of systems – network devices, load balancers, server platforms, virtualization hosts, database servers etc.
Create dashboards to easily identify different categories of information which might warrant further investigation – here are some examples:
- Logons using generic accounts – admin, administrator, root, super, superuser, manager, dba, sa etc
- VPN connections
- VPN connection sources
- Two factor authentication success/failure
- Failed logins above a defined threshold – this could be further subdivided by data centre, domain, IP range
- New user accounts, account modifications, deletions
- IDS/IPS alerts above a defined threshold
- Anti-Virus alerts
- Web Application Firewall alerts
- Outbound connections to new domains
The purpose of all the above is not just to have lots of logs but to be able to derive intelligence from the noise and ultimately enable Security Analysts to identify Indicators of Compromise. In the event of a compromise, these logs are also an invaluable resource during a forensic investigation.
Once you’re getting logs from all systems and you’ve built a bunch of dashboards which provide you useful insight to your environment, you can move to the next step in improvement!
Depending on internal capacity and expertise, execute attack playbooks and verify the Tactics, Techniques and Procedures can be identified and alerted. If internal capacity is not available, this could be done with an appropriate service provider. This exercise could include a mixture of known and blind (known to the attacking party only) attack scripts so the defending team can practice security analysis during an attack simulation.
During and following such exercises, the security team can identify blind spots, work to improve logs, custom queries and detections.
Here are 2 exercises you could try yourself, one fairly simple and another with a degree of complexity.
Have Alice do the following:
- Remote logon via VPN using 2 factor authentication
- Logon to jump server
- Logon to web server in DMZ
- Logon to application server in internal VLAN using incorrect password
- Logon to application server in internal VLAN
- Create file in application directory
- Delete file in application directory
- Logoff each system
- Disconnect the VPN
Identify the above sequence of events in your centralised log server (note time synchronisation is important here, especially if operating across multiple time zones using centralised log servers).
Once you can perform the above, try with different variations, such as having Alice create or modify user accounts, firewall rules, install, remove software etc.
If you are a user of a cloud platform you could integrate logs generated by that cloud platform in your centralised logging system. You need to ensure you can correlate cloud platform management events with events on your own instances.
A more complex exercise involves creating specific attack playbooks containing Tactics, Techniques and Procedures used by sophisticated (or not) adversaries in breaches. The most common breach avenue is:
- Email phishing
- Code execution
- Command and Control establishment
- Lateral movement
- Privilege escalation
So a way forward in this scenario would be to try and mimic actions that fall on the above categories.
- Email phishing: Use publicly available tools (e.g. Social Engineering Toolkit -SET) to create weaponised files to send to a controlled recipient within your organisation. What security events - if any - did your email filtering platform/provider log? Is it configured to forward events to your centralised monitoring platform? Were they forwarded to your centralised monitoring?
- Code execution: For all the weaponised files that reached the controlled user, what was the outcome when executed/acted upon? Was it successful? Was it caught by the antivirus? Is this visible within your centralised monitoring platform?
- Command and Control establishment: Use one of the few Command and Control frameworks to initiate an outgoing callback to an internet server you control. Ideally this should be logged in either a proxy, a DNS server, an IPS, a firewall or a combination of them. Since you have control of the callback channel and its parameters, try to identify it within your centralised logging platform.
- Lateral movement: Lateral movement is a bit tricky to simulate without tainting the environment in one way or another.
- As a simple test, one can (ab)use their organisation’s remote management software, Remote Desktop Protocol (RDP), Secure SHell (ssh), etc. to access additional endpoints. Create both successful and failed attempts. For multi-region organisations consider performing cross region tests, i.e. a US based user accessing APAC based resources.
- Introduce an unauthorised remote access management tool/protocol e.g. PsExec, WMI, WinRM. Perform a similar assortment of tests.
All the tests above will likely generate a mixture of log entries depending on the configuration of the endpoints and/or network components. Since you know what you are looking for, try and identify all related logs. Is it possible to correlate all the different log entries in a single high confidence alert?
- Privilege escalation: Privilege escalation is another one that is tricky to emulate without tainting the environment. Simple tests that can be executed are:
- Password brute forcing
- Password spraying
- Successful password spraying (i.e. a password spraying attack with a successful login at the end).
- Successful and unsuccessful login attempts by Domain Administrator accounts on domain controllers.
- Successful and unsuccessful login attempts by Domain Administrator accounts on endpoints.
- Successful and unsuccessful login attempts by local Administrator accounts on endpoints.
- Creation/modification of local and Domain Administrator accounts.
- Privileged group modifications.
- Exfiltration: Use the established C2 channel or a different one to move control files outside of the organisation. If there is a Data Loss Prevention (DLP) solution deployed, use a mixture of monitored and non monitored files and formats. DLP logs excluding the same indicators as in point 3 above should be visible.
The intent of this post has been to illustrate the functions and inter-relationships between monitoring, logging, and FIM; and how these complement each other in an information security management system. Hopefully the exercises provide you guidance on how to improve and self assess these controls in your own environment. Given the changing nature of any environment and the threat landscape, appropriate and adequate security is always a moving target so these exercises should be performed and reviewed regularly. As per previous posts, you can’t detect and respond to threats if you’re not capturing the right information.
To learn more about security practices, as well as how we can help secure your website, get in touch with us at firstname.lastname@example.org.