The Wget command line utility is used to download files from the internet. It is commonly used in shell scripts and comes installed by default on many flavors of Linux. Wget is capable of downloading over https and is HTTP Strict Transport Security (HSTS) compliant. This last point is interesting from a forensics standpoint because a cache is maintained in a plaintext file. This file provides the hostnames, configured to support HSTS, which were accessed by the Wget utility along with timestamps of the latest activity. In this post, I will examine the forensic potential of the wget-hsts file and show methods to use this artifact for triage during an investigation.
On Linux, the default location is a hidden (dot) file, within the user’s home directory:
The file’s location is configurable with the command line flag:
Using a Ubuntu 18.04 VM as a test platform, I observed the following file structure.
Comments / header
The lines beginning with a ‘#’ character are comments and are ignored by Wget. They show the following information:
- A header showing version information.
- A warning against manually editing the file.
- A comment showing the fields for each entry.
The remaining three lines are the hosts I visited to demonstrate how the file works.
- Hostname: The HSTS enabled hostname connected to by Wget.
- Port: Used for a non-default port, in which case both hostname and port will be checked by Wget. The man page identifies this feature as only for testing / development purposes.
- Incl. subdomains: Whether subdomains are included in the HSTS policy
- Created: A creation timestamp in epoch time. This shows when the host was last connected to by Wget.
- Max-age: The duration the HSTS policy is valid for.
From a DFIR point of view we only really care about the created time and the hostname because we are not concerned with the HSTS configuration on a web server but would like to know what hosts were accessed and when.
Artifact analysis / tools
During this investigation, I wrote the following Perl one-liner to rapidly triage a system, provided the file is present in the default location. It will show the information, most useful to an investigator: The timestamp (in UTC) of the most recent interaction with a host and the host. Optionally, the output can be piped to sort so that the activity can be viewed in chronological order.
This will display the following, when run on my Ubuntu test VM.
Forensic HSTS analyzer
While I was checking for existing research in this space, I came across a project on GitHub which can be used to analyze HSTS caches on all major browsers in addition to command line utilities such as Wget and cURL. I have not looked more closely at this project but it may be a good place to start for further reading and checking whether it meets your analysis needs.
Obviously, the file will only show the hosts that are configured with HSTS. Unencrypted downloads and hosts that do not return an HSTS header will not appear in the file. This is not a Wget history file.
The file is a plaintext file and is subject to tampering and anti-forensics techniques, such as the removal of specific threat actor activity or the deletion of the file altogether.
As documented in the man page, the command line flag
--no-hsts disables HSTS support and makes Wget act as a non-HSTS-compliant UA. In this case, Wget will ignore the HSTS headers and will not generate the wget-hsts file, if it doesn’t already exist and will ignore any existing version of this file.
During testing, I found that in CentOS 7, Wget is not installed by default. Furthermore, the version in the default repo is, at time of writing, 10 years old and predates the HSTS RFC. As a result, unless your threat actor brings their own – more up-to-date – Wget along with them (or the sysadmin installs one), you’re SOL. Current Debian based distros, such as Debian and Ubuntu ship with far more modern versions of Wget.
The wget-hsts file can show certain activity from the Wget utility. However, this information can be more valuable when coupled with other sources.
- Shell history (bash_history etc.) can be used to see what other commands were used around invocations of Wget.