• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

War Room

Shells from above

RSM logo

  • Home
  • About
  • Blog
  • Talks/Whitepapers
  • Tools
  • Recreation
Home > Forensics > Find Sensitive Data with Bulk Extractor

Find Sensitive Data with Bulk Extractor

June 29, 2015 By Mark Wolters

Bulk Extractor is a great tool for searching a file system for sensitive data. Bulk extractor ignores the file system and scans it linearly. This, in combination with parallel processing, makes the tool very fast. It will have an issue with fragmented files, but typically, files aren’t fragmented.

Follow the directions here  for installation.

 

Using BEViewer, the Bulk Extractor GUI

While you may prefer the command line, in my opinion it is easier to get a base understanding of the tool starting with the GUI. The layout gives you a better idea of default settings and how everything works. Plus it generates the command line so that you can get a feel for the syntax.

Click on the Tools option and then run bulk_extractor like below…

BEViewer
Run bulk_extractor (Ctrl+R)

 

…and you will be presented with a large selection of options!

bulk extractor options
Many options available

 

Image

In the required parameters section, we can see that there are three options for the type of images (ie: E01, raw devices, and specific directories) that can be targeted.

required
Required Parameters Options

Next, select what you would like bulk extractor to search (it changes based on the last choice). The output feature directory is where you want to output the results.

 

Scanners

Scanners are another very important option. When you first open this view some are selected by default while others are not.

These are your default enabled and disabled ones. Most scanners output to files that match their names (e.g. elf scanner will output to elf.txt). Below is a description of the different scanners.

scanners

Enabled:

  • Accts searches for credit card numbers, track data, phone numbers, and other numbers
  • AES finds AES keys
  • Base64 Searches for Base64 encoded text
  • Elf Searches for ELF type files.
  • Email Searches for headers, cookies, hostnames, IPs, emails, and URLs.
  • Exif Finds images and their metadata
  • Find Used for finding specific regular expressions
  • GPS finds Garmin-formatted XML containing GPS coordinates
  • Gzip Finds gzip compressed files
  • Hiberfile Finds the Windows hibernation file
  • Httplogs Finds HTTP log files
  • Json Searches for JSON type files
  • Kml Finds KML type files.
  • Msxml Searches for Microsoft XML Core Services
  • Net Finds packets in memory
  • Pdf Searches for text from PDF files
  • Rar Searches for RAR compressed files
  • Sqlite Finds SQLite3 database files
  • Vcard Finds vCard type files
  • Windirs Searches for Windows directories
  • Winlnk Finds Windows LNK files
  • Winpe Searches for windows executables and dlls.
  • Winprefetch Searches for prefetch files.
  • Zip Searches for ZIP compressed files

Disabled:

  • Base16 will search for hex code
  • Facebook Finds Facebook HTML
  • Outlook Finds Outlook Compressable Encryption
  • Sceadan Stands for Systematic Classification Engine for Advanced Data ANalysis. Unsure what this scanner does.
  • Wordlist Finds words. Potentially useful for passwords
  • Xor Searches for data hidden by XOR encoding

General Options

GeneralOptions
General Options
  • The banner file will put a banner at the beginning of each output file.
  • Alert list will create an alert file for specific terms when found.
  • Stop list specifies a whitelist that will be put into a special file.
  • Regex text and Regex text file will search for specified regular expressions.
  • Random sample will ostensibly take a random sample of the data to search through.

Tuning Parameters

These relate primarily to the how the scanner will perform its scan.

tuning
Tuning Parameters Options
  • Used for specifying the context that scanners will use
  • The page size is how much bulk_extractor will search at each stage (how many bytes at a time it searches).
  • The margin size is to determine how much overlap between each page there is (to avoid missing data).
  • Block size
  • Number of threads is defaulted to the number of processors on the computer and determines how many threads it will use.
  • Maximum recursion depth is how deep it will search through files (for example: zipped files)
  • Wait time is how long bulk extractor will wait for scanners to finish after all the data has been read.

Parallelizing

parallel
Parallelizing Options
  • Start at a specific point
  • Process between two parts in the file/directory
  • Adds a value to the reported offsets

Debugging Options

debugging
Debugging Options
  • Starts at a specific page
  • See the source code for different debugging options
  • Erases output after finishing

Scanner Controls

scannercontrols
Scanner Controls Options
  • Plugin directories specifies a directory for plugins (default /usr/local/lib/bulk_extractor and /usr/lib/bulk_extractor)
  • Use settable options allows you to set options that can be found on page 24 of the user manual.

Command Line

The basic syntax for using bulk_extractor from the command line is as follows:

bulk_extractor -o <out_dir> <image>

Or

bulk_extractor -o <out_dir> -R <dir>

Required Parameters

bulk extractor required options
Running bulk_extractor from Terminal
  • -o <dir> – puts the results in the <dir> directory.
  • -R <dir> – scans a directory recursively.

Scanners

  • -E <scanner> – enables <scanner> and then disables all others.
  • -e <scanner> – enables <scanner> (typically for disabled scanners).
  • -x <scanner> – disables <scanner>.

General Options

  • -b <file> – sets banner file to <file>.
  • -r <file> – sets alert list to <file>.
  • -w <file> – sets stops list to <file>.
  • -f <regex> – searches for <regex>.
  • -F <file> – searches for regex’s in <file>.
  • -W<num1>:<num2> – only extracts words between <num1> and <num2> in length.
  • -s frac[:<num>] – sets random sampling values.

Tuning Parameters

  • -C <num> – sets the context window to <num> (default 16).
  • -S fr:<name>:[window=<num>|window_before=<num>|window_after=<num>] – specifies context window <num> for before, after, or during recorder <name>.
  • -G <num> – sets the page size to <num>.
  • -g <num> – sets the margin to <num>.
  • -j <num> – sets number of threads to <num>.
  • -M <num> – sets max recursion depth to <num>.
  • -m <num> – sets max number of minutes to wait to <num>.

Parallelizing

  • -Y <offset1>[-<offset2>] – starts at <offset1> and goes to <offset2> if specified.
  • -A <num> – adds <num> to the reported offset.

Debugging Options

  • -V – prints the version.
  • -H – prints detailed info on the scanners.
  • -z <num> – starts on a page <num>.
  • -d<num> – uses debug mode <num> (note the lack of space).
  • -Z – deletes the output directory.

Scanner Controls

  • -P <dir> – specifies the plugin directory.
  • -S <option>=<value> – method for setting settable options (e.g. word_min=6 for minimum size of words to report).

 

And that is Bulk Extractor. It’s quick and quite useful. Hopefully you agree! As always, keep hacking.

 

References

http://digitalcorpora.org/downloads/bulk_extractor/BEUsersManual.pdf

https://github.com/simsong/bulk_extractor/wiki/Installing-bulk_extractorhttp://www.getcreditcardnumbers.com/

http://forensicswiki.org/wiki/Bulk_extractor

http://wiki.bitcurator.net/index.php?title=Using_Bulk_Extractor_Viewer_to_Find_Potentially_Sensitive_Information_on_a_Disk_Image

Share this...
  • Reddit
  • email
  • Facebook
  • Twitter
  • Linkedin

Mark Wolters

Primary Sidebar

Categories

  • Defense
  • Forensics
  • Offense
  • Physical
  • R&D

Most Viewed Posts

  • Sophos UTM Home Edition – 3 – The Setup 10.7k views
  • DLL Injection Part 1: SetWindowsHookEx 10.6k views
  • Leveraging MS16-032 with PowerShell Empire 9.9k views
  • Bypassing Gmail’s Malicious Macro Signatures 9.8k views
  • How to Bypass SEP with Admin Access 8.7k views

Footer

  • RSS
  • Twitter
  • Tools
  • About
  • RSM US LLP

+1 800 903 6264

1 S Wacker Dr Suite 800
Chicago, IL 60606

Copyright © 2020 RSM US LLP. All rights reserved. RSM US LLP is a limited liability partnership and the U.S. member firm of RSM International, a global network of independent audit, tax and consulting firms. The member firms of RSM International collaborate to provide services to global clients, but are separate and distinct legal entities that cannot obligate each other. Each member firm is responsible only for its own acts and omissions, and not those of any other party. Visit for more information regarding RSM US LLP and RSM International.