Google Dork: Finding the Information You Don’t Know Exists
Reconnaissance. It’s a technique not unknown to most teenagers, and if we’re honest, we’ve all done it ourselves too – Googling the person you just met at the bar, Facebook stalking the new person at work, we all know the drill. This is the age of social media and data breaches, so we all know there’s a ton of our information out there. And what’s true about our own personal information is true about an organization’s information as well.
Reconnaissance of an organization is in fact one of the key elements to a successful penetration test (and attack). During reconnaissance, the penetration tester (or attacker) scours open source intelligence (OSINT) in search of information about the target. If you think teens on their phones are good at this, just imagine what a dedicated attacker might be able to find.
What You Can Find
For example, by searching LinkedIn, you can find positions of current employees, and sometimes even their email addresses. By searching the company’s website, you might locate names, contact information, current events/projects, etc. By collecting documents the company has published, you might discover metadata that reveals the kind of software in use or usernames of those who created the document.
All of this information is extremely useful for harvesting targets for an attack, such as a phishing campaign or a password guessing attack. It can also provide insight into the type of systems in use, which can allow the attacker to better target the environment.
How to Find It
One of the best ways to conduct this reconnaissance is by using Google dork searches. A Google dork is a specialized search query that can find information in the deep, forgotten places of the internet. These searches find information that is publicly available, but in most cases, organizations do not intend the information to be public (or even realize that it’s there). Google dorking retrieves public facing documents that your organization may have published at some point in the past; it also helps you discover what kind of metadata may have been left in these documents, and what this data reveals.
Google dorking goes beyond the average, curiosity driven searches. But it’s important to note that this is not a complex technique either; you don’t need specialized tools. You just need to know how to search, plus have access to the internet. Even a novice hacker can Google dork effectively. Here are some of the queries used to retrieve very specific types of information:
- site: finds files located on a specific website or domain.
- filetype: when followed by a file extension, this will return files of a specific type, such as DOC, PDF, XLS and INI.
- inurl: when followed by a particular string, this returns results with that sequence of characters in the URL.
- intext: when followed by a specific word or phrase, this will return files with that word/phrase in the text of the document.
Using these queries, attackers can retrieve all documents associated with a domain and retrieve the metadata that might not have been scrubbed. They can search for documents marked “confidential,” which may reveal sensitive internal information. Google dorking is a free, relatively easy way to conduct reconnaissance.
How to Combat It
Reconnaissance is so effective because there’s virtually no way for organizations to prevent someone from conducting reconnaissance on them. You can’t very well stop someone from Googling your company, after all.
What you can do is perform reconnaissance on yourself. This way, you can see exactly what the attackers can see, and in many cases, you can find ways to limit your exposure.
What we recommend is devoting about 15 minutes, once a month, to using Google dorks and discovering what may be hiding on the internet. If you find an employee had published a pdf about your company picnic from eight years ago, take it down. If you find a document detailing how to VPN into your network that somehow got published publicly, definitely take it down. If you find all sorts of metadata included in documents meant for your customers, take them down, scrub the metadata, and republish if necessary. If there’s no reason for documents to be available externally, say it with me: Take Them Down.
This process is especially pertinent for healthcare organizations and professional services organizations who publish a lot of documents online. If you don’t have a defined, consistent process for reviewing and removing these documents, you will forget what’s out there.
There might be valid business justifications for disclosing names and emails online on your company website. And there may also not be a way to tightly control what your users post to social media. But public documents and metadata are things you can—and should—control. Don’t make it any easier on attackers than you have to. Performing a simple Google dork process once a month is a great way to cut down on your information disclosure.