*All images in this post were found using publicly available sources and should be used for educational purposes only
One of the best things in the IT community is Open Source Software. Open source software is something where the a company develops a piece of software and then makes the source code publicly available, allowing anyone to look and manipulate the code. This has yield projects such as Git, which has made lives easier with version control. However, when I hear the term ‘open source’, something very different comes to mind than software, an attack vector. Another form of open source is Open Source Intelligence (OSINT). OSINT is the method where someone can gather data, or intelligence, from publicly available sources. These can be sources anywhere from doing simple Google searches to pulling back public records from the government. There is a wide variety of ways to obtaining this data, and in this post I will cover the tools we use to pull back data in order to get usernames for a potential phish, gain recon on network’s footprint, or to find potential overall targets. OSINT is also highly useful for physical assessments, but that will be covered in a later post. I won’t be going into a great deal of depth on how to use the tools, just a quick overview of what they do and what kind of information they can give us.
Dorks Can be Cool Too
Google is a great source of information, and it’s easy to use. But sometimes, you will face the dreaded choice of whether or not to go to that second page of results to find the answer you are looking for. Luckily for us there is something called Google Dorks. Google dorks allow you to target your query towards more specific results. Say you are looking for a PDF file from a specific domain, we’ll say example.org, in the Google search bar simply type: filetype:pdf site:example.org
. Now your search will only yield results that are PDF documents
from example.org. Tailoring your search to what you need is a good way to get back valuable metadata or even login pages.
Here is a short list of Google Dorks and what they do:
Dork | Description |
Site:url | Searches only the given url |
Filetype:png | Will search for the given filetype, in this case a ‘png’ |
inurl:admin | Searches for urls that have the given string in them, in this case ‘admin’ |
There are many more dorks available, but these are just a few that come in handy when performing recon and planning an attack. There is even a nifty Exploit-DB page with some good dorks already made up.
So if we search for a PDF and get all these awesome results, what can we do with it? The answer is metadata! PDFs are a treasure-trove of metadata, and if there is one thing an attacker can not get enough of it’s information on their target. So with all these PDFs, now we can download them, look at their properties in Adobe, and see what juicy intel we can pull back.
Cool! We can see the user who created the document, in this case their full name, the application that created it, and even the OS version. We’re well on our way to forming a list of users for a phishing attack, but this process may take awhile; there has to be a better way.
Retrieve All the Metadata!
One of the tools we use for retrieving a lot of metadata is FOCA. FOCA is a tool developed by the nice people over at Eleven Paths, and its main purpose is to retrieve metadata. It is fairly simple to use too. When creating a project just specify the domain website you want to target and from there you can begin pulling back metadata. You can search through all kinds of files from doc, PDFs, and xls. It will even utilize different search engines like Google and Bing.
Once FOCA finds all the files, we can extract the metadata and analyze it. This will show us users, operating systems, software, and even full email addresses. All of the documents FOCA finds can be easily downloaded and further examined if need be. This is a great way for obtaining a good amount of usernames and email addresses quickly to formulate a phishing attack. With full names we can branch out to other sources of information such as LinkedIn and Facebook. People have a nasty habit of posting sensitive data about their work on social media, and once we have names we can begin digging through social media to see if we find anything worth pursuing. On a recent engagement I found the company’s Facebook page and after going through several pictures I found a few that were of banners with employees’ names on them. We used this information and combined it with a confirmed email schema to formulate a successful phishing attack.
iRobots
If there is one thing that we’ve learned so far it’s that Google is a powerful tool for the average pentester, and that we can never have too much intel. Well for WebApplication assessments we can take it one step further and find directories that do not show up through normally queries. There is something known as the Robots Exclusion Protocol. Web developers, wanting to hide certain pages and directories from normal web traffic, will list the ones they want hidden in a Robots.txt file and put it on the root of the site’s hierarchy. So simply putting /robots.txt
at the end of a url would yield some potentially juicy information.
Bingo! This site is disallowing /admin/
, meaning if we put that on the end of the url as the directory instead of robots.txt, we should be greeted with a login page for an administrator! If you have an IP address for the target URL, you can use the auxiliary module for robots.txt in metasploit.
If It’s Not Broke…
The oldies are sometimes the best, they’ve been around for awhile and they still give us some good information. Tools like Whois, Nslookup, and Nmap are still very good tools for conducting recon on a target. With a known domain name, or IP address, whois will return a wealth of information that’s perfect for an attacker.
Here we can see this whois query returned the address of the company that owns the doamain, two contacts in their IT department, and their IP address. With this we could attempt spear phishing on their IT admins, vishing with the contact numbers, or use nmap to begin port scanning their domain. Using Nmap we can scan the whole range of IP addresses that would be under the one we got from the whois query. This will reveal open ports and services, and can even reveal the OS the systems are running. Once we’ve mapped the domain we can begin targeting specific systems on the domain and begin the three step process of 1. Compromise, 2. Escalate, 3. Profit.
Conclusion
These are by no means the only OSINT tools out there, just a select few we use on our typical engagements. There are some OSINT tools that function as browser plugins. While conducting some research I came across some pretty cool Firefox aplugins and Chrome extensions that will passively gather information on the website you are currently on. On Firefox checkout Passive Recon , which has all sort of recon tools built into it like whois and DNS mappings. For Chrome there is an extension called Email Extractor which will pull out any email addresses found on the particular page you are on so you don’t have to dig through the page, however it will not pull back emails in the metadata of documents. IMINT is another form of OSINT that was covered in a previous blog in relation to a physical pentest, I suggest checking it out for a more in depth look at it. It is important for a pentester to spend time on recon and intel gathering, when given enough time formulating a plan of attack based on intel will usually yield a smoother engagement for the pentester. Attempting to ‘Leeroy Jenkins‘ an attack could very well have the same result as getting mauled by dragons and orcs… OK maybe not that bad but it could bring your assessment to a halt and lead to frustration. Take your time and use the tools available to you, easily accessed data can sometimes be the most valuable.
Happy hunting!