During RSM’s 2016 Capture the Flag (CTF) event, the Web Application Security category took the format of a full-blown web application penetration test. Participants could accomplish the 100 point challenge simply by exploring and mapping out the web application. By the time participants reached the 500 point level, they had performed password guessing, SQL injection, bypassed file upload restrictions, and performed OS command execution on the compromised server.
Though some of these exploits (such as SQL injection and bypassing file upload restrictions) benefited from technical knowledge and expertise, others (such as the application mapping or password guessing) require only basic knowledge and common sense. As with any CTF, the most important attributes any participant can bring to the table are a willingness to research and the patience to keep working at something until they figure it out.
The thirst for knowledge and an inquisitive nature are fundamentally more important to the field of information security than straightforward technical know-how which is why RSM deliberately constructs challenges to encourage and educate participants who may have less experience in the field. For example, entire categories (such as Cryptography and Physical) do not require previous hacking knowledge or computer programming skills, while other categories, such as Web Apps, work to bridge that gap.
Take, for example, the previously mentioned 100 point web challenge from the 2016 CTF. The series of web challenges stepped participants through a web application penetration test for a (hypothetical) local widget manufacturer whose website had just gone live. The level 100 scenario, which was entitled “These are not the droids you are looking for” read:
Having heard from industry insiders that the Internet is the “next big thing,” Akron Widgets, Inc. has just launched a new website.
The company has wisely decided to retain your assistance in identifying any vulnerabilities on their website. Keep in mind that at this stage portions of the site are still in development, and the Akron Widgets amateur development team doesn’t want everyone to see them.
The Key for this challenge can be found on one of those “hidden” pages.
To web developers, the solution may be straightforward. For others, the challenge should set them on the road to some basic research. Typing “prevent website from being seen” into Google turned up the following results for me:
As so often seems to be the case, Google has a funny way of knowing what I’m really asking. Notice that 4 of the top 5 results mention “indexing” or “crawling.” Those 4 also mention the “robots.txt” file. The robots.txt file is a simple text file manifestation of the “Robots Exclusion Protocol,” which is used by websites to tell web crawlers what portions of the site not to index. At a high-level, web crawlers are automated scanners that traverse the Internet, and indexing is the process they perform of collecting website content and metadata to reference for search results or some other purpose.
When organizations or site administrators don’t want portions of a site to be indexed (or what to control what crawlers index what portions) they can put those requests in a robots.txt file located in the root web directory. In the case of our Widgets company, the robots.txt file looked like this:
User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /documents/
Nothing super tantalizing in and of itself, but it alerted participants to some directories on the site that they may not have found otherwise (for example, directories that may not have been linked to from other portions of the site). Within one of those directories was the key to the 100 point challenge, as well as clues that led participants on the path to the 200 point answer and eventual compromise of the site.
Checking for the robots.txt file is a basic and preliminary step a penetration tester may take as he or she spiders a target site. Knowing the entire attack surface is essential to ensuring a comprehensive test, and robots.txt may be helpful there.
Of course, crawlers don’t have to honor robots.txt. It’s purely voluntary. When we think of web crawlers, we tend to think of those indexing for search engines such as Google, but there are other ones out there looking to scrape emails for spammers or identify vulnerabilities for malicious purposes. Those crawlers obviously aren’t going to honor robots.txt, so administrators need to understand the voluntary nature of the Robots Exclusion Protocol. In the 2016 CTF scenario, the relative inexperience of the hypothetical widget company explained their use of the robots.txt as a means of trying to hide portions of the site that are under construction, an attempt at “security through obscurity.” Though robots.txt shouldn’t be used as a security measure, it sometimes still is.
For a good example of robots.txt use, check out Wikipedia’s. Not surprisingly, Wikipedia is frequently the target of crawlers and scrapers because of all the it contains. The site’s very detailed and commented robots.txt shows how the file can be utilized.
Another excellent example of a robots.txt file is that on tindeck.com, the most important portions of which are reproduced below:
# .-. # ( ) # '-' # J L # | | # J L # | | # J L # .-'.___.'-. # /___________\ # _.-""' `bmw._ # .' `. # J `. # F L # J J # J ` # | L # | | # | | # | J # | L # | | # | ,.___ ___....--._ # | ,' `""""""""' `-._ # | J ____________________`-. # | F .-' `-88888-' `8888b.`. # | | .' `P' `8888b \ # | | J # L # q888b L # | | | | )888D ) # | J \ J d888P P # | L `. .b. ,8888P / # | `. `-.___,o88888o.___,o8888P'.' # | `-._________________________..-' # | | # | .-----.........____________J # | .' | | | | # | J---|-----..|...___|_______| # | | | | | | # | Y---|-----..|...___|_______| -- KILL ALL HUMANS # | `. | | | | # | `'-------:....__|______.J # | | # L___ | # """----...______________....--'
If robots.txt isn’t effective as a security measure, at the very least it’s an excellent opportunity to reference Bender or some other beloved pop culture robots.
So all it really took to get started on the web application security category was a little Googling. If you learned something from this post, then just think of how much you could learn by participating in RSM’s CTF. Conversely, if you already knew everything, then why not compete? The CTF has something for everyone. Robots welcome.