Exploring the File System of a Webserver
Have you ever wondered how to find out the file system and directory structure of a webserver that you don’t own? In this article, we will explore different methods to discover files and directories on a remote host.
Using Web Archiving Tools
Web archiving tools, also known as ‘Spiders’, are designed to start with a HTML document and then follow every link they find on it. By doing so, they are able to find any file on the webserver that is linked from somewhere on the same domain. However, it’s important to note that the <a rel="nofollow"
attribute on a hyperlink is only a request for spiders not to follow the link, and not all spiders respect this attribute.
Search Engines and Robots.txt
Search engines not only crawl a single domain, but also follow links leading to other domains. This means that they can sometimes discover files on domains that are not directly linked from that domain, but are linked from other domains. However, you can prevent search engines from indexing directories you don’t want to appear in search results by using a robots.txt file. Although this is just a polite request, most search engines will respect it.
Configurations and Guessing Filenames
Some webservers are configured to generate a listing of all files and directories in a requested directory. However, this is becoming less common as it is usually not the desired behavior. Nowadays, most webservers return a 403 or 404 error instead of a directory listing. When a file is not explicitly linked anywhere and the webserver doesn’t provide a directory listing, the only option is to guess filenames. Penetration testing tools can automatically guess file names that might be interesting to an attacker, but brute-forcing every possible filename is too slow to be done online.
By understanding these methods, you can gain insight into the file system and directory structure of a webserver that you don’t own. However, it’s important to remember to always respect the privacy and security of others and only access files and directories that are intended to be publicly available.