Keeping a Secret
In the early days of the Internet, it would have been possible to publish pages on the web that would be seen only by the people it was intended for, simply by putting it in another web directory that wasn't public knowledge. Now, the assumption that no one will find "hidden" pages of a web site no longer holds true; a few simple techniques can betray the existence of these hidden pages or data files and you should take care now, more than ever, to protect any information that could be regarded private or confidential.
Roaming Robots
The quest to find new information on the web is a huge ongoing task that is highly automated; it would be impossible to expect anyone to look through and assess the billions of pages out there based on their content. Programs that automate the task of data gathering are known as bots or spiders and are becoming more commonplace. Bots are useful in identifying the presence of broken links, finding new content or checking for updates. Some bots are more sinister than others and may probe a server for known weaknesses, but most, for example Googlebot are designed to make surfing the web much easier.
Most bots used for legitimate purposes will read and respect any instructions (as defined by the Robots Exclusion Standard) that they find with regards to your site content. Usually, they will look for a single file in the root directory (robots.txt) for the general list of where they can and can't go in a server, as well as adhering to any instructions given by each file as they read them.
Having said that, using the Robots Exclusion Standard does not offer protection from any bots with malicious intentions, as they will ignore it anyway. You should also consider the implications of adding your most sensitive files and folders in the robots.txt file, since it is easily found and readable by anyone and can act as a signpost pointing to data you'd rather people not see.
If you want to keep the well-behaved robots out of "hidden" areas of your site, then it would be a good idea to add the instructions to the files you wish to protect. This will make it more difficult for anyone simply browsing your site to find it through your robots.txt file, but will do nothing to stop rouge robots from looking through your directories and stealing your files. If you want to make sure you keep all unauthorized bots and people from accessing certain areas of your site, then the safest way short of making it web inaccessible is to password protect the directories.
Password Protection
There are various ways that you can password protect a directory or document, each has its various strengths and weaknesses. The type of password protection you employ will depend on the nature of the information you want to protect.
- Client side JavaScript
- Probably the easiest to find a workaround for since it usually relies on the visitor having JavaScript enabled for protection. The simplest scripts will write the contents to the browser if the correct password is used, but without encryption of some sort, the contents, or the location of the page the script would redirect to, would be viewable. Offers no protection against bots.
- Server Side Scripting
- When server password protection is unavailable then it is possible to protect scripts and directories with a bit of well thought out server scripting. ASP, PHP and Perl all have the ability to pull data from non web accessible directories, so it is possible to make a gateway script that uses several layers of authentication.
- Server Authentication
- The most watertight solution since it will query any effort to read data from a protected directory and disconnect any unauthorised connections.
Password protection is better than simply hiding your files since it creates more work for the person trying to get access to information that they know is there. However there are a few issues that can crop up, particularly if visitors pay to access the information that you protect.
Password sharing is perhaps more common than you might realize and there is little to stop users with legitimate access allowing other users to get in through the back door. Using scripts that creates single use passwords or limits simultaneous accesses can help reduce the chances that a password will be abused, but sometimes such techniques can be more difficult to implement than they are worth.
Using "Offline" Directories
Another way you can protect data from prying eyes is to keep it offline by putting it in a directory that is not directly accessible via the web. Scripts can be written to access data from directories created outside the web root; because the data is not in a directory that has been mapped to a domain or sub domain, the information stored in it can not be read directly by bots or browsers.
Here is an example of a typical Windows setup, when I FTP to my web site I see 4 folders; db, logs, special and www. Anything I put in the www directory is accessible to anyone that knows the URL because that folder has been mapped to my domain. The other three folders store my databases, site logs and mail settings, but as they are not mapped to my domain I have to FTP into my account to see them.
Encrypting Data Sources
There may be a time when a determined individual gets access to your otherwise inaccessible directories; if this ever happens, then there is nothing to stop them downloading your databases and reading the real contents of your scripts. If those files contain sensitive information such as email addresses or credit card numbers then the last line of defense is to encrypt the data.
Logic suggests that you would want to encrypt in a way that can easily be decoded so that you can retrieve values from the database, but in most cases this is not a good idea and is usually unnecessary. Say for example we identified who a person was by asking for their credit card number; the encrypted form of the number is in a database and we would simply encrypt the input in the same way and check that it matched the one in the database. We do not need the actual number from the database since an authorized person would already know what it is and would not benefit from seeing the number on the screen.
One-way encryption of selected data such as this can provide that extra layer of security for details that even you as the site owner do not need to know. Passwords are often protected in this way so that even if someone got hold of the database they would be unable to use most of the information to do much harm to an individual's account or the server, assuming that is, that the problem has recognised, identified and addressed.
Something to Hide?
The more you try to hide the existence of anything, the more desirable and intriguing it becomes; curiosity can drive people to do desperate things to learn exactly what it is that they shouldn't be seeing. Perhaps then you should expect some curious people or bots to test your site to see what they might find of interest.
Of course the best way of protecting your most sensitive data is to store it on a computer that is not connected to the Internet or any other network; however this makes sharing and using the information more difficult. Uploading to a server is the easiest way to distribute such data while allowing a certain degree of interactivity when necessary; but it is not without its risks.
Remember you can no longer simply upload the files and hope that no one else finds them; given time they will be found, and there are better ways of ensuring that confidential information stays private!
