Find Unused HTML Files on Website

Ryan Dube
Puzzle Pieces

One of the more difficult aspects of building and maintaining web pages is dealing with how to find unused HTML files on website. The difficulty comes from how easily web designers can remove a link from a page, not realizing that they've just turned that HTML file into an "orphan."

How to Find Unused HTML Files on Website

Many HTML reference manuals refer to the problem of "orphan" web pages. An orphan web page is actually an HTML file that exists on your public web folder, but no other page links to it. In other words, unless a visitor to your website knows the name of the HTML file, they have no way of navigating to that page from your main "index.html" file, or any page that it links to. As a website grows and web designers edit content by adding and removing links, it's very common for those HTML files to remain on the web server and get forgotten.

The problems with unused HTML files on your website include:

  • Large HTML files or a great number of such files consume valuable website space.
  • The public can still access the content in that HTML file, even if the web designer removed the link.
  • Too many unused files on a web server clutter the directories, and make maintenance and updates more difficult.

Cleaning Up Unused Files Manually

There are a number of methods an experienced web designer can use to find and delete unused HTML files. In order to determine which HTML files are orphaned, the designer would need to list all files in the public web directory by "date last accessed," or "date last modified." This would provide a listing of files which can be used to isolate those HTML files which are very old. The webmaster would then need to systematically move or rename each file, and then check every web page to make sure all links still work correctly. While this procedure is time consuming, it isn't too difficult to accomplish if you only have 10 or 20 HTML files. However, once a website starts growing and reaches into the hundreds of HTML pages, this task is impossible to do manually.

Software Cleanup is Much More Efficient

Since the process of locating unused HTML files only requires searching through the HTML of existing web pages for a link to that file, web programmers were quick to develop software to perform the task. Software has the ability to sift through thousands of files in less than a minute and determine whether or not particular HTML files are orphans. Many web hosts offer utilities as part of a web hosting package that will perform this routine maintenance check. If your web host doesn't have such a utility, you can download software that will find those files for you. Some software packages will even automatically delete the orphan files if you configure it to.

The following software products are useful tools for webmasters who want to keep their web space organized and well maintained.

  • Inspyder OrFind will "crawl your site" and identify HTML pages and images that are no longer used. The software can be configured to automatically delete those files, and it will even email you a report if any files are modified. This can provide a webmaster with peace of mind, knowing that if their webspace gets hacked, they'll be alerted. There is a free trial available. Afterward, you need to pay for the software to continue using it.
  • HTML Link Validator, released in 2007 by Lithops Software, will check your website for any broken links, regardless of the number of pages your site has. This software is another class of "web spider" which sifts through your web pages and identifies orphan web site files. The software is Shareware, but you need to register and pay to use the full version.
  • Linx Explorer from NOVOSIB software will verify all links on your website to check for broken links, and it will also detect any unlinked files in your web directory and provide you with a report.
  • SiteCleaner is Mac software created by Sideburn Studios that will take all of the files and code for your web project and "clean it" of all orphaned html or media files. Removed files are placed in a "removed content" folder.

Checking For Orphan Files with Dreamweaver

The well-known Dreamweaver web development software also has a built-in feature that will verify the links within the website that you've created, and it will identify any unused files in your web folder. This feature is built into Dreamweaver's "Check Links" utility in the Site menu. After the software checks links, it also provides you with a report of all unused files, which you can either delete or re-link.

Final Words

At first glance, cleaning up your website of orphan HTML and media files can seem daunting. When your website has thousands of pages, the task is impossible to do manually. But with the tools that are now available, either as standalone software downloads or as part of existing web design software, cleaning your website of broken links and unused files is now as simple as a few mouse clicks.

Find Unused HTML Files on Website