This is for those of you who enjoy high-level problem solving and engineering – if you’d like, feel free to play with this, and shoot me your answers, to: firstname.lastname@example.org. (Just to warn you, if your answers are interesting, expect me to reach out to you about joining our Core Development team).
It would be a great benefit if we could categorize each site upon visiting it so as to know what kind of information we should be looking for. For example, a typical company site will have a contact page, a management page, press releases, and a description of the company. If our goal was to find information about people and companies it could be argued that we’d be smart to visit every page of these sites. Alternatively, some websites’ purpose is to try to sell you something. These are shopping sites or online marketplaces, like Amazon.com, for example, containing products and prices and should be pruned so that we visit only the sections that are interesting to us. Still other sites provide information about a particular topic (like www.cancer.org) and another approach might have to be taken to find the kinds of pages we’re looking for. The purpose of this project is to identify the types of sites we want to be able to recognize. Once there exists a set of categories we can work to develop the criteria that puts a site into one of the categories. So, take the sites on the spreadsheet and go through as many as you can to determine what you think the site categories ought to. You only need to go through as many sites as it takes to come up with 5 to 10 categories. Next, think about what characteristics of the sites identify them to be in their specific category. And then, please explain what your methodology or approach would be if you were told to write software to automatically type other websites. Just a few paragraphs of your ideas giving some examples would be fine – we’re not looking for a full, working prototype or anything. Good luck!