home / contact / Established on 9/1/2007 --- Please send us suggestions for sites to add to this list -- and suggestions on how we might improve the site / Tell your friends about this site: we have no ad budget and depend on happy visitors to spread the word. Please report dead links and errors / Martin R. Carbone / 5123 Don Rodolfo Drive / Carlsbad, CA 92010 / Telephone: 760-603-1910 / FAX: 760-603-1930 / Website <<http://www.alphabeticalist.com>>
Alphabeticalist Topics --- A / B / C / D / E / F / G / H / I / J-K / L / M / N / O / P / Q / R / S / T / U-V / W / X-Y-Z
This message was sent to Google. It proposes
the generation of a controlled vocabulary for
classifying and searching for web pages on
any topic. (it was not answered)
Does it make any sense to anyone but the writer?
Send comments to the contact address above.
1) One of the main problems with searching for information on web pages
is that the originator of the web page rarely makes a serious attempt to
classify the page by topic or categories. Therefore, I believe, the search
engine classifies the page based on the common language on the page.
That can lead to errors.
2) Perhaps this problem could be reduced over time if the page
originators (PO) would assign classifications and topics based on a
controlled vocabulary.
3) Controlled vocabularies are generaly regarded as being difficult
to generate precisely. However, Google has already generated a
comprehensive, de-facto controlled vocabulary and presents it at the
"Google Directory page" <<http://www.google.com/dirhp>>. That page
shows 15 "categories" -- each leading to "topics. I am skipping the
category "World" for now. "World" essentially deals with the creation
of a directories in all the world's languages
4) Here is a list of categories and the number of topics in each
category --
Arts (52) / Business (53) / Computers (53) / Games (32) / Health
(46) / Home (25) / Kids and Teens (140) / News (23) / Recreation
(43) / Reference (27) / Regional (11) / Science (30) / Shopping
(44) / Society (38) / Sports (94).
That is about 711 topics.
5) The topics lead to subtopics of various quantities. Quantities of
the sub-topics are shown below in ( ).
For instance, the category
"Reference" leads to Almanacs (61) / Archives (448) / Ask an Expert
(115) / Bibliography (189) / Biography (12396) / Books (47) /
Dictionaries (1726) / Directories (312) / Education (61082) /
Encyclopedias (90) / Flags (1026) / Geographic (279) / Journals (57) /
Knots (89) / Knowledge Management (1492) / Libraries (3801) / Maps
(279) / Museums (6554) / Open Access Resources (5) / Parliamentary
Procedure (31) /Questions and Answers (86) / Quotations (377) /
Scientific Reference (560) / Style Guides (158) / Thesauri (37) / Time
(94) / World Records (13).
The total for all these subtopics in the category "Reference" is something
near 100,000. A casual glance at some of the categories leads me to
the conclusion that there are at least 1,500,000 subtopics.
6) It would be a simple matter to put the 711 topics on an
alphabetical list and use that list as a "Controlled Vocabulary of
Topics". It would be impractical to use the 1,500.000 subtopics as a
controlled vocabulary.
7) Here is a "search string" through one category and one high
volume string of topics.
Business >> leads to 53 subtopics, one of the largest subtopics is
"Industrial Goods and Services" -- that leads to 22 subtopics, the
largest is "Machinery and Tools" -- that leads to 47 sub-topics, the
largest of these is "Textile Machinery". That leads to 36 subtopics.
The largest of these is "Auxiliary Equipment " -- that leads to 15
subtopics. The largest of these is "Splitting Machinery". That leads
to 3 subtopics, the largest is "Cutting Equipment". "Cutting
Equipment" is the end of that search string. Along that search string,
we came across 176 subtopics at 7 levels.
8) It would be easy to combine those 176 sub-topics with the 711
topics in #6 above into one alphabetical list.
9) If we assume there are 40 topics at every category and we create
all the possible search strings for the 15 categories, we would have
600 search strings. Each search string would likely traverse 300 or so
subtopics (based on #7 above where we found 711 subtopics while
searching through large topics.)
10) The total number of sub topics for all the search strings would
probably be about 180,000 subtopics (600 x 300 -- from #9 above).
11) An alphabetical list of these subtopics should be able to be put
together by Google easily.
12) I believe that alphabetical list of subtopics could be easily
used by any viewer. It would be a wonderful "controlled vocabulary"
for anyone assigning categories to any web page.
13) An author would logically start by reducing the size of the
list; removing any topic on which he would not be writing. I, for
instance, write a lot -- but I can't imagine that I will ever write on
"aquariums", "calculus", "volcanoes", "zebras" or thousands of other
topics. If I eliminated all the topics I would not be writing on, I
would probably wind up with a list of 3,000 topics or so. I could
easily assign a topic from the resulting controlled vocabulary.
14) Searchers would simply search the "controlled vocabulary" list
for a logical subtopic. If for instance, If wanted to find out
something about moccasins, I would search for that word, If that
failed, I'd try shoes, footwear, protective clothing and whatever. I
think I'd find the most likely subtopic in short order.
15) I would be happy to help Google generate the controlled vocabulary
on a non-paid volunteer basis. We could start by working on one of the
15 categories now listed at <<http://www.google.com/dirhp>>.
Let me know if this is of interest to Google, or anyone who reads this message.
In the meantime, I think this website will start to use the "New York Times list
of topics" as our controlled vocabulary. We know that list has a certain validity
and proven usefulness
Marty Carbone /2/24/08
martycarbone@yahoo.com
760-603-1910
http://www.alphabeticalist.com