Project
“New Search Engine”
Executive Summary
www.milionovastranka.net/en
Initial Position - My Insertion – Finance
– History – Procedure - Estimate of the Price - Features of the
Project
Idea, New Principle of Searching –
Algorithm,
Short Description – Mini
Search Engine - Method of Evaluating of the
Search Results
Microsoft and Bing
- Short CV - Contacts
1. Initial Position
Google is the biggest company on Internet and one of the richest companies in
the world. But about 30 percent of WWW links is placed incorrectly in Google
(my previous estimate, confirmed later by a study of the City Group). Relevant
(qualitatively and quantitatively excellent WWW pages) are not up, while less
important WWW pages are up in the sequence of found WWW pages. Thus it is
possible to create even a better search engine, than is Google. I think I know
the way, which leads to this goal.
2. My Insertion
I insert into the project:
The idea – new principle of searching (ready).
The algorithm of searching (the system of criteria and initial weights between
the criteria are ready).
The algorithm of the mini search engine (ready).
Prototype - mini search engine (will be inserted after being written, it is
necessary to work it out, mainly for this purpose is destined the initial investment).
Method of evaluating of the search results (ready).
More detailed information to the inserted things is stated below.
3. Finance
I have already the initial investment 60 thousand USD. This investment is
sufficient for the realization of the prototype (mini search engine).
I am seeking investment 2
million USD. The conditions of the investment are defined here: Investment into the New Czech
Search Engine.
For this investment, the real search engine will be realized
and performed in one country (Czech Republic).
See also Budget (Financial consideration).
4. History
The present status of the project is the result of my 4 years work.
4 years ago I have revealed (as most of searching persons), that the search
engines (including the best one - Google) don’t give such results of searching,
which are expected. I made a small research, which I made more accurate later
using 100 keywords, and have found out, that roughly 30 percent of found WWW
pages are placed incorrectly in Google. For around 3 years I have experimented
with the search engines using 100 keywords, observing the placement of found
links and their relation to the properties of the corresponding WWW pages. I
was also changing the properties of my WWW pages and following, how the search
engines react to these changes. I still could not find the answer to the
question, why the sequence of found WWW pages is not optimal, while relevant
WWW pages are placed down and non-important WWW pages (from the point of view
of searching) are placed up. I reached it after 3 years, when I managed to have
a look at searching from another point of view (angle), than present search
engines. I simply revealed the principle, how (while searching) to put up
rightly the relevant WWW pages, which the present search engines, according to
my opinion, do not use.
Next year I paid attention to the construction of the algorithm of searching,
where I projected this principle (criteria and weights between the criteria),
to the suggestion of the algorithm of the mini search engine (so that it could
be realized in relatively short time), to the method of comparing the search
results of various search engines and to the documentation and WWW pages of the
project.
I offer you this project now for kind judgment, and, if you like it, also for
investing.
Remark:
There exists some analogy between the history of “my search engine” and the
initial history of other search engines.
The authors of Google have invented their algorithm of searching already in
1995. They claimed that their algorithm is better than the algorithms of those
times and they wanted to sell this algorithm. They failed in this attempt for
about 3 years. Only then they decided to develop the whole search engine and
got the first bigger investment (100 thousand USD from the director of Sun Microsystems).
Two years ago, the experts looked at the inventors of the real time searching
(searching in social networks and news, e.g. at Facebook and Twitter) like they
“were crazy”, now it is nearly the world Internet sensation number 1. Notice:
it is specialized searching, not general searching.
5. Procedure
- the suggestion of the algorithm of searching and its theoretical verification
on my system of 21 WWW servers is ready
- the prototype (mini search engine) will be programmed for practical
verification of the algorithm
- my principle and algorithm of searching will be offered to Microsoft for the
search engine Bing, or to other alternative search engines (there are about 100
such start-ups)
- if sold, the gain will be divided between me and the investors and the
project will be finished
- if not sold, the “New Czech Search Engine” will be realized and performed for
several years
- after several years, after gaining substantial number of users, the offer to
Microsoft (or to other search engines) will be repeated
6. Estimate of the Price
Basis:
The founders of Google (Larry Page and Sergey Brian) wanted originally to sell
their algorithm of searching for 1.6 billion USD.
Microsoft was attempting to buy Yahoo, at first as a whole (for about 44
billion USD), then the “search part” only (for about 19 billion USD). Estimated
division of the Yahoo price: 10 billion mark, 10 billion portal, 5 billion
hardware, 5 billion network, 5 billion software, 5 billion algorithm of
searching.
Derived:
In case of success and selling my algorithm of searching can thus be the
selling price of this algorithm round 2 billion USD (primary financial purpose
of this project is the selling of the algorithm of searching, that is why it
will be appropriate to go with the price a bit lower, than is the real price,
derived from the Yahoo price). The price 2 billion USD would be divided between
me and the investors into this project. The investor, who invests 2 million USD
for 30 percent of the project, would get about 600 million USD. This would mean
for this investor total income in the height of 300 multiple of the investment:
income 600 million USD / investment 2 million USD = 300 times.
7. Features of the Project
Probability 70 percent: my searching will be better than the searching of
Google, my algorithm of searching will be sold to Microsoft for Bing or to
other suitable interested company.
Probability 20 percent: my searching will be comparable with the searching of
Google, the real “New Czech Search Engine” will be realized and performed in
the Czech Republic; after this search engine will get the reputation and
substantial number of users, the solution will be offered to Microsoft or to
other appropriate and interested company again.
Probability 5 percent: I will continue in developing
my algorithm, especially in the optimization of the weights of the criteria,
and after the solution will be returned to one of the two points, stated above.
Probability 5 percent: none of the variants above will succeed, the project
will be finished.
So the total probability of success (gain) is 95 percent, the probability of
non-success (loss) is 5 percent. The first results of the prototype (mini
search engine) will be available in round 6 months, the final result (the
quality of the algorithm of searching compared to Google and/or Bing, business
negotiations) will be available in round 12 months. In case of full success the
investors will take about 300 multiple of the investment (!), (single investors
would get portions, which would be adequate to the corresponding parts of their
investment).
8. Idea – New Principle of Searching
The
principle of searching, which I have invented, is characterized here.
8.1.
The base of my principle of
searching is in three points:
- I evaluate components of Internet:
WWW pages, documents, images (drawings, photos, maps…), audio, video, scripts
...
- from these components I
construct “other objects” = thematically connected sets
- as for the links between WWW pages, I use dynamic Rank, which depends on the
searched keyword.
The differences between Google (and the like) and me, are as follows. Primary
difference: existing search engines evaluate separate WWW pages, while I
evaluate sets of thematically connected Internet components. Secondary
difference: existing search engines use static PageRank, which considers the
links between WWW pages without respect to the searched keywords, I use dynamic
Rank, which takes into account also the searched keyword. This is, concerning
the links between WWW pages, more accurate.
Most WWW pages are very similar
to each other, concerning their properties, e.g. size, occurrence of keywords,
WWW links etc. The differences between WWW pages are minimal, the distinctive
space is very compressed. About the sequence of found WWW pages often decides
one occurrence of a keyword, 1-2 WWW links or even randomness (location of
keywords). Not even the best search engine
(and that Google is somehow good) can do anything about it, this is generally
due to the principle of evaluation of WWW pages. Using this principle, the
algorithm of searching of existing search engines performs a very hard task. In
contrast to this, I evaluate sets of thematically connected components of
Internet. These sets are much bigger than
single WWW pages and differ from each other much more, than single WWW pages.
So these sets can be distinguished quite well (in terms of their properties -
size, occurrence of keywords, WWW links etc.). Simply, I have the distinctive
space comfortably stretched, so my algorithm based on this principle works
better than existing algorithms of searching. Only from the
sequence of such sets I derive the sequence of WWW pages, which are contained
in them.
In other words: On one side,
there is the searching person and the searched keyword, on the other side,
there is the total set of all information on Internet concerning this keyword. Single WWW pages bite only very
small pieces from this total set, roughly one percent or even much less. One WWW page about cars contains
let’s say 1 percent of all the information concerning cars on the Internet, the
second WWW page contains 0.9 percent, the third 0.8 percent... The differences between WWW
pages are relatively small. In contrast, my sets of Internet components bite
much bigger portions from the total set, i.e. from the summary information
concerning the searched keyword. One such set of thematically
connected Internet components may contain 10 percent of all the information
concerning cars on Internet, the second set 9 percent, the third set 8 percent…
The differences between such sets are relatively large.
So, the differences between my
sets of components of Internet are much bigger than the differences between
single WWW pages. One can guess that my
distinctive space is 10 times bigger than the distinctive space of Google (or
other existing search engines). Based on this principle, my
algorithm is more robust, the sequence of WWW pages, computed by my algorithm,
is better.
I publicly say, that for Internet searching, the sets of components instead of
single WWW pages should be used - in this respect, I was the first person in
the world to reveal and publish this. But this is only theoretical point A of
my solution. Practical point B is how to construct these sets. I have invented
also this construction, it took me about 6 months.
8.2.
The basic property of my principle is, that it puts up, while searching,
rightly the relevant (i.e. qualitatively and quantitatively good) WWW pages,
for the searched keywords. Up are those WWW pages, which are adequate to the
searched keyword(s) – not more general, other or less general WWW pages.
8.3.
The idea is oriented towards basic (common, classical) searching. Not to
specialized branches of searching, which are e.g. Internet shopping or real
time searching, like searching of persons (Facebook) or news and miniblogs
searching (Twitter) etc.
8.4.
My principle is not expressed in the search algorithm by a single criterion. On
the contrary, it is projected practically into all the criteria of the search
algorithm, and it influences the main criteria in fundamental way. It winds
through the algorithm like a “red line”. It can be said, that into my search
algorithm, instead of the concept “WWW pages”, my concept of “other objects” is
installed (moreover, some criteria being completed or changed in other way).
8.5.
My concept is no “abstract noun”, on the contrary, it is a well-known computer
term with fix content, used by myself in another way while searching. I simply
look at searching in other way, from other point of view (angle).
8.6.
My principle of searching is no “artificial (computer) intelligence“, as uses
e.g. the search engine WoframAlfa. This artificial intelligence has the
challenge only in remote future, not now. I also do not use neuron networks or
such ideas. My principle and algorithm of searching is a combination of common
sense, theory of graphs, fuzzy sets, probability and statistics.
8.7.
My idea is so far not used by existing search engines. This follows from my
study of publicly accessible articles and documents concerning search
algorithms as well as from my practical verification, observing the sequence of
found WWW pages in existing search engines. Would the present search engines
use this idea, their behavior would be other, and the sequence of found WWW
pages while searching would be different, than it is at present.
8.8.
In my algorithm, I do not have any special criterion against "SEO
spamming" or “black SEO”, i.e. against artificial (formal) putting up some
WWW pages while searching, which is a big problem for the present search
engines. But the magic of my thought and algorithm consists among others in it,
that it is able to eliminate this “SEO spamming” naturally, simply this
elimination follows from my search algorithm. It is adjacent effect of my
algorithm.
8.9.
My principle and/or algorithm is (according to my opinion) patentable (I was
also engaged in intellectual property protection, in patenting and in
protection of designation of origin in EU).
But I do not want to patent it, for the following reasons:
- it is something like “family silver”, like recipe for Whisky or Becherovka
liqueur, which is also not to be revealed or patented
- to make really worldwide patent (search engines are worldwide) represents
cost in the height of about half million USD; especially in the initial stadium
of the project such cost cannot be imagined and would be useless
- most important: if something is patented, then the patent application (patent
text) is published, i.e. publiclly accessible; if my idea would be used by
another person, it would be difficult to prove it (cost of court proceedings;
how to know what's what in hundreds thousands rows of the source code of the
program of foreign search engine; the misusing person or company could put to
the court other program, than he/she is really using – for direct proof of this
it would be necessary to realize elsewhere practically the whole duplicate of
their search engine…).
It can be the consideration of the buyer of my principle and/or algorithm of
searching, whether they will wish to patent these things or protect them in
another way.
8.10.
See also:
Illuminating
explanation of the difference between Google and me
Graphical
explanation of my principle of searching.
9. Algorithm – Short Description
The principle of searching, which I have invented, is projected into the
criteria of my algorithm of searching, which computes the sequence of found WWW
pages while searching. My algorithm of searching consists of about 30 criteria.
Moreover, some criteria are new or modified. Finding the correct weights
between the criteria in the algorithm is also important. I do have the initial
settings of these weights, the weights will be optimized during the development
of the mini search engine.
10. Prototype - Mini Search Engine
Usually, to verify the algorithm of searching, it is necessary to construct the
whole search engine (a work for several – or many – people for several years).
I invented how to reduce this procedure to work of about 3 people for about 1
year.
The mini search engine will save and process nearly the whole Czech Internet
(Czech WWW pages) and about 50 keywords from the World Internet (English WWW
pages). From each of these English keywords the mini search engine will reveal
100-1000 front WWW pages. The mini search engine will count the sequence of
these WWW pages according to my algorithm of searching. After, I will perform
the optimization of the weights of the criteria of searching. I will change
these weights and follow the effect on the sequence of found WWW pages. At the
end of this procedure I will choose, according to my opinion, the best relation
of the weights. The results of this optimized algorithm of searching will be
compared to the search results of Google and Bing.
11. Method of Evaluating of
the Search Results
I have my own method, how to evaluate the search results, i.e. how the
compare the search results of two search engines.
There are at least two other methods of evaluation of the
search results. One has the City Group, another has Microsoft (according to the
comment of Mr. Steven Ballmer).
If it will be a wish of the buyer of my algorithm (e.g.
Microsoft for Bing), they may choose their own testing keywords, I will process
these keywords for them and will generate the sequence of found links. After that, they will be able to compare my results with the results
of searching of Google, Bing or another search engine, according to their
method.
12. Microsoft - Bing
Microsoft tries to penetrate into Internet searching already for about 10
years, this history includes Inktomi, Netscape, MSN Search, Live Search, Yahoo
and Bing. Microsoft introduced the new search engine Bing on Internet in May
2009, so far without substantial success. The relation between the using of
Google and Bing is in the world round 30:1, in USA round 10:1 (according
to the www.statcounter.com). The
analysts expect that this relation will keep without substantial change of the quality
of Bing’s searching.
After the agreement between Microsoft and Yahoo was signed, the Chief Executive
Officer of Microsoft Steven A. Ballmer said, he believes in the „future of
searching“. Moreover, he said, that Microsoft intends to invest into the
Internet searching about 8 billion USD during the next 5 years.
That is why, it does have sense to develop new algorithm of searching and to
offer it to Microsoft.
13. Short CV
I studied on the Czech Technical University – Prague, branch computers. I have
the titles Ing (Engineer) and CSc (Candidate of Science) - this for the work
concerning structural programming. For about 15 years, I was a programmer, I
programmed big laboratory information system. For about 16 years, I am
independent expert and make small business, in the branches of programming and
Internet. I invented and realized the programming language Visual Pascal. I was
working for 5 years for a Canadian telecommunication company - I developed
connection between computers and mobile phones. Now, I provide 21 WWW servers
(tourism, presentations, and corporate actions). I am engaged in Internet lists
(catalogues) and search engines for about 10 years, 6 years theoretically and 4
years practically.
Brno, Czech Republic, February 10th, 2011.
Ing. Petr Hejl, CSc.
Ondrouskova 15, 63500 Brno, Czech Republic
tel.: (+420) 608 374 535
email: phejl@lednice.org