Project “New Search Engine”
Executive Summary
www.milionovastranka.net/en


Initial Position - My InsertionFinanceHistoryProcedure - Estimate of the Price - Features of the Project
Idea, New Principle of SearchingAlgorithm, Short DescriptionMini Search Engine - Method of Evaluating of the Search Results
Microsoft and Bing - Short CV - Contacts

1. Initial Position
Google is the biggest company on Internet and one of the richest companies in the world. But about 30 percent of WWW links is placed incorrectly in Google (my previous estimate, confirmed later by a study of the City Group). Relevant (qualitatively and quantitatively excellent WWW pages) are not up, while less important WWW pages are up in the sequence of found WWW pages. Thus it is possible to create even a better search engine, than is Google. I think I know the way, which leads to this goal.

2. My Insertion
I insert into the project:
The idea – new principle of searching (ready).
The algorithm of searching (the system of criteria and initial weights between the criteria are ready).
The algorithm of the mini search engine (ready).
Prototype - mini search engine (will be inserted after being written, it is necessary to work it out, mainly for this purpose is destined the initial investment).
Method of evaluating of the search results (ready).
More detailed information to the inserted things is stated below.

3. Finance
I have already the initial investment 60 thousand USD. This investment is sufficient for the realization of the prototype (mini search engine).
I am seeking investment 2 million USD. The conditions of the investment are defined here: Investment into the New Czech Search Engine.
For this investment, the real search engine will be realized and performed in one country (Czech Republic).
See also Budget (Financial consideration).

4. History
The present status of the project is the result of my 4 years work.
4 years ago I have revealed (as most of searching persons), that the search engines (including the best one - Google) don’t give such results of searching, which are expected. I made a small research, which I made more accurate later using 100 keywords, and have found out, that roughly 30 percent of found WWW pages are placed incorrectly in Google. For around 3 years I have experimented with the search engines using 100 keywords, observing the placement of found links and their relation to the properties of the corresponding WWW pages. I was also changing the properties of my WWW pages and following, how the search engines react to these changes. I still could not find the answer to the question, why the sequence of found WWW pages is not optimal, while relevant WWW pages are placed down and non-important WWW pages (from the point of view of searching) are placed up. I reached it after 3 years, when I managed to have a look at searching from another point of view (angle), than present search engines. I simply revealed the principle, how (while searching) to put up rightly the relevant WWW pages, which the present search engines, according to my opinion, do not use.
Next year I paid attention to the construction of the algorithm of searching, where I projected this principle (criteria and weights between the criteria), to the suggestion of the algorithm of the mini search engine (so that it could be realized in relatively short time), to the method of comparing the search results of various search engines and to the documentation and WWW pages of the project.
I offer you this project now for kind judgment, and, if you like it, also for investing.
Remark:
There exists some analogy between the history of “my search engine” and the initial history of other search engines.
The authors of Google have invented their algorithm of searching already in 1995. They claimed that their algorithm is better than the algorithms of those times and they wanted to sell this algorithm. They failed in this attempt for about 3 years. Only then they decided to develop the whole search engine and got the first bigger investment (100 thousand USD from the director of Sun Microsystems).
Two years ago, the experts looked at the inventors of the real time searching (searching in social networks and news, e.g. at Facebook and Twitter) like they “were crazy”, now it is nearly the world Internet sensation number 1. Notice: it is specialized searching, not general searching.

5. Procedure
- the suggestion of the algorithm of searching and its theoretical verification on my system of 21 WWW servers is ready
- the prototype (mini search engine) will be programmed for practical verification of the algorithm
- my principle and algorithm of searching will be offered to Microsoft for the search engine Bing, or to other alternative search engines (there are about 100 such start-ups)
- if sold, the gain will be divided between me and the investors and the project will be finished
- if not sold, the “New Czech Search Engine” will be realized and performed for several years
- after several years, after gaining substantial number of users, the offer to Microsoft (or to other search engines) will be repeated

6. Estimate of the Price
Basis:
The founders of Google (Larry Page and Sergey Brian) wanted originally to sell their algorithm of searching for 1.6 billion USD.
Microsoft was attempting to buy Yahoo, at first as a whole (for about 44 billion USD), then the “search part” only (for about 19 billion USD). Estimated division of the Yahoo price: 10 billion mark, 10 billion portal, 5 billion hardware, 5 billion network, 5 billion software, 5 billion algorithm of searching.
Derived:
In case of success and selling my algorithm of searching can thus be the selling price of this algorithm round 2 billion USD (primary financial purpose of this project is the selling of the algorithm of searching, that is why it will be appropriate to go with the price a bit lower, than is the real price, derived from the Yahoo price). The price 2 billion USD would be divided between me and the investors into this project. The investor, who invests 2 million USD for 30 percent of the project, would get about 600 million USD. This would mean for this investor total income in the height of 300 multiple of the investment: income 600 million USD / investment 2 million USD = 300 times.

7. Features of the Project
Probability 70 percent: my searching will be better than the searching of Google, my algorithm of searching will be sold to Microsoft for Bing or to other suitable interested company.
Probability 20 percent: my searching will be comparable with the searching of Google, the real “New Czech Search Engine” will be realized and performed in the Czech Republic; after this search engine will get the reputation and substantial number of users, the solution will be offered to Microsoft or to other appropriate  and interested company again.

Probability 5 percent: I will continue in developing my algorithm, especially in the optimization of the weights of the criteria, and after the solution will be returned to one of the two points, stated above.
Probability 5 percent: none of the variants above will succeed, the project will be finished.
So the total probability of success (gain) is 95 percent, the probability of non-success (loss) is 5 percent. The first results of the prototype (mini search engine) will be available in round 6 months, the final result (the quality of the algorithm of searching compared to Google and/or Bing, business negotiations) will be available in round 12 months. In case of full success the investors will take about 300 multiple of the investment (!), (single investors would get portions, which would be adequate to the corresponding parts of their investment).


8. Idea – New Principle of Searching
The principle of searching, which I have invented, is characterized here.
8.1.
The base of my principle of searching is in three points:
- I evaluate components of Internet: WWW pages, documents, images (drawings, photos, maps…), audio, video, scripts ...
- from these components I construct “other objects” = thematically connected sets
- as for the links between WWW pages, I use dynamic Rank, which depends on the searched keyword.
The differences between Google (and the like) and me, are as follows. Primary difference: existing search engines evaluate separate WWW pages, while I evaluate sets of thematically connected Internet components. Secondary difference: existing search engines use static PageRank, which considers the links between WWW pages without respect to the searched keywords, I use dynamic Rank, which takes into account also the searched keyword. This is, concerning the links between WWW pages,
more accurate.
Most WWW pages are very similar to each other, concerning their properties, e.g. size, occurrence of keywords, WWW links etc. The differences between WWW pages are minimal, the distinctive space is very compressed. About the sequence of found WWW pages often decides one occurrence of a keyword, 1-2 WWW links or even randomness (location of keywords). Not even the best search engine (and that Google is somehow good) can do anything about it, this is generally due to the principle of evaluation of WWW pages. Using this principle, the algorithm of searching of existing search engines performs a very hard task. In contrast to this, I evaluate sets of thematically connected components of Internet. These sets are much bigger than single WWW pages and differ from each other much more, than single WWW pages. So these sets can be distinguished quite well (in terms of their properties - size, occurrence of keywords, WWW links etc.). Simply, I have the distinctive space comfortably stretched, so my algorithm based on this principle works better than existing algorithms of searching.  Only from the sequence of such sets I derive the sequence of WWW pages, which are contained in them.
In other words: On one side, there is the searching person and the searched keyword, on the other side, there is the total set of all information on Internet concerning this keyword. Single WWW pages bite only very small pieces from this total set, roughly one percent or even much less. One WWW page about cars contains let’s say 1 percent of all the information concerning cars on the Internet, the second WWW page contains 0.9 percent, the third 0.8 percent... The differences between WWW pages are relatively small. In contrast, my sets of Internet components bite much bigger portions from the total set, i.e. from the summary information concerning the searched keyword. One such set of thematically connected Internet components may contain 10 percent of all the information concerning cars on Internet, the second set 9 percent, the third set 8 percent… The differences between such sets are relatively large.
So, the differences between my sets of components of Internet are much bigger than the differences between single WWW pages. One can guess that my distinctive space is 10 times bigger than the distinctive space of Google (or other existing search engines). Based on this principle, my algorithm is more robust, the sequence of WWW pages, computed by my algorithm, is better.
I publicly say, that for Internet searching, the sets of components instead of single WWW pages should be used - in this respect, I was the first person in the world to reveal and publish this. But this is only theoretical point A of my solution. Practical point B is how to construct these sets. I have invented also this construction, it took me about 6 months.
8.2.
The basic property of my principle is, that it puts up, while searching, rightly the relevant (i.e. qualitatively and quantitatively good) WWW pages, for the searched keywords. Up are those WWW pages, which are adequate to the searched keyword(s) – not more general, other or less general WWW pages.
8.3.
The idea is oriented towards basic (common, classical) searching. Not to specialized branches of searching, which are e.g. Internet shopping or real time searching, like searching of persons (Facebook) or news and miniblogs searching (Twitter) etc.
8.4.
My principle is not expressed in the search algorithm by a single criterion. On the contrary, it is projected practically into all the criteria of the search algorithm, and it influences the main criteria in fundamental way. It winds through the algorithm like a “red line”. It can be said, that into my search algorithm, instead of the concept “WWW pages”, my concept of “other objects” is installed (moreover, some criteria being completed or changed in other way).
8.5.
My concept is no “abstract noun”, on the contrary, it is a well-known computer term with fix content, used by myself in another way while searching. I simply look at searching in other way, from other point of view (angle).
8.6.
My principle of searching is no “artificial (computer) intelligence“, as uses e.g. the search engine WoframAlfa. This artificial intelligence has the challenge only in remote future, not now. I also do not use neuron networks or such ideas. My principle and algorithm of searching is a combination of common sense, theory of graphs, fuzzy sets, probability and statistics.
8.7.
My idea is so far not used by existing search engines. This follows from my study of publicly accessible articles and documents concerning search algorithms as well as from my practical verification, observing the sequence of found WWW pages in existing search engines. Would the present search engines use this idea, their behavior would be other, and the sequence of found WWW pages while searching would be different, than it is at present.
8.8.
In my algorithm, I do not have any special criterion against "SEO spamming" or “black SEO”, i.e. against artificial (formal) putting up some WWW pages while searching, which is a big problem for the present search engines. But the magic of my thought and algorithm consists among others in it, that it is able to eliminate this “SEO spamming” naturally, simply this elimination follows from my search algorithm. It is adjacent effect of my algorithm.
8.9.
My principle and/or algorithm is (according to my opinion) patentable (I was also engaged in intellectual property protection, in patenting and in protection of designation of origin in EU).
But I do not want to patent it, for the following reasons:
- it is something like “family silver”, like recipe for Whisky or Becherovka liqueur, which is also not to be revealed or patented
- to make really worldwide patent (search engines are worldwide) represents cost in the height of about half million USD; especially in the initial stadium of the project such cost cannot be imagined and would be useless
- most important: if something is patented, then the patent application (patent text) is published, i.e. publiclly accessible; if my idea would be used by another person, it would be difficult to prove it (cost of court proceedings; how to know what's what in hundreds thousands rows of the source code of the program of foreign search engine; the misusing person or company could put to the court other program, than he/she is really using – for direct proof of this it would be necessary to realize elsewhere practically the whole duplicate of their search engine…).
It can be the consideration of the buyer of my principle and/or algorithm of searching, whether they will wish to patent these things or protect them in another way.
8.10.
See also:
Illuminating explanation of the difference between Google and me
Graphical explanation of my principle of searching.


9. Algorithm – Short Description
The principle of searching, which I have invented, is projected into the criteria of my algorithm of searching, which computes the sequence of found WWW pages while searching. My algorithm of searching consists of about 30 criteria. Moreover, some criteria are new or modified. Finding the correct weights between the criteria in the algorithm is also important. I do have the initial settings of these weights, the weights will be optimized during the development of the mini search engine.


10. Prototype - Mini Search Engine
Usually, to verify the algorithm of searching, it is necessary to construct the whole search engine (a work for several – or many – people for several years). I invented how to reduce this procedure to work of about 3 people for about 1 year.
The mini search engine will save and process nearly the whole Czech Internet (Czech WWW pages) and about 50 keywords from the World Internet (English WWW pages). From each of these English keywords the mini search engine will reveal 100-1000 front WWW pages. The mini search engine will count the sequence of these WWW pages according to my algorithm of searching. After, I will perform the optimization of the weights of the criteria of searching. I will change these weights and follow the effect on the sequence of found WWW pages. At the end of this procedure I will choose, according to my opinion, the best relation of the weights. The results of this optimized algorithm of searching will be compared to the search results of Google and Bing.

11. Method of Evaluating of the Search Results
I have my own method, how to evaluate the search results, i.e. how the compare the search results of two search engines.
There are at least two other methods of evaluation of the search results. One has the City Group, another has Microsoft (according to the comment of Mr. Steven Ballmer).
If it will be a wish of the buyer of my algorithm (e.g. Microsoft for Bing), they may choose their own testing keywords, I will process these keywords for them and will generate the sequence of found links. After that, they will be able to compare my results with the results of searching of Google, Bing or another search engine, according to their method.

12. Microsoft - Bing
Microsoft tries to penetrate into Internet searching already for about 10 years, this history includes Inktomi, Netscape, MSN Search, Live Search, Yahoo and Bing. Microsoft introduced the new search engine Bing on Internet in May 2009, so far without substantial success. The relation between the using of Google and Bing is in the world round 30:1, in USA round 10:1 (according to the www.statcounter.com). The analysts expect that this relation will keep without substantial change of the quality of Bing’s searching.
After the agreement between Microsoft and Yahoo was signed, the Chief Executive Officer of Microsoft Steven A. Ballmer said, he believes in the „future of searching“. Moreover, he said, that Microsoft intends to invest into the Internet searching about 8 billion USD during the next 5 years.
That is why, it does have sense to develop new algorithm of searching and to offer it to Microsoft.

13. Short CV
I studied on the Czech Technical University – Prague, branch computers. I have the titles Ing (Engineer) and CSc (Candidate of Science) - this for the work concerning structural programming. For about 15 years, I was a programmer, I programmed big laboratory information system. For about 16 years, I am independent expert and make small business, in the branches of programming and Internet. I invented and realized the programming language Visual Pascal. I was working for 5 years for a Canadian telecommunication company - I developed connection between computers and mobile phones. Now, I provide 21 WWW servers (tourism, presentations, and corporate actions). I am engaged in Internet lists (catalogues) and search engines for about 10 years, 6 years theoretically and 4 years practically.

Brno, Czech Republic, February 10th, 2011.

Ing. Petr Hejl, CSc.
Ondrouskova 15, 63500 Brno, Czech Republic
tel.: (+420) 608 374 535
email: phejl@lednice.org