LINUX FOR SUITS

DOC SEARLS

There are three primary roles within Atlas:

Factory—responsible to the content.

Collector—responsible to the keyword.

Broker—responsible to the Searcher.

Each of these actors must interact with the others to complete any search request. Any two roles could be performed by a single entity (whereas if all three are performed by one entity, the result would be a traditional, monolithic search engine).

A Factory is akin to a crawler in today’s search engines. An Atlas Factory must fetch and process the content as intelligently as possible, performing analysis (such as Natural Language Processing) and normaliz-ing it into distinct units. A Factory shares its highly refined and pro-cessed output with one or more Collectors based on who they believe is best utilizing it.

A Collector absorbs and indexes output from one or more Factories, with one primary goal: ranking.

An Atlas Collector must provide the most intelligent ranking and relationship analysis possible. A Collector has to compete for the output of a Factory, as well as compete to provide the best ranking quality for Brokers.

A Broker must provide a Searcher with the best possible results. It does so by combining diverse ranking results from Collectors and also by retrieving content from the original Factories. This last step, a Broker interacting with a Factory, is critical to maintaining a balanced ecosys-tem. All Factories must be aware of and approve how their results are being used and by whom.

Reputation and reward is bi-directional between all parties (Factory-Collector, Collector-Broker and Broker-Factory).

Each entity may choose to interact on principle (free, Commons), attribution

(results provided by), or commercially (as a paid service). The Atlas protocol is purely a facilitator and does not restrict how the relationships between any entities are formed. In considering these motives for the various entities, it’s likely that the free-based networks will tend to become more specialized, commercial ones will compete on quality, and attribution-based networks will mature in both directions.

This simple yet powerful division of roles, responsibilities and relation-ships will result in a distributed economic foundation for an Internet Search Infrastructure. The wire protocol and further definition of the interactions between these entities is openly evolving; anyone interested is welcomed to join the discussions and see the initial proposals at lists.wikia.com/mailman/listinfo/

atlas-l over the coming weeks.

As a kind of gauntlet, Jeremie threw down a summary challenge, “Nobody will beat Google, but EVERYBODY will.”

Vigorous discussion ensued, as a rapidly growing community of developers began getting into what Jeremie calls “the dirty work of building it openly now”. As that work began, I asked him if it would be cool to flow some of our conversation over here to Linux Journal. He said sure, so here it is.

DS: There is always this tug between monolith and polylith. The irony of the Net as a Giant Zero (world of ends) is that it is entirely polylithic—or wants to be.

Centralizing a future polylithic protocol into a monolithic service is one way it starts. But the end state is polylithic.

JM: I agree, it’s inevitable. It’s the being of the Net itself that ultimately demands it, but Google is fighting to be a monolith for as long as possible...and that’s fine, they’ll embrace Atlas when they see it providing value.

DS: In your announcement [above] I see the seeds of a credit-where-due-based economic system—one in which we might obtain finders fees, or something like that, as value is given for service performed.

JM: The attribution-based model, yes.

Absolutely the middle man needs to be

involved in the transaction. Atlas doesn’t flow the money, but it does flow the information and provide a framework.

The same with advertising. Really, contextual ads are very helpful. I rely on them as a tool when using Google search. And in fact, that model is the best form of fighting Web spam.

How the system works, and who is involved in the flow of information, is completely transparent, so the three actors—a Factory, a Collector and a Broker—are all involved in providing a search result. A Broker works on behalf of the Searcher. They have relationships with the appropriate Collectors, plural, and perform the queries—

assembling all the relevant “Knuggets” they get back from the Collectors, valuing them based on whatever metrics they want, including talking to a “sponsored” Collector who serves only commercial results, any “local” Collectors for regional areas, and so on.

Competition is fierce when anyone can be a Broker for almost no cost other than relationships. So, a Collector has one job, provides relevant results, and has to compete with anyone else to do so. And, it can judge the relevancy via any algorithmic, human-reputation-based, or combination.

DS: Open source at the production end has always been a meritocracy. Seems to me the equivalent with Atlas to “show me the code” is “show me the results”. Or at least, “show me the relevancy”. No?

JM: The Factory is managing commodity access to the refined content, doing all the work of normalizing the Web. Search results are just merit: who has the best. And a Broker going to many sources, many Collectors (there can be lots of them) gains lots of merits in different forms. So foxmarks has a great database of deep links that are very important to people. That way Mitch can serve high-quality results but only for certain categories of queries. And Wikipedia can serve another category of queries with high relevancy—as can local yellow-page-style systems, as can social networks for people queries.

DS: I see implicit in this a respect for the snowballing nature of knowledge, both for individuals and for groups. To be human is to grow what one knows.

Authority is the right we grant certain others to contribute to what we know—and to change us in the process. Knowing more makes me different.

As has been said elsewhere, “we are all authors of each other”.

JM: Yes, the fundamental unit of Atlas is a “Knugget”, a Knowledge Nugget essentially, a search result. A Factory adds value based on what it knows about the content;

a Collector adds value based on what it knows about keywords and ranking Knuggets; and a Broker adds value based on what Collectors it knows and what value they provide in aggregate.

DS: Wikipedia, in growing its own relevance, is an

interesting example. I’ve been looking at radio stations and Webcasters. Wikipedia on the whole is a great source of info, but it’s far from complete.

And, it needs a better way to stay complete than just relying on narrow subject obsessives to stay on top of the current narrative. Search results that feed into better Knuggets that turn into better Wikipedia entries should be a Good Thing, no?

JM: Yes.

DS: Is a Knugget “something somebody wants to know”? I like the word. How about if it’s a combination of keywords that may change over time?

JM: A Knugget is one unit of context, as I define it. It may be a title and a link; it may be a sentence saying some-thing about a noun; it might be a row from a table of things. It’s human-defined and, therefore, very fuzzy by its very nature. It’s “What would a human recognize and make some sense of, out of context of anything other than what’s contained inside of it?” The Web is human, not machine, and Atlas reflects that.

DS: I like contextuality. The summit of Mt. Everest can be an elevation, a sum of climbers, a single fact (such as, it is marine limestone).

JM: Yes. The very nature of Atlas is to demand that a Factory produces the best Knuggets, that a Factory

“understands” the content as best as it can. It is a model that rewards human understanding and value first. All derivative knowledge is built atop that foundation, and we can reduce all of these inflated DB/schema disasters, which all serve the machine first. Atlas works in the same way that the Internet served people first, and servers/data centers second—and the Web content serves people first and software second. Search-as-Internet-infrastructure must serve people first.

DS: Yes. I think there is a Static (traditional Google) vs. Live (human, now, evolving) distinction here.

Allow me to quote myself at a bit of length: “And as for live feeds of Knuggets as they are produced, any content provider can get the most value by generat-ing these feeds of Knuggets into Collectors they trust, search results can be instantly rewarded, a Searcher can find that breaking-news article immediately.”

One becomes an Atlas Broker just by being involved, I would gather.

JM: Just by searching, and knowing whom to ask for results.

DS: How do you want to seed this thing? Where does actual use start first?

JM: An Atlas Broker should be living here in my IM client, using all this great context, LOCALLY, to search my IM his-tory as well as provide the best relevant search results from any content store I want it to.

DS: Who is the Broker?

ABERDEEN

In document + TrolltechGreenphone Program the (Page 42-45)