|
Los Links y su Fuerza
Is Google Objective? Manual Edits in Search Results
Gracias a BLOG OUTER COURT, a THINKER

Google claims that their
search results “are generated completely objectively and are independent of the
beliefs and preferences of those who work at Google.”
Elizabeth Van Couvering, at
the Berlin
search engines workshop Monday, argued however that Google didn’t find their
search engine “on the road.” It’s true that every ranking algorithm for a search
engine at one time was written by a necessarily subjective human engineer – so
how objective can a search ranking be? I think we can break up this question
into 4 different levels of “objectivity.” (These levels may
overlap more than they are clearly separate entities.)
Perceived results relevance without manual
edits
When I’m saying “perceived relevance” then I mean that an engineer at Google,
or any other search engine, is trying to rank the “best” pages on top for a
particular search query (the more relevant a search result to its query, the
better for the user), and that what’s relevant or best is a highly subjective
issue to begin with. An engineer/ programmer/ developer must come up with a
basic concept for ranking pages, like “let’s think of web links as votes on
pages, and call this thing PageRank.” An engineer must then evaluate the search
results for different queries (with the help of feedback by external quality
testers, actual usage data and so on), and fine-tune the algo again, for example
to battle search result spam.
Now, I don’t think there’s a way to get around the subjectivity on this level,
because there is no such thing as a truly “objective” result ranking. Any
ranking must reflect the human values of the team who came up with the ranking
algos (or of those who judged the result through feedback polls), unless indeed
we find the source code on the road... nothing too desirable either if
it would be realistic. At this level, we can however argue that to some extent,
“all individual pages and search queries are equal.”
Perceived results relevance through manual edits
On top of trying to rank pages solely automatically & algorithmically, manual
edits consider certain pages or search queries to be “unequal,” meaning they
receive special treatment (we can still talk about algorithms, but
these algorithms are peppered with data):
- A Google engineer might try to create an adult filter for search
rankings, removing adult websites from the listings; for this,
some manual evaluation, or a database of adult words or sites is
needed to seed the system (e.g. “sex” or “sex.com”). Adult filters
may help to avoid “shocking” results from turning up for kids. At
the same time, an adult filter must make sure all the
“porn-googling adults” still find their stuff, so there’s usually
a setting to turn it off.
- A Google engineer might code an annotation in reaction to
protests related to a specific search result. This happened when people
started to complain about an anti-Semite website being on top for the search
query “jew.” After Google saw these protests (and the accompanying Googlebombs)
escalate, they put a little link on top of organic results reading “We’re
disturbed about these results as well. Please read our note.” I’m not arguing
that these annotations are bad, I’m arguing that this is a manual edit that’s
not objective, for better or worse (if you think this is objective, then think
about all the search results you feel are harmful but for which there is
no such disclaimer).
Also, one might think of ads next to search results as a type of paid
annotation.
Another type of future annotation system
might include putting an “unsafe” icon next to spyware, as the
SiteAdvisor plugin does today.
- A Google engineer might determine a set of seed sites
which are deemed positive. This way, you can algorithmically generate all
sorts of other data about the web based on this initial value of what’s a good
seed site. E.g. if you choose Slashdot.org as a “cool” site, then you can
create an algorithm saying “the further away a site is from Slashdot in terms
of degrees of linking, the more uncool it may be.” (I actually don’t believe
Google uses seed sites, but of course no one outside of Google knows.) The
obvious subjectivity included in this approach is the selection of seed sites;
who’s to say Slashdot is really so cool?
(The concept of seed sites works in reverse
too, seeding the system with bad apples to then write algos to
down-rank sites in that neighborhood; it’s likely Google does
indeed use penalized spam sites as bad seeds, the so-called “bad
neighborhood.”)
- The Google death penalty “googleaxes” spam sites. Or to
put it less colorful, sometimes websites trying to game their own search
engine ranking are removed from results (or they are ranked on the deep bottom
of results where mostly no one ever sees them). This happened with e.g. BMW.de
at one time because they used keyword-stuffed doorway pages to lure the
searchers onto their site. Obviously removing spam from search results is a
good thing, and as search spam tries to skew results, battling it is
actually a step towards objectivity or neutrality.
Now, the Google death penalty
may be exercised manually or automatically. In the case of the
German BMW site, the edit was obviously manual as the site’s
doorway pages stayed untouched for years, and were only removed
after a public discussion on this erupted from blogs. We can
imagine that Google engineers in general prefer to automatically
determine what’s spam and what’s not, simply as this is more
effective with the galactic number of websites.
- Copyright laws defend a web author’s work from being
copied all over the place, unless the webmaster freely shares the content.
With the Digital Millennium Copyright Act, you can snail mail Google Inc when
you feel your copyrights have been violated. As an example, try searching for
site:xenu.net... on the bottom of the result page in Google.com you’ll
see: “In response to a complaint we received under the US Digital Millennium
Copyright Act, we have removed 1 result(s) from this page.” (Who pressured
this particular site out of Google result? The Church of Scientology did.)
Whether or not a search
result is the right place to “manually edit” copyright
infringements is another issue – you may argue Google only uses
“fair use” even when it links to unfair-use websites (the
republished Google cache feature aside for the sake of argument).
- A search engine of the future may also have its creators manually edit
results based on privacy issues. When the world becomes more
and more digital and transparent, there may be a stronger concept of “fair
privacy” and not just “fair use.” At the moment, when you google a name of a
stranger you may dig up old stuff about them that even they had
forgotten about, from rants made in a newsgroup in 1994 to images
posted on some exotic web forum in 2002.
- Engineers may manually adjust results for popular searches.
E.g. search engine creators might decide that their engine’s results for the
frequent query “dating” isn’t too good, and it can’t figure out a way to
improve this algorithmically at the time, so it just injects a list of
manually selected URLs to rank on top. While from what we know Google doesn’t
do any of this “short-tail,” manual reorganizing of rankings – other search
engines might – they do have semi-automated “onebox” results on top (semi-automated
as they’re only triggered for certain searches, and that they’re also often
restricted to certain content providers). For example, when you enter age
of george bush the top “result” will be a onebox reading “George Bush –
Date of Birth: 12 June 1924.”
Perceived results irrelevance through manual edits (for a
perceived larger overall
relevance)
In previous examples, we can see that while we can’t always tell if manual
removals and such were fair, we can always argue that at least the
search engine creators deemed them fair. E.g. removing spam sites makes
the search engine return more relevant results on top. However, there’s another
type of manual edit: the one where even the search engineers agree that results
are made worse. I’m thinking of the thing we stop calling “filter” and start
calling “censorship.”
For example, when Google agreed to self-censor German search results based on
a manual blacklist of sites (e.g. those containing Neo-Nazi material), they did
so voluntarily, but one might argue they didn’t really like to do that.
They made the decision to react on semi-official German regulations, possibly
trying to prevent further, stronger censorship, or at least trying to not stay
out of the German market on principle. This was a very clear clash with Google’s
principles – you just had to read their help files at the time, where they said
they don’t censor*. This was also making results, taken on their own, more
irrelevant; clearly entering
stormfront.org and getting no results (on Google.de) is worse
than getting the actual Stormfront.org site as result (on Google.com), at least
measured by relevance.
Why might there be a potentially larger “overall relevance”
for search results on this level? Well, for example when Google would leave the
German market on principle, as they’re opposing censorship at least by their old
standards, they might leave Germans with what they may deem less relevant
results**. Yeah, Yahoo might be up to par with Google relevance, but I bet
Google engineers think differently – it’s sort of their job to do so. So from
their unique subjective perspective, any market without Google is a market with
less relevant search results***, even when that market may have other search
engines available****.
Results relevance not a top
priority
Well, and then there’s the point when search engine creators do not even have
results relevancy as top priority, mostly to replace them with money-makin’
priorities – we could title this level “plain vanilla evil” or “let’s care about
the money instead of the user"***** or “Dilbert cartoon boss doing random stuff.”
For example:
- Mix organic results with paid results, and don’t disclaim paid
results as such.
- Just take whatever the local gov’t gives you in terms of
blacklists so you can enter the market, or maintain your position in
it (Yahoo used to do this in China
from
what
we
can
tell). But the line between “we want to
do a little bad to do overall good” vs “we just want to make money
and we don’t care” is blurred; who’s gonna decide if Google is
more noble than Yahoo in China? Certainly we can’t leave that
decision up to Google.
- Push sites of your own company higher up in the organic rankings.
- Clutter search engine results with popup ads and a bunch of other
“features,” or turn the whole thing into a portal (AltaVista made
that error a couple of years ago before they lost their “geek
approval” crown to Google).
- Accept payments by webmasters to allow them to rank
higher (or be included faster) in search results.
Su misión es permitir PENSAR.
La Search Economía transforma el Mundo; la Search Filosofía piensa la Batalla. Pues
Pensar es mas importante que computar. Si piensas, controlaras el
Mundo desde Afuera. "I THINK GOOGLE" es un sencillo aparato de
frames que permite comparar en ingles, chino y español resultados en
respuesta a una solicitud con sintaxis de matcheo. Bienvenido a "I
THINK GOOGLE" Usuario Final y Networker. Haz clic en el Lenguaje y
Algoritmo que deseas Pensar y entraras en comparador de I Think
Google. Ensayos Search recientes de I THINK GOOGLE: La
No-visibilidad - Reflexiones sobre Viajes
a Natal, El
Mito del Numero 1 en Buscadores 2006
Gracias
I THINK GOOGLE.COM
|