ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] the data mining craze

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Stephen Young <steve@xxxxxxxxxxxxxxxx>
Date: Tue, 1 Mar 2011 14:42:05 +1100
Message-id: <AANLkTingxKapo4KZgHSin4fkdNM=kUTUMwG6Vd6j6_9z@xxxxxxxxxxxxxx>

> This is very interesting, can you go into details? There is also an interesting short
> paper by Jain, Hitzler, and others called 'Linked Data is Merely More Data' (available
> at knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf). It highlights
> the need for ontologies to make the data more useful.

I don't think there's much to be gained by an analysis of the the quality of data in dbPedia - it is what it is. 
To my mind, it's all about usefulness - and we found it very difficult to "graph" most of the triples in dbPedia
in a consistent manner.  The paper you've cited addresses this issue of consistency by noting the problems of
"Schema heterogeneity" and "Entity Disambiguation" across triple repositories - but I'd say these are also issues
*within* repositories.  Certainly they are for dbPedia - the infobox triples especially. The heavy use of literals
in place of Resources and the lack of any grounding for the predicates were particular problems.

> I think Linked Data gives us this great opportunity to test our Semantic Web ontologies and technologies
> using real and massive data sets.

It sounds great, in theory - but I have real reservations the initiative.  It's interesting that you should bring this up
I've attaching a post that I sent to the semantic-web@xxxxxx list a few days ago.  For some reason it doesn't seem to have made it through moderation - so perhaps the heresy of it was a bit much ;-)


=========================


As the http://wik.me project gathers more interest, I'm getting some valuable feedback and engaging in some interesting discussions.  I thought this excerpt re. Linked Data, might be of particular interest to this list:

The context for it is this: We've used many of the triples generated by the DBPedia initiative to build the semantic graph that backs the wik.me site. Despite the fact that the graph largely "rewires" the DBPedia data it's been suggested that we should link back to the relevant DBPedia resource form every page.  I've argued that such linking doesn't serve the wider interest and I'd be keen to hear the opinions of a wider audience.

....

> That's not the point. Accuracy and other quality factors re. Data are inherently subjective.

So... if I create "data" that says Obama is a Republican, from some "subjective" position this has value?  And even if you can make a case for such a position, how much does this "data" subtract from Linked Data's usefulness for everyone else?

> You've used DBpedia data, you've added value to it, so please keep the URIs in scope of User
> Agents. at the very least re. proper attribution.

Sure.  We'll modify the FAQ - but I'm not accepting the assertion that blind data linking is to everyone's benefit.  Someone will have to make that case.  I've not seen it - from TBL or anyone else.

> I don't know what you mean by first person reference to DBpedia team. I suspect you mean the folks that deal
> with the Wikipedia extraction? DBpedia is much more than extraction from Wikipedia.

Of course. <http://wiki.dbpedia.org/Team> ;-) I wonder if you can see the inconsistency in your views that your statement here demonstrates - you're effectively saying "accuracy in your English statements is important but it doesn't matter what you put in your triples".

I don't mean to be confrontational here and I thank you for helping me clarify my position on Linked Data.  Facts and data are quite different from web pages in that veracity is more important than freedom of speech.  The "throw everything in" ethos of Linked Data is a problem, and the initiative is likely to flail around like a beached whale until some kind of semantic Google comes along to sort through the dross.

As people who care about making knowledge and data universally accessible we have to ask ourselves whether a corporate Semantic Google is what we want - particularly since it's likely to be some kind of large devolved Upper Ontology that becomes a hub for everyone's data.

....

I think "Linked Data" could benefit from its own SIG - and certainly a Mission Statement.  Something that says more than just "let's link everything up".

If anyone wants to discuss this issue in more detail than is appropriate here, I've also posted this excerpt to http://knowledgerights.org/group/access.  


=================================================


Steve



On 1 March 2011 11:40, Krzysztof Janowicz <jano@xxxxxxx> wrote:
Steve,

thanks for your detailed reply.


The connection to Ogden is pretty tenuous, I'll admit and it's an example of a dbPedia
triple that shouldn't have made it past the filter.  I might add that more than half
of DBPedia's triples were dumped because of their quality. 

This is very interesting, can you go into details? There is also an interesting short paper by Jain, Hitzler, and others called 'Linked Data is Merely More Data' (available at knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdf). It highlights the need for ontologies to make the data more useful. I think Linked Data gives us this great opportunity to test our Semantic Web ontologies and technologies using real and massive data sets.

Best,
Krzysztof



On 02/28/2011 07:29 PM, Stephen Young wrote:

Thanks for your feedback Krzysztof

> Frankly speaking I am having a hard time making sense out of wik.me. For instance,
> I typed in Germany and got about 30 results in the 'Germany may include' list. Most
> of them were about German soccer teams, heavy metal bands, or the city Ogden in Utah.

The connection to Ogden is pretty tenuous, I'll admit and it's an example of a dbPedia
triple that shouldn't have made it past the filter.  I might add that more than half
of DBPedia's triples were dumped because of their quality. 

This is what you get when you ask a machine to read human language.  But wik.me *does*
give you the power to change these errors - which I've just done on the Ogden page with the
comment "@bot, this is not connected to Germany." 

> From my perspective this list does not follow any order nor does it contain relevant
> links about Germany in general (leaving the first link aside).

I guess you're not a heavy metal fan. Some of us are ;-).  Seriously though - the keyword
search list is ranked by popularity.  The site wik.me is a *discovery* tool, not an exercise in
presenting structure.  

> After following the first link from the 'Germany may include' list, I get a single sentence
> definition of Germany and then a category called 'Contains'. This category seems to be spatial
> containment at first but it is not. I lists some German cities, some German states, the Oktoberfest,
> the German term 'Schadenfreude' and so forth. All this comes without any further *structure* and is
> rather confusing.

This is a forum about ontology, so I shouldn't be surprised that everything is looked at through "structure"
glasses.  You will be confused, because the internal structure is not apparent, and the "structure"
presented is created for wik.me by inference.

You're expecting too much from the site - think of it as search engine with just a little more structure. 
 
> After clicking on the city Duesseldorf, I don't get any additional results but just the around the Web section.

The hope is that people like you, who know something about Dusseldorf, will add some facts about it.  I note that
no-one on this forum has tried that yet so perhaps we need to make the mechanism more obvious.

> Powerset.com, for instance,  followed a similar but way more advanced approach in 2008 before they got bought
> and closed by Microsoft. The same is true for Freebase and other projects that provide structured data.

No.  This I take issue with.  Powerset may have developed some advanced NLP, but the scope, intent and execution of the project were very different.  Scope and intent for Freebase are similar, but wik.me is fundamentally different
in it's structures and in the way it allows people to interact with data.  Ask your eight-year-old daughter to add a
fact about Dusseldorf at FreeBase and see how you go ;-)

> Again, maybe I just used the wrong terms for testing, but I do not see where wik.me is going or who will use it.

It's a proof-of-concept for us.  Until we can get more data into it, I'll concede that for most it has little more than curiosity value.

Steve




On 1 March 2011 02:20, Krzysztof Janowicz <jano@xxxxxxx> wrote:
Frankly speaking I am having a hard time making sense out of wik.me. For instance, I typed in Germany and got about 30 results in the 'Germany may include' list. Most of them were about German soccer teams, heavy metal bands, or the city Ogden in Utah. From my perspective this list does not follow any order nor does it contain relevant links about Germany in general (leaving the first link aside). The 'Around the Web' section lists some links also contained on the first page of a Google search for the same term. 

After following the first link from the 'Germany may include' list, I get a single sentence definition of Germany and then a category called 'Contains'. This category seems to be spatial containment at first but it is not. I lists some German cities, some German states, the Oktoberfest, the German term 'Schadenfreude' and so forth. All this comes without any further *structure* and is rather confusing. After clicking on the city Duesseldorf, I don't get any additional results but just the around the Web section.

Powerset.com, for instance,  followed a similar but way more advanced approach in 2008 before they got bought and closed by Microsoft. The same is true for Freebase and other projects that provide structured data.

Again, maybe I just used the wrong terms for testing, but I do not see where wik.me is going or who will use it.

Best,
Krzysztof




On 02/28/2011 12:36 AM, ZENG, MARCIA wrote:
@Doug: Great analysis. Steve already explained what they have used. I am forwarding it just in case. --Marcia

 Forwarded Message
From: Stephen Young <steve@xxxxxxxxxxxxxxxx>
Reply-To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Date: Sun, 27 Feb 2011 18:55:19 -0500
To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Subject: Re: [ontolog-forum] the data mining craze


> It would be interesting to see the taxonomy, for example, ‘shape’ is the first under ‘people’.
> Thanks for sharing this interesting service!

Our pleasure, Marcia :-)

What you found is a basic categorisation that wik.me <http://wik.me>  uses to group concepts - mainly for page presentation purposes.  wik.me/1 <http://wik.me/1>  is what you get when it can't find any concept that closely matches your search.

The real "taxonomy" is derived from WordNet - the top level concepts can be traced directly to WordNet noun synsets.  WordNet is a fantastic resource, and this has been a common strategy.  Root is "entity" at http://wik.me/2s .

I mentioned in my first post to this forum that our aim was to create a structure that could serve as a kind of devolved universal ontology/universal data schema. The challenge has been to find a structure that maintains this universality, but still offers some usefulness.  What we have at the moment has even fewer axioms than WordNet - and I'm sure we could introduce more.  It's a work-in-progress, and I'd certainly value the input of anyone on this forum who is interested.

Steve


On 2/27/11 11:16 PM, "doug  foxvog" <doug@xxxxxxxxxx> wrote:

On Sun, February 27, 2011 11:44, ZENG, MARCIA said:
> I happen to find the taxonomy behind wik.me, starting from the high level:
>
>  *   organization
>  *   person
>  *   production
>  *   location
>  *   event

Very general concepts included in wik.me are not subclasses of anything
in this list.  A top-level concept that SHOULD include all of these is
"Entity", defined as "That which is perceived or known or inferred to
have its own distinct existence (living or nonliving)."  I suppose
things like "corner" would not be entities, since they have no
independent existence.

I don't see that wik.me has a taxonomy.  It has concepts, which are
specified as "of" one or two other concepts.  This "of" can sometimes
mean a subclass relation, sometimes mean an instance of relation, and
other times have other meanings.

For example, the "Corner" which is defined as "an interior angle
formed by two meeting walls" is "of" both "building" and "area".

The listed set leaves things such as organisms out.  One would think
that they all should be subclass of "Entity".  If you follow some type
of animal up the hierarchy, you find at many levels it is "of" both
the next more general taxon AND "of" the current taxon type.  E.g.,
the concept "Chordata" is "of" both "phylum" and "Animalia".  However,
"Animalia" is only "of" "kingdom", it is not "of" "Organism".  The
concept "animal" is "of" both "Animalia" and "Organism", but no chain
of "of"s links concepts for most types of animals to the concept "animal".

Wik.me provides some interesting results, but it is no taxonomy.

-- doug foxvog

> http://wik.me/1#foundPages

> At each 'category' there is also a synonym ring, for example, e.g.:
>
> Person
>
> Of people, organism and causal agent     May also be referred to as
> individual, mortal, somebody, someone and soul.
>
> A human being; "there was too much for one person to do".
>
> It would be interesting to see the taxonomy, for example, 'shape' is the
> first under 'people'.
> Thanks for sharing this interesting service!
> Marcia
>
> On 2/27/11 4:12 AM, "Stephen Young" <steve@xxxxxxxxxxxxxxxx> wrote:
>
>
> Pavithra, I think you must have misspelled "Einstein".
> http://search.wik.me/search.htm?words=Albert+Einstein  returns 20+
> concepts named for Albert Einstein - and the topmost result is the man
> himself.  And that list is something you CANNOT get from Google.
>
> Clicking the top result http://wik.me/lfn2 ("Albert Einstein") also gives
> you something you can't get from Google - a self-organised presentation of
> what wik.me <http://wik.me>  "knows" about Einstein.  Google knows
> *nothing* about Einstein but where to find pages that contain the string
> "Albert Einstein".
>
> Structured data is always going to permit greater functionality than
> keyword indexing.  If it didn't, you and I wouldn't have a job ;-)
>
> But of course Google is more robust - it would have detected your spelling
> mistake and given you the most-likely valid alternative.  So it should be
> with 2000 engineers and over a decade of refinement.
>
> wik.me <http://wik.me>  can also only return results based on the data it
> has mapped, which means it's a valid alternative to Google for only a
> minority of searches.  Our estimates suggest that with all organisations,
> products and services in, we should give a much better experience for
> around 65% of all searches currently made against Google.  That's next.
>
> Steve
>
>
>
> On 26 February 2011 23:07, Pavithra <pavithra_kenjige@xxxxxxxxx> wrote:
> wik.me <http://wik.me/>  is another search tool with a LIST of results ..
> does not provide anything more than Google would.  Google is more robust .
>   This  uses information from answers.com <http://answers.com>  etc..
>  The word "Albert Einstein" did not get a result at all, but a list of
> names that started with Albert and did not include Einstein.
>
> Qwiki.com actually provides information on what is typed in.  When it can
> not find the actual information ( NOT A LIST)  it simply says it did not
> find it.  For example, if you type world's tallest building, it did not
> find any information.  They need to include lot more data sets..
>
> Quiki.com seems to be more in the direction of web 3.0 mobile apps with
> plenty of room to grow.    But the audio is very mechanical, unlike
> Watson;s voice. (  :-) )!
>
> Pavithra
>
> --- On Fri, 2/25/11, Stephen Young <steve@xxxxxxxxxxxxxxxx> wrote:
>
> From: Stephen Young <steve@xxxxxxxxxxxxxxxx>
> Subject: Re: [ontolog-forum] the data mining craze
> To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
> Date: Friday, February 25, 2011, 6:28 PM
>
>
> We actually characterised qwiki as the reverse wik.me <http://wik.me>
> when we first saw it ;-)  All style no substance.  There may have been
> some bitterness ;-) - we both applied to launch at TechCrunch Disrupt last
> year.  They got in, we didn't.
>
> I'm frequently amazed by what captures (and fails to capture) the
> imagination of the technology pundit.  We presented a site/app that is a
> quantum improvement over Web 2.0 structured data plays like Freebase and
> Factual.  Among other things our video demonstrated that anyone could
> change complex structured data with simple twitter-like comments - and yet
> we didn't make the cut.  Qwiki went on to win the Techrunch Disrupt prize
> - followed soon after by some serious venture funding.
>
> Mind you, this forum is little different.  I've just announced the
> ontological equivalent of a flying car here and received no more interest
> than a few private messages ;-)
>
>
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>


=============================================================
doug foxvog    doug@xxxxxxxxxx   http://ProgressiveAustin.org

"I speak as an American to the leaders of my own nation. The great
initiative in this war is ours. The initiative to stop it must be ours."
    - Dr. Martin Luther King Jr.
=============================================================


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx


_________________________________________________________________ Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/ Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx Shared Files: http://ontolog.cim3.net/file/ Community Wiki: http://ontolog.cim3.net/wiki/ To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx


-- 
Krzysztof Janowicz

GeoVISTA Center, Department of Geography, 302 Walker Building
Pennsylvania State University, University Park, PA 16802, USA

Email: jano@xxxxxxx
Webpage: http://www.personal.psu.edu/kuj13/
Semantic Web Journal: http://www.semantic-web-journal.net


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
 



--
Stephen Young
CEO @ factnexus.com
Architect @ wik.me
Founding member @ knowledgerights.org
_________________________________________________________________ Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/ Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx Shared Files: http://ontolog.cim3.net/file/ Community Wiki: http://ontolog.cim3.net/wiki/ To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx


-- 
Krzysztof Janowicz

GeoVISTA Center, Department of Geography, 302 Walker Building
Pennsylvania State University, University Park, PA 16802, USA

Email: jano@xxxxxxx
Webpage: http://www.personal.psu.edu/kuj13/
Semantic Web Journal: http://www.semantic-web-journal.net


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
 



--
Stephen Young
CEO @ factnexus.com
Architect @ wik.me
Founding member @ knowledgerights.org

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>