[Top] [All Lists]

Re: [ontolog-forum] PDF and the semantic web

To: Pat Hayes <phayes@xxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Alexander Garcia Castro <alexgarciac@xxxxxxxxx>
Date: Thu, 12 Feb 2009 01:26:03 +0100
Message-id: <1ba2d5730902111626y3f784fbl78d9dea03118fba5@xxxxxxxxxxxxxx>
Thanks for your replies. interesting points. The core of my request has to do with tagging *atomic content* of the PDF (not tagging the whole document).

As for XMP, I am taking a portion of an email I got from another mailing list.

"XMP being a single separate component of the document, I don't see how it helps, unless there is an obvious way to refer to any element within the document."

Duane, u are and ADOBE person, is this possible? not by using ADOBE reader, but by "opening the file".

Again, thanks. Cheers.

On Thu, Feb 12, 2009 at 12:57 AM, Pat Hayes <phayes@xxxxxxx> wrote:

On Feb 11, 2009, at 4:47 PM, Duane Nickull wrote:


Most of what you wrote below is inaccurate.  PDF is not closed

I didn't say it was. I think you need to respond to Alexander rather than to me. 

Also - PDF , while not concerned with semantic declarations, uses a secondary format that is shared with Creative Suite products called XMP, based on RDF and expressed in XML.


Ah, I did not know this. Thanks for the information. 


There are multiple libraries, most open source, for dealing with XMP so metadata can be extracted easily in multiple languages for free.



PDF is, as you noted, concerned with document fidelity and layout for consistent, cross platform rendering.  While it does have structure, it is similar to HTML in terms of semantics (headers, body, paragraph etc).

There are roughly 40-50 vendors with various libraries to access PDF.  True – we think we offer the best but others are based on the ISO standard as well.

This needed to be corrected.



On 11/02/09 2:23 PM, "Pat Hayes" <phayes@xxxxxxx> wrote:

On Feb 11, 2009, at 9:45 AM, Alexander Garcia Castro wrote:

Sorry if this is not the right venue; I decided to send this email because in the past I have seen some semantic web issues being discussed here.

I think one of the W3C mailing lists might be more suitable for this topic. Try    semantic-web@xxxxxx

I would like to know how applicable could the PDF format be within the context of the Semantic web?

NOt very: the primary purpose of PDF is making visually accurate documents, rather than semantic information.

The PDF format is closed; annotating PDFs, as in tagging not the file but the information within the file, is not possible by means different from those provided by ADOBE. For instance, if I wanted to tag a word, or an image within, inside, a PDF I would have to do it with my acrobat reader -the latest version; But if I wanted to facilitate such operation via WEB I could only do it if and only if I had the XSLT so I could transform the PDF into XML.


This limitation is, IMHO, a huge one within the context of the semantic web where we should be able to define links and use them.

I don't quite see why you feel this is a SWeb problem.

Furthermore, being forced to have a third party application just for displaying a file that should be displayed directly by the browser is not a nice feature.

That is an issue for browser implementations.

If PDF was open it could be rendered by the browser.  Aren't closed formats such as PDF viable within the context of the SW?

Im not sure what exactly you are asking here.

After all the PDF was a solution within the context of portability and exchange of information

... for human readers, yes. But not for software inference agents, which is the point of the SWeb.

; the main problem it was solving was a simple one "I want my document to look on display and once printed,  the same everywhere" and "I want people to be able to read my documents without loosing the format of the document and without having to consider the OS". Isn't the PDF obsolete within this context?

What context? PDF seems to work well for its intended purpose. (?? Maybe I havnt understood your point.)

Seriously, I suggest re-sending your message to the semantic-web@xxxxxx  mailing list.

Pat Hayes

Duane Nickull, Vancouver, BC Canada
Senior Technical Evangelist - Adobe LiveCycle ES and Enterprise
Duane's World TV Show - http://tv.adobe.com/#pg+1537
Blog - http://technoracle.blogspot.com
Twitter - http://twitter.com/duanechaos
Community Open Source Music - http://www.mixmatchmusic.com
My Band - http://www.myspace.com/22ndcentury

IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Alexander Garcia

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>