<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Reaching at least 600 000 people with 19 contacts</title>
	<atom:link href="http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/feed/" rel="self" type="application/rss+xml" />
	<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 13:27:45 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Fred</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-645</link>
		<dc:creator>Fred</dc:creator>
		<pubDate>Fri, 23 Feb 2007 12:52:11 +0000</pubDate>
		<guid isPermaLink="false">#comment-645</guid>
		<description>Hi Nishad,&lt;br /&gt;
&lt;br /&gt;
Well, Talk Digger is not a traditional search engine even if  you can search for keywords. In fact, the first goal of this search engine is to find who link to a specific webpage (your blog?).&lt;br /&gt;
&lt;br /&gt;
What I would suggest you if you don&#039;t find any malayalam blogs would be to put the url of one of these in talkdigger and then checking who links to them. Then, starting to browse them from a see blog.&lt;br /&gt;
&lt;br /&gt;
for example, there are the people linking to your blog: &lt;br /&gt;
&lt;br /&gt;
http://www.talkdigger.com/conversations/mallu-ungle.blogspot.com&lt;br /&gt;
&lt;br /&gt;
I hope you will find what you are searching for.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Take care,&lt;br /&gt;
&lt;br /&gt;
Fred</description>
		<content:encoded><![CDATA[<p>Hi Nishad,</p>
<p>Well, Talk Digger is not a traditional search engine even if  you can search for keywords. In fact, the first goal of this search engine is to find who link to a specific webpage (your blog?).</p>
<p>What I would suggest you if you don&#8217;t find any malayalam blogs would be to put the url of one of these in talkdigger and then checking who links to them. Then, starting to browse them from a see blog.</p>
<p>for example, there are the people linking to your blog: </p>
<p><a href="http://www.talkdigger.com/conversations/mallu-ungle.blogspot.com" rel="nofollow">http://www.talkdigger.com/conversations/mallu-ungle.blogspot.com</a></p>
<p>I hope you will find what you are searching for.</p>
<p>
Take care,</p>
<p>Fred</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nishad H. kaippally</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-646</link>
		<dc:creator>Nishad H. kaippally</dc:creator>
		<pubDate>Fri, 23 Feb 2007 07:29:49 +0000</pubDate>
		<guid isPermaLink="false">#comment-646</guid>
		<description>I was trying out talkdigger. I presume you have worked on it. It does not find any malayalam blogs. There are atleast a thousand blogs written in this language.&lt;br /&gt;
&lt;br /&gt;
PLease let me know why a large majority of asian languages  (wich includes 9 indian languages) are not part of your search results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers&lt;br /&gt;
</description>
		<content:encoded><![CDATA[<p>I was trying out talkdigger. I presume you have worked on it. It does not find any malayalam blogs. There are atleast a thousand blogs written in this language.</p>
<p>PLease let me know why a large majority of asian languages  (wich includes 9 indian languages) are not part of your search results. </p>
<p>
Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fred</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-644</link>
		<dc:creator>Fred</dc:creator>
		<pubDate>Fri, 26 Jan 2007 02:16:14 +0000</pubDate>
		<guid isPermaLink="false">#comment-644</guid>
		<description>Hi Dan!&lt;br /&gt;
&lt;br /&gt;
Wow this is great! It has been fast :)&lt;br /&gt;
&lt;br /&gt;
This is not a problem for the current list of profiles. In fact, the simple way for me is to get a list of web page, and then crawling them (PTSW already try to get  elements from HTML document to RDF documents. &lt;br /&gt;
&lt;br /&gt;
So if it would be possible for you to generate me this list in a list of URL separated by return carrier, I could start to crawl them tomorrow or over the weekend. That way I would only have to feed it with them. (in fact it would be a small agent that would read the list and then pinging it with them, anybody could do that).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thanks!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Salutations,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Fred</description>
		<content:encoded><![CDATA[<p>Hi Dan!</p>
<p>Wow this is great! It has been fast <img src='http://fgiasson.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>This is not a problem for the current list of profiles. In fact, the simple way for me is to get a list of web page, and then crawling them (PTSW already try to get  elements from HTML document to RDF documents. </p>
<p>So if it would be possible for you to generate me this list in a list of URL separated by return carrier, I could start to crawl them tomorrow or over the weekend. That way I would only have to feed it with them. (in fact it would be a small agent that would read the list and then pinging it with them, anybody could do that).</p>
<p>
Thanks!</p>
<p>
Salutations,</p>
<p>
Fred</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan Libby</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-643</link>
		<dc:creator>Dan Libby</dc:creator>
		<pubDate>Thu, 25 Jan 2007 23:03:14 +0000</pubDate>
		<guid isPermaLink="false">#comment-643</guid>
		<description>add tag should be &quot;add a &lt;link&gt; tag&quot;.</description>
		<content:encoded><![CDATA[<p>add tag should be &#8220;add a &lt;link&gt; tag&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan Libby</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-642</link>
		<dc:creator>Dan Libby</dc:creator>
		<pubDate>Thu, 25 Jan 2007 23:01:40 +0000</pubDate>
		<guid isPermaLink="false">#comment-642</guid>
		<description>Fred, I&#039;ve begun pinging &lt;a href=&quot;http://Pingthesemanticweb.com&quot; rel=&quot;nofollow&quot;&gt;Pingthesemanticweb.com&lt;/a&gt; whenever a profile is added/updated on &lt;a href=&quot;http://videntity.org&quot; rel=&quot;nofollow&quot;&gt;Videntity.org&lt;/a&gt;.  &lt;br /&gt;
&lt;br /&gt;
A full dump into foaf format would be a bit more coding work, as the files are programatically generated at this time, not real on-disk files.  I did however add   tag in each profile page to aid with discoverability of the foaf files, so they could pretty easily be scraped/spidered starting with this  &lt;a href=&quot;http://videntity.org/directory&quot; rel=&quot;nofollow&quot;&gt;directory page&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Fred, I&#8217;ve begun pinging <a href="http://Pingthesemanticweb.com" rel="nofollow">Pingthesemanticweb.com</a> whenever a profile is added/updated on <a href="http://videntity.org" rel="nofollow">Videntity.org</a>.  </p>
<p>A full dump into foaf format would be a bit more coding work, as the files are programatically generated at this time, not real on-disk files.  I did however add   tag in each profile page to aid with discoverability of the foaf files, so they could pretty easily be scraped/spidered starting with this  <a href="http://videntity.org/directory" rel="nofollow">directory page</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fred</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-640</link>
		<dc:creator>Fred</dc:creator>
		<pubDate>Thu, 25 Jan 2007 17:19:34 +0000</pubDate>
		<guid isPermaLink="false">#comment-640</guid>
		<description>Hi Vaclav,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yeah I was talking about the public semantic bank. I will take a deeper look to it later.&lt;br /&gt;
&lt;br /&gt;
Yeah well, putting longwell over PTSW could be a good idea, but I have other plans that will roll out later in february (so keep checking this blog ;) ). In fact, longwell is nice, but my mother, my friends, etc don&#039;t like it: too complex, need to much knowledge to use it, etc. So this is the reason why I have other plans.&lt;br /&gt;
&lt;br /&gt;
Thanks,&lt;br /&gt;
&lt;br /&gt;
Take care,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Fred</description>
		<content:encoded><![CDATA[<p>Hi Vaclav,</p>
<p>
Yeah I was talking about the public semantic bank. I will take a deeper look to it later.</p>
<p>Yeah well, putting longwell over PTSW could be a good idea, but I have other plans that will roll out later in february (so keep checking this blog <img src='http://fgiasson.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  ). In fact, longwell is nice, but my mother, my friends, etc don&#8217;t like it: too complex, need to much knowledge to use it, etc. So this is the reason why I have other plans.</p>
<p>Thanks,</p>
<p>Take care,</p>
<p>
Fred</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vaclav Synacek</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-641</link>
		<dc:creator>Vaclav Synacek</dc:creator>
		<pubDate>Wed, 24 Jan 2007 10:01:09 +0000</pubDate>
		<guid isPermaLink="false">#comment-641</guid>
		<description>&lt;p&gt;Hi Fred,&lt;br /&gt;
as to your question about PTSW accessing Piggy Bank/Semantic Bank:&lt;br /&gt;
When the data is scraped by any scraper it is saved to Piggy Bank&#039;s database that runs inside one&#039;s browser and is accessible over HTTP on some high port. This data is on users&#039; computers so it would by quite hard to access for PTSW spiders. However users may also publish some of the data from their Piggy Banks to public Semantic Banks they have accounts in. These Semantic Banks are installations of Longwell project (http://simile.mit.edu/wiki/Longwell). Their not so long list is at http://simile.mit.edu/wiki/List_of_banks . The general &lt;i&gt;free for everybody to use&lt;/i&gt; bank (http://simile.mit.edu/bank/) contains nearly 500 FOAF People and hundreds of other data.&lt;/p&gt;

&lt;p&gt;Geting the RDF from the banks is trivial, just follow the alternate link. I don&#039;t know about pinging PTSW on data change. Ask the SIMILE developers about that.&lt;/p&gt;

&lt;p&gt;It might be very interesting if you would set up a semantic bank yourself. And promote publishing Piggy Bank data to your bank. This way you might get a lot of RDF data scraped over the Internet by Piggy Bank users for your project.&lt;/p&gt;

&lt;p&gt;Even better would be if all the data indexed by PTSW would be accessible through Longwell faceted browser interface. This would be a semantic web killer app. But this is more of a dream than a near future project, I guess.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Hi Fred,<br />
as to your question about PTSW accessing Piggy Bank/Semantic Bank:<br />
When the data is scraped by any scraper it is saved to Piggy Bank&#8217;s database that runs inside one&#8217;s browser and is accessible over HTTP on some high port. This data is on users&#8217; computers so it would by quite hard to access for PTSW spiders. However users may also publish some of the data from their Piggy Banks to public Semantic Banks they have accounts in. These Semantic Banks are installations of Longwell project (<a href="http://simile.mit.edu/wiki/Longwell" rel="nofollow">http://simile.mit.edu/wiki/Longwell</a>). Their not so long list is at <a href="http://simile.mit.edu/wiki/List_of_banks" rel="nofollow">http://simile.mit.edu/wiki/List_of_banks</a> . The general <i>free for everybody to use</i> bank (<a href="http://simile.mit.edu/bank/" rel="nofollow">http://simile.mit.edu/bank/</a>) contains nearly 500 FOAF People and hundreds of other data.</p>
<p>Geting the RDF from the banks is trivial, just follow the alternate link. I don&#8217;t know about pinging PTSW on data change. Ask the SIMILE developers about that.</p>
<p>It might be very interesting if you would set up a semantic bank yourself. And promote publishing Piggy Bank data to your bank. This way you might get a lot of RDF data scraped over the Internet by Piggy Bank users for your project.</p>
<p>Even better would be if all the data indexed by PTSW would be accessible through Longwell faceted browser interface. This would be a semantic web killer app. But this is more of a dream than a near future project, I guess.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fred</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-638</link>
		<dc:creator>Fred</dc:creator>
		<pubDate>Wed, 24 Jan 2007 01:07:31 +0000</pubDate>
		<guid isPermaLink="false">#comment-638</guid>
		<description></description>
		<content:encoded><![CDATA[<p>Dan Libby: I just take a look at videntity.org: it seems great! What interest me more here is the fact that each profile are exported using FOAF. So, would it be possible to get a list of FOAF from videntity? That way I would include them into Pingthesemanticweb.com. Also, would it be possible for you to ping it each time a new user create an account, or each time a user update its profile? That way, other people could do cool things with the data of your users.</p>
<p>I am not sure that I will support OpenID with that version of the blog since I would have too much time to put in that and that there is no plugin for opened available for that version of b2Evolution <img src='http://fgiasson.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>
Vaclav Synacek: Yeah well, it is sure that if it is wrote in JS, then it couln’t be that useful to such a project <img src='http://fgiasson.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>By the way, I was wondering without having the time to investigate further, is the data available in the Semantic Bank available to public? If so, it would be great if the data could be indexed by PTSW and if the semantic bank could ping it each time a new/updated file is indexed into the bank.</p>
<p>
Danny: yeah you are right. But there are specialized database of information (normally in university) that filter all that information for them. It is sure that if you try to find all you information on Google, you will have to spend a lot of time filtering out all the crap <img src='http://fgiasson.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>
Take care all,</p>
<p>Salutations,</p>
<p>
Fred</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Danny</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-639</link>
		<dc:creator>Danny</dc:creator>
		<pubDate>Wed, 24 Jan 2007 00:03:04 +0000</pubDate>
		<guid isPermaLink="false">#comment-639</guid>
		<description>I&#039;m not sure if this relates but another reason I can think of where an online community wouldn&#039;t want to let users outside of the community interact is scientific research. What I mean is, a few years ago, I heard on the radio that the scientific community wasn&#039;t happy with the Internet now that it became hugely popular. It made their research difficult because scientists had to spend a lot of time filtering out the information from non-scientific people (example, ads, conspiracy theories, forums, flame wars, 13 year old chatspeak, etc.). The program said that they were planning to launch an exclusive type of Internet for the sole purpose of scientific research. The original use of the Internet! I don&#039;t know if it actually happened.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not sure if this relates but another reason I can think of where an online community wouldn&#8217;t want to let users outside of the community interact is scientific research. What I mean is, a few years ago, I heard on the radio that the scientific community wasn&#8217;t happy with the Internet now that it became hugely popular. It made their research difficult because scientists had to spend a lot of time filtering out the information from non-scientific people (example, ads, conspiracy theories, forums, flame wars, 13 year old chatspeak, etc.). The program said that they were planning to launch an exclusive type of Internet for the sole purpose of scientific research. The original use of the Internet! I don&#8217;t know if it actually happened.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vaclav Synacek</title>
		<link>http://fgiasson.com/blog/index.php/2007/01/21/reaching_at_least_600_000_people_with_19/comment-page-1/#comment-637</link>
		<dc:creator>Vaclav Synacek</dc:creator>
		<pubDate>Tue, 23 Jan 2007 10:42:11 +0000</pubDate>
		<guid isPermaLink="false">#comment-637</guid>
		<description>&lt;p&gt;Hi everybody!&lt;/p&gt;

&lt;p&gt;To Fred: I&#039;m the original author of LinkedIn scraper. Orkut scraper was done by Ben Hyde. I&#039;m not Orkut user and thus I have never seen Orkut scraper in action.&lt;/p&gt;

&lt;p&gt;Answer to your question if these scrapers can be adapted to work outside of Piggy Bank is a bit more difficult. Generally it should be possible - they are standard JavaScripts with dependencies on Firefox XPath processor and some Piggy Bank specific RDF processing calls (not really hard to replace), it is open source and anybody is free to do it. However both scripts are meant to be personal tools and to work inside your browser, they only work after you log in to the specific social network. They only can scrape what you as a user are allowed to see. So in the case of LinkedIn: every user can see as far as his friends&#039; friends profiles, not further, so Piggy Bank can scrape also only this far. The users of LinkedIn agreed to share their profiles with direct friends and their friends, but nobody else - this policy is hard coded into the web interface of LinkedIn and Piggy Bank being a browser plugin cannot break out of these policy rules. No magic here. This is what I meant by &lt;cite&gt;while respecting contacts&#039; privacy&lt;/cite&gt;.&lt;/p&gt;

&lt;p&gt;Conclusion: while porting some scrapers outside of Piggy Bank environment might be possible and interesting thing to do, I don&#039;t see a point in porting these particular two scrapers as they rely on logging in the social networks and thus will remain personal tools anyway.&lt;/p&gt;

&lt;p&gt;To Dan Libby: videntity is an interesting beginning of a project. I&#039;m not sure I got the whole idea, but I think of it as an &lt;i&gt;&#039;OpenID provider with XFN/FOAF file hosting and web hosting the myspece way&#039;&lt;/i&gt;. This might be a more open alternative to myspace and the like, but I don&#039;t see much to offer to the people having a proper web hosting where they can get OpenID, can make their own FOAF or XFN files. So I can&#039;t wait for the &#039;future plans&#039; being implemented.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Hi everybody!</p>
<p>To Fred: I&#8217;m the original author of LinkedIn scraper. Orkut scraper was done by Ben Hyde. I&#8217;m not Orkut user and thus I have never seen Orkut scraper in action.</p>
<p>Answer to your question if these scrapers can be adapted to work outside of Piggy Bank is a bit more difficult. Generally it should be possible &#8211; they are standard JavaScripts with dependencies on Firefox XPath processor and some Piggy Bank specific RDF processing calls (not really hard to replace), it is open source and anybody is free to do it. However both scripts are meant to be personal tools and to work inside your browser, they only work after you log in to the specific social network. They only can scrape what you as a user are allowed to see. So in the case of LinkedIn: every user can see as far as his friends&#8217; friends profiles, not further, so Piggy Bank can scrape also only this far. The users of LinkedIn agreed to share their profiles with direct friends and their friends, but nobody else &#8211; this policy is hard coded into the web interface of LinkedIn and Piggy Bank being a browser plugin cannot break out of these policy rules. No magic here. This is what I meant by <cite>while respecting contacts&#8217; privacy</cite>.</p>
<p>Conclusion: while porting some scrapers outside of Piggy Bank environment might be possible and interesting thing to do, I don&#8217;t see a point in porting these particular two scrapers as they rely on logging in the social networks and thus will remain personal tools anyway.</p>
<p>To Dan Libby: videntity is an interesting beginning of a project. I&#8217;m not sure I got the whole idea, but I think of it as an <i>&#8216;OpenID provider with XFN/FOAF file hosting and web hosting the myspece way&#8217;</i>. This might be a more open alternative to myspace and the like, but I don&#8217;t see much to offer to the people having a proper web hosting where they can get OpenID, can make their own FOAF or XFN files. So I can&#8217;t wait for the &#8216;future plans&#8217; being implemented.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

