<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>"Self-plagiarism is style"</title>
	<atom:link href="http://www.daveyp.com/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.daveyp.com/blog</link>
	<description>Dave Pattern's weblog</description>
	<pubDate>Mon, 05 Jan 2009 23:06:46 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
	<language>en</language>
			<item>
		<title>The &#034;Harry Potter Effect&#034;</title>
		<link>http://www.daveyp.com/blog/archives/611</link>
		<comments>http://www.daveyp.com/blog/archives/611#comments</comments>
		<pubDate>Mon, 05 Jan 2009 23:06:46 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[HotStuff]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=611</guid>
		<description><![CDATA[If you look at the overall keyword cloud for HotStuff 2.0, you can see librar* bloggers like to talk about libraries, books, reading, books and libraries.
When some things are more popular than others, this gives rise to Tim Spalding&#039;s &#034;Harry Potter Effect&#034; &#8212; everyone&#039;s got the HP books on their shelves, so, if you&#039;re not [...]]]></description>
			<content:encoded><![CDATA[<p>If you look at the <a href="http://www.daveyp.com/hotstuff/words/">overall keyword cloud</a> for HotStuff 2.0, you can see librar* bloggers like to talk about libraries, books, reading, books and libraries.</p>
<p>When some things are more popular than others, this gives rise to Tim Spalding&#039;s &#034;<a href="http://www.librarything.com/blog/2005/10/schedule-maintenance-and-books-you.php">Harry Potter Effect</a>&#034; &#8212; everyone&#039;s got the HP books on their shelves, so, if you&#039;re not careful, they end up becoming the top recommendations/suggestions for almost any type of book.</p>
<p>In our case, in many of the keyword clouds, &#034;library&#034; and &#034;book&#034; keep on coming out as the largest words.   Whilst this is an accurate reflection of what the blogs are talking about, it does hide some of the more interesting and relevant keywords.</p>
<p>In honour of Mr Spalding, and at the risk of getting sued silly by Mrs Rowling, I&#039;ve added a bit of JavaScript to toggle between a full version of the cloud (&#034;incrementum!&#034;) and one that can sometimes bring out more interesting/relevant keywords (&#034;redactum!&#034;).*</p>
<p>As an example, the full keyword cloud for <a href="http://www.daveyp.com/hotstuff/words/presentation">presentation</a> has &#034;library&#034; as the largest word&#8230;</p>
<p><img src="http://www.daveyp.com/blog/wp-content/uploads/2009/01/cloud1.jpg" /></p>
<p>&#8230;click on &#034;redactum!&#034; and you get a cloud with some more interesting words such as &#034;audiences&#034; and &#034;interaction&#034;&#8230;</p>
<p><img src="http://www.daveyp.com/blog/wp-content/uploads/2009/01/cloud2.jpg" /></p>
<p><i>* apologies for the cod Latin!</i></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/611/feed</wfw:commentRss>
		</item>
		<item>
		<title>Another day, another bit of code&#8230;</title>
		<link>http://www.daveyp.com/blog/archives/607</link>
		<comments>http://www.daveyp.com/blog/archives/607#comments</comments>
		<pubDate>Sun, 04 Jan 2009 17:48:15 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[HotStuff]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=607</guid>
		<description><![CDATA[I&#039;ve added another bit of code to HotStuff 2.0 to try and locate blogs with similar content.  In theory, the suggestions should improve as more posts are consumed (the good ol&#039; Network Effect) as this gives the code more data to find matches on.

For those interested in such things, the code compares the word [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;ve added another bit of code to <a href="http://www.daveyp.com/hotstuff/">HotStuff 2.0</a> to try and locate blogs with similar content.  In theory, the suggestions should improve as more posts are consumed (the good ol&#039; Network Effect) as this gives the code more data to find matches on.</p>
<p><img src="http://www.daveyp.com/blog/wp-content/uploads/2009/01/similar.jpg" /></p>
<p>For those interested in such things, the code compares the word frequencies of the blog in question with those of all the other blogs to try and locate those whose content is similar.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/607/feed</wfw:commentRss>
		</item>
		<item>
		<title>HotStuff 2.0 widgets</title>
		<link>http://www.daveyp.com/blog/archives/587</link>
		<comments>http://www.daveyp.com/blog/archives/587#comments</comments>
		<pubDate>Sat, 03 Jan 2009 20:06:18 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[HotStuff]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=587</guid>
		<description><![CDATA[For anyone who&#039;s interested, I&#039;ve just posted a couple of HotStuff widgets: www.daveyp.com/hotstuff/widgets/
If you&#039;ve got a blog which is listed, you can add a widget to show your current &#034;Hot or Not&#034; rating&#8230;

The second widget allows you to add a word cloud (based on either all words, words used by a specific blog, or for [...]]]></description>
			<content:encoded><![CDATA[<p>For anyone who&#039;s interested, I&#039;ve just posted a couple of HotStuff widgets: <a href="http://www.daveyp.com/hotstuff/widgets/">www.daveyp.com/hotstuff/widgets/</a></p>
<p>If you&#039;ve got a blog which is <a href="http://www.daveyp.com/hotstuff/blogs/">listed</a>, you can add a widget to show your current &#034;<a href="http://www.daveyp.com/hotstuff/?page_id=2">Hot or Not</a>&#034; rating&#8230;</p>
<p><img src="http://chart.apis.google.com/chart?chs=460x230&#038;cht=gom&#038;chd=t:56&#038;chco=00FFFF,FFFF00,FF0000" /></p>
<p>The second widget allows you to add a word cloud (based on either all words, words used by a specific blog, or for a specific word)&#8230;</p>
<div align="center" style="line-height:1.6; padding:0px 50px;" id="cloudexample"></div>
<p><script src="http://www.daveyp.com/hotstuff/widgets/wordcloud/count=50&#038;min=90&#038;max=250&#038;extras=logscale,sqrtscale,groupstem&#038;cloud=_all&#038;wrap=js&#038;id=cloudexample"></script></p>
<p>Both widgets are available as either Wordpress sidebar widgets or as embeddable JavaScript.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/587/feed</wfw:commentRss>
		</item>
		<item>
		<title>HotStuff 2.0 - new features</title>
		<link>http://www.daveyp.com/blog/archives/579</link>
		<comments>http://www.daveyp.com/blog/archives/579#comments</comments>
		<pubDate>Tue, 30 Dec 2008 20:48:48 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[HotStuff]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=579</guid>
		<description><![CDATA[I&#039;ve added a couple of new features to HotStuff 2.0 today&#8230;
1) &#034;Top blogs&#034; for specific words &#8212; this locates the blogs which contain the highest ratio of posts containing that word (matching on the common word stem).  For example, currently The Kept-Up Academic Librarian is the top blog for universities and Phil Bradley is [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;ve added a couple of new features to <a href="http://www.daveyp.com/hotstuff/">HotStuff 2.0</a> today&#8230;</p>
<p>1) <b>&#034;Top blogs&#034; for specific words</b> &#8212; this locates the blogs which contain the highest ratio of posts containing that word (matching on the common word stem).  For example, currently <a href="http://www.daveyp.com/hotstuff/blogs/2953690">The Kept-Up Academic Librarian</a> is the top blog for <a href="http://www.daveyp.com/hotstuff/words/university">universities</a> and <a href="http://www.daveyp.com/hotstuff/blogs/3375187">Phil Bradley</a> is top for <a href="http://www.daveyp.com/hotstuff/words/search">searching</a>.</p>
<p>2) <b>&#034;Hot or not&#034; score for each blog</b> &#8212; using a top secret formula (which I might patent as &#034;BiblioBlogRank&#034;!), for each day&#039;s blog posts, points are added or subtracted to the overall score for that blog.  Points are gained for using words which have seen a recent increase in usage, but are lost for using words that are declining in usage.  For reasons that even I&#039;m not too sure about, <a href="http://www.daveyp.com/hotstuff/blogs/5306227">Slaw</a> is today&#039;s hottest blog and <a href="http://www.daveyp.com/hotstuff/blogs/5510786">TangognaT</a> is the least!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/579/feed</wfw:commentRss>
		</item>
		<item>
		<title>HotStuff 2.0 - live and kicking</title>
		<link>http://www.daveyp.com/blog/archives/572</link>
		<comments>http://www.daveyp.com/blog/archives/572#comments</comments>
		<pubDate>Mon, 29 Dec 2008 21:52:28 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[HotStuff]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=572</guid>
		<description><![CDATA[As promised/threatened just before Christmas, the new version of HotStuff is now up and running: www.daveyp.com/hotstuff/
It&#039;s still early days, so it&#039;ll be a week or two before it really starts to pick up on the hot new topics in the biblioblogosphere.  So far, it&#039;s sucked in just under 1,000 blog posts and found nearly [...]]]></description>
			<content:encoded><![CDATA[<p>As promised/threatened just before Christmas, the new version of HotStuff is now up and running: <a href="http://www.daveyp.com/hotstuff/">www.daveyp.com/hotstuff/</a></p>
<p>It&#039;s still early days, so it&#039;ll be a week or two before it really starts to pick up on the hot new topics in the biblioblogosphere.  So far, it&#039;s sucked in just under 1,000 blog posts and found nearly 17,000 unique words.</p>
<p>Each day, it&#039;ll create a new <a href="http://www.daveyp.com/hotstuff/?cat=3">Word of the Day</a> blog post using a word that&#039;s seen a sizeable increase in usage in the previous few days.  Today&#039;s word was &#034;<a href="http://www.daveyp.com/hotstuff/?p=42">skills</a>&#034;.</p>
<p>You can also search for specific words (e.g. <a href="http://www.daveyp.com/hotstuff/?s=dewey">Dewey</a>, <a href="http://www.daveyp.com/hotstuff/?s=lcsh">LCSH</a> or <a href="http://www.daveyp.com/hotstuff/?s=cool">cool</a>) or view keyword clouds for specific blogs (e.g. &#034;<a href="http://www.daveyp.com/hotstuff/blogs/9514219">Walt at Random</a>&#034; or &#034;<a href="http://www.daveyp.com/hotstuff/blogs/14799089">Tame the Web</a>&#034;).  There&#039;s also a <a href="http://www.daveyp.com/hotstuff/words/">keyword cloud</a> that pulls everything together to show the most used frequently words from all the blogs.</p>
<p>Once again &#8212; if you&#039;d like your RSS/Atom feed adding, just leave a comment (same goes for if you&#039;d like your feed removing!).  You can see a list of the current feeds on Bloglines: <a href="http://www.bloglines.com/public/liblogs">www.bloglines.com/public/liblogs</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/572/feed</wfw:commentRss>
		</item>
		<item>
		<title>HotStuff 2.0</title>
		<link>http://www.daveyp.com/blog/archives/564</link>
		<comments>http://www.daveyp.com/blog/archives/564#comments</comments>
		<pubDate>Mon, 22 Dec 2008 16:47:20 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[HotStuff]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=564</guid>
		<description><![CDATA[After killing off Hot Stuff due to a server upgrade, I find that I&#039;m kinda missing it!
So, I&#039;ve decided to have a second stab at the problem and this time the code is much cleaner and faster.  In particular, I&#039;m using Bloglines to handle fetching all of the feeds and then grabbing the new [...]]]></description>
			<content:encoded><![CDATA[<p>After killing off <a href="http://www.daveyp.com/blog/archives/324">Hot Stuff</a> due to a server upgrade, I find that I&#039;m kinda missing it!</p>
<p>So, I&#039;ve decided to have a second stab at the problem and this time the code is much cleaner and faster.  In particular, I&#039;m using Bloglines to handle fetching all of the feeds and then grabbing the new posts via the Bloglines API.</p>
<p>It&#039;s too early for the code to start spotting new keywords and topics yet, so it&#039;ll be early in the new year before it launches fully.  In the meantime, feel free to check that your favourite library/librarian blogs are included in the list of sites I&#039;m pulling content from: <a href="http://www.bloglines.com/public/liblogs">http://www.bloglines.com/public/liblogs</a>.</p>
<p>Please post a comment with the URL of any blogs you&#039;d like including!</p>
<p>I&#039;m hoping the make the new code a little more visual, so expect to see things like these&#8230;</p>
<p><a href="http://www.flickr.com/photos/davepattern/3127815853/" title="final6_50_1 by Dave &amp; Bry, on Flickr"><img src="http://farm4.static.flickr.com/3076/3127815853_4008c1614d_m.jpg" width="240" height="240" alt="final6_50_1" /></a> <a href="http://www.flickr.com/photos/davepattern/3127816773/" title="final_015 by Dave &amp; Bry, on Flickr"><img src="http://farm4.static.flickr.com/3248/3127816773_db162e710c_m.jpg" width="240" height="240" alt="final_015" /></a></p>
<p>[edit] HotStuff 2.0 is gradually appearing here: <a href="http://www.daveyp.com/hotstuff/">http://www.daveyp.com/hotstuff/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/564/feed</wfw:commentRss>
		</item>
		<item>
		<title>Presentation to the TILE Project meeting in London</title>
		<link>http://www.daveyp.com/blog/archives/550</link>
		<comments>http://www.daveyp.com/blog/archives/550#comments</comments>
		<pubDate>Fri, 12 Dec 2008 14:46:57 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[misc]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=550</guid>
		<description><![CDATA[About 90 minutes ago, I had the pleasure of doing a short presentation to the JISC TILE Project&#039;s &#034;Sitting on a gold mine&#034; workshop in London.  Unfortunately I wasn&#039;t able to present in person, so we had a go doing it all via a video conferencing link.  As far as I can tell, [...]]]></description>
			<content:encoded><![CDATA[<p>About 90 minutes ago, I had the pleasure of doing a short presentation to the JISC TILE Project&#039;s &#034;<a href="http://www.alt.ac.uk/workshop_detail.php?e=318">Sitting on a gold mine</a>&#034; workshop in London.  Unfortunately I wasn&#039;t able to present in person, so we had a go doing it all via a video conferencing link.  As far as I can tell, it seemed to go okay!</p>
<p>The presentation was an opportunity to formally announce the <a href="http://www.daveyp.com/blog/archives/528">release of the usage data</a>.</p>
<p>Our Repository Manager was keen to try putting something non-standard into the repository and twisted my arm into recording the audio&#8230; and I&#039;d forgotten how much I hate hearing my own voice!!!</p>
<p>Anyway, as soon as SlideShare starts playing ball, I&#039;ll have a go uploading and sync&#039;ing the audio track.  Otherwise, here&#039;s a copy of the PowerPoint: &#034;<a href="http://library.hud.ac.uk/ppt/CanYouDigIt.ppt">Can You Dig It?: A Systems Perspective</a>&#034; and you can hear the audio by clicking on the Flash player below&#8230;</p>
<p></p>
<p>The workshop had a copy of the PowerPoint that they were running locally, so every now and then you&#039;ll hear me say &#034;next slide&#034;.</p>
<p>I haven&#039;t listened to much of the audio, so I&#039;ve got my fingers crossed I didn&#039;t say anything too stupid!!!</p>
<p>[edit]</p>
<p>Well, here&#039;s my first attempt at SlideCasting&#8230;</p>
<div align="center">
<div style="width:425px;" id="__ss_842320"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/daveyp/can-you-dig-it-presentation?type=powerpoint" title="Can You Dig It">Can You Dig It</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=canyoudigit-1229155136125222-1&#038;stripped_title=can-you-dig-it-presentation" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=canyoudigit-1229155136125222-1&#038;stripped_title=can-you-dig-it-presentation" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View SlideShare <a style="text-decoration:underline;" href="http://www.slideshare.net/daveyp/can-you-dig-it-presentation?type=powerpoint" title="View Can You Dig It on SlideShare">presentation</a> or <a style="text-decoration:underline;" href="http://www.slideshare.net/upload?type=powerpoint">Upload</a> your own.</div>
</div>
</div>
<p>&#8230;I had no idea how much I go &#034;erm&#034; when presenting! :-S</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/550/feed</wfw:commentRss>
<enclosure url="http://library.hud.ac.uk/ppt/CanYouDigIt.mp3" length="6838438" type="audio/mpeg" />
		</item>
		<item>
		<title>Free book usage data from the University of Huddersfield</title>
		<link>http://www.daveyp.com/blog/archives/528</link>
		<comments>http://www.daveyp.com/blog/archives/528#comments</comments>
		<pubDate>Fri, 12 Dec 2008 12:11:01 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[misc]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=528</guid>
		<description><![CDATA[I&#039;m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we&#039;ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence.  In total, there&#039;s data for over 80,000 titles [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we&#039;ve just released a major portion of our book circulation and recommendation data under an <a href="http://www.opendatacommons.org/">Open Data Commons</a>/<a href="http://wiki.creativecommons.org/CCZero">CC0</a> licence.  In total, there&#039;s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.</p>
<p><span style="font-size:150%"><a href="http://library.hud.ac.uk/usagedata/">http://library.hud.ac.uk/usagedata/</a></span></p>
<p>I would like to lay down a challenge to every other library in the world to consider doing the same.</p>
<p>This isn&#039;t about breaching borrower/patron privacy &#8212; the data we&#039;ve released is thoroughly aggregated and anonymised.  This is about sharing potentially useful data to a much wider community and attaching as few strings as possible.</p>
<p>I&#039;m guessing some of you are thinking: &#034;what use is the data to me?&#034;.  Well, possibly of very little use &#8212; it&#039;s just a droplet in the ocean of library transactions and it&#039;s only data from one medium-sized new University, somewhere in the north of England.  However, if just a small number of other libraries were to release their data as well, we&#039;d be able to begin seeing the wider trends in borrowing.</p>
<p>The data we&#039;ve released essentially comes in two big chunks:</p>
<p>1) Circulation Data</p>
<p style="padding-left:50px">This breaks down the loans by year, by academic school, and by individual academic courses.  This data will primarily be of interest to other academic libraries.  UK academic libraries may be able to directly compare borrowing by matching up their courses against ours (using the UCAS course codes).</p>
<p>2) Recommendation Data</p>
<p style="padding-left:50px">This is the data which drives the &#034;people who borrowed this, also borrowed&#8230;&#034; suggestions in our OPAC.  This data had previously been exposed as a web service with a non-commercial licence, but is now freely available for you to download.  We&#039;ve also included data about the number of times the suggested title was borrowed before, at the same time, or afterwards.</p>
<p>Smaller data files provide further details about our courses, the relevant UCAS course codes, and expended ISBN lookup indexes (many thanks to Tim Spalding for allowing the use of <a href="http://www.librarything.com/thingology/2006/06/introducing-thingisbn_14.php">thingISBN</a> data to enable this!).</p>
<p>All of the data is in XML format and, in the coming weeks, I&#039;m intending to create a number of web services and APIs which can be used to fetch subsets of the data.</p>
<p>The clock has been ticking to get all of this done in time for the &#034;<a href="http://www.alt.ac.uk/workshop_detail.php?e=318">Sitting on a gold mine: improving provision and services for learners by aggregating and using learner behaviour data</a>&#034; event, organised by the <a href="http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/tile.aspx">JISC TILE Project</a>.  Therefore, the XML format is fairly simplistic.  If you have any comments about the structuring of the data, please let me know.</p>
<p>I mentioned that the data is a subset of our entire circulation data &#8212; the criteria for inclusion was that the relevant MARC record must contain an ISBN and borrowing must have been significant.  So, you won&#039;t find any titles without ISBNs in the data, nor any books which have only been borrowed a couple of times.</p>
<p>So, this data is just a droplet &#8212; a single pixel in a much larger picture.</p>
<p>Now it&#039;s up to you to think about whether or not you can augment this with data from your own library.  If you can&#039;t, I want to know what the barriers to sharing are.  Then I want to know how we can break down those barriers.</p>
<p>I want you to imagine a world where a first year undergraduate psychology student can run a search on your OPAC and have the results ranked by the most popular titles as borrowed by their peers on similar courses around the globe.</p>
<p>I want you to imagine a book recommendation service that makes Amazon&#039;s look amateurish.</p>
<p>I want you to imagine a collection development tool that can tap into the latest borrowing trends at a regional, national and international level.</p>
<p>Sounds good?  Let&#039;s start talking about how we can achieve it.</p>
<hr />
<p>FAQ (OK, I&#039;m trying to anticipate some of your questions!)</p>
<p>Q. <i>Why are you doing this?</i><br />
A. We&#039;ve been actively mining circulation data for the benefit of our students since 2005.  The &#034;people who borrowed this, also borrowed&#8230;&#034; feature in our OPAC has been one of the most successful and popular additions (second only to adding a spellchecker).  The JISC TILE Project has been debating the benefits of larger scale aggregations of usage data and we believe that would greatly increase the end benefit to our users.  We hope that the release of the data will stimulate a wider debate about the advantages and disadvantages of aggregating usage data.</p>
<p>Q. <i>Why Open Data Commons / CC0?</i><br />
A. We believe this is currently the most suitable licence to release the data under.  Restrictions limit (re)use and we&#039;re keen to see this data used in imaginative ways.  In an ideal world, there would be services to harvest the data, crunch it, and then expose it back to the community, but we&#039;re not there yet.  </p>
<p>Q. <i>What about borrower privacy?</i><br />
A. There&#039;s a balance to be struck between safeguarding privacy and allowing usage data to improve our services.  It <b>is</b> possible to have both.  Data mining is typically about looking for trends &#8212; it&#039;s about identifying sizeable groups of users who exhibit similar behaviour, rather than looking for unique combinations of borrowing that might relate to just one individual.  Setting a suitable threshold on the minimum group size ensures anonymity.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/528/feed</wfw:commentRss>
		</item>
		<item>
		<title>Coming soon, to a blog near here&#8230;</title>
		<link>http://www.daveyp.com/blog/archives/515</link>
		<comments>http://www.daveyp.com/blog/archives/515#comments</comments>
		<pubDate>Mon, 08 Dec 2008 20:17:58 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[misc]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=515</guid>
		<description><![CDATA[Okay &#8212; I&#039;m the first to admit I don&#039;t blog enough&#8230;  I still haven&#039;t even blogged about how great Mashed Library 2008 was (luckily other attendees have already blogged about it!)
Anyway, unless I get run over by a bus, later on this week I&#039;m going to post something fairly big &#8212; well, it&#039;s about [...]]]></description>
			<content:encoded><![CDATA[<p>Okay &#8212; I&#039;m the first to admit I don&#039;t blog enough&#8230;  I still haven&#039;t even blogged about how great <a href="http://mashedlibrary.ning.com/">Mashed Library 2008</a> was (<a href="http://blog.paulwalk.net/2008/11/28/library-hackers-ftw/">luckily</a> <a href="http://infteam.jiscinvolve.org/2008/11/28/mashing-thingisbn-and-library-lookup-using-yahoo-pipes-courtesy-of-mashed-libraries-2008/">other</a> <a href="http://www.joeyanne.co.uk/index.php/2008/11/28/mashed-library-unconference-2008/">attendees</a> <a href="http://clarihunt.wordpress.com/2008/11/28/mashed-libraries-08/">have</a> <a href="http://blogs.talis.com/panlibus/archives/2008/11/mashed-libraries.php">already</a> <a href="http://ouseful.wordpress.com/2008/11/30/speedmash-and-mashalong/">blogged</a> <a href="http://multifaceted.wordpress.com/2008/12/01/mashed-libraries-2008/">about</a> <a href="http://www.nostuff.org/words/2008/mashed-libraries/">it</a>!)</p>
<p>Anyway, unless I get run over by a bus, later on this week I&#039;m going to post something fairly big &#8212; well, it&#039;s about 90MB which perhaps isn&#039;t that &#034;big&#034; these days &#8212; that I&#039;m hoping will get a lot of people in the library world talking.  What I&#039;ll be posting will just be a little droplet, but I&#039;m hoping one day it&#039;ll be part of a small stream &#8230;or perhaps even a little river.</p>
<p><a href="http://www.flickr.com/photos/davepattern/sets/72157610239690039/show/"><img src="http://farm4.static.flickr.com/3202/3064693903_7cce62785b.jpg?v=0" /></a><br />(<a href="http://www.flickr.com/photos/davepattern/sets/72157610239690039/show/">view slideshow of Mashed Library 2008</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/515/feed</wfw:commentRss>
		</item>
		<item>
		<title>Dewey friend wheel</title>
		<link>http://www.daveyp.com/blog/archives/509</link>
		<comments>http://www.daveyp.com/blog/archives/509#comments</comments>
		<pubDate>Tue, 18 Nov 2008 19:44:12 +0000</pubDate>
		<dc:creator>Dave Pattern</dc:creator>
		
		<category><![CDATA[misc]]></category>

		<guid isPermaLink="false">http://www.daveyp.com/blog/?p=509</guid>
		<description><![CDATA[I&#039;ve been meaning to have a stab at creating something similar to a friend wheel, but using library data, for a while now.  Here&#039;s a prototype which uses our &#034;people who borrowed this, also borrowed&#8230;&#034; data to try find strong borrowing relationships&#8230;

I picked three random Dewey numbers and hacked together a quick PerlMagick script [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;ve been meaning to have a stab at creating something similar to a <a href="http://thomas-fletcher.com/friendwheel/">friend wheel</a>, but using library data, for a while now.  Here&#039;s a prototype which uses our &#034;<a href="http://www.daveyp.com/blog/archives/49">people who borrowed this, also borrowed&#8230;</a>&#034; data to try find strong borrowing relationships&#8230;</p>
<p><a href="http://www.flickr.com/photos/davepattern/3040577483/" title="Dewey friends by Dave &amp; Bry, on Flickr"><img src="http://farm4.static.flickr.com/3285/3040577483_1cd241638c_m.jpg" width="240" height="240" alt="Dewey friends" /></a></p>
<p>I picked three random Dewey numbers and hacked together a quick <a href="http://www.imagemagick.org/script/perl-magick.php">PerlMagick</a> script to draw the wheel:</p>
<ul>
<li>169 - <i>Logic -> Analogy</i> (orange)</li>
<li>822 - <i>English &#038; Old English literatures -> Drama</i> (purple)</li>
<li>941 - <i>General history of Europe -> British Isles</i> (light blue)</li>
</ul>
<p>The thickness and brightness of the line indicates the strength of the relationship between the two classifications.  For example, for people who borrowed items from 941, we also see heavy borrowing in the 260&#039;s (<i>Christian social theology</i>), 270&#039;s (<i>Christian church history</i>), and the 320&#039;s (<i>Political science</i>).</p>
<p>The next step will be to churn through all of the thousand Dewey numbers and draw a relationship wheel for our entire book stock.  I&#039;ve left my work PC on to crunch through the raw data overnight, so hopefully I&#039;ll be able to post the image tomorrow.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.daveyp.com/blog/archives/509/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
