Biweekly links for 03/13/2009

  • Polymath1 and open collaborative mathematics « Gowers’s Weblog
    • “… for me personally this has been one of the most exciting six weeks of my mathematical life. That is partly because it is always exciting to solve a problem, but a much more important reason is the way this problem was solved, with people chipping in with their thoughts, provoking other people to have other thoughts (sometimes almost accidentally, and sometimes more logically), and ideas gradually emerging as a result. Incidentally, many of these ideas are still to be properly explored: at some point the main collaboration will probably be declared to be over (though I suppose in theory it could just go on and on, since its seems almost impossible to clear up every interesting question that emerges) and then I hope that the comments will be a useful resource…

      The sheer speed at which all this happened contributed to the excitement. In my own case it led to my becoming fairly obsessed with the project and working on it to the exclusion of almost everything else… “

  • Twenty years of the world wide web | The Economist
    • “Science inspired the world wide web. Two decades on, the web has repaid the compliment by changing science”
  • Get our full university data | guardian.co.uk
  • Paul Graham: Be Relentlessly Resourceful
    • One of my favourites of pg’s essays: “A couple days ago I finally got being a good startup founder down to two words: relentlessly resourceful.”

Click here for all of my del.icio.us bookmarks.

Published

Introduction to Statistical Machine Translation

How do computers translate texts from one language to another? Human translators use a great deal of detailed knowledge about how the world works to correctly translate all the different meanings the same word or phrase can have in different contexts. This makes automated translation seem like it might be a hard problem to solve, the kind of problem that may require genuine artificial intelligence. Yet although translators like Google Translate and Yahoo!’s Babel Fish are far from perfect, they do a surprisingly good job. How is that possible?

In this post, I describe the basic ideas behind the most successful approach to automated machine translation, an approach known as statistical machine translation. Statistical machine translation starts with a very large data set of good translations, that is, a corpus of texts (e.g., United Nations documents) which have already been translated into multiple languages, and then uses those texts to automatically infer a statistical model of translation. That statistical model is then applied to new texts to make a guess as to a reasonable translation.

I’m not an expert in statistical machine translation. I’ve written these notes to help me internalize the basic ideas of the area. However, I hope the notes are useful to others as an introductory overview of statistical machine translation, and as a starting place to learn more.

Formulating the problem

Imagine you’re given a French text, [tex]f[/tex], and you’d like to find a good English translation, [tex]e[/tex]. There are many possible translations of [tex]f[/tex] into English, of course, and different translators will have different opinions about what the best translation, [tex]e[/tex], is. We can model these differences of opinion with a probability distribution [tex]\mbox{pr}(e|f)[/tex] over possible translations, [tex]e[/tex], given that the French text was [tex]f[/tex]. A reasonable way of choosing the “best” translation is to choose [tex]e[/tex] which maximizes the conditional probability [tex]\mbox{pr}(e|f)[/tex].

The problem with this strategy is that we don’t know the conditional probability [tex]\mbox{pr}(e|f)[/tex]. To solve this problem, suppose we’re in possession of an initial corpus of documents that are in both French and English, e.g., United Nations documents, or the Canadian parliamentary proceedings. We’ll use that corpus to infer a model estimating the conditional probabilities [tex]\mbox{pr}(e|f)[/tex]. The model we’ll construct is far from perfect, but with a large and high quality initial corpus, yields pretty good translations. To simplify the discussion we assume [tex]e[/tex] and [tex]f[/tex] are single sentences, and we’ll ignore punctuation; obviously, the translator can be applied serially to a text containing many sentences.

Now, how do we start from the corpus and infer a model for [tex]\mbox{pr}(e|f)[/tex]? The standard approach is to use Bayes’ theorem to first rewrite [tex]\mbox{pr}(e|f)[/tex] as

[tex] \mbox{pr}(e|f) = \frac{\mbox{pr}(f|e) \mbox{pr}(e)}{\mbox{pr}(f)}. [/tex]

Because [tex]f[/tex] is fixed, the maximization over [tex]e[/tex] is thus equivalent to maximizing

[tex] \mbox{pr}(f|e) \mbox{pr}(e). [/tex]

What we’re going to do is to use our data set to infer models of [tex]\mbox{pr}(f|e)[/tex] and [tex]\mbox{pr}(e)[/tex], and then use those models to search for [tex]e[/tex] maximizing [tex]\mbox{pr}(f|e) \mbox{pr}(e)[/tex].

The procedure outlined in the last paragraph seems a priori like a strange thing to do. Why not save time and difficulty by inferring a model of [tex]\mbox{pr}(e|f)[/tex] and then maximizing directly? It turns out that the procedure I described actually works quite a bit better. Now, to be blunt, I don’t entirely understand the explanations of this I’ve read in the published literature. But what I think is going on is as follows. It turns out that if we work directly with [tex]\mbox{pr}(e|f)[/tex], then many malformed English sentences [tex]e[/tex] are given a high probability by our model. You’ll see why this is the case later – it’s obvious from the construction – but for now just accept that’s the case. If we worked directly with [tex]\mbox{pr}(e|f)[/tex] those sentences would often be issued as “translations”, which is no good. Now, why should maximizing [tex]\mbox{pr}(f|e) \mbox{pr}(e)[/tex] work any better? After all, both [tex]\mbox{pr}(f|e)[/tex] and [tex]\mbox{pr}(e)[/tex] may give high probabilistic weight to sentences [tex]e[/tex] which are not well-formed. So what gives? I think there are two things going on. First, although [tex]\mbox{pr}(f|e)[/tex] and [tex]\mbox{pr}(e)[/tex] may give high weight to malformed sentences [tex]e[/tex], the models are constructed very differently. For both to give high weight simultaneously, it’s quite unlikely that [tex]e[/tex] is malformed. Second, we can build grammatical checks directly into the construction of [tex]\mbox{pr}(e)[/tex], so very few malformed sentences have high values for [tex]\mbox{pr}(e)[/tex]. We won’t actually do that in this post, but it’s relatively straightforward to do.

We’ve broken machine translation up into three problems: (1) build a language model which allows us to estimate [tex]\mbox{pr}(e)[/tex]; (2) build a translation model which allows us to estimate [tex]\mbox{pr}(f|e)[/tex]; and (3) search for [tex]e[/tex] maximizing the product [tex]\mbox{pr}(f|e)\mbox{pr}(e)[/tex]. Each of these problems is itself a rich problem which can be solved in many different ways. In the next three sections I’ll describe simple approaches to solving each of the three problems.

The language model

Suppose we break an English sentence [tex]e[/tex] up into words [tex]e = e_1e_2\ldots e_m[/tex]. Then we can write the probability for [tex]e[/tex] as a product of conditional probabilities:

[tex] \mbox{pr}(e) = \prod_{j=1}^m \mbox{pr}(e_j|e_1,\ldots,e_{j-1}) [/tex]

So, for example, in the sentence fragment “To be or not to…”, you would probably assign a very high conditional probability to the final word being “be”, certainly much higher than the probability of its occurrence at a random point in a piece of text.

The challenge in building a good language model [tex]\mbox{pr}(e)[/tex] is that there are so many distinct conditional probabilities that need to be estimated. To simplify the problem we’ll make some assumptions about the form of the probabilities. The most drastic assumption we could make is to assume that the probability of seeing a word is independent of what came before it, i.e.,

[tex] \mbox{pr}(e_j|e_1,\ldots,e_{j-1}) = \mbox{pr}(e_j), [/tex]

so

[tex] \mbox{pr}(e) = \prod_{j=1}^m \mbox{pr}(e_j). [/tex]

We could estimate the probabilities [tex]\mbox{pr}(e_j)[/tex] by taking a very large corpus of English text, and counting words. The problem is that this model is not very realistic. A more realistic model is the bigram model, which assumes that the probability of a word occurring depends only on the word immediately before it:

[tex] \mbox{pr}(e_j|e_1,\ldots,e_{j-1}) = \mbox{pr}(e_j|e_{j-1}). [/tex]

More realistic still is the trigram model, which assumes that the probability of a word occurring depends only on the two words immediately before it:

[tex] \mbox{pr}(e_j|e_1,\ldots,e_{j-1}) = \mbox{pr}(e_j|e_{j-2},e_{j-1}). [/tex]

To make the discussion which follows as concrete as possible, let’s concentrate on the problem of how to estimate conditional probabilities in the bigram model. Similar remarks can be made for the trigram model. The obvious way to proceed is to take a large corpus of English text, count the number of occurrences [tex]\#(e_1,e_2)[/tex] of a particular word pair in that corpus, and then set [tex]\mbox{pr}(e_2|e_1) = \#(e_1,e_2)/\#(e_1)[/tex].

The problem with this procedure is that there are a lot of bigrams. If we assume that there are 50,000 words in English, say, then there are 2.5 billion possible bigrams. Even if you take a large corpus of training data (say, a billion words), it’s reasonably likely that there will be some bigrams which don’t appear in your corpus, and thus are assigned zero probability, yet which you would a priori like to appear in translations of some French sentences. That is, this kind of training procedure is likely to underestimate the probability of bigrams which don’t appear in the training set, and overestimate the probability of those which do. The problem is even worse for trigrams.

There’s no obvious best solution to this problem. Many different ad hoc solutions have been tried, and my quick survey of the literature suggests that there’s as yet no broad agreement about the best way of solving the problem. To give you the flavour of the solutions people use, though, let me describe two basic approaches, adapted from the survey in section 2.3 of Philip Clarkson’s PhD thesis.

The first approach is to move away from a pure bigram model, and instead to use linear interpolation between the monogram and bigram models. A large and diverse enough corpus of text is likely to give pretty good estimates for nearly all single-word probabilities [tex]\mbox{pr}(e_1)[/tex]. So one way of estimating [tex]\mbox{pr}(e_2|e_1)[/tex] is as:

[tex] \mbox{pr}(e_2|e_1) = \lambda \frac{\#(e_2)}{N} + (1-\lambda) \frac{\#(e_1,e_2)}{\#(e_1)}, [/tex]

where [tex]N[/tex] is the total number of words in the corpus. [tex]\lambda[/tex] is a parameter in the range [tex]0 < \lambda < 1[/tex] which needs to be determined. This can be done using a second corpus of text, and setting [tex]\lambda[/tex] so that the average probability of the bigrams in that corpus is maximized. A second approach is to apply a discount factor to the conditional probabilities for the bigrams which appear in the training corpus, essentially reducing their probability. The easist way to proceed is simply to multiply them all by some constant [tex]c[/tex] in the range [tex]0 < c < 1[/tex], and then to spread the remaining probability [tex]1-c[/tex] uniformly over all bigrams [tex](e_1,e_2)[/tex] which don't appear in the corpus.

Exercises

  • The constant [tex]c[/tex] can be viewed as an estimate of the fraction of all English bigrams that appear in the training corpus. How might you go about obtaining an estimate for [tex]c[/tex]?

To conclude this section, let me mention that the problem of building a good language model is also of great interest to people working on speech recognition software. In fact, much of the research literature on language models can be applied equally to either speech recognition or statistical machine translation.

The translation model

Note: English is my only language, which makes it hard for me to construct translation examples! All my examples are taken from a paper by Brown et al.

In this section we’ll construct a simple translation model allowing us to compute [tex]\mbox{pr}(f|e)[/tex]. Intuitively, when we translate a sentence, words in the source text generate (possibly in a context-dependent way) words in the target language. In the sentence pair (Jean aime Marie | John loves Mary) we intuitively feel that John corresponds to Jean, loves to aime, and Mary to Marie. Of coure, there is no need for the word correspondence to be one-to-one, nor for the ordering of words to be preserved. Sometimes, a word in English may generate two or more words in French; sometimes it may generate no word at all.

Despite these complications, the notion of a correspondence between words in the source language and in the target language is so useful that we’ll formalize it through what is called an alignment. Rather than give a precise definition, let me explain alignments through an example, the sentence pair (Le chien est battu par Jean | John (6) does beat (3,4) the (1) dog (2)). In this example alignment, the numbers in parentheses tell us that John corresponds to the 6th word in the French sentence, i.e., Jean. The word does has no trailing parentheses, and so doesn’t correspond to any of the words in the French sentence. However, beat corresponds to not one, but two separate words in the French sentence, the 3rd and 4th words, est and battu. And so on.

Two notions derived from alignments are particularly useful in building up our translation model. The first is fertility, defined as the number of French words generated by a given English word. So, in the example above, does has fertility [tex]0[/tex], since it doesn’t generate anything in the French sentence. On the other hand, beat has fertility [tex]2[/tex], since it generates two separate words in the French sentence.

The second notion is distortion. In many sentences, the English word and its corresponding French word or words appear in the same part of the sentence – near the beginning, perhaps, or near the end. We say that such words are translated roughly undistorted, while words which move a great deal have high distortion. We’ll encoude this notion more formally shortly.

We’ll build up our translation model using some simple parameters related to fertility and distortion:

  • The fertility probability [tex]\mbox{pr}(n|e)[/tex], the probability that the English word [tex]e[/tex] has fertility [tex]n[/tex].
  • The distortion probability [tex]\mbox{pr}(t|s,l)[/tex], which is the probability that an English word at position [tex]s[/tex] corresponds to a French word at position [tex]t[/tex] in a French sentence of length [tex]l[/tex].
  • The translation probability [tex]\mbox{pr}(f|e)[/tex], one for each French word [tex]f[/tex] and English word [tex]e[/tex]. This should not be confused with the case when [tex]f[/tex] and [tex]e[/tex] are sentences!

I’ve described the translation model as a way of computing [tex]\mbox{pr}(f|e)[/tex], where [tex]e[/tex] is an English sentence, and [tex]f[/tex] is a French sentence. In fact, we’re going to modify that definition a little, defining the translation model as the probability [tex]\mbox{pr}(f,a|e)[/tex] that the French sentence [tex]f[/tex] is the correct translation of [tex]e[/tex], with a particular alignment, which we’ll denote by [tex]a[/tex]. I’ll return to the question of how this change in definition affects translation in the next section. Rather than specify the probability for our translation model formally, I’ll simply show how it works for the example alignment (Le chien est battu par Jean |John (6) does beat (3,4) the (1) dog (2)):

[tex]\mbox{pr}(1|John) \times \mbox{pr}(Jean|John) \times \mbox{pr}(6|1,6) \times [/tex]

[tex]\mbox{pr}(0|does) \times [/tex]

[tex]\mbox{pr}(2|beat) \times \mbox{pr}(est|beat) \times \mbox{pr}(3|3,6) \times \mbox{pr}(battu|beat) \times \mbox{pr}(4|3,6) \times [/tex]

[tex]\mbox{pr}(1|the) \times \mbox{pr}(Le|the) \times \mbox{pr}(1|5,6) \times [/tex]

[tex]\mbox{pr}(1|dog) \times \mbox{pr}(chien|dog) \times \mbox{pr}(2|6,6) \times [/tex]

[tex]\mbox{pr}(1|) \times \mbox{pr}(par|) [/tex]

This should be self-explanatory except the final line, for the French word par. This word has no corresponding word in the English sentence, which we model using a special word [tex][/tex].

What remains to be done is to estimate the parameters used in constructing the translation model – the fertility, distortion and translation probabilities. I’ll briefly outline an iterative procedure that can be used to do this estimation. It starts with a simple guess of the parameters. E.g., we might guess that the distortion probabilities [tex]\mbox{pr}(t|s,l) = 1/l[/tex] are uniform across the sentence. Similar guesses could be made for the other parameters. For each pair [tex](e,f)[/tex] of sentences in our corpus we can use this guess to compute the probability for all possible alignments between the sentences. We then estimate the “true” alignment to be whichever alignment has highest probability. Applying this procedure to the entire corpus gives us many estimated alignments. We can then use those estimated alignments to compute new estimates for all the parameters in our model. E.g., if we find that [tex]1/10[/tex]th of the alignments of sentences of length [tex]25[/tex] have the first word mapped to the first word then our new estimate for [tex]\mbox{pr}(1|1,25) = 1/10[/tex]. This gives us a procedure for iteratively updating the parameters of our model, which can be repeated many times. Empirically we find (and it can be proved) that the parameters gradually converge to a fixed point.

Exercises

  • A problem I’ve glossed over above is figuring out which sentences in the English corpus correspond to which sentences in the French corpus. How could you figure out which sentences correspond to which? What would be the effect of mistakes?

Searching for the maximizing [tex]e[/tex]

In my introductory remarks I explained how translation from French to English can be viewed as the problem of finding [tex]e[/tex] which maximizes [tex]\mbox{pr}(e|f)[/tex], and that this was equivalent to maximizing [tex]\mbox{pr}(f|e)\mbox{pr}(e)[/tex]. Let’s revisit the reasoning which led to that framing of the problem, and recast it in a form more amenable to our translation model based on alignments. Instead of searching for [tex]e[/tex] which maximizes [tex]\mbox{pr}(e|f)[/tex], let’s instead search for a pair [tex](e,a)[/tex] which maximizes [tex]\mbox{pr}(e,a|f)[/tex]. Using Bayes’ theorem this is equivalent to finding [tex](e,a)[/tex] which maximizes

[tex] \mbox{pr}(e,a|f) = \frac{\mbox{pr}(f,a|e) \mbox{pr}(e)} {\mbox{pr}(f)}. [/tex]

As earlier, we can omit the factor [tex]\mbox{pr}(f)[/tex] in the denominator, and so our search is equivalent to finding a pair [tex](e,a)[/tex] which maximizes

[tex] \mbox{pr}(f,a|e) \mbox{pr}(e) [/tex]

Our translation problem is to find [tex](e,a)[/tex] maximizing this expression. Unfortunately, there are far too many possibility to do an exhaustive search. But there are some heauristics which work pretty well. One is to use a greedy algorithm which gradually builds up partial translations and alignments. Consider the following partial translations and alighments:

(Jean aime Marie | John (1) *)

(Jean aime Marie | * loves (2) *)

(Jean aime Marie | Petroleum (1) *)

Using the translation and language models described earlier we can compute the probabilities for each of these partial alignments, and so compute the product [tex]\mbox{pr}(f,a|e) \mbox{pr}(e)[/tex]. It would be very surprising if the first two candidates weren’t assigned a high probability, and the third example a very low probability. Our heuristic for search is thus to maintain a stack of promising partial translations and alignments, and to continually extend those partial translations and alignments until we find a complete translation and alignment with a significantly higher probability than any of the partial alignments.

Commentary

Rather than argue about whether this algorithm is better than that algorithm, all you have to do is get ten times more training data. And now all of a sudden, the worst algorithm … is performing better than the best algorithm on less training data. Worry about the data first before you worry about the algorithm.

Peter Norvig, Director of Research at Google, as reported by Greg Linden.

That completes our description of a simple statistical machine translation system. The system I’ve described is essentially the same as the system described in 1990 by Brown et al at IBM research, a system which translated roughly half of all sentences in an acceptable way! Personally, I find it remarkable that such simple ideas can be used to make so much progress on what appears to be an enormously difficult problem. (I’ll describe other examples in future posts.) Furthermore, although the basic ideas remain the same, many improvements to these ideas have been made since 1990. The biggest single advance seems to have been a movement away from words as the unit of language, and towards phrase-based models, which give greatly improved performance.

If you’re interested in understanding more, I strongly suggest checking out an accessible and very influential 1993 follow-up paper from the IBM group, which describes a whole plethora of translation models. I don’t have a single canonical reference to give on phrase-based statistical translation models, but found several of the top hits at Google Scholar quite informative. Finally, and just for fun, I recommend Peter Norvig’s excellent short introduction to automatic spelling correction, which uses similar ideas in a much (!) simpler setting, but has the big advantage that it is illustrated by real code.

About this post: This is one in a series of posts about the Google Technology Stack – PageRank, MapReduce, and so on. The posts are summarized here, and there is FriendFeed room for the series here. Subscribe to my blog to follow future posts in the series.

Published
Categorized as GTS

Biweekly links for 03/09/2009

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 03/06/2009

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 03/02/2009

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 02/27/2009

  • Paul Graham: What I’ve Learned from Hacker News
    • “When a technology is this young, the existing solutions are usually terrible; which means it must be possible to do much better; which means many problems that seem insoluble aren’t. Including, I hope, the problem that has afflicted so many previous communities: being ruined by growth.”
  • All Consuming: Giving up the Gun: Japan’s Reversion to the Sword, 1543-1879
    • “Guns were introduced to Japan early on with the arrival of the first Europeans in 1543. And yet in 1854 when Cmdr Perry “opened up Japan” in his “black ships”, it seemed like nobody knew much about them. In fact, between those years, Japan had taken up the gun, improved it vastly (initially swordsmen triumphed over gunners because guns were susceptible to wet weather and took a long time to prime and reload), and then gave up on them. This book seeks to find out why.”
  • Geeking with Greg: Jeff Dean keynote at WSDM 2009
    • “Jeff gave several examples of how Google has grown from 1999 to 2009. They have x1000 the number of queries now. They have x1000 the processing power (# machines * speed of the machines). They went from query latency normally under 1000ms to normally under 200ms. And, they dropped the update latency by a factor of x10000, going from months to detect a changed web page and update their search results to just minutes.

      The last of those is very impressive. Google now detects many web page changes nearly immediately, computes an approximation of the static rank of that page, and rolls out an index update. For many pages, search results now change within minutes of the page changing. There are several hard problems there — frequency and importance of recrawling, fast approximations to PageRank, and an architecture that allows rapid updates to the index — that they appear to have solved.”

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 02/23/2009

  • Cathemeral Thinking: Blurts: The value of short, rapid, open communication to collective creativity
  • Parent of gamer asks his son to honor the Geneva Conventions – Boing Boing
    • “I asked Evan to google the Geneva Convention. Then he had to read it and then we had to discuss it. This we did. So the deal is that Evan has to fight according to the rules of the Geneva Convention. If his team-mates violate the Convention then play stops and Call of Duty goes away for a while.

      We’ll see how it goes, but Evan keeps his word. Especially about his games. “

  • Coding Horror: The Bad Apple: Group Poison
    • “Groups of four college students were organized into teams and given a task to complete some basic management decisions in 45 minutes. To motivate the teams, they’re told that whichever team performs best will be awarded $100 per person. What they don’t know, however, is that in some of the groups, the fourth member of their team isn’t a student. He’s an actor hired to play a bad apple… Invariably, groups that had the bad apple would perform worse. And this despite the fact that were people in some groups that were very talented, very smart, very likeable. Phelps found that the bad apple’s behavior had a profound effect — groups with bad apples performed 30 to 40 percent worse than other groups.”

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 02/20/2009

  • Even more open science? « Tobias J. Osborne’s research notes
    • “Whether a theoretical weblog can be truly allowed to be called open notebook science has been questioned recently. I’m not sure where I stand here. Wikipedia’s definition reads: “Open Notebook Science is the practice of making the entire primary record of a research project publicly available online as it is recorded.” This is roughly what I’m trying to do here: I have notebooks containing the records of my research projects and instead of letting them collect dust on my filing cabinet I’m typing them up as I go and sharing them here. (So, naturally, this means you’ll get a lot of dead ends and half explored ideas…) If enough people feel passionately that my weblog doesn’t count then I’m happy to go with the flow and accept whatever definition is deemed more appropriate by those more involved in this kind of thing (open theoretical brain dump, or open theoretical posturing perhaps?) “
  • Disco
    • “Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.

      The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data. “

  • Wikipedia in academic studies – Wikipedia

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 02/13/2009

  • New Kindle Audio Feature Causes a Stir – WSJ.com
    • The Authors Guild, in their great wisdom, on the “read-aloud” feature of the Kindle: “They don’t have the right to read a book out loud,” said Paul Aiken, executive director of the Authors Guild. “That’s an audio right, which is derivative under copyright law.”
  • Continuous Deployment at IMVU: Doing the impossible fifty times a day. « Timothy Fitz
    • Deploying code to production 50 times a day: uses very aggressive and complete testing, and (of course) a fully automated deployment cycle.
  • i9606: non-anonymous peer review
    • Benjamin Good: “I spent this afternoon acting as a voluntarily non-anonymous peer reviewer – its scary. I ended up advocating rejection of the article I was reading and I have to say that Vince Smith … was absolutely right that the act of signing your review “keeps you in check”. Knowing from the outset that your words are going to be linked to your name can really change what you have to say – it certainly makes you think about it for a while longer. It is scary though – I hope that I managed to convey enough of my reasoning and suggestions for ways to improve the article that the authors don’t despise me and attempt to ruin my life… I also hope that the editors of the journal manage to acquire at least one additional reviewer for this manuscript – safety in numbers! Or perhaps the editors will strip my name from my comments? Time will tell I guess.”
  • Wiki Research Bibliography
    • A bibliography of publications about wikis and Wikipedia. Many very interesting looking papers, many of which are new to me.
  • digitalresearchtools
    • “This wiki collects information about tools and resources that can help scholars (particularly in the humanities and social sciences) conduct research more efficiently or creatively. Whether you need software to help you manage citations, author a multimedia work, or analyze texts, Digital Research Tools will help you find what you’re looking for. We provide a directory of tools organized by research activity, as well as reviews of select tools in which we not only describe the tool’s features, but also explore how it might be employed most effectively by researchers.”
  • Seb’s Open Research
    • Seb was an early blogger who ran an excellent blog called “Seb’s Open Research”. He’s recently started up blogging again.

Click here for all of my del.icio.us bookmarks.

Published

Writing to be taken out of context

In a world where it’s becoming easier and easier to reshare information in other forums, where it can be a sign of authorial success to have your words reshared and then reshared again, I’ve noticed that the way I think about what I write is gradually changing.

Increasingly, I write with an eye not only to the immediate way my words will be received, but also with an eye to how they might be read, in the event that they are taken and reshared in other places, perhaps stripped of the context from which they were taken.

This effect has been strongest, by far, in my stream of delicious bookmarks. These are automatically shared not just with people who follow my delicious bookmarks, but also on my friendfeed, and on my blog.

Each of these forums has a slightly different tone and constraints. I find myself sometimes thinking “oh, that description won’t quite work on friendfeed”, when I know that it’s just fine on delicious and on my blog. And I sometimes pause to think about how things might read, when taken and put in a context over which I have no control.

None of this is new, of course – people have always quoted both themselves and each other. Still, I can’t help believing that the frequency and ease of doing this has increased so dramatically over the past few years that it must gradually be coming to strongly affect all those who write on the web.

Published