One of the topics I have been working on today for the book is the area of blog and buzz mining. One element that needs consideration is the ethical framework for this sort of research.
I have had a go at pulling some notes together, from different sources, but I would really appreciate any thoughts you might have and any suggestions you have.
Here is the current text
The Ethics of Blog and Buzz Mining
Guidance and views about ethics in blog and buzz mining are changing and evolving as researchers and their reprepsentative organisations try to come to grips with what is a dynamic field. In addition to guidelines from organisations such as ESOMAR and local legislation, market researchers should pay attention to the terms and conditions that apply to material that has been posted on the Internet. For example, before mining a closed network for information the researcher should check to see if this would contradict the terms and conditions of that network.
When market researchers find themselves without any firm guidance as to what is appropriate, a good guiding principle is to consider what the general public would say if their project were to be made public. Remember that, in terms of the internet, most things do become public at some point.
In the absence of clear guidelines, the following (drawn from a variety of existing sources including ESOMAR Guidelines on Passive Data Collection and the American Psychological Association) should be of some assistance.
- Don’t pretend to be somebody or something that you are not and when you join a group announce your presence (i.e. don’t try to lurk under the radar).
- Check whether the data constitutes personally identifiable information. If the data does qualify as personally indefinable information, ensure that the relevant data protection and security legislation is complied with.
- Check whether any personal information gathered is sensitive. The EU definition of sensitive is fairly useful here and includes data relating to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, and data concerning health or sexual preference.
- The processing of personally identifiable data requires informed consent. In the case of blog mining and web scraping this is likely to hinge on whether the researcher can imply deemed consent, because of the very large number of people involved.
- Data for research purposes should be made anonymous before being passed on to a third party and as early as possible.
- Information that has been put in the public domain, for example posted on a public website (i.e. a site that can be read without joining, registering, or going through any other gate), is available to the researcher.
- Information that is published in a ‘walled garden’ or in a ‘members only’ part of the Internet is not the same as the public domain, researchers should announce their presence and seek co-operation, either from the moderator of the community/forum/network or from the members.
- Ensure that people are not exposed to harm, or at least not any greater harm than they had already put themselves in.
For further reading on this topic the reader is referred to an excellent paper by Neil Hair and Moira Clark in Issue 6 of the 2007 IJMR (Hair & Clark 2007). For example, one of the problems highlighted by Hair & Clark relates to Finn and Lavitt’s study on computer based support groups for sexual abuse survivors (1994). Finn and Lavitt changed the names of the cases they quoted from, but they used the exact names of the forums they were from and the exact dates, making it trivial for anybody to identify the individuals.
The ethics of blog and buzz mining is an evolving field and researchers need to consider how their actions will appear to others, especially the owners of the views they are mining.