Our data consist of video and sound recordings of people talking together in Danish. As far as possible these are recorded in people’s “natural” environment. By natural we mean that we have gotten permission to record people in their daily lives, at work or privately. This means that the conversations are not arranged for the sake of the recordings, they do not take place in special laboratory surroundings, and people have not received special tasks from the data collectors.
The purpose of this collection method is to gain insight into the way people actually talk when they interact. Interactants have their real-life purposes, and they say and do things because this means something to them. In this way we have been granted access to a peak into the lives of conversation participants.
All conversation participants have given their consent to the recordings and are aware that they are being recorded. This awareness can of course affect their behavior. But this does not stop people from saying and doing the things they need to do in the interaction, and therefore we do not see the awareness of the recordings as a principal problem.
The data we used for our own research are from two sources:
(1) Samtalebanken, which is a publicly available collection of conversations. The people participating in these conversations have given their permission that the conversations be publicly available online. On the samtalebank webpage you can hear and see the recordings and accurate transcriptions of them. When we use extracts from the samtalebank, we link to the conversations and transcriptions on this webpage.
(2) AULing, which is our own collection of recordings and transcriptions of conversations. The people participating in these conversations have given permission to researchers and students who have signed a declaration of confidentiality to see the original data. For this reason we cannot link to these data.
Every time we use excerpts of conversations, we have listened/looked through the original data and re-transcribed it to fit our transcription system so that we can vouch for the accuracy of the transcripts. In the transcripts, any information that could lead to the recognition of the participants has been changed (anonymized). This means that names and places etc. are fictitious.
Clarin.dk has made an infrastructure which, among other things, include corpora of spoken language, including the samtalebank.
Nielsen & Nielsen (2005) has a Danish section on data collection and treatment.
Samtalebanken has a corpus of conversations online with sound, videos and accurate transcriptions, publicly accessible.
Steensig (2001) is a Danish book that includes sections on principles in data collection and transcription.
Steensig (2005) accounts for the principles in conversation analytic transcription and compares different transcription systems.
Steensig (2010) shortly accounts for some of the principles behind conversation researchers’ collection and use of data.
Wagner (2003) is an accessible introduction to the use of interactional data.