Sunday, September 28 • 14:36 - 14:55
"Do pseudonyms enable or unleash? An empirical investigation on YouTube and Twitter"

Context:  Thirty years have now passed since Kiesler, Siegel and McGuire’s (1984) landmark paper demonstrating an association between anonymity and antisocial behaviour via computer mediated communication. It is now a well-established cultural myth that anonymity breeds trolling, flaming, lowered inhibitions and ultimately problematic content. Yet, anonymity and pseudonymity persist in computer-mediated communication. As Hogan (2012) notes, these reasons are partially functional – such as a limitation of characters or distinct addresses. They may also be in response to online content as persistent and searchable. As Marwick and boyd (2011) note, collapsed contexts on public social media lead to complex ways to articulate positions that may be at odds with part of one’s audience. Pseudonymity is one response to the challenges of multiple audiences. These challenges concern both social graces and more contentious topics such as Mexican gangs (Bernstein et al., 2011) and coming out (Burgess and Vivienne 2012). 

Thus, despite evidence that in laboratories individual display different activities under fleeting conditions, there is both a dearth of evidence from field studies and a need to strike a balance to protect free expression.

Objective: In this paper, we examine the use of identity markers in the popular online sites YouTube and Twitter. We examine comments on videos submitted by official political parties and Premier league football clubs. Both YouTube and Twitter encourage real names, but permit pseudonyms. They also have an API for the harvesting of extensive data on comments and users. By classifying boththe content and the identity markers of the individuals (such as having a photo, an old/new account, many friends and followers) we can move beyond a binary distinction of real names equated with good behaviour and pseudonyms associated with bad behaviour.

Methods: Both the classification of identities as real or pseudonymous and the classification of content as offensive or innocuous are non-trivial tasks. We take different approaches to these two tasks. For content, we use voting scores on Reddit.com’s political and football subreddits as baseline for positive and negative content and perform a naïve Bayes classification on comments to official videos and tweets from major figures. Reddit comments are useful for training as up/down voting is very common, leading to strong signals (even if there are obvious limits to this approach). For the classification of identities we employ crowd labour through Crowdflower with multiple coders focusing on features of accounts that suggest the account holder is resolvable (she is who she says she is) and findable (she is available in person through this account). Verification photos are the most resolvable while telephone numbers and home addresses make people the most findable. Using this approach we can investigate identity as a set of features rather than using a mere binary distinction of ‘real’ or ‘fake’. We model content using logistic regression with identity features as the independent variables and offensiveness drawn from the classification as the dependent variable.

Results: Preliminary results suggest that age of the account and the implied gender of the account holder are stronger features than whether a given name is used. Verification of these results will be included in the final paper.

Conclusions:  Anonymity is an entrenched part of contemporary democracy, especially through secret ballot. Yet, democracies also accept that it must be permitted within a coherent framework that minimizes abuse (such as the use of poll cards). Such bounded identity practices make sense online as well. Herein, we can consider features such as account age, number of friends and linkages to other accounts as part of a risk model for offensive content. Thus, individuals need not reveal their name in order to be considered legitimate, so long as they invest in the identity they have chosen. As a methodological contribution, we also demonstrate a means for articulating identity on a gradient of identifiability rather than merely a binary distinction.

Bernie Hogan

University of Oxford

Vyacheslav Polonski

University of Oxford

TRS 1-149 Ted Rogers School of Management

