Research Interests

I have conducted research in several fields using a range of methodologies, focusing on social aspects of the use of the Internet, especially as a site of contact among people of varying backgrounds, with the intent of understanding how the technical systems of the Internet are coupled with contemporary social dynamics. This research has several manifestations, described below, and on various linked pages on this site.

  • Social Network Analysis of Information and Communication Technologies
  • Internet Multilingualism and Language Diversity
  • Emergent Semantics
  • Social Dynamics of New Media
  • Probabilistic, Formal and Quantitative Models of Language

Social Network Analysis of Information and Communication Technologies

Computer-Mediated Communication (CMC) has rapidly evolved into a primary form of communication in a broad range of interpersonal, organizational, and mass communication functions. Understanding CMC is now a key enterprise of the social sciences and humanities. Among the things we most urgently need to understand are the large-scale patterns of CMC, whether these involve technology adoption, information diffusion, the creation of new forms and genres of communication, or more complex social dynamics.

Social Network Analysis offers an important perspective on CMC use because it allows one to directly connect observation of individual-level actions with global trends. It is applicable to a broad range of questions about information and communication technologies and their social effects, and permits us to compare the effects of different CMC types, to each other and to face-to-face communication. On the theoretical level, SNA is especially useful in developing understandings of the social dynamics of CMC use.

Internet Multilingualism and Language Diversity

As I noted in my earliest research on CMC, the Internet is a theater of language contact on a scale larger than any previously observed (Paolillo 1996). It is commonplace for people to discuss the Internet in terms of its global reach, bringing countries and cultures together into contact. It is less common for people to note that this contact also means contact among people with different language backgrounds, and less common still for people to recognize that language contact on Internet scale has unprecedented consequences for the nature, vitality, and diversity of the world's languages. In any language contact situation, the status of languages and their speakers is negotiated, most often through overt conflict. Societal majorities, whether in a local or global sense, have a decided advantage in these negotiations. Hence, language contact often exacts a cost over one or more generations from minority languages and their speakers. Language contact on the Internet offers many opportunities to observe and better understand these social dynamics and their consequences. In addition, Internet language contact involves technical considertations, such as the encoding of scripts, multilingual text, etc., that do not occur in other language contact situations.

Emergent Semantics

Linguistic expressions, unlike those of formal languages, do not have fixed meanings, but rather have meaning that is shaped by context in various ways. The processes under which linguistic expressions evolve new meaning are sometimes known as emergent semantics, where "emergence" is understood in terms of the expression of complex behavior through the interaction of a mass of units following simple rules, as in the interactions of people in a larger social context. One locus of attention in emergent semantics is the notion of "tagging", i.e. the employment of user-defined metadata for the purpose of categorization and retrieval (for an example, see In such applications, the collective semantics of a set of resources, tags or users is said to "emerge" through the simple practice of users applying their own tags, and using the tags that others have applied. CMC provides further examples, as when ideas are contested in public discourse on Usenet newsgroups or in weblogs. Semantic emergence may be studied through vector-space modeling and social network analysis. When viewed particularly under the lens of the latter, semantic emergence is revealed to be highly heterogeneous, depending on the particulars of the application, the linguistic nature of the tags applied, and the social milieu in which it is undertaken. Moreover, semantic emergence may involve conflicting representations which are not necessarily resolved.

Social Dynamics of New Media

New Media is a label that refers to primarily digital forms of communication that may serve mass-communications functions, such as interactive applications for the world-wide web, video blogs, streaming internet video and audio, and viral marketing via weblogs. Apart from the employment of the Internet, many forms of new media have reduced entry barriers and lower production costs. Furthermore, new media forms are shared among Internet users by means not available to traditional mass media — there is no television-age equivalent to sending an email with a link to an online video clip. Consequently, participation in new media is open to a broader sector of society, and new forces are brought to bear on the forms and uses of new media. These changes have given rise to new genres of communication, such as the video weblog and the Internet animation, while demand for new media creates pressure for platforms promoting new consumption modes (e.g. and even new industries. This dynamic environment provides many opportunities to examine evolving relationships among technologies, social processes and institutions.

Probabilistic, Formal and Computational Models of Language

The analysis of language corpora is undertaken in different forms in the research of Computational Linguistics, Corpus Lingusitics, and Information Retrieval, among others. Common to all three approaches is the notion of a quantitive model for the distribution of linguisitc elements. As might be expected, the different approaches have different notions of what constitutes a language model. The notion used in IR is sometimes referred to as the bag of words model, because it does not make use of sequence information. In Computational linguistics, language models are generally intended to account for syntax, which includes word order, so the notion used there may be based on n-gram statistics and/or phrase structure. Corpus Linguistics tends not to use the term "language model", although some branches of it use quantitative models that are structurally similar to the language models of IR, but which may or may not incorporate syntactic information. Few if any of these models are articulated as fully statistical models, with a consistent functional form, a theory of error distribution, and a means for evaluating the adequacy of model fit. This situation greatly complicates comparison and interpretation of language models across the different fields in which they are used. My research in this area focuses on the application of Generalized Linear Models and Latent Space models to language data. These models permit one to systematize empirical linguistic analysis so that rigorous and sound interpretations of the language phenomena can be offered.

