1 Introduction

With the widely available network access, Internet based online communities have become a popular and effective knowledge seeking and sharing platform in recent years. Online communities intended for knowledge sharing and problem solving are known as electronic networks of practice (Wasko and Faraj 2005) or online Communities of Practice (Wenger et al. 2002) where their participants share a similar concern or passion about a topic and deepen their knowledge and expertise by interacting on an ongoing basis. They are different from those online communities where discussions are more open-ended, less goal-oriented, and often do not really have a formal termination (e.g., political discussion communities and dating communities). In the rest of this paper, we use the term “online communities” in the sense of “online Communities of Practice”. Several studies show that online communities have become a popular external knowledge source for organizations because of increasing knowledge demands and limited availability of expertise and knowledge within the knowledge repository of their own organization (Constant et al. 1996; Wasko and Faraj 2005; Zhang and Watts 2003). Besides organizations and companies, individuals also use online communities to seek knowledge and solve problems. For example, the end-users of products and services often use online communities to share their experiences and learn from each other (Lee et al. 2006). Over time successful online communities have accumulated a useful and valuable knowledge repository, an active user base, and a dynamic community. It is important to adequately understand how to leverage and share knowledge in online communities to maximize its value.

Online communities are crowd-sourcing knowledge-intensive business services (KIBS) that function as a facilitator, carrier or source of knowledge exchange and innovation (Hertog 2000). Compared to traditional KIBS providers, online communities serve a much diverse client base, which is made up of knowledge seekers in the community, and relies on community members rather than employees to provide services. A community member can be a client at one time and a knowledge provider at another. The business goal here is to facilitate and support knowledge sharing among community members. Table 1 lists several online community examples that support knowledge sharing. The business processes are the knowledge sharing processes embedded in online discussions. The business processes can be very different from traditional business processes such as tech support. For example, organizational hierarchies usually do not exist in open online communities. In addition, the knowledge-sharing processes in online communities only contain a very limited number of tasks (mostly problem-solving tasks), activities (posting and replying), and participant roles (knowledge seeker and knowledge sharer). That will affect the specificity (Goedertier et al. 2011) of the mined process models that would reveal little insight about the processes other than the sequences of posting and replying activities. Therefore, it is difficult to apply existing business process modeling techniques to understand and analyze knowledge sharing processes in online communities.

Table 1 Examples of online communities supporting knowledge sharing

Despite the increasing number of online communities, little is known about how to facilitate or support knowledge sharing in online discussions. Existing studies on knowledge sharing in online communities have primarily focused on system design factors (Sharratt and Usoro 2003) or motivations behind knowledge sharing participation (Chiu et al. 2006; Setia et al. 2012; Wasko and Faraj 2005). Recent studies have examined the contents of online knowledge sharing and used content analysis to find experts in online communities (Wang et al. 2013) or to predict the helpfulness of online knowledge sharing (Wang et al. 2011) or reviews (Cao et al. 2011). However, very few studies have examined the communication process of online knowledge sharing where dynamic communication patterns can be an indicator of effective knowledge sharing discussions. Among them, Yu et al. examined the communication patterns of an online mailing list and found two-way communication important for supporting effective knowledge exchange, reuse, and construction (Yu et al. 2010). The study shows that communication patterns do impact the effectiveness of knowledge sharing. It is necessary to further examine communication patterns in online communities and have a deeper understanding on their impact on knowledge sharing.

In this study we aim to understand and analyze online knowledge sharing discussions from a communication process perspective. The contribution of this research is three-fold. Firstly, online community practitioners will benefit from this research by promoting those communication patterns that improve effective knowledge sharing experiences. Secondly, our findings can help online community users identify and engage in those discussions that are more likely to yield effective knowledge sharing outcomes. Lastly, our research will help better understand the knowledge sharing processes in online communities and call forth more research that focuses on those processes.

The rest of the paper is organized as follows. Section 2 reviews related work on knowledge sharing in online communities. Section 3 introduces a computational framework for analyzing individual knowledge sharing processes in online communities. Section 4 describes our empirical evaluations and report research findings. Section 5 provides a conclusion and a discussion on the practical implications of our research findings and the limitations of our research.

2 Related work

Past research has used communication networks and structural network analysis to reveal aggregated community-level communication patterns in online communities. Very few studies focused at how communication patterns may affect the effectiveness of individual knowledge sharing discussions. Yu et al. (2010) showed that communication patterns were indeed important for supporting knowledge exchange, reuse, and construction. In this section, we review community-level communication network studies and structural network analysis methods, with which we propose to focus on studying communication patterns in individual knowledge sharing discussions instead of aggregated patterns at the community-level.

2.1 Communication networks in online communities

An online community is often viewed as a social network where users of similar interests share information or knowledge within a knowledge domain. In this network, members of the community engage in continuous social exchanges online with diverse motivations. The social cognitive theory (Bandura 1986) has been used to explain motivations behind user participation in online communities. Some participate out of pure intrinsic motivation such as friendship and loyalty, while others participate more out of extrinsic factors such as recognition within the communities, rewards, increased reputations, or a feeling of obligation to share (Constant et al. 1996; von Hippel and von Krogh 2003; Peddibhotla and Subramani 2007; Preece 2000; Wasko and Faraj 2005). Those motivation studies remain focused at individual-level analysis, addressing the question of when and why individuals participate in online communities. However, due to individual cognitive differences, some findings on participation motivation can cause conflicting explanations (Faraj and Johnson 2011). Communication networks are used to examine aggregated individual behaviors in online communities at the community-level. This perspective provides a complementary understanding to user participation behaviors and motivations in online communities by overlooking individual differences.

Communication network has been used to identify aggregated communication patterns in online communities at the community-level. For example, Nolker and Lina (2005) built a communication network based on discussions in an online community. They used network centrality measures to distinguish different user roles (e.g., leaders, motivators, and chatters). Moreover, De Moor (2006) used a communication network to identify frequent communication patterns, which is a set of communicative workflows and norms describing acceptable and desired communicative interactions within a community. Faraj and Johnson (2011) studied frequent communication exchange patterns in online communities from the communication network perspective. Their study is built on the dual aspect of online interactions: they are social exchanges that take place between participants within a network context. In other words, regardless of resources exchanged, facts, know-how, answers to questions, or social niceties, the interactions within online communities are social in nature. Moreover, online community interactions take place within the context of a social network, and are mediated by the communication network. All user posts are visible to all other participants and are organized in discussion threads. Guided by both the social exchange theory (Cook and Rice 2001) and the network exchange theory (Burke 1997), Faraj and Johnson proposed three network exchange patterns that they claimed to be visible in all types of online communities: direct reciprocity (user A will reply to user B’s post as a direct reciprocity for user B’s prior help to user A), indirect reciprocity (After user B helped user A, user A will help user C as an indirect reciprocity), and preferential attachment (new actors choose to interact with already well-connected others: both user A and user C choose to interact with a prominent user B). Using five online discussion forums, they found consistent results across five online communities that the pattern of ties is consistent with norms of direct reciprocity and indirect reciprocity, and has a tendency away from preferential attachment. Their results across five online communities show the same network exchange tendencies (direct and indirect reciprocity, a tendency away from preferential attachment) but do so in different magnitudes.

Those studies show that communication network is a useful tool for finding frequent communication patterns without addressing individuals’ motivational differences.

2.2 Network structural analysis methods

Interesting network structural characteristics such as power-law and scale-free are frequently observed for online social networks such as Flicker, YouTube, and LiveJournal (Cheng et al. 2008; Kumar et al. 2006; Mislove et al. 2007), and for blogs (Chau and Xu 2012). When analyzing online communities from a network perspective, network structural measures can reveal the importance of users in the social network, the characteristics of communication patterns, and social structures that emerge from online discussions (Wassermann and Faust 1994). Adamic et al. (2008) calculated network structural measures for a communication network constructed based on a group of online discussions. Their intention was to find those users who were more likely to provide answers to questions. These network structural characteristics can provide meaningful guidance for efficient information dissemination and effective knowledge sharing in network-based systems (Mislove et al. 2007).

2.3 Summary

Online community studies based on communication networks are mostly focused at the network-level or community-level analysis, which is built upon aggregated knowledge sharing processes in an online community. There are very few micro-level studies that focus on individual knowledge sharing processes (i.e., individual discussion threads). To the best of our knowledge, the only exceptions are Gómez et al. (2008) and Laniado et al. (2011). Both studies used tree graphs and network graphs to represent individual knowledge sharing discussions and observed discussion-level structural properties such as self-similarity. A tree graph is a special network type that captures the post-reply relationships in a discussion. In a tree graph, each node represents a discussion participant while each edge denotes a replying relationship between two participants. A tree graph can be transformed into a network graph by merging participant nodes. Although those discussion-level analyses revealed interesting discussion patterns, they did not utilize the network structural measures and fail to show the significance of the discussion-level communication patterns related to effective knowledge sharing in online communities.

3 Examining online knowledge sharing processes: A micro-level analysis

In this study, we propose a new computational framework for analyzing individual knowledge sharing processes in online communities. The objective is to find communication patterns that are related to effective knowledge sharing in online communities. Our proposed framework, illustrated in Fig. 1, consists of five steps. Steps 1–2 are the data processing stage, which downloads and extracts data needed for modeling online knowledge sharing processes. Steps 3–5 are the stage of communication process modeling and structural analysis, which seeks to uncover the insights about online knowledge sharing patterns useful for online community practitioners. The crawling step downloads online discussion threads from an online knowledge community. An introduction to Web crawling can be found in (Baeza-Yates and Ribeiro-Neto 1999). In the rest of this section, we describe metadata extraction, communication process modeling, network structural analysis, and communication pattern mining in details.

Fig. 1
figure 1

A micro-level computational framework for online knowledge sharing processes

3.1 Metadata extraction

Online communities for knowledge sharing are often designed in the format of a discussion forum. According to the Big Boards (http://www.big-boards.com/statistics/), the majority of discussion forums uses one of three service providers, namely vBulletin, Invision, and phpBB. All three forum providers organize online discussions into discussion threads, each of which has a starting post followed by replying posts. Each post contains not only text content, but also metadata including an author identifier (e.g., a user name), a time stamp, a post ID, a thread ID, the post ID of the previous post being replied to, and sometimes a user feedback indicator. Most advanced online communities such as the Apple Support Communities and Oracle’s Java Programming Forum allow users to provide helpfulness feedback for reply posts. Each online community may have a different way of indicating users’ helpfulness feedback. In this research we consider three types of user feedback: solved (a satisfactory solution is obtained in a discussion), helpful (helpful information is obtained but not an actual solution), and unhelpful. After crawling all web pages from an online discussion forum, we can acquire the metadata of online discussion threads by parsing those retrieved pages. Table 2 shows an example of online discussion metadata.

Table 2 An example of metadata extracted from online discussion threads

3.2 Communication process modeling

Based on the metadata extracted from the previous section, we create post trees and communication networks for individual knowledge sharing processes based on extracted posts, authors of posts, and reply relationships between the posts.

In most online discussion forums, the posts of a thread are organized chronologically, from oldest to latest, as shown in Fig. 2a. Given a problem-solving oriented discussion thread, the knowledge seeker makes the first post describing the problem. Other knowledge sharers as well as the seeker himself are engaged in the discussion by posting to the same thread. We can then build a post tree that captures the replying relationships among the posts using a tree-like view shown in Fig. 2b. That view helps us understand the conversational structure of knowledge sharing discussions. Furthermore, based on the post authorship and replying relationships, we can convert the post tree into a micro-level or discussion-level communication network, which illustrates the communication interactions among the discussion participants. The network view helps us identify important communication patterns that may improve the effectiveness of knowledge sharing. We now provide formal definitions for a post tree and a micro-level communication network based on the graph theory.

Fig. 2
figure 2

a Posts in a discussion thread with arrows representing reply relationships; b a post tree; and c a communication network. Squares denote posts in a thread. Circles denote post authors (A, B, …, E). Arrow lines represent the reply direction between two posts or users

Definition 1. (Post tree)

A post tree is a directed graph T = P, E with no cycles. A tree node set P contains all posts in a discussion thread. The root node of T denotes the starting post that contains a question. A tree edge set E consists of all reply relationships among all posts in the thread. A directed edge e = u, v indicates that post u replies to post v.

Definition 2. (Micro-level communication network)

A micro-level communication network is a directed graph = U, E created for a discussion thread. Each node u i  ∈ U corresponds to a user while each edge in this graph e i = (u i , u j ) represents a reply relationship between two participants in a discussion thread.

3.3 Communication pattern mining

The post tree and micro-level communication network are used to capture an online knowledge sharing process in two different aspects. Our aim is to find the communication patterns embedded in the two process representations that may be associated with effective knowledge sharing leading to helpful knowledge or problem solutions. Network analysis methods have been commonly used to extract structural characteristics and patterns for graph-based representations. In the rest of this section, we describe several commonly used network analysis measures such as network structural characteristics and sub-graph mining for detecting important communication patterns.

3.3.1 Network structural analysis

Depth and width

Given a post tree T, its depth is the length of its path from its root to the deepest node of the tree. Its width is simply defined as the number of branches (i.e., degree) that T’s root node has. For example, the depth and width of the post tree in Fig. 2b are both 3.

h-index

To overcome the bias of post quantity, Gomez et al. proposed a modified version of h-index to measure the controversy of a thread discussion (2008). The h-index was originally invented to measure the scientific contribution of a researcher based on his/her publication quantity and the number of citations each publication received. The h-index of a discussion thread is defined similarly: Given the post tree of a thread, the h-index value h is its maximum depth h that has i replies where i ≥ h. For example, the h-index of the post tree in Fig. 2b is 3 because the depth level 3 (h) has three posts (i).

Dialogue length

This is a heuristic-based measure. We argue that a thread would be more likely to reach a solution if the knowledge seeker and sharers have a long conversation between them. Therefore, we define a dialogue as a series of continuous replies between a knowledge seeker and a sharer. The dialogue length is the number of replies in the longest conversation. For example, the dialogue length of the post tree in Fig. 2b is 4.

Reciprocity

Reciprocity measures the tendency to form a mutual connection between vertex pairs in a directed network. In other words, it defines how often bi-directional links occur between any two nodes. It is important to study two-way communications because it has shown to be important in knowledge exchange, reuse, and construction (Yu et al. 2010). Traditionally, the reciprocity r is defined as the ratio of the number of bi-directional edges \( \overleftrightarrow{L} \) to the total number of edges L in a directional network (Wassermann and Faust 1994).

$$ r=\frac{\overleftrightarrow{L}}{L} $$

When r = 1, it is a fully connected network with all connections bidirectional. When r = 0, it is a network where bi-directional links do not exist at all. Because of some conceptual problems with r, Garlaschelli and Loffredo (2004) defined a reciprocity coefficient ρ as the correlation coefficient between the entries of the adjacency matrix of a directed graph,

$$ \rho =\frac{{\displaystyle {\sum}_{i\ne j}}\left({a}_{ij}-\overline{a}\right)\left({a}_{ji}-\overline{a}\right)}{{\displaystyle {\sum}_{i\ne j}}{\left({a}_{ij}-\overline{a}\right)}^2}=\frac{r-\overline{a}}{1-\overline{a}} $$

where a ij  = 1 if a link from i to j exits and 0 otherwise. Both reciprocity measures are used in this study.

Cluster coefficient

The local cluster coefficient of a vertex in a graph measures how well its neighbors are connected to be a clique, in which every two neighbors are connected. Consider a graph G with vertices i = 1,…,n . Let k i be the number of neighbors that vertex i has and n i be the number of edges between its neighbors. The local cluster coefficient for vertex i is defined as

$$ {C}_i=\frac{n_i}{\left(\begin{array}{c}\hfill {k}_i\hfill \\ {}\hfill 2\hfill \end{array}\right)} $$

The global cluster coefficient for the whole graph is calculated as the average of the local cluster coefficients of all n vertices (Watts and Strogatz 1998),

$$ \overline{C}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^n}{C}_i. $$

3.3.2 Network sub-graph mining

We use the sub-graph mining technique to identify frequent communication patterns embedded in post trees and micro-level communication networks that are more likely to lead to effective knowledge sharing. Sub-graphs are basic structural elements of many types of natural networks such as chemical compounds (Yan and Han 2002), protein structures (Huan et al. 2004), and social networks (Lahiri and Berger-Wolf 2008). Studies have demonstrated that mining common substructures in graphs is crucial to understanding the interactions and dynamics at work in those graphs. Generally, the problem of sub-graph mining can be formalized as follows. Given a database with n graphs G = {G 1, …, G n }, the support level of a sub-graph g, sup(g) is the percentage of graphs in G that contains the sub-graph g,

$$ \sup (g)=\frac{\left|{\delta}_G(g)\right|}{n}. $$

Sub-graph mining algorithms can then discover all sub-graphs with the minimum support (σ) where sup(g) ≥ σ. The pseudocode of the sub-graph mining algorithm is presented in Fig. 3. The most computationally intensive parts in the algorithm are testing sub-graph isomorphism in the subroutine subgraph-isomorphism() and generating candidate sub-graphs from size k to k + 1 in the subroutine candidate-generate(). Existing sub-graph mining algorithms employ various strategies to speed up these two processes.

Fig. 3
figure 3

Sub-graph mining algorithm

We propose to use an efficient sub-graph mining algorithm named gSpan (Yan and Han 2002) for frequent communication pattern discovery in online discussions. gSpan is a DFS (Depth-First-Search) based algorithm. In a comparative survey of algorithms for frequent sub-graph discovery, DFS-based algorithms were found to be much more efficient than other algorithms (Krishna et al. 2011). In their experiment, the search of subg-graphs was completed within minutes given a dataset with more than 42,000 graphs.

Sub-Graph Mining Algorithm

Input: a graph database G = {G 1, …, G n }, a minimum support σ

Output: g = {g 1, …, g k } a set of frequent sub-graphs of cardinality 1 to k

C 1 ← all sub-graphs of cardinality 1

g 1 ← {c ∈ C 1| sup(c) = c. count/n ≥ σ}

for (k = 2; g k − 1 ≠ ∅; k++) do

      C k  ← candidate-generate(g k − 1)

      for each graph G i  ∈ G do

          for each candidate c ∈ C k do

               if subgraph-isomorphism(c, G i ) then

                     c. count++

               end

          end

      end

      g k  ← {c ∈ C k | sup(c) = c. count/n ≥ σ}

end

 return g ← ∪ g k

4 Empirical evaluations and results

4.1 Data collection

We evaluated our computational framework using real data downloaded from two large online communities: the Apple Support community for consumer product support and the Sun’s Forums for software development support. These two communities represent two major types of online communities: consumer communities and developer communities. Both communities are IT related, problem-solving oriented, and large in size. The Apple Support Communities, which attract a large number of Apple product consumers, has a much diverse user base than the Sun’s Forums in terms of the users’ technical background and technology literacy. Users in the Sun’s Forums are mainly IT developers who are familiar with technology and share a relatively focused interest.

To limit the scope of our study, we focused on one of the most active sub-forums from each of the two communities. Specifically, we crawled 49,343 online discussion threads published between June 29, 2007 and April 6, 2010 from the Using iPhone forum in the Apple Support Community, and 70,488 threads published between June 19, 2001 and June 6, 2009 from the Java programming forum in the Sun’s Forums. We removed 6702 and 6979 discussion threads from the iPhone and Java data sets respectively because they did not have any reply.

A unique feature provided by these two online communities is the ability to let knowledge seekers rate replies provided by knowledge sharers. The seeker can rate a reply with either a “This helped me” (helpful) tag or a “This solved my question” (solved) tag based. Table 3 shows the total number of threads with at least one reply and the number of threads with different user feedback labels in each dataset. Those threads without any user feedback do not necessarily mean that they contain no helpful knowledge or solutions. Past research has found that user feedback participation is often very low in virtual communities (Cao et al. 2011). Due to users’ negligence or unwillingness, many discussion threads in the two forums that are without any user feedback, however, may contain helpful knowledge or even problem solutions. We decided to manually examine those threads without feedback and identify those discussion threads that failed to be productive in knowledge sharing.

Table 3 Statistics of discussions threads with different user feedback labels

From each data set, we randomly selected unlabeled discussion threads without replacement and asked two Computer Science graduate students to manually examine whether they contained solutions or helpful information. From each dataset we identified 1017 threads that both experts agreed upon not containing any helpful knowledge to the question being discussed. We then randomly selected 500 solved discussion threads, 500 helpful threads, and 500 “unhelpful” threads. During the selection process, we made sure that each selected thread was problem solving oriented. Table 4 shows some basic statistics of the three types of discussion threads.

Table 4 Basic statistics of the three types of discussion threads

We can observe that effective knowledge sharing discussions, i.e., the discussions threads labeled with a solved or helpful tag, have significantly more posts and participants than unhelpful discussions (i.e., threads manually identified as unhelpful). The observation is intuitive. A discussion thread with many posts indicates an active knowledge sharing process, which often leads to a positive outcome. Similarly, if a discussion attracts more participants, it will be more likely to have an active sharing process as well as diverse knowledge sources.

We also observe differences between the two datasets. The iPhone forum has slightly more users participating in each discussion thread on average. However, the average number of posts in the iPhone forum is fewer than that in the Java forum. It shows that Java forum participants, who have less diverse technical background and share a more focused interest, are more committed to discussions and contribute more than iPhone forum participants.

4.2 Results

We parsed each selected discussion thread to get its metadata before we constructed a post tree and a micro-level communication network. We then calculated the seven network structural metrics described in Section 3.3.1 for all 3000 threads in our sample datasets. We notice that the number of posts in a thread may significantly affect most metrics. Therefore, discussion threads with different lengths cannot be directly comparable. We grouped the selected threads by their thread type (solved, helpful, and unhelpful) and length (i.e., number of posts). Discussion threads with 12 or more posts were discarded because they were few and did not have enough data points to estimate accurate means. We then calculated the mean values of those metrics for the threads with the same length in each thread group. Finally, we computed the average network structural metrics across thread groups with different lengths for each thread type.

4.2.1 Differences in network structural characteristics

Table 5 shows the comparison on the seven network structural metrics across the three thread types. The significance level α = 0.05 was used to determine the statistical significance of the differences between solved/helpful threads and unhelpful threads. All structural metrics except h-index showed significant differences when comparing solved or helpful threads to unhelpful threads.

Table 5 The differences in network structural characteristics across different thread types and datasets

The results show that effective knowledge sharing discussions do have distinct structural characteristics compared to unhelpful discussions. An effective knowledge sharing discussion tends to have more depth and less width. It suggests that a knowledge sharing process with a limited number of engaged dialogues is more likely to yield productive knowledge sharing. The dialogue length values suggest that helpful or solved discussions are more likely to have a long conversation between the knowledge seeker and at least one particular knowledge sharer than unhelpful discussions. Results in reciprocity and clustering coefficient show that effective knowledge sharing processes have higher reciprocity (two-way communications) and more cohesive discussions among different discussion participants than unhelpful processes. The above findings are consistent across the two datasets.

The differences in the metrics between solved and helpful threads were statistically insignificant on the Java dataset. For the same comparisons on the iPhone data set, only the dialogue, reciprocity and cluster coefficient metrics had significant p-values. Therefore, there is no conclusive evidence that these structural metrics can be used distinguish solved threads from helpful threads.

Comparing the metrics between the two datasets, solved and helpful threads in the iPhone forum have significantly smaller depth, larger width, shorter dialogue, less reciprocity, and lower clustering coefficient than those in the Java forum. That can be explained by the fact that users in the Java community, compared to those in the iPhone forum, have a focused interest and are more committed to problem-solving discussions.

4.2.2 Differences in communication network patterns

We used the gSpan algorithm (Yan and Han 2002) to discover sub-graph patterns from the communication networks with a minimum support level of 10 %, which is the default threshold used by gSpan. The minimum support level of 10 % means that a sub-graph is considered as a frequent pattern if at least 10 % of the communication networks contain the sub-graph. Table 6 lists support values for those sub-graph patterns in the three types of discussion threads. We compared the frequencies of patterns in “unhelpful” with those appeared in “helpful” and “solved” threads.

Table 6 Support values of sub-graph patterns (a light-colored node represents a knowledge seeker, a dark-colored node represents one who replied)

First, we found that those patterns with at least four nodes all had higher supports in helpful and solved threads than in unhelpful threads. It suggests that having diverse participants (at least three different knowledge sharers) improves the likelihood of having a helpful discussion.

Second, increased interactions among knowledge sharers can improve the chance of having a helpful discussion. Among 3-node patterns, patterns 3–1 and 3–3 had almost the same level of support across the three thread groups. However, patterns 3–2 and 3–4, both of which showed interactions between knowledge sharers, had much higher support in helpful and solved threads than unhelpful threads. A similar trend exists in 4-node patterns. The support values of those patterns with more interactions between sharers (e.g., patterns 4–2 and 4–4) had a significant increase in helpful and solved threads over the corresponding support values in unhelpful threads. The finding suggests that it is important to have interactions among knowledge sharers in problem-solving oriented discussions. It is consistent with group learning literature that fully-connected or cliquish communication patterns improve group learning performance (Chen et al. 2003).

Lastly, unhelpful threads do not exhibit distinct communication patterns. Patterns 3–1 and 3–2 are common in all three types of discussion threads. It suggests that unhelpful threads are often caused by insufficient user participation and insufficient engagement in discussion dialogues. Both the difficulty level of the question and lack of communication can contribute to insufficient participation. In this research, we focus on the problem of lack of communication. To avoid unhelpful discussions, we call for online community practitioners to design mechanisms for encouraging user participation.

5 Conclusions and discussions

Online communities designed to support knowledge sharing have become an important knowledge source. While most existing studies on online communities focus on system design factors or motivations behind user participation, we aim to decipher the online knowledge sharing processes and identify communication patterns associated with effective knowledge sharing. In this paper we proposed a computational framework to examine individual knowledge sharing processes in online communities from a communication process perspective. Our empirical evaluations showed that, compared to unhelpful knowledge sharing processes, effective knowledge sharing processes exhibit distinct structural characteristics using post tree and communication network representations.

Our research findings on knowledge sharing processes in online communities have practical implications for online community practitioners. The practitioners should encourage those structural characteristics and communication network patterns that we found more likely to appear in effective knowledge sharing processes. For example, we suggest that knowledge sharing processes with a limited number of engaged discussion dialogues is more likely to have effective knowledge sharing. Online community practitioners can choose up to two replies directly replying to the original question post based on relevance and filter out the rest of direct replies that are less relevant. Therefore, discussion dialogues can be focused on two selected reply posts. We also suggest that effective knowledge sharing has a longer dialogue length. Online community may develop a reputation mechanism that rewards activities contributed to long dialogues (e.g., with at least three continuous replies in one dialogue). Similarly, we found that effective knowledge sharing had higher reciprocity and more cohesive discussions among discussion participants than unhelpful discussions. A reputation mechanism can be used to reward reciprocal postings and those posts that improve the cluster coefficient of a discussion process (i.e., increase the number of interactions among discussion participants in the same thread).

Our study still has several limitations and can be extended in a number of directions. First, our computational framework only examines the structural characteristics and communication patterns of knowledge sharing processes without directly considering the content and information quality of individual posts. Although user feedback ratings can be considered as quality inputs, they are only available for very few posts in a discussion thread if there is any. Some posts may be irrelevant to the problem raised in the original posting or may have low information quality. Those posts contribute little to problem solving and knowledge sharing, but potentially introduce biases into the structural metrics calculation and communication patterns. Second, we currently do not consider the nature of the problem in a discussion, specifically the difficulty level of the problem. A knowledge sharing process may not be productive due to the problem being too difficult or complex to resolve rather than the patterns in the knowledge sharing process. Removing those problems would further improve the validity of our findings. Lastly, although we conducted empirical evaluation using two different online communities in two different domains, we could still further improve the external validity of our research by evaluating more knowledge sharing processes in other online communities. We expect that similar result would be obtained from other tech support forums. However, some online communities that have unique characteristics should be further examined. For example, communication patterns may change in those online communities where users are identifiable and offline relationships may affect online communication patterns (e.g., corporate intranet communities (Leshed 2009)).