
12.1 Research Background

According to Nielsen’s survey in 2012,Footnote 1 the growth rates of common social activity sites such as Facebook, Myspace, and LinkedIn dropped in 2012. In contrary, the most grown social services in last year include Pinterest, Blogger, Twitter, Tumblr, and Wikia, which provide content sharing and collaborative authoring. In this research, they are all referred as ‘crowdsourcing’ services, whose definition is somewhat restricted in past decade. Howe [3] defined ‘crowdsourcing’ as ‘…represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call’. This definition just widens the traditional outsourcing process to incorporate online environments and processes. On the other hand, Brabham [1] adopted a broader definition as which ‘crowdsourcing is an online, distributed problem-solving and production model’. In this regard, it fits our previous definition of crowdsourcing, in which all services incorporating collaborative content creation and task completion are included.

Most of contemporary crowdsourcing services provide platforms for task announcement and worker recruitment. A famous example is Amazon’s Mechanical Turk.Footnote 2 Besides such task-worker matchmaking platform, other services are also possible by crowdsourcing, such as crowdvoting and crowdfunding. Message exchange is necessary for crowdsourcing. Various types of messages are available throughout the crowdsourcing process, e.g. usage pattern, user profile, link data, tags, and text messages. Manipulating and mining of such messages are seldom discussed in past. Therefore, it is unclear about the plausibility and effectiveness of data mining, especially text mining, techniques on such data produced during crowdsourcing processes.

12.2 Research Goals

In this research, we will try to achieve the following goals regarding crowdsourcing data mining:

  1. 1.

    To establish schemes for crowdsourcing data management and visualization.

  2. 2.

    To develop kernel techniques for crowdsourcing data mining, such as topic detection and relation discovery.

  3. 3.

    To establish a platform to demonstrate the effectiveness of proposed methods.

We expect that our research will provide a uniform scheme to levitate the data usage in crowdsourcing.

12.3 Research Methods

We will describe the major steps of this research in the following:

Date collection and processing The volume of data, in various types, produced in crowdsourcing process is usually large. We only focus on textual data in this research. Two types of textual data will be collected, namely messages and profiles. Profiles are used to provide demographic and social attributes of messages which are the major sources of mining process. We will develop several approaches to clean, reduce, and normalize these messages, as well as attaching attributes.

Data clustering and classification We will apply self-organizing map (SOM) algorithm to cluster messages to discover the relations among messages. Various SOM implementation, such as classical SOM [4], growing hierarchical SOM [2], and topic-oriented SOM [5], will be used to verify their effectiveness. We also perform clustering process with profile data to obtain demographic clustering of messages.

Topic detection For further investigation of relationships among messages, the topics of messages will be discovered through a topic detection process. Here a topic is a set of keywords that could possibly describe the main idea of a message. We will develop a detection scheme based on message clustering result to discover semantic terms. These topical terms will then be used to perform thematic categorization of both messages and profiles.

Association discovery The purposes of this process is to discover the relations among messages, users, and topics. Since the clustering process should be able to discover relations among messages, users, and topics, respectively, the goals of this step is to find the association across messages, users, and topics.

Application platform implementation In the final stage of this research, we will implement a platform to demonstrate the usage and applicability of our proposed crowdsourcing mining process. We plan to establish a disaster information coordination platform, which incorporates real-time reports from users. Trends, associations, events, and other useful knowledge regarding disasters could be discovered and disseminated using this platform.

12.4 Expected Result

We expect to achieve the following results in this research:

  1. 1.

    Gather and process crowdsourcing data from various platforms for further researches.

  2. 2.

    Complete development of topic detection and association discovery algorithms, as well as other derived algorithms, such as event detection, automatic summarization, spam detection, and content recommendation, etc.

  3. 3.

    Establish a experimental platform for disaster information coordination.

12.5 Conclusion

Crowdsourcing is a new way for problem solving in Web era. However, data management and usage are seldom discussed in such process, let alone knowledge discovery from such data. In this work, we address a proposal to establish a framework for mining crowdsourcing data mainly based on text mining techniques. Several techniques for mining crowdsourcing data will be developed. We expect the result of this research could be beneficial for applications and researches on crowdsourcing and broaden its usage.