Toward Crowdsourcing Data Mining

Yang, Hsin-Chang; Lee, Chung-Hong

doi:10.1007/978-94-007-7293-9_12

Hsin-Chang Yang⁶ &
Chung-Hong Lee⁷

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

957 Accesses
2 Citations

Abstract

Nowadays, crowdsourcing has emerged as a popular and important problem-solving approach. The major difference between crowdsourcing and traditional outsourcing lies on the people which tasks were outsourced. Those people involved in crowdsourcing are generally varied in knowledge, demographic properties, and number. Many applications and services have been developed to solve various types of tasks. However, these applications and services focus on providing platforms for outsourcing to the crowd. Little has been addressed so far on the management and usage of those information produced during the crowdsourcing process. Actually, as an emerging social network application and service, the data and social interactions created during crowdsourcing should carry important and valuable knowledge. This knowledge will develop various techniques for mining messages and information of crowdsourcing process. In this work, we address several approaches to discover useful knowledge from data created for and in crowdsourcing process. We hope the outcome of this research could help discovering usable knowledge from such emerging social network services and bring benefit in constructing crowdsourcing services.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Crowdsourcing Data Analysis for Crowd Systems

From crowdsourcing to crowdmining: using implicit human intelligence for better understanding of crowdsourced data

Article 31 August 2019

A Review on Crowdsourcing Models in Different Sectors

Keywords

12.1 Research Background

According to Nielsen’s survey in 2012,^{Footnote 1} the growth rates of common social activity sites such as Facebook, Myspace, and LinkedIn dropped in 2012. In contrary, the most grown social services in last year include Pinterest, Blogger, Twitter, Tumblr, and Wikia, which provide content sharing and collaborative authoring. In this research, they are all referred as ‘crowdsourcing’ services, whose definition is somewhat restricted in past decade. Howe [3] defined ‘crowdsourcing’ as ‘…represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call’. This definition just widens the traditional outsourcing process to incorporate online environments and processes. On the other hand, Brabham [1] adopted a broader definition as which ‘crowdsourcing is an online, distributed problem-solving and production model’. In this regard, it fits our previous definition of crowdsourcing, in which all services incorporating collaborative content creation and task completion are included.

Most of contemporary crowdsourcing services provide platforms for task announcement and worker recruitment. A famous example is Amazon’s Mechanical Turk.^{Footnote 2} Besides such task-worker matchmaking platform, other services are also possible by crowdsourcing, such as crowdvoting and crowdfunding. Message exchange is necessary for crowdsourcing. Various types of messages are available throughout the crowdsourcing process, e.g. usage pattern, user profile, link data, tags, and text messages. Manipulating and mining of such messages are seldom discussed in past. Therefore, it is unclear about the plausibility and effectiveness of data mining, especially text mining, techniques on such data produced during crowdsourcing processes.

12.2 Research Goals

In this research, we will try to achieve the following goals regarding crowdsourcing data mining:

1.
To establish schemes for crowdsourcing data management and visualization.
2.
To develop kernel techniques for crowdsourcing data mining, such as topic detection and relation discovery.
3.
To establish a platform to demonstrate the effectiveness of proposed methods.

We expect that our research will provide a uniform scheme to levitate the data usage in crowdsourcing.

12.3 Research Methods

We will describe the major steps of this research in the following:

Date collection and processing The volume of data, in various types, produced in crowdsourcing process is usually large. We only focus on textual data in this research. Two types of textual data will be collected, namely messages and profiles. Profiles are used to provide demographic and social attributes of messages which are the major sources of mining process. We will develop several approaches to clean, reduce, and normalize these messages, as well as attaching attributes.

Data clustering and classification We will apply self-organizing map (SOM) algorithm to cluster messages to discover the relations among messages. Various SOM implementation, such as classical SOM [4], growing hierarchical SOM [2], and topic-oriented SOM [5], will be used to verify their effectiveness. We also perform clustering process with profile data to obtain demographic clustering of messages.

Topic detection For further investigation of relationships among messages, the topics of messages will be discovered through a topic detection process. Here a topic is a set of keywords that could possibly describe the main idea of a message. We will develop a detection scheme based on message clustering result to discover semantic terms. These topical terms will then be used to perform thematic categorization of both messages and profiles.

Association discovery The purposes of this process is to discover the relations among messages, users, and topics. Since the clustering process should be able to discover relations among messages, users, and topics, respectively, the goals of this step is to find the association across messages, users, and topics.

Application platform implementation In the final stage of this research, we will implement a platform to demonstrate the usage and applicability of our proposed crowdsourcing mining process. We plan to establish a disaster information coordination platform, which incorporates real-time reports from users. Trends, associations, events, and other useful knowledge regarding disasters could be discovered and disseminated using this platform.

12.4 Expected Result

We expect to achieve the following results in this research:

1.
Gather and process crowdsourcing data from various platforms for further researches.
2.
Complete development of topic detection and association discovery algorithms, as well as other derived algorithms, such as event detection, automatic summarization, spam detection, and content recommendation, etc.
3.
Establish a experimental platform for disaster information coordination.

12.5 Conclusion

Crowdsourcing is a new way for problem solving in Web era. However, data management and usage are seldom discussed in such process, let alone knowledge discovery from such data. In this work, we address a proposal to establish a framework for mining crowdsourcing data mainly based on text mining techniques. Several techniques for mining crowdsourcing data will be developed. We expect the result of this research could be beneficial for applications and researches on crowdsourcing and broaden its usage.

Notes

References

Brabham D (2008) Crowdsourcing as a model for problem solving: an introduction and cases. Convergence: Int J Res New Media Technol 14(1): 75–90
Google Scholar
Dittenbach M, Merkl D, Rauber A (2000) Using growing hierarchical self-organizing maps for document classification. In: Proceedings of the 8th European symposium on artificial neural networks (ESANN’2000), Bruges, 7–12
Google Scholar
Howe J (2006) Crowdsourcing: a definition. http://crowdsourcing.typepad.com/cs/2006/06/crowdsourcing_a.html
Kohonen T (2001) Self-organizing maps. Springer, Berlin
Book MATH Google Scholar
Yang HC, Lee CH, Ke KL (2010) TOSOM: A topic-oriented self-organizing map for text organization. World Acad Sci Eng Technol 41:1100–1104
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan
Hsin-Chang Yang
Department of Electrical Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan
Chung-Hong Lee

Authors

Hsin-Chang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Hong Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hsin-Chang Yang .

Editor information

Editors and Affiliations

School of Computing, Staffordshire University, Stafford, United Kingdom
Lorna Uden
Dean of College of Management, National University of Kaohsiung, Kaohsiung, Taiwan
Leon S.L. Wang
Information Engineering, National University of Kaohsiung Dept. Computer Science &, Kaohsiung, Nan-Tzu District, Taiwan
Tzung-Pei Hong
National University of Kaohsiung, Kaohsiung, Taiwan
Hsin-Chang Yang
National University of Kaohsiung Dept. Information Management, Kaohsiung, Nan-Tzu District, Taiwan
I-Hsien Ting

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, HC., Lee, CH. (2013). Toward Crowdsourcing Data Mining. In: Uden, L., Wang, L., Hong, TP., Yang, HC., Ting, IH. (eds) The 3rd International Workshop on Intelligent Data Analysis and Management. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7293-9_12

Download citation

DOI: https://doi.org/10.1007/978-94-007-7293-9_12
Published: 06 August 2013
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-7292-2
Online ISBN: 978-94-007-7293-9
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics

Toward Crowdsourcing Data Mining

Abstract

Similar content being viewed by others

Crowdsourcing Data Analysis for Crowd Systems

From crowdsourcing to crowdmining: using implicit human intelligence for better understanding of crowdsourced data

A Review on Crowdsourcing Models in Different Sectors

Keywords

12.1 Research Background

12.2 Research Goals

12.3 Research Methods

12.4 Expected Result

12.5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Toward Crowdsourcing Data Mining

Abstract

Similar content being viewed by others

Crowdsourcing Data Analysis for Crowd Systems

From crowdsourcing to crowdmining: using implicit human intelligence for better understanding of crowdsourced data

A Review on Crowdsourcing Models in Different Sectors

Keywords

12.1 Research Background

12.2 Research Goals

12.3 Research Methods

12.4 Expected Result

12.5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation