Abstract
In this chapter we describe several systems that detect emerging trends in textual data. Some of the systems are semiautomatic, requiring user input to begin processing, and others are fully automatic, producing output from the input corpus without guidance. For each Emerging Trend Detection (ETD) system we describe components including linguistic and statistical features, learning algorithms, training and test set generation, visualization, and evaluation. We also provide a brief overview of several commercial products with capabilities of detecting trends in textual data, followed by an industrial viewpoint describing the importance of trend detection tools, and an overview of how such tools are used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. Allan, L. Ballesteros, J. Callan, W. Croft, and Z. Lu.Recent experiments with inquery.In Proceedings of the Fourth Text Retrieval Conference (TREC-4), pages 49–63, 1995.
J. Allan, R. Papka, and V. Lavrenko.On-line new event detection and tracking.In Proceedings of ACM SIGIR, pages 37–45, 1998.
Applied Semantics [online, cited July 2002]. Available from World Wide Web: www.appliedsemantics.corn.
R. Agrawal, G. Psaila, E.L. Wimmers, and M. Zait.Querying shapes of histories.In Proceedings of the 21st International Conference on Very Large Databases,Zurich, Sep 1995.
R. Agrawal and R. Srikant.Mining sequential patterns.In Proceedings of the International Conference on Data Engineering (ICDE),Taipei, Mar 1995.
Autonomy [online, cited July 2002].Available from World Wide Web: www.autonomy. corn.
Autonomy [online, cited July 2002].Available from World Wide Web: www.autonomy. com/Content/Technology/Background/ IntellectualFoundations.
Knowlege Suite (Review) [online].1999 [cited July 2002 ]. Available fromWorld Wide Web: www. autonomy. com/Extranet/Marketing/ Analyst White Papers/Butler Report on Autonomy Suite 200299.pdf.
Banter [online, cited July 2002].Available from World Wide Web: www.banter. corn.
R. Bader, M. Callahan, D. Grim, J. Krause, N. Miller, and W.M. Pottenger.The role of the HDDITM collection builder in hierarchical distributed dynamic indexing.In Proceedings of the Textmine ‘01 Workshop, First SIAM International Conference on Data Mining,Apr 2001.
D. Bikel, S. Miller, R. Schwartz, and R. Weischedel.Nymble: A high-performance learning name-finder.In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 194–201, 1997.
F. Bouskila and W.M. Pottenger.The role of semantic locality in hierarchical distributed dynamic indexing.In Proceedings of the 2000 International Conference on Artificial Intelligence (IC-Al 2000),Las Vegas, Jun 2000.
G.D. Blank, W.M. Pottenger, G.D. Kessler, M. Herr, H. Jaffe, S. Roy, D. Gevry, and Q. Wang.Cimel: Constructive, collaborative inquiry-based multimedia elearning.In Proceedings of the Sixth Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE),Jun 2001.
G.D. Blank, W.M. Pottenger, G.D. Kessler, S. Roy, D.R. Gevry, J.J. Heigl, S.A. Sahasrabudhe, and Q. Wang.Design and evaluation of multimedia to teach Java and object-oriented software engineering.American Society for Engineering Education, Jun 2002.
Bri92] E. Brill.A simple rule-based part of speech tagger.In Proceedings of the Third Conference on Applied Natural Language Processing. ACL, 1992.
D. Bryan, Jul 2002. Email correspondence.
Captiva [online, cited July 2002].Available from World Wide Web: www.captivacorp.com.
C. Chen and L. Car.A semantic-centric approach to information visualization.In Proceedings of the 1999 International Conference on Information Visualization, pages 18–23, 1999.
CIMEL [online, cited July 2002].Available from World Wide Web: www.cse.lehigh.edu/”cimel.
H. Chen and K.J. Lynch.Automatic construction of networks of concepts characterizing document databases.IEEE Transactions on Systems, Man and Cybernetics, 22 (5): 885–902, 1992.
ClearForest [online, cited July 2002 ]. Available from World Wide Web: www. clearforest. corn.
ClusterizerTM [online, cited July 20021.Available from World Wide Web: www.autonomy.com/Extranet/Technical/Modules/ TB Autonomy Clusterizer.pdf.
COMPENDEX® [online, cited July 2002].Available from World Wide Web: edina.ac.uk/compendex.
Delphion [online, cited July 2002].Available from World Wide Web: www.delphion. corn.
G.S. Davidson, B. Hendrickson, D.K. Johnson, C. E. Meyers, and B.N. Wylie.Knowledge mining with VxlnsightTM: Discovery through interaction.Journal of Intelligent Information Systems, 11 (3): 259–285, 1998.
E. Edgington.Randomization Tests.Marcel Dekker, New York, 1995.
Factiva [online, cited July 2002].Available from World Wide Web: www.factiva.com.
R. Feldman and I. Dagan.Knowledge discovery in textual databases.In Proceedings of the First International Conference on Knowledge Discovery (KDD-95). ACM, New York, Aug 1995.
D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert.Description of the UMASS systems as used for MUC-6.In Proceedings of the Sixth Message Understanding Conference, pages 127–140, Nov 1995.
GartnerG2 [online, cited July 2002].Available from World Wide Web: www.gartnerg2.com/site/default. asp.
D. Gevry.Detection of emerging trends: Automation of domain expert practices.Master’s thesis, Department of Computer Science and Engineering at Lehigh University, 2002.
B. Graubart.White paper, turning unstructured data overload into a competitive advantage, Jul 2002. Email attachment.
HDDITM [online, cited July 2002].Available from World Wide Web: hddi cse.lehigh.edu.
S. Havre, E. Hetzler, P. Whitney, and L. Nowell.ThemeRiver: Visualizing thematic changes in large document collections.IEEE Transactions on Visualization and Computer Graphics, 8(1), Jan — Mar 2002.
HyBrix [online, cited July 2002].Available from World Wide Web: www.siemens.com/index.jsp.
IDC [online, cited July 2002 ]. Available from World Wide Web: www. idc.com.
INSPEC® [online, cited July 2002].Available from World Wide Web: www.iee.org.uk/Publish/INSPEC.
Interwoven [online, cited July 2002].Available from World Wide Web: www.interwoven.com/products.
A. Leuski and J. Allan.Lighthouse: Showing the way to relevant information.In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), pages 125–130, 2000.
A. Leuski and J. Allan.Strategy-based interactive cluster visualization for information retrieval international Journal on Digital Libraries, 3 (2): 170–184, 2000.
B. Lent, R. Agrawal, and R. Srikant.Discovering trends in text databases.In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, pages 227–230, 1997.
LexisNexis [online, cited July 2002].Available from World Wide Web: www.lexisnexis.corn.
L. Leydesdorff.Indicators of structural change in the dynamics of science: Entropy statistics of the sci journal citation reports.Scientometrics,53(1):131159, 2002.
Linguistic Data Consortium [online, cited July 2002 ]. Available from World Wide Web: www. ldc. upenn. edu.
Lockheed-Martin [online, cited July 2002].Available from World Wide Web: www.lockheedmartin.com.
V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan.Mining of concurrent text and time-series.In Proceedings of the ACM KDD-2000 Text Mining Workshop,2000.
A. Martin, T.K.G. Doddington, M. Ordowski, and M. Przybocki.The DET curve in assessment of detection task performance.In Proceedings of EuroSpeech ‘97, vol. 4, pages 1895–1898, 1997.
Moreover [online, cited July 2002].Available from World Wide Web: www.moreover. corn.
L.T. Nowell, R.K. France, D. Hix, L. S Heath, and E.A. Fox.Visualizing search results: Some alternatives to query-document similarity.In Proceedings of SIGIR’96, Zurich, pages 67–75, 1996.
Northern Light [online, cited July 2002].Available from World Wide Web: www.northernlight.corn.
W.M. Pottenger, M.R. Callahan, and M.A. Padgett.Distributed information management.Annual Review of Information Science and Technology (ARIST), 35, 2001.
A.L. Porter and M.J. Detampel.Technology opportunities analysis. Technological Forecasting and Social Change, 49: 237–255, 1995.
A. Popescul, G.W. Flake, S. Lawrence, L. Ungar, and C.L. Giles.Clustering and identifying temporal trends in document databases.In Proceedings of IEEE Advances in Digital Libraries, pages 173–182, 2000.
W.M. Pottenger, Y. Kim, and D.D. Meling.HDDITM: Hierarchical distributed dynamic indexing.In Data Mining for Scientific and Engineering Applications, Robert Grossman, Chandrika Kamath, Vipin Kumar and Raju Namburu, eds., Jul 2001.
C. Plaisant, R. Mushlin, A. Snyder, J. Li, D. Heller, and B. Shneiderman.Lifelines: Using visualization to enhance navigation and analysis of patient records.In Proceedings of the 1998 American Medical Informatic Association Annual Fall Symposium, pages 76–80, 1998.
W.M. Pottenger and T. Yang.Detecting emerging concepts in textual data mining.In Computational Information Retrieval, M.W. Berry, ed., pages 89–105, SIAM, Philadelphia, 2001.
S. Roy, D. Gevry, and W.M. Pottenger.Methodologies for trend detection in textual data mining.In Proceedings of the Textmine ‘02 Workshop, Second SIAM International Conference on Data Mining,Apr 2002.
S. Roy.A multimedia interface for emerging trend detection in inquiry-based learning.Master’s thesis, Department of Computer Science and Engineering at Lehigh University, May 2002.
R. Srikant and R. Agrawal.Mining sequential patterns: Generalizations and performance improvements.In Proceedings of the Fifth International Conference on Extending Database Technology (EDBT),Avignon, 1996.
R. Swan and J. Allan.Automatic generation of overview timelines.In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, ACM, New York, pages 49–56, 2000.
Semio [online, cited July 2002].Available from World Wide Web: www.semio.com.
Ser Solutions [online, cited July 2002]. Available from World Wide Web: www.sersolutions.com.
R. Swan and D. Jensen.TimeMines: Constructing timelines with statistical models of word usage.In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000.
SPSS Clementine [online, cited July 2002].Available from World Wide Web:www.spss.corn/spssbi/clementine.
SPSS LexiQuest [online, cited July 2002].Available from World Wide Web:www.spss.com/spssbi/lexiquest.
Stratify [online, cited July 2002].Available from World Wide Web: www.stratify. corn.
TDT [online, cited July 2002 ]. Available from World Wide Web: www. ni s t.gov/speech/tests/tdt/index.htm.
TextAnalyst [online, cited July 2002].Available from World Wide Web: www.megaputer.com/products/ta/index.php3.
ThoughtShare [online, cited July 2002 ]. Available from World Wide Web:www. thought share.corn.
University of Illinois at Urbana-Champaign Digital Library Initiative [online,cited July 2002 ]. Available from World Wide Web: dl i. grainger. uiuc. edu.
US Patent Site [online, cited July 2002].Available from World Wide Web:www.uspto.gov/main/patents.htm.
Verity [online, cited July 2002].Available from World Wide Web: www.verity. corn.
J. Xu, J. Broglio, and W.B. Croft. The design and implementation of a partof speech tagger for English.Technical report, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, Technical Report IR-52, 1994.
T. Yang.Detecting emerging conceptual contexts in textual collections.Master’s thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, 2000.
Y. Yang, T. Pierce, and J. Carbonell.A study on retrospective and on-line event detection.In Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval,1998.
L. Zhou.Machine learning classification for detecting trends in textual collections.Master’s thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, December 2000.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kontostathis, A., Galitsky, L.M., Pottenger, W.M., Roy, S., Phelps, D.J. (2004). A Survey of Emerging Trend Detection in Textual Data Mining. In: Berry, M.W. (eds) Survey of Text Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-4305-0_9
Download citation
DOI: https://doi.org/10.1007/978-1-4757-4305-0_9
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-3057-6
Online ISBN: 978-1-4757-4305-0
eBook Packages: Springer Book Archive