Abstract
This chapter studies the use of textual features based on systemic functional linguistics, for genre-based text categorization. We describe feature sets that represent different types of conjunctions and modal assessment, which together can partially indicate how different genres structure text and may prefer certain classes of attitudes towards propositions in the text. This enables analysis of large-scale rhetorical differences between genres by examining which features are important for classification. The specific domain we studied comprises scientific articles in historical and experimental sciences (paleontology and physical chemistry, respectively). We applied the SMO learning algorithm, which with our feature set achieved over 83% accuracy for classifying articles according to field, though no field-specific terms were used as features. The most highly-weighted features for each were consistent with hypothesized methodological differences between historical and experimental sciences, thus lending empirical evidence to the recent philosophical claim of multiple scientific methods.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
8. References
Argamon, S., Koppel, M., Fine, J., and Shimoni, A. R. (2003a) Gender, Genre, and Writing Style in Formal Written Texts. Text, 23(3).
Argamon, S., Šari, M., and Stein, S. S. (2003b) Style mining of electronic messages for multiple authorship discrimination: First Results. In Proceedings of ACM Conference on Knowledge Discovery and Data Mining 2003.
Baayen, H., van Halteren, H., and Tweedie, F. (1996) Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution, Literary and Linguistic Computing, 11.
Baker, V.R. (1996) The pragmatic routes of American Quaternary geology and geomorphology. Geomorphology 16, pp. 197–215.
Cleland, C.E. (2002) Methodological and epistemic differences between historical science and experimental science. Philosophy of Science.
Diamond, J. (1999) Guns, Germs, & Steel, New York: W. W. Norton and Company.
Divisia-Blohorn, B., Genoud, F., Borel, C., Bidan, G., Kern, J-M., and Sauvage, J-P. (2003) Conjugated Polymetallorotaxanes: In-Situ ESR and Conductivity Investigations of Metal-Backbone Interactions, J. Phys. Chem. B, 107, pp. 5126–5132.
Dodick, J. T. and Orion, N. (2003) Geology as an Historical Science: Its Perception within Science and the Education System. Science and Education, 12(2).
Dunbar, K. (1995) How scientists really reason: Scientific reasoning in real-world laboratories. In Sternberg, R.J. and Davidson, J. (Eds.). Mechanisms of Insight. Cambridge MA: MIT Press, pp. 365–395.
Eggins, S. and Martin, J. R. (1997) Genres and registers of discourse. In van Dijk, T. A. (Ed.) Discourse as structure and process. A multidisciplinary introduction. Discourse studies 1. London: Sage, pp. 230–256.
Goodwin, C. (1994) Professional Vision. American Anthropologist, 96(3), pp. 606–633.
Gould, S. J. (1986) Evolution and the Triumph of Homology, or, Why History Matters, American Scientist, Jan.–Feb. 1986:60–69.
Gregory, M. (1967) Aspects of varieties differentiation, Journal of Linguistics 3:177–198.
Halliday, M.A.K. (1991) Corpus linguistics and probabilistic grammar. In Karin Aijmer & Bengt Altenberg (Ed.) English Corpus Linguistics: Studies in honour of Jan Svartvik. (London: Longman), pp. 30–44.
Halliday, M.A.K. (1994). An Introduction to Functional Grammar. Edward Arnold, London.
Hasan, R. (1988) Language in the process of socialisation: Home and school. In Oldenburg, J., v Leeuwen, Th., and Gerot, L. (ed.), Language and socialisation: Home and school; Proceedings from the Working Conference on Language in Education, 17–21 November, 1986. North Ryde, N.S.W., Macquarie University.
Holmes, D. I. and Forsyth, R. S. (1995). The federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing, 10(2):111–126
Joachims, T. (1998) Text categorization with Support Vector Machines: Learning with many relevant features. In Machine Learning: ECML-98, Tenth European Conference on Machine Learning, pp. 137–142.
Koppel, M., Argamon, S., and Shimoni, A. R. (2003) Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4).
Latour, B. and Woolgar, S. (1986) Laboratory Life: The Construction of Scientific Facts, Princeton: Princeton University Press.
Lewin, B.A., Fine, J. and Young, L. (2001) Expository Discourse: A Genre-Based Approach to Social Science Research Texts, Continuum Press.
Losee, R. M. (1996) Text Windows and Phrases Differing by Discipline, Location in Document, and Syntactic Structure. Information Processing & Management, 32(6):747–767.
Marcu, D. (2000) The Rhetorical Parsing of Unrestricted Texts: A Surface-Based Approach. Computational Linguistics, 26(3):395–448.
Martin, J. R. (1992) English Text: System and Structure. Amsterdam: Benjamins.
Matthews, R. A. J. and Merriam, T. V. N. (1997) Distinguishing literary styles using neural networks. In Fiesler, E. and Beale, R. (Eds) Handbook of Neural Computation, chapter 8. Oxford University Press.
Matthiessen, C. (1995) Lexicogrammatical Cartography: English Systems. International Language Sciences Publishers: Tokyo, Taipei & Dallas.
Mayr, E. (1976). Evolution and the Diversity of Life. Cambridge: Harvard University Press.
Mosteller, F. and Wallace, D. L. (1964) Inference and Disputed Authorship: The Federalist Papers, Reading, Mass.: Addison Wesley.
Ochs, E., Jacoby, S., and Gonzales, P. (1994) Interpretive journeys: How physicists talk and travel through graphic space, Configurations 1:151–171.
Platt, J. (1998) Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research Technical Report MSR-TR-98-14.
Plum, G. A. and Cowling, A. (1987) Social constraints on grammatical variables: Tense choice in English. In Steele, R. and Threadgold, T. (Eds.), Language topics. Essays in honour of Michael Halliday. Amsterdam: Benjamins.
Sebastiani, F. (2002) Machine learning in automated text categorization, ACM Computing Surveys, 34(1):1–47.
Smith, F. A. and Betancourt, J. L. (2003) The effect of Holocene temperature fluctuations on the evolution and ecology of Neotoma (woodrats) in Idaho and northwestern Utah, Quaternary Research 59:160–171.
Stamatatos, E., Fakotakis, N., and Kokkinakis, G. (2001) Computer-based authorship attribution without lexical measures, Computers and the Humanities 35.
Teufel, S. and Moens, M. (1998) Sentence extraction and rhetorical classification for flexible abstracts. In Proc. AAAI Spring Symposium on Intelligent Text Summarization.
Wiebe, J., Wilson, T., and Bell, M. (2001) Identifying Collocations for Recognizing Opinions. In Proc. ACL/EACL’ 01 Workshop on Collocation, Toulouse, France, July 200.
Whewell, W. (1837) History of the Inductive Sciences, John W. Parker, London.
Witten, I.H. and Frank E. (1999) Weka 3: Machine Learning Software in Java; http://www.cs.waikato.ac.nz/~ml/weka.
Yule, G.U. (1938) On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship, Biometrika, 30:363–390.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer
About this chapter
Cite this chapter
Argamon, S., Dodick, J. (2006). Corpus-Based Study of Scientific Methodology: Comparing the Historical and Experimental Sciences. In: Shanahan, J.G., Qu, Y., Wiebe, J. (eds) Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, vol 20. Springer, Dordrecht. https://doi.org/10.1007/1-4020-4102-0_17
Download citation
DOI: https://doi.org/10.1007/1-4020-4102-0_17
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4026-9
Online ISBN: 978-1-4020-4102-0
eBook Packages: Computer ScienceComputer Science (R0)