A Comparison of Unstructured and Structured Principal Component Analyses and their Interpretation

Brandsegg, Kristian Bjarnøe; Hammer, Erik; Sinding-Larsen, Richard

doi:10.1007/s11053-010-9110-4

A Comparison of Unstructured and Structured Principal Component Analyses and their Interpretation

Published: 04 February 2010

Volume 19, pages 45–62, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Natural Resources Research Aims and scope Submit manuscript

A Comparison of Unstructured and Structured Principal Component Analyses and their Interpretation

Download PDF

Kristian Bjarnøe Brandsegg¹^nAff2,
Erik Hammer¹ &
Richard Sinding-Larsen¹

188 Accesses
2 Citations
Explore all metrics

Abstract

Multivariate analysis is employed to investigate the structure of variations within highly heterogeneous data. Traditionally, principal component analysis (PCA) is run by analyzing the entire wireline log and using PCA scores to characterize variability within and between lithologies. In this paper, we propose a technique using only specific subsets of all well records to quantify reservoir heterogeneity due to second order lithological variability. These subsets are chosen from uniform lithofacies parts of the wireline log in order to reduce the variability in the correlation matrix that otherwise would cause lithological changes. The purpose is to assess the efficiency of structured PCA in analyzing small-scale heterogeneity that is captured by wireline logs but often masked by traditional PCA approaches. This paper shows that a structured PCA procedure based upon special lithological units is superior to an unstructured PCA, when the focus is within lithology variations. This structured procedure is applied to data from the Heidrun field, offshore mid-Norway. The results demonstrate clear benefits from added insight into the variability of a complex fluviodeltaic heterolithic sequence that poses great challenges to hydrocarbon development.

Integrating CoDA and PCA for enhanced characterization of fluvial depositional processes: a case study of the Shendi formation, Sudan

Article Open access 13 May 2024

Analysing Pairwise Logratios Revisited

Article 09 April 2021

A practitioner’s guide for exploring water quality patterns using principal components analysis and Procrustes

Article 28 March 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The quantification of heterogeneity in sandstone reservoirs is often challenging as the magnitude and type of heterogeneity is normally not known beforehand. Multivariate data analysis can be applied to study and visualize the data in a more comprehensive way and thereby ease the interpretation of heterogeneity. One common goal of multivariate data analysis is to reduce the dimensions of a specific dataset without losing information. Linear combinations of the original variables created through principal component analysis (PCA) define a smaller set of variables that extract successively the maximum of the remaining variability (Jolliffe, 2002). Another goal can be to seek the most representative multidimensional structure according to a given problem. This implies seeking an appropriate variance covariance matrix as input to the PCA. The general objectives can therefore be twofold; data reduction and interpretation (Davis, 2002). PCA has the potential to show relationships not previously suspected, and thereby uncover associations that are not readily seen.

Standard PCA, investigating the entire dataset to indicate gross-variability, has been applied to many disciplines. In geosciences, PCA has been widely used to evaluate geological processes using satellite images (c.f. Petrovic, Khan, and Chafetz, 2008) or outline different lithological types from wireline logs (c.f. Gupta and Johnson, 2001). Zhang and others (2007) show an example where PCA has successfully been applied to detect hydrocarbon bearing sands from satellite images. In Zhang’s study, the residual principal components (PCs) that are not disturbed by the non-hydrocarbon influence of the first PCs outlined different hydrocarbon bearing zones that could be a target for exploration.

Stanley and Sinclair (1988) introduced the term structured PCA for an analysis that only uses a specific subset of variables in a geochemical survey in order to better outline mineralized zones. Their results showed that the PCA using only the trace elements that were related to rock forming minerals outlined the major lithological units, whereas the mineralization was not delineated. Different modes in the frequency distribution of the polymodally distributed trace elements from the study area were identified and used to select the variables that in the structured approach pointed to the mineralized zones. This interpretation philosophy is in this paper extended to wireline log interpretation with some important modifications.

Normally in wireline log analysis the PCs are computed from the correlation matrix, where each input variable is equally weighted. The correlation matrix is chosen due to differences in scale of each wireline log variable (Doveton, 1994). Although PCA is a robust and powerful method for both visualizing and manipulating the multidimensional representation of wireline data, it cannot be used as a black box and should be carefully designed in order to obtain significant results. A major limitation of PCA is that the first few principal component axes extract max variability, but this does not guarantee the best subset of features (Nadler and Smith, 1993). This is due to the fact that PCA uncovers feature combinations that model the variance of a data set, but these may not be the same features that separate the different lithological changes. However, a structured PCA approach, analyzing a specific lithological unit or interval, includes only relevant variability and each of the PCs will therefore explain lithological effects that are not indicated when analyzing the entire data set which contain both relevant and not-relevant heterogeneity (Stanley and Sinclair, 1988).

Several methods have been proposed for the classification and grouping of lithologies (Davis, 2002). One method widely used is lithofacies analysis, introduced in 1980s using the name electrofacies, to characterize collective associations of wireline log responses that are linked to geological attributes (Serra and Abbott, 1982). It has been used in its standard form to characterize sequence stratigraphy (Eichenseer and Leduc, 1996), study heterolithic reservoirs (Gupta and Johnson, 2001), and for enhanced reservoir description (Pereira and others, 1990). Several studies of tidal and fluvial deposits have been analyzed in conjunction with PCA to enhance the understanding of heterolithic deposits separated into specific lithofacies (Avseth, Mukerji, and Mavko, 2005; Bourquin, Rigollet, and Bourges, 1998; Bridge and Tye, 2000; Hohn and others, 1997; Moline and Bahr, 1995; Singh, 2007). A common challenge in wireline log interpretation and petrophysics is to determine the relation and reliability of measurements of rock properties made at the borehole scale with the same property at the reservoir scale (Corbett, Jensen, and Sorbie, 1998). This challenge is particularly apparent in heterogeneous fluviodeltaic deposits (Martinius and others, 2005).

The present study uses PCA in a non-standard form in both unstructured and structured mode to characterize fluviodeltaic reservoir heterogeneity from wireline logs. The wireline logs represents separate measurements of different physical properties of the rock-fluid system and do not pose any simplex space constrains that in the case of compositional data violates the use of standard PCA. The objective is to describe and evaluate the benefits of using a modified structured PCA approach to show how petrophysical wireline log responses can be decomposed to reflect different orders of variability and how these can be differentially interpreted to provide additional insight into fluviodeltaic heterogeneity and its lithological complexity.

Method

Study Area, Wireline Data, and Software

The present study was carried out over a 300 m zone of the Upper Triassic to Lower Jurassic fluviodeltaic Åre Fm. (Dalland and others, 1988) from the Heidrun Field, offshore mid-Norway. A vertical water saturated well was selected where both core and petrophysical parameters have been thoroughly studied relative to five wireline logs (gamma ray, neutron porosity, bulk density, resistivity, and sonic logs). The computations have been performed within the R language, a free and open source software, which facilitates data manipulation, calculation, and graphical display (Dalgaard, 2008).

Univariate Analysis

The first step of an univariate analysis is to interpret the shape of the frequency distribution. The most prominent populations revealed by the five wireline logs can be identified by cumulative probability plots with the percentiles of the normal distribution as x-axis. Any polymodality in the five distributions might be caused by specific lithological processes. The identification of the number of modes or populations is determined by the inflection points in each of the probability plots (Stanley and Sinclair, 1988). The overlap between the neighboring populations is calculated separately for data with more than two populations and summed to portray the total magnitude of population overlap.

Unstructured and Structured PCA

Multivariate analysis of the five independent wireline log responses needs to be performed in order to supplement the univariate analysis. Eigenvectors representing orthogonal directions in space permit the viewing of data from a variety of perspectives. The aim of the modified structured PCA used in this study is not to reduce the dimensionality of the data, but to work on subsets of the wireline log and only include those samples in the correlation matrices that capture particular heterogeneity effects related to specific lithological units. PC loadings and scores are, according to this procedure, calculated from (1) a total unstructured analysis of all well records from all wireline log variables and (2) a structured subset of separate well records from specific lithological units. The analysis of the entire 300 m interval has been named TPC due to the use of the total number of records, whereas the structured approach is named according to the lithological units covered (sandstone, shale, coal, and cemented layers). A lithofacies classification is used to outline the geologic variation in rock types. The total wireline log interval was manually classified into four lithofacies based upon core analysis and wireline log responses according to the following rock types: sandstone (ss), shale (sh), coal (co), and cemented layers (cc). The choice of what samples to include in a subset was done on the basis of this facies interpretation in order to obtain apparently homogeneous lithological samples that could unmask the internal heterogeneity that otherwise is obstructed by intra-lithological variability. The loadings from the structured subsets are used to calculate PC scores that can be used to extrapolate the specific lithological signatures to the totality of well records. This calculation makes it possible to compare the log responses from the unstructured and the structured approach for the complete well.

Stability in the Eigenvectors

In order to ensure representativity of the computed eigenvectors for the sandstone (ss) subset, the following procedure was chosen: The subset was divided randomly into two groups and each group was analyzed. The loadings for the two subset groups were compared and expressed as a percentage difference between the initial subset and the two subset groups. The size of this percentage reflects the stability of the eigenvectors.

Visualization and Interpretation Methods

The populations defined by the PC scores, both resulting from unstructured and structured PCA, are evaluated by probability plots and histograms. Crossplots are used to visualize the relationship between components, both for the original data and for PC scores. Polymodal distributions are identified by selecting inflection points on each of the empirical cumulative frequency distributions, indicating a transition from one to the other population.

Standard PCA is often applied without interpreting the weighting (loadings) of each PC (Moline and Bahr, 1995). The interpretation of the loading values is however crucial as the loading signature represents a linear combination of variables that may or may not represent a process that make sense from a geological point of view. A comparison of the unstructured and structured loadings and their explained variability permits a detailed understanding of the relationship between geologic processes and the PC loadings and scores. Traditionally, loadings are only displayed in table form. In this study, two additional visualizations are carried out to enhance multidimensional similarity. The first uses visualization as star diagrams (Wegman, 1990) and the second is a glyph representation of Chernoff faces (Chernoff, 1973) mimicking human faces according to loading values.

Results

Univariate Analysis

Initially, the statistical treatment of wireline data followed the procedure indicated by Stanley and Sinclair (1988) with the separation by univariate analysis of different populations of wireline log responses. Based upon these petrophysical responses, three major lithological units, sandstone (ss), shale (sh), and coal (co), in addition to cemented layers (cc), were manually identified on the basis of 15 cm well record intervals (Fig. 1 and Table 1). Probability plots of each wireline log were evaluated to identify the polymodality of the cumulative frequency distributions which in most cases reflects specific lithological populations. All five wireline logs exhibited polymodal distributions. The GR, RHOB, NPHI, and DT logs are visualized in Figure 2. The probability plot of the GR log indicates six populations, where the A population is interpreted to represent clean channel sands or cemented sandstone zones, B represents bayfill sands, F represents GR-rich spikes, and the C–E populations reflect coal and shale influenced sandstone intervals (Fig. 2a). Four populations are indicated on the RHOB log: A representing coal, B sand, C shale, and D cement (Fig. 2b). The largest population of the NPHI log holds 93.5% of the total records (Fig. 2c) and reflects a range of sand/shale mixtures. The remaining three populations are interpreted to represent end-members with extreme low and high NPHI values related to cement and organic-rich coal, respectively. Interpretation of the DT log shows that the B–D populations reflect sand and shale intervals, whereas A indicates cement and E indicates organic rich coal (Fig. 2d). The two most exotic populations, organic-rich coal and cemented intervals, are clearly differentiated by these specific wireline responses. On the other hand, sand, shale, and impure coal intervals are found to be less distinguishable on the basis of only univariate analysis. An additional plot, describing the initial wireline log response distributions in relation to each lithological unit, indicates large variations of population overlap between the lithological units (Fig. 3).

Table 1 Summary of the Lithofacies Description of the Studied Fluviodeltaic Well Interval

Full size table

Unstructured and Structured PCA

An unstructured PCA based on the total number of well records was used to observe the major variability from all lithogical units. The calculations are computed from standardized wireline log values so all variables have equal variability (Table 2). Separate analysis of the probability plot of the first two unstructured PCs (TPC1 and TPC2) indicates four populations each (Fig. 4). The TPC1 does not allow for a clear distinction between all the different lithological population types (Fig. 4a). This is especially evident for the sand-shale population overlap. The second PC, TPC2, identifies the major cemented layer with extreme low TPC2 scores, as well as the difference between sand and shale units (Fig. 4b). The TPC1-TPC2 crossplot (Fig. 4c), combining the principal two unstructured PCs, allows for only a rough discrimination of the principal lithological units. However, this unstructured PCA crossplot still permits a more precise separation of the lithological units that can be obtained from the crossplot of the RHOB and NPHI wireline logs (Fig. 4d).

Table 2 The First Three PC Loadings for the Total Unstructured PCA (TPC) and the Structured PCA of Each of the Lithofacies Groupings are Outlined

Full size table

The structured PCA that is based on the correlation matrix calculated from only a subset of well records highlights the internal variations within each of the interpreted lithological units, named PC_ss, PC_sh, PC_co, and PC_cc for sandstone, shale, coal, and cement, respectively (Table 2). All the probability plots for each separate lithological unit (Fig. 5), where the inter-lithological effect has been removed, still indicate polymodal distributions. However, the polymodality is caused by different intra-lithological populations characterized by the specificities of the structured loadings that now reflect the higher order variability once the inter-lithological variability has been removed. The principal sandstone lithological component, PC1_ss, can be separated into four subpopulations, where the A and B sub-populations comprise 3.0% of the total records. These populations are interpreted as GR-enriched sandstone and coal influenced sandstone, and the C and D sub-populations represent bay fill sand and channel fill deposited sandstone, respectively (Fig. 5a). Four sub-populations are also indicated by the PC2_ss (Fig. 5b): A represents a specific 4 m sandstone interval with low GR and NPHI values and higher RHOB values interpreted to be channel sands, B bay fill sand, C channel fill deposited sandstone, and D coal influenced sandstone. The principal shale component, PC1_sh (Fig. 5c), can be divided into two dominant sub-populations, assumed to be pure shale (B) and sand influenced shale (C). The dominant sub-population of PC2_sh (B), comprising 97% of the shale records, explains internal variations within the shale assumed to be related to porosity variations in contrast to the A and C sub-populations that respectively represent coal and cement influenced records (Fig. 5d). For the coal intervals, four sub-populations are indicated both for PC1_co and PC2_co. The A sub-population in PC1_co is pure coal and the remaining three populations are assumed to be related to the degree of impurities (Fig. 5e). The PC2_co also contain four sub-populations, indicating coal–sand (B) and coal–shale (C) relations (Fig. 5f). The cemented interval outline four sub-populations of PC1_cc (Fig. 5g), where the A population represents records from the middle part of a 2 m cemented interval, the B population is related to the rim of this interval. The C–D populations are related to cement records influenced by nearby lithology types. For the PC2_cc (Fig. 5h), also interpreted to have four sub-populations, the A population is related to siderite cement, whereas the D population represents the middle part of the 2 m cemented interval. The B and C populations are assumed to be influenced by the nearby lithology types.

As the different PC within a specific PCA are independent of each other, crossplots are introduced to show how the sub-populations of the PC scores are interacting. The crossplot of the structured PCA, PC1_ss, and PC2_ss shows the internal variations of the 867 records representing the sandstone lithological unit, where clean sand is plotted in the right part of the diagram, while GR-rich sand, silt and coal influenced records are plotted in the left, lower left, and upper parts, respectively (Fig. 6a). The trend lines illustrate the intra-lithology variations. The populations of the PC1_ss and the PC2_ss generate a more precise description of the within sandstone lithological variations than the unstructured PCA can provide. The crossplot of the shale lithological units indicate that PC1_sh separate sand–shale variations and the low PC2_sh scores outline coal influenced shale (Fig. 6b). The interpretation of the crossplot of the coal records show that low PC1_co scores represent organic-rich coal, whereas the PC2_co discriminates between sandstone without impurities and shale influenced sandstone (Fig. 7a). The crossplot of the cemented interval separates both cement types and the thickness of the cement interval that is not discovered by the unstructured PCA (Fig. 7b). The internal variations within the specific lithological units give a more precise picture of the intra-lithological variability than the unstructured PCA.

The loadings of the two first PCs of each separate lithological unit are separately applied to calculate new wireline records to visualize how the specific score values used to explain within lithological variations will perform when applied to the entire study interval. These new PC scores covering all records of the study interval help to determine the variations. The crossplot of PC1_ss and PC2_ss scores include all lithologic units using the sandstone lithological loadings. This sandstone view allows us to differentiate between cemented intervals, variations in coal and shale records in addition to the shale and within sandstone variations (Fig. 6c). Similar crossplot using shale loadings (Fig. 6d) illustrates how sandstone can be discriminated from shale as well as displaying gradations of shale variation including cemented and coal intervals. The crossplot of PC1_co and PC2_co differentiates between coal and other lithologies, including coal quality along the x-axis and sand influence along the y-axis (Fig. 7c). Similar crossplot of cemented loadings (Fig. 7d) indicates that the 2 m thick cemented interval has its own signature compared to the other cemented records plotted along the sand–shale–coal line. This result shows that applying structured PCA and later using these specific PC loadings to include all study interval records can go beyond the interpretation of both univariate and unstructured PCA when the separation of petrophysical variations are in focus. In order to ensure the representative of the computed eigenvectors, the sandstone (ss) subset was divided randomly into two groups. The loadings for these two subset groups of eigenvectors were compared with the initial sandstone subset and the results show that only loadings between −0.1 and 0.1 give percentile variation above 10% (Table 3). This test shows that there is stability in the eigenvectors.

Table 3 The Stability in Eigenvectors was Tested by Selecting at Random Half of the Samples within the Sandstone Lithofacies

Full size table

Comparison of Unstructured and Structured PCA Loadings

The difference in loading values, including their ability to explain the total data variability, is distinct when comparing unstructured PCA and the four separate structured PCAs (Table 2). The bar plot (Fig. 8a) shows that the first two PCs of the unstructured PCA explain less of the total variability than the structural PCAs. This implies that unstructured PCA uses a correlation matrix that has less strong correlations due to a larger part of heterogeneity from inter-lithological variations. The separate structured PCA analyzes specific lithological units avoids interactions from intra-lithological variations.

The star diagrams (Fig. 8b) visualize the relation between the unstructured and structured PC loadings expressed in Table 2. The PC1_ss has about identical loadings as TPC2, indicating that TPC2 represents the residual sandstone variability due to internal sandstone variations once the major lithofacies variability has been removed by TPC1. The similarity between PC2_ss and TPC1 shows that the residual variability once the intra-lithological sand variability is removed contains much of the same heterogeneity as shown in the totality of the well records. This indicates a sort of fractal behavior of the lithological mix at the Åre Fm. scale (300 m) and the scale of the combined sandstone layers (130 m). The TPC3 signature is related to the PC1_cc, indicating variation due to the cemented records. A second graphical visualization of the PC loadings in Table 2 is represented by Chernoff faces (Fig. 8c); these faces that mimic human faces are drawn based upon the loading values of the five wireline variables and can discriminate similar PC loading patterns and correspond to the results of the star diagrams.

Comparison of Unstructured and Structured PCA Scores

PCA can be regarded as a data-driven method because of the dependency between the position of the eigenvector and the gravity field of the samples. PCA can therefore give different results according to a specific selection of input variables and/or samples. It is therefore important to ensure that as much of the unwanted heterogeneity is removed by including a proper choice of samples representative for each lithological subset. Similarities in PCA loadings of unstructured and structured PCA can either be related to pure luck or, if properly designed, driven by specific geologic phenomena. In the following, the relationship between unstructured and structured PCA is illustrated by plotting the associations in crossplots. In this paper, only differences in eigenvector loadings between the structured and unstructured approach for each subset are considered. However, Figure 9 portrays how the individual score values of the unstructured PCA in the sandstone subset match the scores calculated with a structured correlation matrix based only on the subset samples. The similarity of the sandstone records of TPC1 and PC2_ss (Fig. 9a) could give the impression that the total unstructured analysis is as good as obtained with structured loadings, but this is only a consequence of the difference in the loadings for GR and RT being cancelled out because of close to zero standardized values in the sandstone for the wireline logs and similar loadings for RHOB, NPHI, and DT resulting in an alignment along the pure sand–shale trend line. The TPC2-PC1_ss plot (Fig. 9b) shows an alignment along a coal–sand–shale–cement trend along the structured PC1_ss vector. The TPC2-PC1_ss plot shows the close correlation between the variables with a marginal difference in the lower values interpreted to be related to GR-rich sandstone records. The sandstone records deviating perpendicular to the trendline is interpreted to be related to larger standardized values of RT. In the two crossplots, the two modes of each of the PCs, illustrated by the gray lines, express the similarity between the populations and show that the structured PCA modes have a wider separation, even if the gross lithology relation is similar.

The near perfect correlation between the TPC2 and PC1_ss scores indicates that the largest contribution to the total variability captured by TPC1 comes from overall shale, coal, cement vs. sandstone contrasts and that the residual variability captured by TPC2 reflects the dominant intra sandstone variability portrayed by PC1_ss. The scatter from the less perfect correlation between TPC1 and PC2_ss is an indicator of the fractal nature of the variability where the dominant shale, coal, cement vs. sandstone contrasts in a fractal way is representative both by the gross 300 m Åre Fm. interval as well as residual sandstone variability, PC2_ss, once the dominant intra-sandstone variability is removed.

In the TPC1-PC2_ss crossplot, the deviating samples perpendicular to the general trendline are related to minor lithology variations interpreted to be caused by extreme GR-enriched sandstone (>200 API units). Even if these points are related to the GR-enriched population of TPC1, these points fall within the two modes of the structured PCA, PC2_ss. This indicates that the GR-enriched sandstone variations are entirely captured by the principal structured PC, PC1_ss, whereas for the unstructured PCA both the two first PCs are needed to express this phenomenon.

A more in-depth analysis of the structured approach where the original log responses are recalculated based upon differences in heterogeneity will be published in a separate paper that is based upon the preliminary results presented in Brandsegg, Hammer, and Sinding-Larsen (2008).

Comparison Between Univariate and Multivariate Overlap

The distance between the mean value of each sub-population can indicate their separation. An overlap criterion is introduced to evaluate the difference between univariate, unstructured, and structured PCA. The population overlap between each component is determined by the percentage of data which falls between the mean plus two standard deviations of the lower population and the mean minus two standard deviations of the upper population (Stanley and Sinclair, 1988). The information conveyed in Table 4 shows that there is a marked difference in the amount of overlap between the primary and secondary PCs for the unstructured and structured PCA. The unstructured PCA has little overlap between populations because the variability is spanning the full variability space and thereby identifies end member populations focusing on inter-population variability rather than intra-population variability. The increase in the degree of overlap with higher order PCs reflects the increasing compactness of the variability space and hence the increasing overlap of the lithological populations. The structured PCs show the opposite trend whereby the first PC displays a large overlap between what is now differences within the lithological population due to an expansion of internal heterogeneity in the respective lithological unit. These observations permit us to break the apparent uniform lithological population defined from the unstructured into sub-populations reflecting the local petrophysical contrast within the lithological unit.

Table 4 Comparison of Component Population Overlaps Including the Number of Populations of All Initial Wireline Log Variables and the Most Significant PCs of Both Unstructured and Structured PCA

Full size table

Comparison Between Two Lithofacies Classifications

In the previous analysis, with four lithofacies classifications related to rock types that have been separately calculated, an increased separation between each of the lithofacies was achieved. In order to portray the effect of sedimentary features related to depositional environment, a new lithofacies classification was introduced, following the work of Kjærefjord (1999) and Hammer, Mørk, and Næss (2009). The new lithofacies types, predominantly based upon core analysis, were segmented into four sedimentary features related to deposition environment: fluvial channel (FCH), floodplain fines (FF), sandy bay-fill (SBF), and muddy bay-fill (MBF). The RHOB/NPHI crossplot outlines the high and low RHOB values of cemented and coal influenced intervals, in addition to portraying an overlap of the sandy and muddy bay-fill deposition feature populations (Fig. 10a). The overlapping of sandy and muddy bay-fill is related to the highly heterolithic deposition of a bay-fill environment which is difficult to differentiate (Svela, 2001; Hammer, Mørk, and Næss, 2009). The structured sandstone PCA loadings, indexed by the four sedimentary features, allow considerable additional differentiation to be mapped out, which otherwise would have been missed (Fig. 10b). When applied to all interval records studied, the first PC, PC1_ss, separates two populations of sand and one population of shale, whereas the PC2_ss separates two sandstone populations and the end-members of coal and cemented records. The crossplot of these two PCs points to two sandstone populations related to FCH and SBF, with a more pronounced separation than the initial NPHI/RHOB crossplot. The PC2_ss separates FCH and SBF, whereas PC1_ss explains the internal variations within these records. An increased separation between SBF and MBF is generated when applying the two PCs of the structured shale PCA (Fig. 11), as the SBF records are clustered, surrounded by the MBF records.

The structured PCA crossplot is superior to the initial wireline log responses when focusing on specific variations within a specific depositional setting. The calculations and graphical visualization of data using loadings expressing variations in the sandstone population has enhanced the differentiation between the different depositional environments without interfering with the other specific lithologies, such as coal and cement influenced intervals.

Discussion

A basic requirement for using multivariate analysis on geologic data is to reflect on the quantification procedure measuring the geologic processes that constitute the input data for your analysis (Davis, 2002). By the use of PCA, analyzing patterns within the data aim to translate geologic objects that are described by a set of indirect information (e.g. wireline logs) into categorical information, which refers to a given geologic property (e.g. lithology type and porosity). In this study, petrophysical wireline responses in combination with core analysis have outlined four lithofacies types to be separately evaluated to determine their independent signatures that can express geologic processes that operate within lithofacies scale. It should therefore be noted that the lithofacies types used here are based upon manual lithofacies classifications interpreted from cores, supplemented by wireline log analysis in non-cored intervals, and not an automatic pattern recognition identification of lithofacies types. However, our aim has been to identify the merit of using specific lithofacies weights to explain variations within specific lithofacies and to show how these weights can be used to enhance lithofacies interpretation. The populations identified from the initial wireline logs do not independently outline specific rock types and/or lithological processes in contrast to the unstructured/structured PCA procedure. The different populations in the probability plot analysis have the potential to have large overlaps if the PCs are polymodally distributed with component population means of roughly equal magnitude, but with large standard deviations. Ignoring discernable univariate patterns in the design of subsequent multivariate analysis may lead to unnecessary ambiguities and/or complexities in the multivariate results and the subsequent interpretation (Stanley and Sinclair, 1988). New variables calculated from structured PCA can be powerful discriminators for visualizing within lithofacies signatures that are not achievable with a standard PCA approach. Griffiths (1988) stated that a question asked may be unanswerable within the system in which it is formulated and to solve such a problem it is necessary to enlarge the system, creating a meta-language, and find the solution, if any, within this enlarged system. In this study, the reasoning of Griffiths is applied by using separate loadings from lithological units as a form of meta-language to explain within lithological contrasts. This is further exemplified by the principal structured PCA sandstone loading that had close to identical loading of the second unstructured PC: Despite the fact that the unstructured PCA included other lithofacies wireline responses, its similarity to sandstone processes could not be identified prior to the structured PCA.

The limitation scale for this study is related to the sampling interval of the wireline logs. Even though the sample interval is 15 cm, some of the different wireline log measuring tools can have larger distance between transponder and receiver resulting that not small-scale heterogeneities are captured. Nordahl and Ringrose (2008) concluded that, by using the representative elementary volume (REV) concept as a basis, it is important to incorporate lamina scale (mm) and lithofacies scale (dm-m) heterogeneities into full field reservoir scale heterogeneity to reduce uncertainty in reservoir modeling. This structural PCA approach has been applied on wireline logs to enhance the separation of petrophysical contrasts in fluviodeltaic deposits and support the estimated lithofacies REV around 20 cm lengths stated by Nordahl and Ringrose (2008).

Conclusions

We have evaluated separate PCAs derived from different lithological subsets of the well records to detect and interpret for higher order heterogeneity within the different lithologies. This procedure has allowed us to gain a clearer and more comprehensive interpretation of the data than by use of traditional PCA procedures. A case study analyzing higher order lithological effects from the fluviodeltaic environment of the Heidrun Field, offshore mid-Norway, has indicated that our ability to map and interpret higher order variability will improve the fluviodeltaic reservoir heterogeneity description that is important for production scheduling. The structured/unstructured PCA method adds to the standard interpretation of the wireline log data by identifying specific intra-lithological processes that are not outlined by traditional approaches. This workflow can easily be applied to isolate other depositional environments and is assumed to be particularly valuable in other studies involving heterolithic deposits. The structured/unstructured PCA method permits the effective removal of variability due to gross lithological effects and allows for differential interpretation of heterogeneity. This procedure can further be applied into lithofacies classification routines for incorporating small-scale heterogeneities that potentially can be used to decrease the misclassification records. The use of separate PCAs through the examples given has been effective in portraying petrophysical variability of reservoir properties within different lithological units. We suggest that this procedure should be used to pre-process effective reservoir properties in order to enhance the choice of reservoir drainage strategies.

References

Avseth, P., Mukerji, T., and Mavko, G., 2005, Quantitative seismic interpretation: applying rock physics tools to reduce interpretation risk: Cambridge University Press, Cambridge.
Google Scholar
Bourquin, S., Rigollet, C., and Bourges, P., 1998, High-resolution sequence stratigraphy of an alluvial fan-fan delta environment: stratigraphic and geodynamic implications—an example from the Keuper Chaunoy sandstones, Paris basin: Sed. Geol., v. 121, no. 3–4, p. 207–237.
Article Google Scholar
Brandsegg, K. B., Hammer, E., and Sinding-Larsen, R., 2008, Quantifying fluvial sandstone heterogeneity by using multivariate analysis, in Sirum, H. J. H., and Haukdal, G. K., eds., NGF Abstracts and Proceedings of the Geological Society of Norway: Stavanger, Norway, p. 7–9.
Google Scholar
Bridge, J. S., and Tye, R. S., 2000, Interpreting the dimensions of ancient fluvial channel bars, channels, and channel belts from wireline-logs and cores: AAPG Bull., v. 84, no. 8, p. 1205–1228.
Google Scholar
Chernoff, H., 1973, The use of faces to represent statistical association: J. Am. Stat. Assoc., v. 68, p. 361–368.
Article Google Scholar
Corbett, P., Jensen, J., and Sorbie, K., 1998, A review of up-scaling and cross-scaling issues in core and log data interpretation and prediction, in Harvey, P., and Lovell, M., eds., Core-Log Integration. Vol. 136 of Geological Society Special Publication, 136: Springer, London, p. 9–16.
Google Scholar
Dalgaard, P., 2008, Introductory Statistics with R (2nd edn.): Springer, London.
Google Scholar
Dalland, A., Augedahl, H., Bomstad, K., and Ofstad, K., 1988, The post-Triassic succession of the Mid-Norwegian Shelf, in Dalland, A., Worsley, D., and Ofstad, K., eds., A Lithostratigraphic Scheme for the Mesozoic and Cenozoic Succession Offshore Mid- and Northern Norway. Vol. 4, Norwegian Petroleum Directorate Bulletin: Springer, Stavanger, p. 5–42.
Google Scholar
Davis, J. C., 2002, Statistics and data analysis in geology: Wiley, New York.
Google Scholar
Doveton, J. H., 1994, Geological log analysis using computer methods. Vol. 2, AAPG Computer Applications in Geology.
Eichenseer, H. T., and Leduc, J. P., 1996, Automated genetic sequence stratigraphy applied to wireline logs: Bulletin Des Centres De Recherches Exploration-Production Elf Aquitaine., v. 20, no. 2, p. 277–307.
Google Scholar
Griffiths, J., 1988, Measurement, sampling and interpretation, in Chung, C. F., Fabbi, A. G., and Sinding-Larsen, R., eds., Quantitative Analysis of Mineral and Energy Resources. NATO ASI Series, 82: D. Reidel Publishing Company, Boston, p. 37–56.
Google Scholar
Gupta, R., and Johnson, H. D., 2001, Characterization of heterolithic deposits using electrofacies analysis in the tide-dominated Lower Jurassic Cook Formation (Gullfaks Field, offshore Norway): Petrol. Geosci., v. 7, no. 3, p. 321–330.
Google Scholar
Hammer, E., Mørk, M. B. E., and Næss, A., 2009, Facies controls on the distribution of diagenesis and compaction in fluvial-deltaic deposits. Marine Petrol. Geol. Corrected proof (in press). doi:10.1016/j.marpetgeo.2009.11.002.
Hohn, M. E., McDowell, R. R., Matchen, D. L., and Vargo, A. G., 1997, Heterogeneity of fluvial-deltaic reservoirs in the Appalachian basin: a case study from a Lower Mississippian oil field in central West Virginia: AAPG Bull., v. 81, no. 6, p. 918–936.
Google Scholar
Jolliffe, I., 2002, Principal component analysis (2nd edn.): Springer, New York.
Google Scholar
Kjærefjord, J., 1999. Bayfill successions in the lower Jurassic Åre formation, Offshore Norway: sedimentology and heterogeneity based on subsurface data from the Heidrun Field and analog data from the Upper Cretaceous Neslen Formation, eastern Book Cliffs, Utha, in Hentz, T., ed., 19th Annual Research Conference. Advanced Reservoir Characterization for the Twenty-First Century. Gulf Coast Section and Society Economic Paleontologists and Mineralogists Foundation, Special Publication, p. 149–157.
Martinius, A. W., Ringrose, P. S., Brostrøm, C., Elfenbein, C., Næss, A., and Ringås, J. E., 2005, Reservoir challenges of heterolithic tidal sandstone reservoirs in the Halten Terrace, mid-Norway: Petrol. Geosci., v. 11, no. 1, p. 3–16.
Article Google Scholar
Moline, G. R., and Bahr, J. M., 1995, Estimating spatial distributions of heterogeneous subsurface characteristics by regionalized classification of electrofacies: Math. Geol., v. 27, no. 1, p. 3–22.
Article Google Scholar
Nadler, M., and Smith, E. P., 1993, Pattern recognition engineering: Wiley, New York.
Google Scholar
Nordahl, K., and Ringrose, P. S., 2008, Identifying the Representative Elementary Volume for permeability in heterolithic deposits using numerical rock models: Math. Geosci., v. 40, no. 7, p. 753–771.
Article Google Scholar
Pereira, H. G., Silva, A. C. E., Soares, A., Ribeiro, L., and Decarvalho, J., 1990, Improving reservoir description by using geostatistical and multivariate data-analysis techniques: Math. Geol., v. 22, no. 8, p. 879–913.
Article Google Scholar
Petrovic, A., Khan, S. D., and Chafetz, H. S., 2008, Remote detection and geochemical studies for finding hydrocarbon-induced alterations in Lisbon Valley, Utah: Marine Petrol. Geol., v. 25, no. 8, p. 696–705.
Article Google Scholar
Serra, O., and Abbott, H. T., 1982, The contribution of logging data to sedimentology and stratigraphy: Soc. Petrol. Eng. J., v. 22, no. 1, p. 117–131.
Google Scholar
Singh, Y., 2007, Lithofacies detection through simultaneous inversion and principal component attributes: The Leading Edge, v. 26, no. 12, p. 1568–1575.
Article Google Scholar
Stanley, C. R., and Sinclair, A. J., 1988, Univariate patterns in the design of multivariate analysis techniques for geochemical data evaluation, in Chung, C. F., Fabbi, A. G., and Sinding-Larsen, R., eds., Quantitative Analysis of Mineral and Energy Resources. NATO ASI series, 82: D. Reidel Publishing Company, Boston, p. 113–130.
Google Scholar
Svela, K., 2001, Sedimentary facies in the fluvial-dominated Åre formation as seen in the Åre 1 member in the Heidrun Field, in Martinsen, O., and Dreyer, T., eds., Sedimentary Environments Offshore Norway—Paleozoic to Recent. Vol. 10. Norwegian Petroleum Society Special Publication: Elsevier Science B.V., Amsterdam, p. 87–102.
Google Scholar
Wegman, E. J., 1990, Hyperdimensional data-analysis using parallel coordinates: J. Am. Stat. Assoc., v. 85, no. 411, p. 664–675.
Article Google Scholar
Zhang, G. F., Shen, X. H., Zou, L. J., Li, C. J., Wang, Y. L., and Lu, S. L., 2007, Detection of hydrocarbon bearing sand through remote sensing techniques in the western slope zone of Songliao basin, China: Int. J. Remote Sens., v. 28, no. 7–8, p. 1819–1833.
Article Google Scholar

Download references

Acknowledgments

Heidrun Unit (Statoil Petroleum AS (operator), Petoro AS, ConocoPhillips Skandinavia AS, Eni Norge AS) is acknowledged for providing well data and permission to present case examples from the Heidrun Field. This paper represents a contribution to the GeoEnhance consortium on reservoir characterization at the Norwegian University of Science and Technology (NTNU). The research stems from the post-graduate work of K. B. Brandsegg and E. Hammer at the Faculty of Engineering Science and Technology at NTNU. Special thanks to Arve Næss and Mali Brekken at Statoil and Steinar Ellefmo for helpful comments. Two anonymous reviewers gave valuable comments which greatly improved this paper.

Author information

Kristian Bjarnøe Brandsegg
Present address: Exploro AS, Stiklestadveien 1, N-7041, Trondheim, Norway

Authors and Affiliations

Department of Geology and Mineral Resources Engineering, Norwegian University of Science and Technology, N-7491, Trondheim, Norway
Kristian Bjarnøe Brandsegg, Erik Hammer & Richard Sinding-Larsen

Authors

Kristian Bjarnøe Brandsegg
View author publications
You can also search for this author in PubMed Google Scholar
Erik Hammer
View author publications
You can also search for this author in PubMed Google Scholar
Richard Sinding-Larsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kristian Bjarnøe Brandsegg.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brandsegg, K.B., Hammer, E. & Sinding-Larsen, R. A Comparison of Unstructured and Structured Principal Component Analyses and their Interpretation. Nat Resour Res 19, 45–62 (2010). https://doi.org/10.1007/s11053-010-9110-4

Download citation

Received: 17 September 2009
Accepted: 11 January 2010
Published: 04 February 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11053-010-9110-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Comparison of Unstructured and Structured Principal Component Analyses and their Interpretation

Abstract

Similar content being viewed by others