Introduction

There are four characteristics describing the crystallinity of cellulose: the nature of the crystal lattice, percentage of crystalline components, crystal size and relative orientation of crystals (Ward 1950). The percentage of crystalline components is usually represented by the crystallinity index (CI) measured via techniques such as X-ray diffraction (XRD), solid-state 13C NMR (Park et al. 2010; Ahvenainen et al. 2016), FT Raman spectroscopy (Taylor and Zografi 1998) and the recent vibrational sum-frequency-generation (SFG) spectroscopy (Barnette et al. 2012). Crystallinity is strongly correlated to material strength and stiffness (Ward 1950) and is also an indicator of cellulose enzymatic hydrolysis (Bansal 2011; Bansal et al. 2010) and cellulose-water interaction characteristics (Astley et al. 2001; Park et al. 2009; Fang and Catchmark 2014). Although the precise relationships between these diverse phenomena remain elusive, accurate determination of CI and other crystal parameters such as crystal size will provide a basis for deeper investigations.

In cellulose, both crystalline and amorphous domains are present within fibrils (Ioelovich et al. 2010). Chemical reactions are thought to occur in the amorphous regions and on the surface of crystallites (Ciolacu et al. 2011). Ball milling and phosphoric acid treatment convert crystallites into an amorphous phase. A comprehensive examination of the effects of ball milling on cotton cellulose has recently been conducted (Ling et al. 2019). Bates et al. hypothesize using a random close packed (RCP) model that such amorphous regions retain inherent short-range order (Bates et al. 2006). Hermans and Weidinger have shown that ramie and regenerated cellulose have an invariable diffuse background in their XRD data due to amorphous content (Hermans and Weidinger 1946). They further suggest that all cellulose has such an invariable XRD background.

XRD is used for the assessment of molecular spacing and determination of allomorph, crystal size and crystallinity. Three analytical concepts are adopted to calculate cellulose CI with XRD data (Mittemeijer and Welzel 2008): peak intensity analysis such as peak height method, line profile decomposition such as peak decomposition or deconvolution (Lanson 1997; Park et al. 2010; Ahvenainen et al. 2016), and line profile synthesis such as the Rietveld refinement based methods (Madsen et al. 2011). Qualitative and quantitative comparisons have been presented by researchers (Park et al. 2010; Bansal et al. 2010; Ahvenainen et al. 2016), including the most commonly used peak height, peak deconvolution and amorphous subtraction methods. Molecular dynamics simulations show that the minimum diffraction intensity usually occurs around 18° where the amorphous diffraction intensities overlap crystalline intensities (French 2014). Hence it is not appropriate to use the Segal Crystallinity Index calculated by peak height method as an accurate measure of CI (French and Santiago Cintrón 2013). Instead, peak deconvolution, amorphous subtraction and NMR C4 peak separation methods are used for accurate CI determination (Park et al. 2010; Ahvenainen et al. 2016). Transverse crystallite size is determined based on the Scherrer equation (Langford and Wilson 1978; French and Santiago Cintrón 2013) using the widths of peaks such as the (200), and crystallite length is calculated with the (11-4) and (004) peaks for Cellulose I\({\upalpha }\) and Iβ respectively.

The peak deconvolution method involves deconvoluting diffraction data into separate peaks associated with crystalline structures and amorphous cellulose according to Bragg’s law. CI is calculated by the integrated area ratio of deconvoluted crystalline peaks and the whole diffraction curve. This approach has two issues to be solved: (1) there is no standard XRD data for amorphous cellulose; and (2) common peak functions such as Lorentzian and Gaussian functions do not fit the amorphous profile well. Previous work has recommended that the peak of the amorphous cellulose background be at 18° (Segal et al. 1959) and 16° (Azubuike et al. 2012) for cellulose I and II, respectively and peak height method usually uses the minimum intensity at around 18.3° (Park et al. 2010). However the justification for these recommendations is not sufficient. As commented by International Centre for Diffraction Data (ICDD), the separation of background and amorphous contributions from crystalline contributions remain the primary sources of CI variance and error (Fawcett et al. 2013). In addition, inaccurate modeling of the amorphous component in XRD analysis will affect the deconvoluted crystalline peak profiles, resulting in errors in calculated crystal size and crystallinity. Improvement in the modeling of the amorphous component will enable more accurate analysis of cellulose structure using the peak deconvolution method.

In this work, cellulose from plant and bacterial sources and cello-oligosaccharides are made amorphous by ball milling. It is found that the diffractograms for each amorphous cellulose converge to a common profile. This profile is modeled using a simple Fourier series equation whose coefficients are determined using nonlinear least squares method. This new amorphous cellulose profile is then implemented to calculate cellulose crystallinity and crystal size using the peak deconvolution methodology. Results suggest that previous peak deconvolution analyses using common peak functions and background subtraction approaches have overestimated cellulose crystallinity.

Methods

Materials and equipment

Three types of cellulose have been ball milled to obtain amorphous cellulose: Avicel, pulp and bacterial cellulose (BC). Avicel PH101 microcrystalline cellulose is made from wood pulp and available at commercial vendors. Pulp powder was obtained by ball milling blotting paper pulp (blotting paper, Dick Blick Art Materials, Galesburg, IL) for 10 min. BC was produced by Gluconacetobacter xylinus JCM 9730 (ATCC 700,178, provided by the Bioresource Center of American Type Culture Collection) and the procedure has been described in Guo and Catchmark (2012). BC was ball milled for 10 min to produce powder sample. In addition, oligosaccharides (cellotriose, cellotetraose, and cellopentaose; purchased from Megazyme) were also examined and they were used as delivered without further processing.

Ball milling was performed on a Retsch CryoMill and specifications and settings are shown in Table 1. The sample and jar remained at room temperature during grinding so the cryogenic function was not used.

Table 1 Retsch CryoMill specifications and settings

One-dimensional powder XRD was performed on a PANalytical X’Pert Pro multi-purpose diffractometer (MPD) in focusing geometry mode, known as Bragg–Brentano geometry (Dinnebier 2008) or power mode. MPD operates at 40 kV, 40 mA and the CuK X-ray has a wavelength of 1.5418740 Å. Diffraction angle 2θ ranges from 4.997° to 59.987° with a step size of 0.0262606°. The 10° divergence slit and the 1″ anti-scatter slit both worked on a programmable slit mode, resulting in a uniform irradiated spot area on the sample. Monochromator and collimator were not used. Soller slits were set to 0.04 rad. All samples were powder and contained in the holder of 10 mm diameter and 1 mm thickness.

Range of fitting

5–60° 2θ range is used in most of the XRD data presentation of the paper. In cellulose XRD, this range covers all the notable crystalline peaks. The intensity at 60° 2θ reaches a local minimum and corresponds to a 1.54 Å d-spacing which approximates the bonded C–C distances within a glucose molecule. Beyond 60°, the intensities remain almost invariably low, which could be a combination of atom–atom d-spacing in glucose molecules, the amorphous profile tail (Ju et al. 2015) and crystalline peak harmonics, and hence are not used. Previous work has used 10–40° while there is a visible broad amorphous profile at 40° 2θ (as described in the “Results and discussion” section). To obtain the best results, the range of 5–60° 2θ is used to conserve as much information as possible.

Background subtraction

XRD background originates from air scattering, thermal agitation of atoms, Compton scattering, diffraction associated with the sample holder (Cullity 1978), and incoherent scattering from the sample holder (McCusker et al. 1999). In this study, the use of slits on the X-Ray beam and detector minimizes the effects of air scattering and Compton scattering. Zero background Holder (ZBH) made of single crystal silicon cut at a special orientation parallel to Si (510) plane generates very low background. Figure 1 shows the background from an empty holder and a typical diffractogram from Avicel cellulose.

Fig. 1
figure 1

XRD data of a crystalline Avicel sample and the sample holder

Normally a two-point linear background is subtracted from XRD data, bringing most of the crystalline peaks to the baseline. This is an empirical approach. As seen in Fig. 1, the holder generates about half of the intensity of the two-point linear background in 5–60° 2θ range. Thus instead, this holder diffraction data is used as the background for subtraction. Subtracting instrumental background before peak deconvolution ensures a stable baseline fitting and reduces fitting error.

XRD intensity correction

Powder XRD instrumentation introduces distortion of diffraction data making diffraction peaks asymmetrical as compared to the ideal symmetrical peaks predicted by Bragg’s law (Lanson 1997). These distortion factors include polarization factor, Lorentz factor, temperature factor and absorption factor (Klug and Alexander 1974; Cullity 1978). Lorenz factor is due to the non-monochromaticity and divergence of the X-ray beam and the motion of the sample, which are minor factors using the MPD instrument. The temperature effect is caused by specimen thermal agitation and impacts measured intensities. This effect is cancelled by the absorption effect in the Bragg–Brentano geometry (Cullity 1978) and thus the temperature and absorption effects are usually safe to ignore. Additionally, thermal agitation causes thermal diffuse (TDS) scattering which is usually separated from Bragg scattering as part of the general background (Suortti 1993). In this work, it is assumed to be contained in amorphous scattering and is not corrected. Only the polarization effect is corrected and the factor is known to be (1 + cos22θ)/2 (Buerger and Klein 1945), which divides the original measurement data.

Fourier series

Fourier analysis dates back to the seventeenth century when mathematicians such as Bernoulli, Alembert, Lagrange and Euler sought the possibility of representing an arbitrary function with a trigonometric series. It was Fourier’s book in 1822 (Fourier 1822) that confirmed such a possibility and such mathematical analyses are used in many disciplines. Such trigonometric series (as shown in the following sections) are called Fourier series and the coefficients are called Fourier coefficients. Any data set can be described by a Fourier series expansion. Based on this, three methods have been performed to fit amorphous cellulose XRD data. The fitting goodness is measured by the Pearson product-moment correlation coefficient (PPMCC) (Rodgers and Nicewander 1988), denoted with “r”:

$$ {\text{r}}\left( {{\text{X}},{\hat{\text{X}}}} \right) = \frac{{{\text{cov}}\left( {{\text{X}},{\hat{\text{X}}}} \right)}}{{{\upsigma }_{{\text{X}}} {\upsigma }_{{{\hat{\text{X}}}}} }} = \frac{{{\text{E}}\left[ {\left( {{\text{X}} - {\upmu }_{{\text{X}}} } \right)\left( {{\hat{\text{X}}} - {\upmu }_{{{\hat{\text{X}}}}} } \right)} \right]}}{{\sqrt {{\text{E}}\left[ {\left( {{\text{X}} - {\upmu }_{{\text{X}}} } \right)^{2} } \right]} \sqrt {{\text{E}}\left[ {\left( {{\hat{\text{X}}} - {\upmu }_{{{\hat{\text{X}}}}} } \right)^{2} } \right]} }} $$

This represents the linear correlation between variables and is used to quantify the goodness of fit of a dataset.

Fast Fourier transform (FFT)

To obtain a Fourier series representation of the amorphous cellulose XRD data, a Fourier transform is performed. XRD data is a discrete series and the method for processing such discrete signals is called the discrete Fourier transform (DFT) which is to perform sampling of discrete signals (Stankovic et al. 2013). The Fast Fourier Transform (FFT) is the most frequently used DFT and refers to algorithms that provide DFT coefficient calculation with a reduced number of arithmetic operations. It can be performed using software programs such as MATLAB and PeakFit.

XRD data is not uniformly sampled and interpolation is applied to obtain uniform spatial 2θ interval data to perform the FFT. The output of the FFT is data in the frequency domain (w) that contains the same information as in the 2θ domain. Frequency domain data is then windowed, trimming the data to retain the data within a pre-defined window interval. An inverse FFT is then performed to transform the data back into the 2θ domain, which obtains a Fourier series that fit the original data. The procedure is shown in Fig. 2. Raw i(2θ) is the diffractogram data, i*(2θ) is the interpolated data with uniform 2θ intervals, i**(2θ) is the down-sampled data to reduce computational load, I(w) is the Fourier transformed data in the frequency domain, Iw(w) is the windowed data of I(w) and iw(2θ) is the inverse Fourier transformed data.

Fig. 2
figure 2

FFT fitting procedure

Two parameters in the above procedure are tuned for efficient and accurate fitting: 2θ sampling interval and windowing width. Raw XRD data is sampled every 0.026° and contains about 2000 data points in 5–60° range. Windowing cuts off non-dominant components and high frequency noise while keeping the integrity of the original data. The amount of windowing width determines how many frequency terms will be used in the fitted expression. Smaller sampling interval and larger windowing width will retain more information from the original data and generates better fitting results. Yet we found they have limits, beyond which the fitting accuracy measured by PPMCC will not increase. Avicel XRD data fitting result shows that applying a sampling interval of 1.366° and windowing width of 13 generates r = 0.9955, while reducing computational load by 98%. This is the best result obtained. All calculations were performed using MATLAB.

The above FFT fitting procedure ultimately generates a finite real form Fourier series representing XRD data. An alternative process is directly fitting XRD data to a real form Fourier series. It is much easier to achieve using the fitting functions in computing tools such as MATLAB, as described below.

Real form Fourier series

The real form of Fourier series is \({\rm{f}}\left( {\rm{x}} \right) = {{\rm{a}}_0} + \sum\nolimits_{{\rm{k}} = 1}^\infty {\left[ {{{\rm{a}}_{\rm{k}}}\cos \left( {{\rm{k}}{\omega _0}{\rm{x}}} \right) + {{\rm{b}}_{\rm{k}}}{\rm{sin}}\left( {{\rm{k}}{\omega _0}{\rm{x}}} \right)} \right]}\) where a0, w0, ak, and bk are parameters to be determined by fitting with the original data. The maximum of k, i.e., the order of the series, is set by the user. The higher degree of order, the more accurate the fitting. By examining the Avicel XRD data, it is found that r reaches 0.9955 when a 6th order series is used with 14 parameters to determine. The fitting process is done in MATLAB with the predefined command fit(). This approach is used in the following amorphous XRD data fitting.

A variation on the real form of the Fourier series is the sum of sine functions. Data is fit as \({\text{f}}\left( {\text{x}} \right) = \sum\nolimits_{{{\text{k}} = 1}}^{\infty } {{\text{a}}_{{\text{k}}} {\sin}\left( {{\text{b}}_{{\text{k}}} {\text{x}} + {\text{c}}_{{\text{k}}} } \right)}\) where r is 0.9951 when five sine functions are used, corresponding to 15 unknown parameters to determine. It is similar to the real form Fourier series fitting.

Amorphous profile fitting

Gaussian, Lorenz and Voigt peak functions have shortcomings in fitting amorphous XRD data. The diffractogram shape is quite different from the true profile and the peak position and intensity of the high 2θ tail is compromised to obtain better overall accuracy as seen in Fig. 3. High-order polynomials have also been explored but “the extent of corrections is difficult to control” (Young 1995). The methods described above produce much better fitting and can easily achieve r > 0.99. The fitting process is done with MATLAB and the computational time is trivial. The precise Fourier function is provided in the Amorphous equation section for use by the reader. Compared with other functions, it fits diffractogram data with exceptionally high accuracy and is applied in the following work.

Fig. 3
figure 3

Amorphous Avicel XRD data fit by the Fourier function (r = 0.9992), 10th order polynomial function (r = 0.9875), Gaussian function (r = 0.9323), Lorenz function (r = 0.9606) and Voigt function (r = 0.9267)

Peak deconvolution

Peak deconvolution is performed in PeakFit software and it uses a Levenburg-Marquardt non-linear engine with built-in peak function constraints. Five crystalline peaks are identified and they are the most observed crystalline peaks for cellulose I. Their shape and amplitude are then tuned by PeakFit’s AutoFit tool. The Voigt function is considered the best choice for crystalline peaks and is a convolution of Gaussian and Lorentzian functions. A previous study states that the Gaussian function approximates the interference function obtained from a single crystallite column length (Warren 1990) and characterizes microstrain broadening (Mittemeijer and Welzel 2008), while Lorentzian function characterizes crystal size broadening due to polydisperse and crystallite column lengths (Delhez et al. 1982, 1993). In terms of reliability factor, Voigt functions perform better than Gaussian and Lorenz functions (Wada et al. 1997). PeakFit uses the coefficient of determination r2 to evaluate the deconvolution. It is defined as \({\text{r}}^{2} = 1 - \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {{\text{X}}_{{\text{i}}} - {\hat{\text{X}}}_{{\text{i}}} } \right)^{2} }}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {{\text{X}}_{{\text{i}}} - {\overline{\text{X}}}} \right)^{2} }}\), where \({\text{X}}_{{\text{i}}}\) is the observed ith intensity, \({\hat{\text{X}}}_{i}\) is the fitted ith intensity and \({\overline{\text{X}}}\) is the average of the observed intensities.

This paper follows the procedures described in Fig. 4 and PeakFit settings are shown in Table 2.

Fig. 4
figure 4

Data collection and peak deconvolution procedures

Table 2 PeakFit settings: (1) width of Gaussian response function used for sharpening peaks to uncover hidden peaks; (2) AI (artificial intelligence) seeks optimum frequency domain filtering level; (3) threshold to add a peak; (4) when the data contains obvious outliers, Medium (Lorentzian err) is recommended; (5) not applicable for Gaussian or Lorentzian peaks

During deconvolution, it is important to: (1) make sure the crystalline peaks stay around the empirical peak positions for cellulose; (2) ensure the 38° 2θ peak does not broaden too much by manually controlling the peak width; (3) ensure that the amorphous profile does not exceed the lowest intensity around 18° 2θ. If any of these issues occur, PeakFit allows you to manually reposition or rescale the peaks. When CI is low, the peaks may need to be positioned manually since the crystalline peaks are quite tiny and are apt to vary in compromise of achieving high correlation coefficients.

Results and discussion

All ball-milled cellulose XRD data converge to a common profile. This diffractogram profile is fit with the Fourier series and used in peak deconvolution. The height of the amorphous cellulose Fourier series function is allowed to vary for the best fit. Calculated CIs and crystal sizes are compared with results obtained using the peak height and peak deconvolution methods implementing usual peak functions.

Amorphous cellulose model

Figure 5a–c show XRD data of Avicel, pulp and BC with increasing ball milling time and (d) overlaps the final data. It shows that after sustained ball milling, the XRD profiles converge to a unique shape regardless of sources. The converged XRD profile has a broad peak at 20.6° 2θ with 12.45° FWHM and an even broader peak at 34.6° 2θ.

Fig. 5
figure 5

Normalized XRD data of ball-milled cellulose: a Avicel; b pulp; c BC; d converged profiles after ball milling

The convergence of the XRD profiles is not surprising. The International Centre for Diffraction Data (ICDD) has recommended cryogrinding cellulose as an amorphous reference (Fawcett et al. 2013). Its amorphous standard (PDF entry 00-060-1501) has a very similar diffraction curve shown in Fig. 6. Stubicar et al. observed a similar amorphous profile with two XRD peaks at 20.80° and 36.37° when examining ball milled pulp and microcrystalline cellulose powder (Nada Stubičar 1998). Ju et al. obtained phosphoric acid treated amorphous Avicel with three broad peaks at 20.5°, 38.8° and 80.9° 2θ (Ju et al. 2015). Schwanninger et al. also found invariable characteristics through FT-IR analysis, of ball-milled cellulose and wood (Schwanninger et al. 2004).

Fig. 6
figure 6

Reference amorphous cellulose XRD data from ICDD’s PDF-4 + 2014 database Version 4.1403, obtained by E. Bucher in 2009

At 20.6° 2θ, there are crystalline peaks, i.e., cellulose I \({\upalpha }\) (10-2), cellulose I \({\upbeta }\) (102), cellulose II (012) and cellulose III (100) reflections, according to French’s simulated diffraction patterns (French 2014). Nelson and O’Connor using infrared spectra also showed that amorphous cellulose was very similar to Cellulose II in the four characteristic absorption bands in terms of band intensities and locations (Nelson and O’Connor 1964).

Despite the above observations, we believe that sustained ball milling has dismantled the crystalline structure and the diffractogram shape can be explained by the atomic spacings within the glucan chain, which remains through ball milling. This can be referred to as ‘short range order’ (SRO, see below). SRO is smaller than the persistence length which describes the degree of linear persistence of a glucan chain and is usually 4–30 nm (Swenson et al. 1965; Kroon-Batenburg et al. 1997; Dumitriu 2004). The SRO in cellulose may result in the presence of periodic or quasi-periodic atomic features in amorphous cellulose which determine the shape of the diffractogram. Figure 7 shows the 3D structure of a single cellobiose unit of a glucan chain of cellulose Iα (.cif file exported from JADE) viewed in Jmol (an open-source Java viewer for chemical structures in 3D (Jmol)). Six most visible distances between carbon atoms are listed in Table 3. The amorphous profile features may be caused by the spacing between carbon atoms and associated groups. After ball milling, however, the loss of the cellulose crystal structure may result in slight changes in the atomic spacings. For example, the C6 group may transition to a different conformation or different rotational angles about the β1,4 glycosidic linkage may increase or decrease spacings between atoms located in adjacent glucose molecules. In any case, an examination of the 2θ angles in Table 3 shows good overlap with observed broad peaks in the amorphous cellulose diffractogram at ~ 20°, ~ 35° and the shoulder at ~ 15°.

Fig. 7
figure 7

Cellulose Iα cellobiose unit structure viewed in Jmol

Table 3 Distances between residues in Cellulose I (a, b represent adjacent glucose residues)

These findings agree well with the work of Fink et al. (1987), where they examined 200-h-milled cellulose I and II and saponified cellulose triacetate as amorphous cellulose models with wide-angle X-Ray. They compared the distances indicated by the radial distribution function (RDF) peak with calculated interatomic distances of one cellobiose unit of the “backbone conformation” and “bent and twisted backbone conformation” models. Calculated and observed RDF results both presented six major peaks, corresponding to interatomic distances of approximate 0.14 nm, 0.24 nm, 0.29 nm, 0.37 nm, 0.41 nm and 0.48 nm. Such results can also be studied and validated through XRD pattern simulations introduced by Zhang et al. (2016).

Short range order

Short range order (SRO) refers to d-spacings less than 20 Å. Bates et al. suggest that whatever nanocrystalline solid forms, with SRO, their XRD profiles present no distinguishable difference (Bates et al. 2006). This inspired us to look for the amorphous cellulose’s short range periodicity (Pecharsky and Zavalij 2008) by examining oligosaccharides.

Amorphous cellulose appears to have the same SRO as cellotetrose as seen in Fig. 8c where even ball milling does not change the XRD pattern. Figure 8b, d show that cellotriose XRD profile looks similar and after ball milling cellopentaose also converges to the amorphous BC profile. Cellobiose has peaks at 10.58° and 20.35° 2θ. The latter, corresponding to 0.44 nm, could be the spacing within cellobiose molecule as we illustrate in Fig. 7. It is conserved in the other oligosaccharides and amorphous cellulose.

Fig. 8
figure 8

Normalized XRD data: a cellobiose; b cellotriose; c cellotetraose; d cellopentaose. Control sample materials were tested as provided without further processing

Amorphous equation

An 8th order real form Fourier series model is used for fitting the amorphous cellulose XRD data, with empty holder XRD intensities subtracted and polarization factor corrected as seen below. The XRD data has been attached to this paper as Supplementary Information.

$$\begin{aligned} {\text{f}}\left( {\text{x}} \right) & = {\text{a}}0 + {\text{a1}}*{\cos}\left( {{\text{x}}*{\text{w}}} \right) + {\text{b1}}*{\sin}\left( {{\text{x}}*{\text{w}}} \right) + {\text{a2}}*{\cos}\left( {{2}*{\text{x}}*{\text{w}}} \right) + {\text{b2}}*{\sin}\left( {{2}*{\text{x}}*{\text{w}}} \right) \\ & \quad + {\text{a3}}*{\cos}\left( {{3}*{\text{x}}*{\text{w}}} \right) + {\text{b3}}*{\sin}\left( {{3}*{\text{x}}*{\text{w}}} \right) + {\text{a4}}*{\cos}\left( {{4}*{\text{x}}*{\text{w}}} \right) + {\text{b4}}*{\sin}\left( {{4}*{\text{x}}*{\text{w}}} \right) \, \\ & \quad + {\text{a5}}*{\cos}\left( {{5}*{\text{x}}*{\text{w}}} \right) + {\text{b5}}*{\sin}\left( {{5}*{\text{x}}*{\text{w}}} \right) + {\text{a6}}*{\cos}\left( {{6}*{\text{x}}*{\text{w}}} \right) + {\text{b6}}*{\sin}\left( {{6}*{\text{x}}*{\text{w}}} \right) \, \\ & \quad + {\text{a7}}*{\cos}\left( {{7}*{\text{x}}*{\text{w}}} \right) + {\text{b7}}*{\sin}\left( {{7}*{\text{x}}*{\text{w}}} \right) + {\text{a8}}*{\cos}\left( {{8}*{\text{x}}*{\text{w}}} \right) + {\text{b8}}*{\sin}\left( {{8}*{\text{x}}*{\text{w}}} \right) \\ \end{aligned}$$

where x is 2θ in degrees, sin() and cos() calculations are based on radian inputs and the coefficients (with 95% confidence bounds) are:

  • w = 0.1162 (0.1157, 0.1166)

  • (w = 6.657 if the software trigonometry functions expect radians inputs)

  • a0 = 4987 (4973, 5000)

  • a1 = − 3027 (− 3038, − 3016)

  • b1 = 1359 (1332, 1385)

  • a2 = − 328.5 (− 361.6, − 295.4)

  • b2 = − 1797 (− 1808, − 1787)

  • a3 = 603.9 (583.4, 624.4)

  • b3 = 525.4 (509.8, 540.9)

  • a4 = − 323.1 (− 335.5, − 310.7)

  • b4 = − 179.7 (− 194.6, − 164.9)

  • a5 = 281.4 (272.8, 290)

  • b5 = − 97.82 (− 111.7, − 83.95)

  • a6 = − 25.02 (− 39.84, − 10.21)

  • b6 = 258.1 (251.3, 265)

  • a7 = − 75.76 (− 88.19, − 63.34)

  • b7 = − 142.9 (− 149.9, − 136)

  • a8 = 88.23 (76, 100.5)

  • b8 = 77.83 (66.4, 89.26)

The R2 of the fitting is 0.9984 and the root mean square error is 111.9025.

Deconvolution with Fourier function

Avicel and slightly ball-milled pulp and BC powder XRD data are analyzed with peak deconvolution method. All studies used the conventional 10–40° 2θ range, two-point background and five Voigt peaks to fit crystalline peaks and the amorphous profile is fit with the Fourier profile. Figure 9b, d, f present the deconvoluted peaks while (a, c, e) as a comparison use Voigt functions to fit the amorphous profile. Both approaches give close and good r2 values. The amorphous profile fitted with the proposed Fourier model resembles the Voigt amorphous profile except for the Avicel data, while there is slight difference in peak position and peak shape. The difference in the crystalline peaks is more obvious in terms of peak height and shape, especially for Avicel and pulp samples.

Fig. 9
figure 9

Peak deconvolution of Avicel, pulp and BC XRD data: a, c, e all peaks are fitted with Voigt functions; b, d, f all crystalline peaks are fitted with Voigt functions and the amorphous profile is fitted with the Fourier function determined in Amorphous Cellulose Model section. All the other PeakFit parameters remain the same for af

In all plots, the fitting in the range of 27–31° 2θ is not optimal. This is probably because crystalline peaks exist in that range while none is assigned in the deconvolution. These peaks should be very small and assigning additional crystalline peaks increases the complexity and uncertainty of the deconvolution method. The authors thus suggest using the conventional five crystalline peaks.

The choice of amorphous profile function does affect CI and crystal size values. Table 4 shows the calculated CI and (200) crystal size results when using different peak functions, where the cellulose Iβ (200) Miller indices are used for the similar peak on the BC pattern. There is significant variability in the results, which arises from both the selected model shape of the peak and the mathematical process for optimizing fit. Using the proposed method (Voigt + Fourier), the calculated CI is always lower than the Lorentz fit, but always higher than the Gauss fit. There seems to be no relationship to the CI predicted by using the Voigt fit. Similarly, there is no clear relationship between the predicted crystal size and fit selected. In comparison, a Rietveld method calculates Avicel CI to be 63.7% (Laysa Pires de Figueiredo and Fabio Furlan Ferreira 2014) and Table 5 collects reference values in the literature using various peak deconvolution methods. The calculated values are in the range of the reference values. It is the hypothesis of this work that accurately modeling the amorphous cellulose contribution will provide more accurate CI and crystal size data.

Table 4 CI and crystal size calculated with different peak functions (at 95% confidence level)
Table 5 Reference values of cellulose CI and crystal size

Effect of XRD data background

Different from the conventional two-point background, the sample holder XRD data is used as the background to be subtracted, as described in the “Methods” section. The new background intensities are approximately half of the two-point background for Avicel, and one third for pulp and BC (as seen in Fig. 1). The peak deconvolution range is 10–40° 2θ. The crystalline peaks are fit with Voigt functions. The amorphous profile is fit with a Fourier function, which fits the amorphous cellulose XRD data subtracted by the new background. As seen in Fig. 10g, h, i, the long tail of the XRD profile is mostly attributed to the amorphous phase. This is the major difference with using the two-point background, by which a large portion of the intensities are treated as instrumental background. In other words, in range of 10–40° 2θ the deconvolution of the amorphous profile is the most influenced by using the new background subtraction. On the other hand, the fitting range of 10–40° adds to the overestimation of CI compared with of 5–60° or wider ranges. We use 10–40° range intentionally to compare with previously published data.

Fig. 10
figure 10

Peak deconvolution of background-subtracted XRD data with Voigt (crystalline) and Fourier (amorphous) peaks

Table 6 presents the calculation results with the new background. Applying the new background reduces the CI predicted by the Voigt + Fourier model in comparison to Lorentz and Voigt, while also improving the deconvolution goodness in terms of r2. The (200) crystal sizes predicted by the Voigt + Fourier model are larger for pulp and BC and smaller for Avicel in comparison to values predicted by Lorentz and Voigt.

Table 6 CI and crystal size calculated with new background subtracted (at 95% confidence level)

Effect of polarization correction

Besides the application of the new background, the polarization factor correction is performed on the XRD data before deconvolution in this part. In 10–40° 2θ, the polarization factor (1 + cos22θ)/2 ranges from 0.98 to 0.79. Figure 11 presents the deconvolution results of the corrected data. The curves are gradually shifted upwards from 10 to 40° 2θ, compared with the uncorrected data.

Fig. 11
figure 11

Peak deconvolution of polarization-corrected data with Voigt (crystalline) and Fourier (amorphous) peaks

The correction of polarization factor improves r2 for all samples and all peak functions. This satisfies the expectation that the polarization correction will recover the theoretical XRD curve so that symmetrical peaks (Gauss, Lorentz and Voigt) can fit the crystalline content more realistically. It decreases the calculated CI values for Avicel (relative to Lorentz and Voigt), opposite to the effect on BC and pulp (relative to Lorentz). Table 7 shows all the calculation results.

Table 7 CI and crystal size calculated with polarization factor corrected (at 95% confidence level)

Application to the Rietveld method analysis

Improved diffraction analysis was attempted using the Rietveld method in MAUD software (Luca Lutterotti). However, the amorphous cellulose function developed here could not be implemented in MAUD. MAUD is able to provide models of crystalline cellulose phases that contain all predicted diffraction peaks. However, the amorphous phase cannot be modeled to precisely resemble experimental XRD data. Although the amorphous pattern can be produced by a very high order polynomial in MAUD, the refining calculation cannot be executed. If such additional amorphous models or additional mathematical functions could be integrated into the MAUD software, improved Rietveld refining analysis may be possible. At the same time, the Rietveld method still faces the challenge that it requires specifying the structure (or content of each structure) of the crystalline phase for any given cellulose sample. For natural cellulose samples, this will be difficult.

Conclusions

The XRD results of ball milled Avicel, pulp and bacterial cellulose samples all converge to one steady profile. It favorably matches several external references including phosphoric acid treated amorphous cellulose in the shape and peak positions at 20.6° and 34.6° 2θ. This profile has been considered as the pure amorphous cellulose XRD profile.

This paper then studied the short range order of oligosaccharides and it revealed that cellotriose and cellotetraose have nearly the same XRD profile as amorphous cellulose. After ball milling the XRD diffractogram from cellopentaose also matches it, while cellobiose exhibits a very ordered XRD profile even after ball milling. It can be speculated that the SRO of amorphous cellulose exists in the less-than-five-glucose chain. To verify this, the cellulose Iα cellobiose residue 3D structure was viewed and the d-spacings within glucose unit and between adjacent units match the amorphous 20.6° and 34.6° peaks well.

Peak deconvolution method calculates cellulose crystallinity index using the ratio of the crystalline peak areas and the total XRD profile area, after subtracting the background intensity. The major source of error lies in the determination of amorphous portion from the XRD profile and the accuracy of the background intensity. This paper first determined that the background intensity using a silicon zero-background holder is much smaller than that associated with the approximation of a two-point model, which has been usually used in peak deconvolution. Thus subtraction of the true holder background intensity is proposed and applied in this work. Normal peak functions like Gauss, Lorenz, Voigt and polynomial functions do not fit the amorphous XRD profile very well. Instead, we used a Fourier series function and the fitting is very accurate as measured by the correlation coefficient. The analytical expression of this function can be derived by FFT or nonlinear curve fitting algorithms. Both methods prove to be very accurate and the latter is easier to be applied. A derived amorphous profile function is presented in the Amorphous equation section.

With the revision of background subtraction and amorphous profile fitting, we performed peak deconvolution using the PeakFit program, assigning variable-shape Voigt functions to crystalline peaks and the Fourier function to the amorphous profile allowing its intensity to be scaled by PeakFit. The calculated crystallinity indices of three types of cellulose are consistently smaller than using the previous peak functions and XRD analysis approaches. Such difference can be explained by the fact that common peak functions only model a single peak profile for amorphous cellulose and the two-point background contains part of the amorphous intensities. This work provides a better understanding of amorphous cellulose structure and a reliable peak deconvolution method for analyzing cellulose XRD data.