Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

All data sets are available on the Springer webpage or at the authors’ home pages. More detailed information on the data sets may be found there.

1 B.1 Boston Housing Data

The Boston housing data set was collected by Harrison and Rubinfeld (1978). It comprise 506 observations for each census district of the Boston metropolitan area. The data set was analyzed in Belsley, Kuh and Welsch (1980).

X 1:

per capita crime rate

X 2:

proportion of residential land zoned for large lots

X 3:

proportion of nonretail business acres

X 4:

Charles River (1 if tract bounds river, 0 otherwise)

X 5:

nitric oxides concentration

X 6:

average number of rooms per dwelling

X 7:

proportion of owner-occupied units built prior to 1940

X 8:

weighted distances to five Boston employment centers

X 9:

index of accessibility to radial highways

X 10:

full-value property tax rate per $10,000

X 11:

pupil/teacher ratio

X 12:

\(1000(B-0.63)^{2}\,\textbf{\textit{I}}(B<0.63)\) where B is the proportion of African American

X 13:

% lower status of the population

X 14:

median value of owner-occupied homes in $1000

2 B.2 Swiss Bank Notes

Six variables measured on 100 genuine and 100 counterfeit old Swiss 1000-franc bank notes. The data stem from Flury and Riedwyl (1988). The columns correspond to the following 6 variables.

X 1:

Length of the bank note

X 2:

Height of the bank note, measured on the left

X 3:

Height of the bank note, measured on the right

X 4:

Distance of inner frame to the lower border

X 5:

Distance of inner frame to the upper border

X 6:

Length of the diagonal

Observations 1–100 are the genuine bank notes and the other 100 observations are the counterfeit bank notes.

3 B.3 Car Data

The car data set (Chambers, Cleveland, Kleiner and Tukey, 1983) consists of 13 variables measured for 74 car types. The abbreviations in Table B.3 are as follows:

X 1:

P

Price

X 2:

M

Mileage (in miles per gallone)

X 3:

R78

Repair record 1978 (rated on a 5-point scale; 5 best, 1 worst)

X 4:

R77

Repair record 1977 (scale as before)

X 5:

H

Headroom (in inches)

X 6:

R

Rear seat clearance (distance from front seat back to rear seat, in inches)

X 7:

Tr

Trunk space (in cubic feet)

X 8:

W

Weight (in pound)

X 9:

L

Length (in inches)

X 10:

T

Turning diameter (clearance required to make a U-turn, in feet)

X 11:

D

Displacement (in cubic inches)

X 12:

G

Gear ratio for high gear

X 13:

C

Company headquarter (1 for U.S., 2 for Japan, 3 for Europe)

4 B.4 Classic Blue Pullovers Data

This is a data set consisting of 10 measurements of 4 variables. The story: A textile shop manager is studying the sales of “classic blue” pullovers over 10 periods. He uses three different marketing methods and hopes to understand his sales as a fit of these variables using statistics. The variables measured are

X 1:

Numbers of sold pullovers

X 2:

Price (in EUR)

X 3:

Advertisement costs in local newspapers (in EUR)

X 4:

Presence of a sales assistant (in hours per period)

5 B.5 U.S. Companies Data

The data set consists of measurements for 79 U.S. companies. The abbreviations in Table B.5 are as follows:

X 1:

A

Assets (USD)

X 2:

S

Sales (USD)

X 3:

MV

Market Value (USD)

X 4:

P

Profits (USD)

X 5:

CF

Cash Flow (USD)

X 6:

E

Employees

6 B.6 French Food Data

The data set consists of the average expenditures on food for several different types of families in France (manual workers = MA, employees = EM, managers = CA) with different numbers of children (2, 3, 4 or 5 children). The data is taken from Lebart, Morineau and Fénelon (1982).

7 B.7 Car Marks

The data are averaged marks for 24 car types from a sample of 40 persons. The marks range from 1 (very good) to 6 (very bad) like German school marks. The variables are:

X 1:

A

Economy

X 2:

B

Service

X 3:

C

Non-depreciation of value

X 4:

D

Price, Mark 1 for very cheap cars

X 5:

E

Design

X 6:

F

Sporty car

X 7:

G

Safety

X 8:

H

Easy handling

8 B.8 French Baccalauréat Frequencies

The data consist of observations of 202100 baccalauréats from France in 1976 and give the frequencies for different sets of modalities classified into regions. For a reference see Bourouche and Saporta (1980). The variables (modalities) are:

X 1:

A

Philosophy-Letters

X 2:

B

Economics and Social Sciences

X 3:

C

Mathematics and Physics

X 4:

D

Mathematics and Natural Sciences

X 5:

E

Mathematics and Techniques

X 6:

F

Industrial Techniques

X 7:

G

Economic Techniques

X 8:

H

Computer Techniques

9 B.9 Journaux Data

This is a data set that was created from a survey completed in the 1980‘s in Belgium questioning people’s reading habits. They were asked where they live (10 regions comprised of 7 provinces and 3 regions around Brussels) and what kind of newspaper they read on a regular basis. The 15 possible answers belong to 3 classes: Flemish newspapers (first letter v), French newspapers (first letter f) and both languages (first letter b).

X 1:

WaBr

Walloon Brabant

X 2:

Brar

Brussels area

X 3:

Antw

Antwerp

X 4:

FlBr

Flemish Brabant

X 5:

OcFl

Occidental Flanders

X 6:

OrFl

Oriental Flanders

X 7:

Hain

Hainaut

X 8:

Lièg

Liège

X 9:

Limb

Limburg

X 10:

Luxe

Luxembourg

10 B.10 U.S. Crime Data

This is a data set consisting of 50 measurements of 7 variables. It states for one year (1985) the reported number of crimes in the 50 states of the U.S. classified according to 7 categories (X 3X 9).

X 1:

land area (land)

X 2:

population 1985 (popu 1985)

X 3:

murder (murd)

X 4:

rape

X 5:

robbery (robb)

X 6:

assault (assa)

X 7:

burglary (burg)

X 8:

larcery (larc)

X 9:

autothieft (auto)

X 10:

US states region number (reg)

X 11:

US states division number (div)

figure a

11 B.11 Plasma Data

In Olkin and Veath (1980), the evolution of citrate concentration in the plasma is observed at 3 different times of day, X 1 (8 am), X 2 (11 am) and X 3 (3 pm), for two groups of patients. Each group follows a different diet.

X 1:

8 am

X 2:

11 am

X 3:

3 pm

12 B.12 WAIS Data

Morrison (1990b) compares the results of 4 subtests of the Wechsler Adult Intelligence Scale (WAIS) for 2 categories of people: in group 1 are n 1=37 people who do not present a senile factor, group 2 are those (n 2=12) presenting a senile factor.

WAIS subtests:

X 1:

information

X 2:

similarities

X 3:

arithmetic

X 4:

picture completion

13 B.13 ANOVA Data

The yields of wheat have been measured in 30 parcels which have been randomly attributed to 3 lots prepared by one of 3 different fertilizers A, B, and C.

X 1:

fertilizer A

X 2:

fertilizer B

X 3:

fertilizer C

14 B.14 Timebudget Data

In Volle (1985), we can find data on 28 individuals identified according to sex, country where they live, professional activity and matrimonial status, which indicates the amount of time each person spent on ten categories of activities over 100 days (100⋅24 h=2400 hours total in each row) in the year 1976.

X 1:

prof:

professional activity

X 2:

tran:

transportation linked to professional activity

X 3:

hous:

household occupation

X 4:

kids:

occupation linked to children

X 5:

shop:

shopping

X 6:

pers:

time spent for personal care

X 7:

eat:

eating

X 8:

slee:

sleeping

X 9:

tele:

watching television

X 10:

leis:

other leisures

maus:

active men in the U.S.

waus:

active women in the U.S.

wnus:

nonactive women in the U.S.

mmus:

married men in U.S.

wmus:

married women in U.S.

msus:

single men in U.S.

wsus:

single women in U.S.

mawe:

active men from Western countries

wawe:

active women from Western countries

wnwe:

nonactive women from Western countries

mmwe:

married men from Western countries

wmwe:

married women from Western countries

mswe:

single men from Western countries

wswe:

single women from Western countries

mayo:

active men from Yugoslavia

wayo:

active women from Yugoslavia

wnyo:

nonactive women from Yugoslavia

mmyo:

married men from Yugoslavia

wmyo:

married women from Yugoslavia

msyo:

single men from Yugoslavia

wsyo:

single women from Yugoslavia

maes:

active men from Eastern countries

waes:

active women from Eastern countries

wnes:

nonactive women from Eastern countries

mmes:

married men from Eastern countries

wmes:

married women from Eastern countries

mses:

single men from Eastern countries

wses:

single women from Eastern countries

15 B.15 Geopol Data

This data set contains a comparison of 41 countries according to 10 different political and economic parameters.

X 1:

popu

population

X 2:

giph

Gross Internal Product per habitant

X 3:

ripo

rate of increase of the population

X 4:

rupo

rate of urban population

X 5:

rlpo

rate of illiteracy in the population

X 6:

rspo

rate of students in the population

X 7:

eltp

expected lifetime of people

X 8:

rnnr

rate of nutritional needs realized

X 9:

nunh

number of newspapers and magazines per 1000 habitants

X 10:

nuth

number of television per 1000 habitants

AFS

South Africa

DAN

Denmark

MAR

Marocco

ALG

Algeria

EGY

Egypt

MEX

Mexico

BRD

Germany

ESP

Spain

NOR

Norway

GBR

Great Britain

FRA

France

PER

Peru

ARS

Saudi Arabia

GAB

Gabun

POL

Poland

ARG

Argentine

GRE

Greece

POR

Portugal

AUS

Australia

HOK

Hong Kong

SUE

Sweden

AUT

Austria

HON

Hungary

SUI

Switzerland

BEL

Belgium

IND

India

THA

Tailand

CAM

Cameroon

IDO

Indonesia

URS

USSR

CAN

Canada

ISR

Israel

USA

USA

CHL

Chile

ITA

Italia

VEN

Venezuela

CHN

China

JAP

Japan

YOU

Yugoslavia

CUB

Cuba

KEN

Kenia

  

16 B.16 U.S. Health Data

This is a data set consisting of 50 measurements of 13 variables. It states for one year (1985) the reported number of deaths in the 50 states of the U.S. classified according to 7 categories.

X 1:

land area (land)

X 2:

population 1985 (popu)

X 3:

accident (acc)

X 4:

cardiovascular (card)

X 5:

cancer (canc)

X 6:

pulmonar (pul)

X 7:

pneumonia flu (pnue)

X 8:

diabetis (diab)

X 9:

liver (liv)

X 10:

Doctors (doc)

X 11:

Hospitals (hosp)

X 12:

U.S. states region number (r)

X 13:

U.S. states division number (d)

figure b

17 B.17 Vocabulary Data

This example of the evolution of the vocabulary of children can be found in Bock (1975). Data are drawn from test results on file in the Records Office of the Laboratory School of the University of Chicago. They consist of scores, obtained from a cohort of pupils from the eighth through eleventh grade levels, on alternative forms of the vocabulary section of the Coorperative Reading Test. It provides the following scaled scores shown for the sample of 64 subjects (the origin and units are fixed arbitrarily).

18 B.18 Athletic Records Data

This data set provides data on Men’s athletic records for 55 countries in 1984 Olympic Games.

19 B.19 Unemployment Data

This data set provides unemployment rates in all federal states of Germany in November 2005.

20 B.20 Annual Population Data

The data shows yearly average population rates for Former territory of the Federal Republic of Germany incl. Berlin-West (given in 1000 inhabitants).

21 B.21 Bankruptcy Data I

The data are the profitability, leverage, and bankruptcy indicators for 84 companies.

The data set contains information on 42 of the largest companies that filed for protection against creditors under Chapter 11 of the U.S. Bankruptcy Code in 2001–2002 after the stock market crash of 2000. The bankrupt companies were matched with 42 surviving companies with the closest capitalizations and the same US industry classification codes available through the Division of Corporate Finance of the Securities and Exchange Commission (CF SEC, 2004).

The information for each company was collected from the annual reports for 1998–1999 (CF SEC, 2004), i.e., three years prior to the defaults of the bankrupt companies. The following data set contains profitability and leverage ratios calculated, respectively, as the ratio of net income (NI) and total assets (TA) and the ratio of total liabilities (TL) and total assets (TA).

22 B.22 Bankruptcy Data II

Altman (1968), quoted by Morrison (1990a), reports financial data on 66 banks.

X1 = (working capital)/(total assets)

X2 = (retained earnings)/(total assets)

X3 = (earnings before interest and taxes)/(total assets)

X4 = (market value equity)/(book value of total liabilities)

X5 = (sales)/(total assets)

The first 33 observations correspond to bankrupt banks and the last 33 for solvent banks as indicated by the last columns: values of y.

Original Data:

 

X1

X2

X3

X4

X5

y

1

36.70

-62.80

−89.50

54.10

1.70

1

2

24.00

3.30

−3.50

20.90

1.10

1

3

−61.60

−120.80

−103.20

24.70

2.50

1

4

−1.00

−18.10

−28.80

36.20

1.10

1

5

18.90

−3.80

−50.60

26.40

0.90

1

6

−57.20

−61.20

−56.60

11.00

1.70

1

7

3.00

−20.30

−17.40

8.00

1.00

1

8

−5.10

−194.50

−25.80

6.50

0.50

1

9

17.90

20.80

−4.30

22.60

1.00

1

10

5.40

−106.10

−22.90

23.80

1.50

1

11

23.00

−39.40

−35.70

69.10

1.20

1

12

−67.60

−164.10

−17.70

8.70

1.30

1

13

−185.10

−308.90

−65.80

35.70

0.80

1

14

13.50

7.20

−22.60

96.10

2.00

1

15

−5.70

−118.30

−34.20

21.70

1.50

1

16

72.40

−185.90

−280.00

12.50

6.70

1

17

17.00

−34.60

−19.40

35.50

3.40

1

18

−31.20

−27.90

6.30

7.00

1.30

1

19

14.10

−48.20

6.80

16.60

1.60

1

20

−60.60

−49.20

−17.20

7.20

0.30

1

21

26.20

−19.20

−36.70

90.40

0.80

1

22

7.00

−18.10

−6.50

16.50

0.90

1

23

−53.10

−98.00

−20.80

26.60

1.70

1

24

−17.20

−129.00

−14.20

267.90

1.30

1

25

32.70

−4.00

−15.80

177.40

2.10

1

26

26.70

−8.70

−36.30

32.50

2.80

1

27

−7.70

−59.20

−12.80

21.30

2.10

1

28

18.00

−13.10

−17.60

14.60

0.90

1

29

2.03

−38.00

1.60

7.70

1.20

1

30

−35.30

−57.90

0.70

13.70

0.80

1

31

5.10

−8.80

−9.10

100.90

0.90

1

32

0.01

−64.70

−4.00

0.70

0.10

1

33

25.20

−11.40

4.80

7.00

0.90

1

34

35.20

43.00

16.40

99.10

1.30

0

35

38.80

47.00

16.00

126.50

1.90

0

36

14.00

−3.30

4.00

91.70

2.70

0

37

55.10

35.00

20.80

72.30

1.90

0

38

59.30

46.70

12.60

724.10

0.90

0

39

33.60

20.80

12.50

152.80

2.40

0

40

52.80

33.00

23.60

475.90

1.50

0

41

45.60

26.10

10.40

287.90

2.10

0

42

47.40

68.60

13.80

581.30

1.60

0

43

40.00

37.30

33.40

228.80

3.50

0

44

69.00

59.00

23.10

406.00

5.50

0

45

34.20

49.60

23.80

126.60

1.90

0

46

47.00

12.50

7.00

53.40

1.80

0

47

15.40

37.30

34.10

570.10

1.50

0

48

56.90

35.30

4.20

240.30

0.90

0

49

43.80

49.50

25.10

115.00

2.60

0

50

20.70

18.10

13.50

63.10

4.00

0

51

33.80

31.40

15.70

144.80

1.90

0

52

35.30

21.50

−14.40

90.00

1.00

0

53

24.40

8.50

5.80

149.10

1.50

0

54

48.90

40.60

5.80

82.00

1.80

0

55

49.90

34.60

26.40

310.00

1.80

0

56

54.80

19.90

26.70

239.90

2.30

0

57

39.00

17.40

12.60

60.50

1.30

0

58

53.00

54.70

14.60

771.70

1.70

0

59

20.10

53.50

20.60

307.50

1.10

0

60

53.70

35.90

26.40

289.50

2.00

0

61

46.10

39.40

30.50

700.00

1.90

0

62

48.30

53.10

7.10

164.40

1.90

0

63

46.70

39.80

13.80

229.10

1.20

0

64

60.30

59.50

7.00

226.60

2.00

0

65

17.90

16.30

20.40

105.60

1.00

0

66

24.70

21.70

−7.80

118.60

1.60

0