Keywords

2010 Mathematics Subject Classification:

1 Introduction

Timbres and colors fascinated musicians, artists, and scientists across centuries [9, 20]. In physics, the complexity of timbre is due to the superposition of simple components (sinusoidal waves), which can be separated with Helmholtz resonators [16]. Timbres can be computationally investigated with Fourier transforms and sonograms, which show the strength of each component (partial) of the superposition. Colors are also related to the idea of superposition, as proved by [32] for white light, which can be decomposed in colors through a prism. The physics involved is quite different: sound involves mechanical longitudinal waves, while light is made of electromagnetic transverse waves. However, timbres and colors have a main similarity: they are complex signals, made of simple superposed wave signals [37]. This suggests a correspondence based on the relation between sound frequency and spatial frequency of light, but according to [5], absolute correspondences between these domains are difficult to establish, so the relativity and the obstructions of this problem could be softened in the categorical context.

In fact, the point of view of precise measurement can be enriched in several ways. Scholars such as Goethe pointed out the importance of perception to understand colors in the framework of nature and the arts [11]. In addition, both colors and timbres can be qualitatively rated as, for example, cold, strong, or delicate. Even though different cultures can associate a different (symbolic) meaning to each color, we can find aspects with certain universality, related to human perception. Some colors are more instinctively associated with higher or lower tension: red or yellow raise more attention than light blue or gray. Similarly, specific orchestral timbres are more awakening than others: a loudFootnote 1 trumpet sound is a more effective alarm than a soft flute melody. Some recent studies point out the importance of a “shared emotion” to associate colors and musical sequences [33], which also occurs in the framework of classical music listening [8]. On the other hand, both colors and timbres can be mixed or shaded—as it happens for painting and orchestration, respectively, transforming a delicate sound (or color) into a strong sound (or color). In this way, we can draw upon the idea of superposition and similarity of perception to imagine how we can investigate colors and timbres, focusing on common aspects through abstraction.Footnote 2 These aspects are intensities, mixing, and shadows/nuances. In particular, harmonic choices, which influence timbre, are also ruled by the idea of superposition.

In this article we introduce fundamental groupoids of color and timbre spaces and functors between them. These functors could be induced by some classical (possibly) continuous maps suggested in [5]. This categorical framework [22, 23] could be adequate to express the superposition and similarity principles to understand the color/timbre relation, complementing analytical approaches. Categories have already been used to investigate processes and phenomena in the arts from a bird’s-eye perspective [19, 21, 30].

This article is structured as follows. In Sect. 2, we review some color spaces, timbre spaces, and maps between them. In Sect. 3, we offer a categorical enrichment of previous approaches to relate color and timbre. In particular, in Sect. 3.3, we include a computational example of interaction between color and timbre paths. Then, Sect. 4 is devoted to a gestural extension of the previous enrichment. In Sect. 5 some conclusions and further possible applications are discussed. In the Glossary (Sect. 6) we provide definitions of some specialized mathematical concepts that we mention. We use boldface for these terms.

As a general disclaimer: colors, timbres, and their relationships constitute a vast topic. This is a position paper (or rather, a working one) aiming to open the way toward further studies in this field.

2 Spaces and Mappings: An Overview

2.1 The CIE 1931 Color Space

The CIE model [10] connects the visible spectrum with human perception. It assigns to each spectrum wavelength \(\lambda \in [380,780]\), measured in nanometres, three sensitivity level values \(\overline{x}(\lambda )\), \(\overline{y}(\lambda )\), and \(\overline{z}(\lambda )\) corresponding to the kinds of human cone cells under certain standard conditions. Thus, a spectral distribution yields, by integration of its product with each color matching function (\(\overline{x}\), \(\overline{y}\), or \(\overline{z}\)), a triple (XYZ) of color coordinates. All these triples amount to the unit cube \([0,1]^3\), after normalizing units. We embed this cube in \(\mathbb {R}^3\), regarding the latter as a vector space and a topological space. The vector sum in the cube, whenever defined, corresponds to color mixing (superposition of light beams).Footnote 3 If the sum is not in \([0,1]^3\), one can take an average of vector components to represent a mixture (with average intensity) for computational purposes.

On the other hand, the standard RGB space is used for screens and photography, so we need it for experiments. It has red, green, and blue as primary colors, which give white if superposed. The standard RGB model does not cover the CIE gamut in principle, for instance, a spectral violet. However, we can transform CIE to standard RGB by means of an appropriate conversion of CIE to linear RGB followed by electro-optical transfer. The RGB space has already been considered for mathematical modeling [34, 35]. In particular, [35] proposed a three-dimensional space of perceived colors, where equivalence classes correspond to perceptual match.

2.2 Timbre Space

As a possible representation of timbres, we can consider the space proposed by Grey [13], based on the dissimilarity between pairs of musical instrument sounds.

On the other hand, we have the set of all continuous periodic maps. These maps represent continuous sound waves that can be recovered from their Fourier series according to Fourier’s and Fejér’s theorems [4, Section 2.4]. It is embedded in the space of all continuous maps \(\mathbb {R}\longrightarrow \mathbb {R}\), which has the compact-open topology and is a vector space. Superposition of waves corresponds to addition of the associated periodic functions, although the result need not be periodic. In what follows, we take the topological space of continuous periodic maps as our timbre space, given the structural analogy with the CIE space in the sense that color/wave superpositions correspond to vector sums.

2.3 Maps Between Timbre and Color

According to [5], a possible correspondence between color and sound can be based on the idea that a musical octave should match a color octave. A musical octave is a closed interval of the form [f, 2f], where f is a fixed sound frequency in Hertz. Human vision barely ranges through color octave, namely the interval of wavelengths in nanometres [380, 760], which corresponds to the interval of spatial frequencies [(1/2)(1/380), 1/380] by means of the assignment \(\lambda \mapsto 1/\lambda \). Thus, the map \(\lambda \mapsto 760f/\lambda \) is a continuous bijection from the color octave [380, 760] to the musical octave [f, 2f]. Note that under this logic, the color order violet-blue-green-yellow-orange-red corresponds to a decreasing pitch frequency.

Since human hearing ranges frequencies in the Hertz interval [20, 20000], and therefore several octaves, there is not a perfect correspondence between wavelengths and frequencies. This suggests reducing the interval [20, 20000] modulo a chosen octave and then using the previous correspondence. The resulting map is continuous under the assumption that we identify the endpoints of [380, 760].

There are other possibilities for a correspondence between color and sound. Some scholars focus on perceived correspondences of pitch classes with classes of colors [18]. The use of classes can be formalized by means of quotient spaces. Classes take into account perceptive similarities but not perfect one-to-one associations. Other continuous correspondences could associate the transition from violet to red with an increasing pitch frequency.

The following construction is a possible way to get a continuousFootnote 4 map from the timbre space to the CIE color space. First, let us consider the case of a timbre given by simple FM synthesis [4, Sect. 8.8], namely a periodicFootnote 5 wave corresponding to

$$\begin{aligned} \sin [\omega _c t+I\sin (\omega _m t)], \end{aligned}$$
(1)

where \(\omega _c=2\pi f_c\), \(\omega _m=2\pi f_m\), \(f_c\) is the carrier frequency, \(f_m\) is the modulator frequency, and I is the modulation index. An associated convergent series is

$$\begin{aligned} \sum _{n =-\infty }^{\infty }J_n(I)\sin [(\omega _c+n\omega _m)t], \end{aligned}$$
(2)

where \(J_n\) is the nth Bessel function of the first kind. This series expresses the wave in terms of simple harmonics with frequencies \(f_c+n f_m\) for \(n\in \mathbb {Z}\). By factorizing the sign of each negative value of \(f_c + nf_m\) outside of \(\sin [(f_c + nf_m)t]\) we obtain:

$$\begin{aligned} \sum _{n=0}^{\infty }a_n\sin (2\pi f_n t). \end{aligned}$$
(3)

Thus, given a continuous map h from frequencies to color wavelengths, we construct (by linearity) the series in the CIE space

$$\begin{aligned} \sum \limits _{n=0}^{\infty }a_n XYZ(h(f_n)), \end{aligned}$$
(4)

where \(XYZ(\lambda )\) gives the CIE coordinates of the wavelength \(\lambda \), whenever the series converges in the CIE space. In general, one could use the Fourier series [30, p. 1019]:

$$\begin{aligned} a_0+\sum \limits _{n=1}^{\infty }a_n \sin (2\pi n f t +\phi _n) \end{aligned}$$
(5)

of the given continuous periodic wave and associate the series (if it converges in the CIE space)

$$\begin{aligned} a_0+\sum \limits _{n=1}^{\infty }a_n XYZ(h(n f)), \end{aligned}$$
(6)

but it is to be determined whether (1) this procedure coincides with that used for FM synthesis and (2) the phase \(\phi _n\) affects the color quality. These are open questions. In Sect. 3.3 we exemplify computationally the procedure for the FM case.

3 Categorical Enrichment

Color and timbre, and their relations, can be recast in a categorical framework, where we emphasize the color and timbre transitions, rather than the objects color and timbre themselves.

Each topological space X (like the CIE and timbre space) has an associated category whose morphisms are invertible, that is, a groupoid. Its objects are the elements of X and its morphisms are homotopy classes of paths in X. The composition \([\tau ]\circ [\sigma ]\) of two classes \([\sigma ]:x\longrightarrow y\) and \([\tau ]:y\longrightarrow z\) is the class of the concatenation \(\sigma \tau \). The identity on x is the class of the associated constant map and the inverse of a path \(\sigma \) sends \(t\in [0,1]\) to \(\sigma (1-t)\). This construction can be generalized to yield higher relations between paths as follows.

3.1 Induced Infinity-Groupoids

Let us consider the singular complex \(\text {Sing}(X)\), which is a simplicial set and an \(\infty \)-groupoid, under the definitions in Sect. 6. According Proposition 1.9 and Remark 1.10 from [14], \(\infty \)-categories have n-morphisms for each \(n\ge 0\) and composition of them, which is associative up to homotopy. Thus, 1-morphism of \(\text {Sing}(X)\) is a path in X, and a 2-morphism is a homotopy between two paths with the same endpoints. Note that the groupoid of X comes from homotopy classes of 1-morphisms and hence the concatenation of them is associative up to homotopy equivalence. On the other hand, a 2-morphism can be seen as a band of intermediate paths between two given ones that connect the same points. Figure 1 shows examples of 1-morphisms and 2-morphisms in the cases of the CIE and timbre spaces. More generally, we can define n-morphisms of the singular complex, which describe the evolution of a single color (timbre), of a path of colors (timbres), of a homotopy of paths, and so on.

We emphasize the need for higher relations and bands. For example, we can map the transition light blue\(\rightarrow \)dark blue into the transition light green\(\rightarrow \)dark green, creating a band that connects, as different shades, light green with light blue, and dark green with dark blue.Footnote 6 If the initial and final points of the band coincide, we can have the situation described in Fig. 1, where the dark blue becomes a light blue through different paths: some paths remain in the blue area, while other ones cross the violet area [27].

Fig. 1.
figure 1

(a) A 1-morphism in the space of colors, a path between two colors, (b) a 2-morphism in the same space, a band between two color paths, (c) a 1-morphism in the timbre space, and (d) a 2-morphism in the same space.

3.2 Induced Functors

Given two topological spaces X and Y, which can be the timbre and the CIE color space respectively, and a continuous map \(f:X\longrightarrow Y\) there is an induced natural transformation \(F:\text {Sing}(X)\longrightarrow \text {Sing}(Y)\) that sends a singular n-simplex \(\sigma :\varDelta ^n\longrightarrow X\) to \(f\sigma :\varDelta ^n\longrightarrow Y\). According to the definition in Sect. 1.2.7 of [22], which says that a functor between infinity-categories is a natural transformation between the respective simplicial sets, F is a functor from \(\text {Sing}(X)\) to \(\text {Sing}(Y)\).

Note that F coincides with f on objects and sends a 1-morphism \(\sigma \) in X to the path \(f\sigma \) in Y.

As any functor between infinity-categories, F preserves the usual categorical structure (up to homotopy), in the sense that

$$\begin{aligned} F([id_x])=[id_{f(x)}] \end{aligned}$$

whenever \(x \in X\) and

$$\begin{aligned} F([\tau ]\circ [\sigma ])=F([\tau ])\circ F([\sigma ]) \end{aligned}$$

whenever \(\sigma :x\longrightarrow y\) and \(\tau :y\longrightarrow z\) are paths in X. More generally, F preserves the compositions of higher morphisms in an appropriate sense, but we omit these technical details. Next, a computational sketch of a functor from timbre to color.

3.3 A Computation of Colors from Timbres

As an example of associations between a timbre path and a color path, let us consider the progressive enrichment of a simple 440 Hz sine wave with harmonics, using FM synthesis, and the associated color transition.

More formally, take \(f_c=440\) and \(f_m=2f_c\). By regarding the modulation index I as a parameter in the interval [0, 20], we obtain a continuousFootnote 7 path in the timbre space with parametrization (Eq. 1):

$$\begin{aligned} \sin [\omega _c t+I\sin (\omega _m t)] . \end{aligned}$$

The result is a fluctuation in the brilliance of a sort of clarinet sound since only odd harmonics are present.Footnote 8 Figure 2 is the corresponding spectrogram of the timbre path.

To obtain a color path (Fig. 3) we use the procedure in Sect. 2.3 for each value of I, see Eq. (6). For each new value of the index modulation I, harmonics vary, reaching a new timbre in Fig. 2. For each new value of I, and thus, for each timbre point reached, there is a color point reached in Fig. 3. In fact, each color bar represents a color point in the space of colors. This could mean that we are using the functor induced by any of the continuous maps from timbres to colors (Sect. 2.3), according to Sect. 3.2. In Fig. 3, the color squares correspond to the modulation index I values n/10 for integers n from 0 to 200. There, the modulation index increases from left to right and from top to bottom. Then one uses conversion to RGB for screen representation (Sect. 2.1).

The Python codes for the FM path and color path are available at https://github.com/medusamedusa/color_gesture.

The results agree with Caivano’s reflections [5]: the closer the sound to a white noise, the closer the color to white light, with additive color mixing. The inverse choice could associate the richness of harmonics (especially in a low-register orchestral range) with a darker color, more like in painting, with subtractive color mixing. In the first case, primary colors are red, blue, and green, and their sum gives white; in the second case, primary colors are red, yellow, and blue, and their sum gives black. In gestural chromo-similarity (Sect. 4.1), in analogy with painting we may use the second option (subtractive), see an example in [27].

Fig. 2.
figure 2

An example of timbre path. The spectrogram is obtained with SonicVisualiser. The darker the color, the closer the sound to silence.

Fig. 3.
figure 3

Visual color gradient corresponding to the timbre path of Fig. 2. Each color corresponds to a value of the modulation index I.

4 Gestural Considerations

We close this paper with some gestural reflections that may enrich the color and timbre relation theory.

4.1 From Paths to Gestures and Gestural Similarity

Color and timbre paths (or 1-morphisms) are particular cases of gestures [1, 2, 7, 17, 25, 28, 31], which are informally diagrams (shaped by a digraph) of paths in a topological space. Continuous maps induce new ones between respective spaces of gestures, as we explain in Sect. 4.2, so there are correspondences between color gestures and timbre gestures.

We can talk of gestural similarity if musical sequences (auditory domain) and simple sketches (visual domain) appear as being produced by the same generator gesture [24]. This possible definition is supported by the hypothesis of a supramodal brain [36]. Thus, when gestures in the space of colors and gestures in the space of timbres show perceptive analogies, we can talk of chromo-gestural similarity.

4.2 Induced Maps Between Spaces of Gestures

Let \(\varGamma \) be a digraph. A continuous map \(f:X\longrightarrow Y\) induces a new oneFootnote 9 between topological spaces of \(\varGamma \)-gestures, namely

$$\begin{aligned} \varGamma \pitchfork F:\varGamma \pitchfork S_X\rightarrow \varGamma \pitchfork S_Y: \left( (c_a)_{a\in A},(x_v)_{v\in V}\right) \mapsto \left( (fc_a)_{a\in A},(f(x_v))_{v\in V}\right) , \end{aligned}$$

where \(\varGamma \pitchfork S_X\) (\(\varGamma \pitchfork S_Y\)) is the space of \(\varGamma \)-gestures in X (Y) (respectively).

As an example, the Attack-Delay-Sustain-Release (ADSR) envelope of a sound is a gesture shaped by the digraph \(\bullet \rightarrow \bullet \rightarrow \bullet \rightarrow \bullet \rightarrow \bullet \) in the amplitude-time space. The envelope has a main role in timbre perception. We can transfer the envelope to the color space by regarding it as an intensity gesture of a single color. In fact, this remark may raise new questions regarding color envelopes, and transitions effects from a color to another one.

Color and timbre ramifications, which are gestures, are interesting objects to study and apply to composition. Shaping the orchestral colors, in particular, is a distinctive mark of a composer’s style, of a genre, of an epoch. Thus, the proposed ideas can be developed in terms of machine learning as exploited in music information retrieval. Vice versa, a creative interface may be developed starting from the proposed theoretic tools.

Note that the objects and 1-morphisms of the \(\infty \)-groupoid \(\text {Sing}(\varGamma \pitchfork S_X)\) are \(\varGamma \)-gestures in X and paths between gestures, respectively. This groupoid allows one to generalize the idea of timbre paths to transformations between timbres with different ADSR envelopes (loudness profile over time). We may, for example, keep the timbre of a musical instrument while changing its envelope, or keep the envelope and change the timbre, thus performing separate transformations of the envelope and timbre in terms of spectral superposition. As a final abstraction, \(\varGamma \pitchfork F\) induces a functor between \(\infty \)-groupoids \(\text {Sing}(\varGamma \pitchfork S_X)\longrightarrow \text {Sing}(\varGamma \pitchfork S_Y)\), which would help transfer envelope transitions between the color and timbre domains.

5 Conclusion

The proposed categorical framework could be a way to understand the relation between color and timbre, complementing classical approaches from physics. This framework is based on structural analogies between the perceptual domains of hearing and vision. It is interesting to ask to what extent categorical models could be independent from perception and classical models, taking into account the computational advantages of the latter.

We also proposed a gestural extension of the categorical framework to capture gestural similarities between the musical and auditory domains.

As a possible, alternative structure to look at, we could consider the Moore paths as 1-cells, in order to have a strictly associative composition, taking homotopies of homotopies for the 2 cells [12]. Given that we are interested in invertible arrows, another suitable structure appears to be the bigroupoid [15], that is, a weakly-invertible bicategory. Concerning the spaces, we could also consider the Euclidean space of colors (as RGB) and the Euclidean space of timbres as defined by Grey [13]. In a (bi)groupoid, all arrows are invertible. In this way, the points (single colors, single timbres) are 0-cells; the color gestures and timbre gestures are the paths, the 1-cells; the bands (hypergestures in the sense of [31]) are the 2-cells. Path associativity is verified for equivalence classes of homotopies. The model of bigroupoid for color and timbre gestures is discussed in detail in [27].

However, we stress the fact that \(\infty \)-categories simplify the involved axioms and computations in higher category, 2- and bi-categories included.

This research could lead to signal processing practical implementations, and it could provide a theoretical framework to analyze experiments in the domain of musical timbre. On the creative side, other possible directions may involve the development of interfaces for composers to manipulate timbres through symbols and/or color references, and for visual artists to do the inverse.

The possibility of translating structures from one domain to another one, provided that some cognitive conditions are verified [29], can open scenarios also for disability studies, where people with visual impairment can benefit from auditory-accessible interfaces, and people with auditory impairment can benefit from visually-accessible interfaces [5, pp. 128–129]. The reference to gesture and touch regarding intensity, organization, and time distribution of stimuli can inspire even more audacious applications for touch-based interfaces for deaf-blind people.

Thus, a simple question such as “can we join timbres and colors?” can open the way to striking applications to improve people’s lives.

6 Glossary

Bicategory. In a bicategory, the morphism composition is not associative, but only associative up to an isomorphism. This notion has been introduced by Bénabou in 1967 [3]. The objects are the 0-cells, the morphisms are the 1-cells, and the morphisms between morphisms are the 2-cells.

Bigroupoid. A bigroupoid is a bicategory whose “2-cells are strictly invertible, and the 1-cells are invertible up to coherent isomorphism” [15].

Compact-Open Topology. The subbasic opens of the compact-open topology on the space of continuous maps \(\mathbb {R}^{\mathbb {R}}\) are those of the form

$$\begin{aligned} \{f:\mathbb {R}\longrightarrow \mathbb {R}\text { continuous }\ |\ f(K)\subseteq U\}, \end{aligned}$$

where K is compact (closed and bounded) in \(\mathbb {R}\) and U is open in \(\mathbb {R}\). This makes \(\mathbb {R}^{\mathbb {R}}\) an exponential in the category of topological spaces. The fact that Top is not Cartesian closed does not imply the non-existence of \(\mathbb {R}^\mathbb {R}\).

Simplicial Category. Denote by [n] the ordered set (ordinal) \(\{0,1,\dots ,n\}\) for \(n\in \mathbb {N}\). The simplicial category \(\varDelta \) has as objects all [n] for \(n\in \mathbb {N}\) and as morphisms all order-preserving maps between them.

Standard Simplex (Functor). For each \(n\in \mathbb {N}\), we define the standard n-simplex \(\varDelta ^n\) as the set

$$\begin{aligned} \{(t_1,\dots ,t_n)\ | \ 0\le t_1\le \dots \le t_n\le 1\}. \end{aligned}$$

The standard n-simplex is a subspace of \(\mathbb {R}^n\) and this construction defines a standard simplex functor \(\varDelta ^{(-)}\) from the simplicial category to the category of topological spaces, which sends an order-preserving map \(\alpha :[n]\longrightarrow [m]\) to the appropriate continuous map \(\varDelta ^{\alpha }:\varDelta ^n\longrightarrow \varDelta ^m\) sending the ith vertex (with \(n-i\) zeros) to the \(\alpha (i)\)th one. Examples: \(\varDelta ^{0}\) is a singleton, \(\varDelta ^{1}\) is the interval [0, 1] in \(\mathbb {R}\); \(\varDelta ^{2}\) is the triangle with vertices (0, 0), (0, 1), and (1, 1) in \(\mathbb {R}^2\); and \(\varDelta ^{3}\) is the tetrahedron with vertices (0, 0, 0), (0, 0, 1), (0, 1, 1), and (1, 1, 1) in \(\mathbb {R}^3\).

Simplicial Set. Functor from the opposite \(\varDelta ^{op}\) of the simplicial category to the category \(\mathbf {Set}\) of sets. Example: The singular complex \(\text {Sing}(X)\) of a topological space X.

Singular Complex. The singular complex of a topological space X, denoted by \(\text {Sing}(X)\), is the simplicial set \(\mathbf {Top}(\varDelta ^{(-)},X)\), where \(\varDelta ^{(-)}\) is the standard simplex functor. Examples: a 0-simplex of \(\text {Sing}(X)\) is a point of X, a 1-simplex of \(\text {Sing}(X)\) is a path in X.

Infinity-Category. A simplicial set S is a set such that given \(n\in \mathbb {N}\) and k with \(0<k< n\), for each subset \(\{a_i\ | \ 0\le i\le n;\ i\ne k\}\) of \(S([n-1])\) satisfying the identities

$$\begin{aligned} d_i(a_j)=d_{j-1}(a_i)\ \ (i<j;\ i,j\ne k), \end{aligned}$$

there is an element \(a\in S([n])\) such that \(d_i(a)=a_i\) for \(i\ne k\). If this property also holds for \(k=0\) and \(k=n\), then we say that S is an \(\infty \)-groupoid. Example: The singular complex of a topological space is an \(\infty \)-groupoid.

Topological Space of Gestures. Let \(\varGamma \) be a digraph \((A,V,d_0,d_1)\) and X a topological space. The space of \(\varGamma \)-gestures in X, denoted by \(\varGamma \pitchfork S_X\) (where \(\pitchfork \) stands for transversality), is the subspace of the product space (compact-open topology on \(X^I\))

$$\begin{aligned} \left( X^I\right) ^A\times X^V \end{aligned}$$

consisting of all sequences \(\left( (c_a)_{a\in A},(x_v)_{v\in V}\right) \) such that \(c_a(i)=x_{d_i(a)}\) for \(i=0,1\). We say that such a sequence is a \(\varGamma \)-gesture in X.