Designing and evaluating the usability of an API for real-time multimedia services in the Internet

López-Fernández, Luis; García, Boni; Gallego, Micael; Gortázar, Francisco

doi:10.1007/s11042-016-3729-z

Designing and evaluating the usability of an API for real-time multimedia services in the Internet

Published: 13 August 2016

Volume 76, pages 14247–14304, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Designing and evaluating the usability of an API for real-time multimedia services in the Internet

Download PDF

Luis López-Fernández¹,
Boni García¹,
Micael Gallego¹ &
…
Francisco Gortázar¹

844 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

In the last few years, multimedia technologies in general, and Real-Time multimedia Communications (RTC) in particular, are becoming mainstream among WWW and smartphone developers, who have an increasing interest in richer media capabilities for creating their applications. The engineering literature proposing novel algorithms, protocols and architectures for managing and processing multimedia information is currently overwhelming. However, most of these results do not arrive to applications due to the lack of simple and usable APIs. Interestingly, in this context in which APIs are the critical ingredient for reaching wide developer audiences, the scientific literature about multimedia APIs and their usability is scarce. In this paper we try to contribute to fill this gap by proposing the RTC Media API: a novel type of API designed with the aim of making simple for developers the use of latest trends in RTC multimedia including WebRTC, Video Content Analysis or Augmented Reality. We provide a specification of such API and discuss how it satisfies a set of design requirements including programming-language agnosticism, adaptation to cloud environments, support to multisensory multimedia, etc. After that, we describe an implementation of such an API that has been created in the context of the Kurento open source software project, and present a study evaluating the API usability performed in a group of more than 40 professional developers distributed worldwide. In the light of the obtained results, we conclude that the usability of the API is adequate across the main development activities (i.e. API learning, code creation and code maintenance), with an average usability score of 3.39 over 5 in a Likert scale, and that this result is robust with respect to developers’ profiles, cultures, professional experiences and preferred programming languages.

Application of Quality of Experience in Networked Services: Review, Trend & Perspectives

Article 17 October 2018

Personalized Web-Based Contents, Generated Through a Cross-Media Environment, as Additional Information to Documentary Videos

Multimedia Service Delivery on Next-Generation Platforms: A Survey on the Need for Control, Adaptation and Inter-Mediation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Analysts such as Marc Andreessen claim that “software is eating the world” stressing the importance of software-centered models into the economy and the transition of traditional business to software-based organizations [7]. This trend is permeating into all areas of IT (Information Technologies) including also multimedia industries. In the last few years, we have witnessed how multimedia technologies have been evolving toward software-centered paradigms embracing cloud concepts through different types of XaaS (Everything as a Service) models [17].

More recently, another turn of the screw is taking place thanks to the emergence and popularization of APIs (Application Programming Interfaces). This is perfectly summarized by Steven Willmott with his claim “software is eating the world and APIs are eating software” [81]. Software developers worldwide are getting used to create their applications as a composition of capabilities exposed through different APIs. These APIs are typically accessible through SDKs (Software Development Kits) and expose in an abstract way all kind of capabilities including device hardware, owned resources and remote third party infrastructures. This model, applied to cloud concepts, is quite convenient for individual developers and small companies, which have now the opportunity of competing with large market stakeholders without requiring huge effort investments and without needing to acquire hardware infrastructure or software licenses. Thanks to this, in the last few years, we are experiencing an explosion of innovation with thousands of new applications and services both for WWW and smartphone platforms that are being catalyzed by the rich and wide ecosystems of APIs made available to developers.

This trend towards the “APIfication” is also invading the multimedia arena and, very particularly, the RTC (Real-Time multimedia Communications) area. Initiatives such as WebRTC [44] are bringing audiovisual RTC in a standard and universal way to WWW users. The main difference between WebRTC and other popular video-conferencing applications is that WebRTC is not a service, but a set of APIs enabling WWW developers to create their customized applications using standard WWW development techniques.

WebRTC belongs to the HTML5 ecosystem and has awakened significant interest among the most important Internet and telecommunication companies. As opposed to other previous proprietary WWW multimedia technologies, it has been conceived to be open in a broad sense, both by being based on open standards and by providing open source software implementations. Currently, a huge standardization effort on WebRTC protocols is taking place at different IETF working groups (WGs), being the RTCWeb WG the most remarkable one [41]. In turn, WebRTC APIs are being defined and consolidated at the W3C WebRTC WG [82]. WebRTC standards are still under maturation stage and they might take some time to consolidate. In spite of this, most of the major browsers in the market already support WebRTC and it is currently available in billions of devices providing interoperable multimedia communications.

Hence, WebRTC is an opportunity for the creation of a next generation of disruptive and innovative multimedia services catalyzed worldwide through those emerging APIs. However, to reach this goal, the WebRTC ecosystem needs to evolve further. Basing on WebRTC browser capabilities, services can only provide peer-to-peer communications, which restrict use-cases to simple person-to-person calls involving few users. In order to enhance this model, server side infrastructures need to be involved. This is not new: as it is well known, the traditional WWW architecture is based on a three tier model [31] involving an application server layer and a service layer, this latter typically reserved to databases. In the same way, rich media applications also base on an equivalent three tier model where the service layer provides advanced media capabilities. The media component in charge of providing such capabilities is typically called media server in the jargon.

There is not a formal definition of what a media server is and different authors use the term with different meanings. In this paper, we understand that a media server is just the server side of a client-server architected media system. We concentrate our attention on RTC media servers, which are specialized in RTC media problems. Commonly, RTC media server capabilities consist on the following [50]:

Group communication capabilities: These include mixing and forwarding. This type of media servers is called MCU (Multipoint Control Unit) [69] following the H.323 terminology and usually takes the form of Mixing Mixers or Selective Forwarding Units (SFU) [80].
Media archiving capabilities: These are related to the recording of the audiovisual streams into structured or unstructured repositories and the ability to recover them later for visualization.
Media bridging capabilities: This refers to attaining interoperability among networks or domains having incompatible media formats or protocols. Transcoders and IMS (IP Multimedia Subsystem) Gateways [34] are among the most popular in this area.

Media servers are a critical ingredient for transforming WebRTC into the next wave of multimedia communications, and the availability of mature solutions exposing simple to use yet powerful APIs is a necessary requirement in that area. However, most standardization and implementation efforts are still concentrated at the client side, and server side technologies are still quite fragmented. Although there are a relevant number of WebRTC media servers available they do not provide coherent APIs compatible with WWW development models. Developing solutions with them typically requires expertise with low level protocols such as SIP [63], XMPP [64] or MGCP [6], on which average WWW developers do not have any experience. In addition to this, most state-of-the-art WebRTC media servers just provide the three basic capabilities specified above and are extremely hard to extend with further features. However, nowadays, many RTC services involve person-to-machine and machine-to-machine communication models and require richer multimedia processing capabilities such as computer vision, augmented reality, speech analysis and synthesis, etc.

In this paper we propose an evolution on current state-of-the-art RTC media servers presenting a new type of RTC API for media server control, which has been designed for usability. This API addresses many current state-of-the-art limitations, such as the ones described above, and is aligned with WWW development principles, architectures and methodologies. The main contributions of this paper are threefold. First, we introduce the main concepts of the above mentioned API. Second, we present how developers may leverage it and create applications providing transparent interoperability among heterogeneous formats and protocols through a modular and extensible architecture. Third, we present an evaluation of the proposed API usability based on the Cognitive Dimensions of Notations (CDs) [35], which is a lightweight framework created for describing and analyzing the usability of notational systems, such as user interfaces, programming languages and APIs.

The remainder of this paper is as follows. Section 2 summarizes some approaches of RTC media servers and APIs available in the literature. Section 3 presents the proposed RTC Media API and illustrates how to create applications with it. Section 4 describes a survey in which our API is evaluated by means of a research questionnaire following the CDs framework. The last section concludes this research with discussion, contributions of the study and suggestions for further work.

2 Related work

2.1 RTC media server control APIs

Media server technologies emerged in 90’s catalyzed by the popularization of digital video services. Initial media servers were specialized into specific functions such as streaming [48], transcoding [71] and RTC for audio and video conferencing [7]. In this paper we concentrate on this latter category.

The popularization of video and audio conferencing made RTC media servers to evolve through different types of standards, which include H.323 [73], where the media server role is played by elements such as the MCU (Multipoint Control Unit) and the IMS (IP Multimedia Subsystem), where media servers are generically called MRF (Media Resource Function) [46]. These standards were conceived by operators and corporate communications solution vendors, who concentrated on the specificities of their infrastructures and not on the needs of developers. As a consequence, the involved media control interfaces were designed based on low level protocols and not on high level friendly APIs. Among such protocols we can find the IETF MGCP [6], which evolved later into the 3GPP H.248 [72] recommendation. These are based on binary formats, which are hard to understand, implement, debug and extend. Probably due to this, these protocols did not have much impact out of telecommunication providers.

More recently, the commoditization of RTC media server technologies brought increasing interest on more flexible mechanisms for media control. Several IETF WG emerged with the objective of democratizing them among common developers. As a result, further protocols such as MSCML [75], MSML [65] emerged providing the ability of controlling media server resources through technologies understandable and familiar to average developers such as XML [16].

Although these protocols are simpler to understand and integrate, developing application on top of them is still a cumbersome, complex and error prone process. Due to this, many stakeholders noticed that the natural tools used by developers are not protocols but APIs and SDKs. Hence, a number of initiatives emerged trying to transform the protocol-based development methodology into an API-based development experience providing seamless media server control through interfaces adapted to programming languages specificities and not to infrastructure characteristics. In particular, the Java platform was one of the first on integrating this philosophy by trying to reproduce the WWW development experience and methodology for the creation of RTC media enabled applications. A relevant activity in this area is JAIN (Java API for Integrated Networks), which issued several APIs for the signaling, control and orchestration of media capabilities. These include the JAIN SIP API [53], the JAIN SLEE API [29] and the JAIN MEGACO API [9]; this latter being specifically devoted to control media servers through the H.248 protocol. JAIN APIs did not permeated much out of operators, but their ideas inspired more popular developments such as the SIP Servlet APIs [47] for the signaling plane, and the Media Server Control API (aka JSR 309) [27] for the media plane, which have been more widely used for the development of RTC solutions for voice and video.

Among all these APIs, this paper is especially interested in the JSR 309. JSR 309 concepts were quite revolutionary at the moment because the API tried to fully abstract the low level media server control protocols and media format details. The objective was to enable developers to concentrate on application logic. JSR 309 defined both a programming model and an object model for media server control through a northbound interface, but independent of media server control protocols and hence, without requiring any specific southbound protocol driver. JSR 309 does not make any kind of assumption in relation to the signaling protocol or to the call flow, which are left to the application logic.

From a developer’s perspective, probably the most innovative concept of JSR 309 was the introduction of a mechanism for defining the media processing logic in terms of a topology. This mechanism is based on an interface called Joinable. In JSR 309, all objects having the ability to manipulate media (e.g. send, receive, process, archive, etc.) implement such interface, which has a join method enabling interconnecting such objects following arbitrary dynamic topologies. Hence, a specific media processing logic can be implemented by developers just joining the appropriate objects. As an example, if you want to create an application mixing two RTP (Real-time Transport Protocol) streams and recording the resulting composite into a file, you just need to join the appropriate objects with the appropriate topology. Taking into consideration that in JSR 309 the NetworkConnection is the class of objects capable of receiving RTP streams, that MediaMixer is the class of objects with mixing capability and that MediaGroup is the class with be ability of recording; the above mentioned media topology can be achieved just by joining two NetworkConnection instances to a MediaMixer instance, which in turn, is joined with a recording MediaGroup. This approach makes possible for developers to conceive their media processing logic as graphs of “black-box” joinables, which is a quite modular and intuitive mechanism for working in abstract terms with the complex concepts involved in RTC multimedia applications.

Another relevant innovation of JSR 309 is the introduction of media events. Thanks to this mechanism, the media processing logic held by a media server can fire events to applications through a publish/subscribe mechanism. This is very convenient for enabling applications to become media-aware meaning that complex processing algorithms at the media server can provide asynchronous information dealing with things happening inside the media, for instance DTMF (Dual-Tone Multi-Frequency) tones being detected, voice activity being present, and so on.

JSR 309 permeated into mainstream developer audiences as a suitable API for media server control following the typical three tier model [27]. However, in the last few years, the emergence of novel technologies and computation paradigms have made JSR 309 to show relevant limitations. For example, nowadays group videoconferencing services are evolving from Media Mixing models, which require relevant media processing, towards SFU (Selective Forwarding Unit) models, which are based on media routing [80]. JSR 309 is heavily adapted to Media Mixing and, due to this, most of its APIs assume that participants send/receive only one media stream to/from the media server. As a consequence, SFU models do not fit nicely into JSR 309 APIs. This is particularly a problem when all the streams of a group videoconference are multiplexed into a single RTP session, as happens typically on modern WebRTC SFU media servers supporting bundle RTP [43], because JSR 309 APIs do not provide any kind of mechanism for demultiplexing streams from a NetworkConnection. Moreover, in JSR 309 the API specification explicitly forbids several input NetworkConnections to be joined to a single output NetworkConnection, as an SFU router would require. Instead, they need to be joined first to a MediaMixer, which, in turn, can be joined to the output NetworkConnection.

When looking to other modern RTC technologies, we notice again that the JSR 309 design has limitations. For example, if we consider WebRTC W3C APIs [82], we may understand that they split endpoint capabilities into different functional blocks each of which is exposed through an abstract interface (e.g. RtpSender, RtpReceiver, PeerConnection, etc.) However, if we want to expose WebRTC media server capabilities through JSR 309 we need to accept that endpoints can only be represented through the NetworkConnection interface, which is extremely limited to support rich WebRTC capabilities such as DataChannels [11], Trickle ICE [42], simulcast [79], etc.

JSR 309 shows also drawbacks in relation to its extensibility. In JSR 309 it is possible to support new media object types using MediaGroups, however, configuration of these new types have to be done with Media Server specific descriptions as strings, which cannot be validated by the compiler. It is important to note that these new media object types cannot be NetworkConnection, only MediaGroups. This is a hard limitation because no other network protocol different than RTP (negotiated through SDP) can be incorporated. The ideal would be to allow supporting the creation of new object types in a similar way than the core types, with factory methods in MediaSession (e.g. createNetworkConnection, createMediaGroup, etc.), but this is not possible as MediaSession is an interface defined in JSR 309 API and hence it cannot be modified by the API user.

Further limitations about JSR 309 include:

A counter-intuitive asynchronous development model based on an obscure joinInitiate primitive, which is incompatible with modern Java mechanism for managing asynchrony such as futures, continuations or lambdas. This lack of clean asynchronous programming model makes JSR 309 difficult to adapt to reactive programming frameworks and languages that are very demanded by developers today such as Node.js or Scala.
A complete lack of mechanisms for monitoring and gathering quality stats on media sessions. This is an essential ingredient for production systems.
JSR 309 is designed specifically for the Java language. It would be desirable a portable API that can be used in as more languages as possible.
This API is specifically designed to control Media Servers for phone communications because it exposes concepts like Dialogs (Prompt and record, DTMF, VoiceXML dialog, etc.). For example, it is mandatory for an implementation to provide a player with the capability to detect audio signals in DTMF, but this kind of functionality is not very useful in web applications.

2.2 Foundations of API evaluation and characterization

APIs are critical, non-optional and cross-cutting in the construction of modern software systems [39]. Programming is a hard mental work and developers need to deal with large amounts of information for writing satisfactory code. In that duty, APIs are the most critical ingredient, especially when dealing with distributed systems and enterprise frameworks. For example, recent works [40] show that API misuse is the single most prevalent cause of software defects.

Designing APIs consists on conceiving abstractions through types and interfaces so that they can be consumed seamlessly, efficiently and safely by application developers. This is quite a complex topic for which very little is known and which requires interdisciplinary knowledge combining cognitive psychology and software engineering. However, the responsibility of API design is typically assigned to development team members who often do not have expertise or training in this area and who, typically, are more concerned with implementation details than with usability.

In spite of the well-known importance of APIs, API design and evaluation has not been a mainstream research topic and only recently some light has been shed on this area. Early attempts to investigating APIs typically followed unstructured and ad-hoc approaches concentrating on the specificities of given technologies. For example, works have been published with guidelines and recommendations for API design in C# [78], Java [15] or C++ [60] and for ad-hoc evaluation of new programming languages [18].

Using another perspective, some authors concentrated on specific problems transversal to all APIs with independence on their underlying technologies. Some remarkable efforts on this area enabled to understand that, for instance, the factory pattern tends to generate usability problems [26] and that there is a systematic set of questions that developers have when learning new APIs [24]. All these efforts are relevant due to the talent of their authors to detect and isolate common patterns and practices, but they do not make possible to build a consistent and reusable methodology for the area.

During the last decades further authors have tried to systematize the problem of API design and usability evaluation from a holistic perspective. Different approaches have been created for this [2, 12, 21, 28]. However, in this area, the one which has gained highest popularity is the Cognitive Dimensions of Notations (CDs) framework [13, 36]. CDs is a framework for describing the usability of notational systems. In this context, a notational system typically consists of a collection of symbols made on some medium and which define a behavior (i.e. meaning) through some kind of structured interactions. Examples of notational systems include English text on paper, buttons on a WWW GUI or programing with API calls on an IDE. CDs allow designers of notational systems to evaluate their designs with respect to the impact they have on the users of those designs.

The CDs framework is not an analytic method. Rather, it is a set of discussion tools for use by designers and people evaluating designs whose main aim is to improve the quality of discussion. CDs emerged because, at the end of the day, API design is more of an engineering craft than a scientific discipline. It is subject to elements of affect, of fashion and of social acceptance, in addition to technical considerations. For these reasons, we can learn from studies of other design disciplines where the same craft elements apply. For example, a study comparing knitwear designers and helicopter designers [25] observed that designers’ communities develop their own vocabulary for design criteria that is created through practice and tradition. The CDs framework aims to provide the same kind of vocabulary for API designers.

As a result, the CDs main objective is to enable API designers to reason consistently in relation to how well an API supports the intended activities of its users. Simply stated, CDs make possible to discuss in a coherent way about the extent to which an API supports application developers at the time of performing typical activities such as API learning and understanding, application design and creation, application maintenance and evolution, etc. For this, the framework considers a set of dimensions each of which describes an aspect of the API usability. These dimensions constitute a vocabulary of terms that can be used to characterize cognitive artifacts and which makes possible to establish comparisons and to discuss and investigate about the implications of design decisions on those artifacts. It is important to remark that these dimensions are not good or bad in themselves but that they simply describe properties of the system with respect to developers’ activities.

In the context of API evaluation, the CDs framework is a powerful tool because it allows to compare users’ expectations and designer’s views of the APIs with what the system actually provides. For example, early users of the HTML notation probably expected to be able to modify their pages headings easily, whereas the language required one action per heading for doing so. This signaled an imperfection in HTML API’s usability which probably drove to the introduction of CSS (Cascading Style Sheets). In CDs terms such resistance to changes is characterized through a dimension called viscosity. Hence, in this case, we would say that CSS decreased the viscosity of HTML APIs. A comprehensive description of CDs dimensions can be found in Blackwell et al [13]. For the sake of completeness, here we introduce a brief (and incomplete) description of the 13 main dimensions of the CDs framework:

Viscosity: resistance to change.

A viscous system needs many user actions to accomplish one goal. Changing all headings to upper-case may need one action per heading. (Environments containing suitable abstractions can reduce viscosity.) We distinguish repetition viscosity, many actions of the same type, from knock-on viscosity, where further actions are required to restore consistency.
Visibility: ability to view components easily.

Systems that bury information in encapsulations reduce visibility. Since examples are important for problem-solving, such systems are to be deprecated for exploratory activities; likewise, if consistency of transcription is to be maintained, high visibility may be needed.
Premature commitment: constraints on the order of doing things.

Self-explanatory. Examples: being forced to declare identifiers too soon; choosing a search path down a decision tree; having to select your cutlery before you choose your food.
Hidden dependencies: important links between entities are not visible.

If one entity cites another entity, which in turn cites a third, changing the value of the third entity may have unexpected repercussions. Examples: cells of spreadsheets; style definitions in Word; complex class hierarchies; HTML links. There are sometimes actions that cause dependencies to get frozen, e.g. soft figure numbering can be frozen when changing platforms; these interactions with changes over time are still problematic in the framework.
Role-expressiveness: the purpose of an entity is readily inferred.

Role-expressive notations make it easy to discover why the author has built the structure in a particular way; in other notations each entity looks much the same and discovering their relationships is difficult. Assessing role-expressiveness requires a reasonable conjecture about cognitive representations.
Error-proneness: the notation invites mistakes and the system gives little protection.

Enough is known about the cognitive psychology of slips and errors to predict that certain notations will invite them. Prevention (e.g. check digits, declarations of identifiers, etc) can redeem the problem.
Abstraction: types and availability of abstraction mechanisms.

Abstractions (redefinitions) change the underlying notation. Macros, data structures, global find-and-replace commands, quick-dial telephone codes, and word-processor styles are all abstractions. Some are persistent, some are transient. Abstractions, if the user is allowed to modify them, always require an abstraction manager (i.e. a redefinition sub-device). It will sometimes have its own notation and environment (e.g. the Word style sheet manager) but not always (for example, a class hierarchy can be built in a conventional text editor). Systems that allow many abstractions are potentially difficult to learn.
Closeness of mapping: closeness of representation to domain.

How closely related is the notation to the result it is describing?
Consistency: similar semantics are expressed in similar syntactic forms.

Users often infer the structure of information artifacts from patterns in notation. If similar information is obscured by presenting it in different ways, usability is compromised.
Diffuseness: verbosity of language.

Some notations can be annoyingly long-winded, or occupy too much valuable “real-estate” within a display area. Big icons and long words reduce the available working area.
Hard mental operations: high demand on cognitive resources.

A notation can make things complex or difficult to work out in your head, by making inordinate demands on working memory, or requiring deeply nested goal structures.
Provisionality: degree of commitment to actions or marks.

Even if there are hard constraints on the order of doing things (premature commitment), it can be useful to make provisional actions such as recording potential design options, sketching, or playing “what-if” games. Not all notational systems allow users to fool around or make sketchy markings.
Progressive evaluation: work-to-date can be checked at any time.

Evaluation is an important part of a design process, and notational systems can facilitate evaluation by allowing users to stop in the middle to check work so far, find out how much progress has been made, or check what stage in the work they are up to. A major advantage of interpreted programming environments such as BASIC is that users can try out partially- completed versions of the product program, perhaps leaving type information or declarations incomplete.

The CDs framework has been criticized due to its theoretical and practical limitations. For example, Moody et al [51] claim that CDs do not provide a scientific basis because of several reasons:

The dimensions are vaguely defined often leading to misinterpretation in applying them.
The theoretical and empirical foundations of the dimensions are poorly defined.
The dimensions lack clear operationalization (i.e. evaluation procedures and metrics), which mean they can be only applied in a subjective manner.
It does not support evaluation, as the dimensions simply define properties of notations and are not meant to be either “good” or “bad”.
It does not support design: the dimensions are not design guidelines and issues of effectiveness are excluded from its scope.
Its level of generality precludes specific predictions meaning that it is unfalsifiable and, hence, it cannot be considered to provide a scientific basis for evaluating anything.

In spite of these criticisms, most authors accept that, although CDs need further evolutions and improvements, they are today the most suitable tool for performing comparative API evaluation and that their methodological principles allow analyzing real-world development problems on controlled lab studies in quite an efficient and lightweight manner [36]. The advantages and main motivations why many authors prefer the CDs over other usability techniques include the following:

They offer a comprehensive, broad-brush evaluation mechanism which does not suffer the ‘death by details’ symptom of other techniques.
They offer a set of discussion tools and a common vocabulary helpful for evaluating designs.
They are based on terms that are comprehensible by non-specialists.
They are directly applicable, without requiring customizations or reinterpretations, to all types of notations including APIs.
Although they are not theoretically complete, they are theoretically coherent, which makes possible to analysts to generate consistent analyses.
They describe a set of necessary, though not sufficient, conditions for usability, which enable deriving usability predictions from the structural properties of a notation, the properties and resources of an environment and the type of activity.

2.3 Quantitative evaluation of API usability

CDs are used by designers for performing quantitative evaluation of API usability. The common practice for this is to use a questionnaire [14] requesting users to evaluate, through a Likert scale [4], how they experience CDs dimensions when performing their development activities. There is a broad literature illustrating how to create such reliable questionnaires [58]. When questionnaires target unsupervised and open audiences through the WWW, as it is the case on this paper, a critical aspect for attaining reasonable answer rates and acceptable accuracy is simplicity [33]. Hence, without a full and complete understanding of the questions, developers under evaluation might not be willing to provide any information on the API usability at all, or might be giving incomplete or mistaken answers.

As stated above, the CDs framework defines dimensions as a vocabulary that can be used by designers when investigating the cognitive implications of their design decisions, so that designers might be able to express any properties of their information artifacts as a composition of these basic dimensions. As an analogy, this is somehow similar to the way vector spaces work: any vector in the space can be expressed as a composition of the base vectors. From this perspective, the base CDs dimensions are designed for independence (i.e. they do not overlap) and not for clarity and simplicity. As a result, questionnaires addressing the complete set of CDs dimensions in the context of all common development activities are too complex, long and impractical for our objectives [14]. Using them might decrease the aim of the target population to provide answers as well as the overall usefulness of the resulting research.

Due to this, some authors propose an adaptation to the CDs framework based on transforming the dimensions into another base that is more meaningful for developers and that is compatible with shorter and simpler questionnaires [57]. These new dimensions are called Clarke’s dimensions and concentrate on 5 specific aspects of API usability: understandability, abstraction, expressiveness, reusability and learnability. All these high-level dimensions can be expressed in terms of the original CDs dimensions, but their operationalization for quantitative research is more practical due to a number of reasons. First, Clarke’s dimensions illustrate the user’s perspective (i.e. developer) and not the designer’s one, as happens with the plain CDs dimensions. As a result, they allow to optimize the questionnaire for API users more than for API designers. Second, Clarke’s dimensions are simpler to understand as they are fewer and as they refer to intuitive and positive usability properties (i.e. the highest the evaluation of each dimension the better the API usability). This is in opposition to CD dimensions that are not associated to a specific notion of goodness. Thanks to this, we are able to scale down the number of questions and to state them in a simpler and more straightforward way. Third, each of Clarke’s dimensions refers to a specific developers’ activity, which further simplifies the questionnaire and enables a more direct analysis of the results. For illustration, these activities include exploratory learning (i.e. learning how to use the API for creating applications), exploratory design (i.e. the process of using the APIs for designing and creating applications) and maintenance (i.e. corrective modifications, evolutionary modifications, etc.) Clearly, understandability and learnability are applicable to exploratory learning activities, abstraction and expressiveness to exploratory design activities and maintainability to maintenance activities. Let’s explain in details the meaning and value of each of Clarke’s high-level dimensions (see Table 1 for further details).

Table 1 This table shows the relation of Clarke’s dimensions of API usability with CDs dimensions and illustrates the meaning of each of these dimensions for developers. As it can be seen, Clarke’s dimensions are, in all cases, more intuitive and simpler to understand than the original CDs dimensions

Designing and evaluating the usability of an API for real-time multimedia services in the Internet

Abstract

Similar content being viewed by others

Application of Quality of Experience in Networked Services: Review, Trend & Perspectives

Personalized Web-Based Contents, Generated Through a Cross-Media Environment, as Additional Information to Documentary Videos

Multimedia Service Delivery on Next-Generation Platforms: A Survey on the Need for Control, Adaptation and Inter-Mediation

Explore related subjects

1 Introduction

2 Related work

2.1 RTC media server control APIs

2.2 Foundations of API evaluation and characterization

2.3 Quantitative evaluation of API usability

2.4 Contributions of this paper: the RTC Media API requirements

3 Description of the proposed API: the RTC Media API

3.1 API specification

3.1.1 MediaObjects: MediaElements and MediaPipelines

3.1.2 RTC Media API IDL specification

3.1.3 Compiling the RTC Media API IDL

3.1.4 Creation and deletion of media capabilities

3.1.5 Synchronous and asynchronous programming models in the RTC Media API

3.1.6 RTC Media API capabilities

3.1.7 Extending the API

3.1.8 Implementing the RTC Media API: the Kurento Client API

3.1.9 Matching the RTC Media API requirements

3.2 Some real-world example applications

4 API evaluation

4.1 Study design: methodology and hypotheses

4.1.1 Research questionnaire

4.1.2 Participants and protocol

4.2 Results and analysis

4.2.1 Analysis of participants

4.2.2 Analysis of dimensions

4.3 Validity of the analyses

4.3.1 Construct validity

4.3.2 Internal validity

4.3.3 External validity

4.4 Discussion

5 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation