The Java Sound API is a low-level API for effecting and controlling input and output of audio media. It provides explicit control over the capabilities commonly required for audio input and output in a framework that promotes extensibility and flexibility.
Because sound is so fundamental, Java Sound fulfills the needs of a wide range of customers. Potential application areas include:
Java Sound enables applications to extend audio support with specialized capabilities, and integrates well with architectures that provide higher-level interfaces and integration with other media types.
Java Sound provides the lowest level of audio support on the Java platform. It provides a high degree of control over audio-specific functionality. For example, it provides mechanisms for installing, accessing, and manipulating system resources such as digital audio and MIDI (Musical Instrument Digital Interface) devices. It does not include sophisticated sound editors and GUI tools; rather, it provides a set of capabilities upon which such applications can be built. It emphasizes low-level control beyond that commonly expected by the end user, who benefits from higher-level interfaces built on top of Java Sound.
Note: Throughout this document, the word "application" refers generically to Java applets as well as Java applications.
The Java Sound API includes support for both digital audio and MIDI data. These two major modules of functionality are provided in separate packages:
This package specifies interfaces for capture, mixing, and playback of digital (sampled) audio.
This package provides interfaces for MIDI synthesis, sequencing, and event transport.
Two other packages permit service providers (as opposed to application developers) to create custom components that can be installed on the system:
The next section of this document discusses the sampled-audio system, including an overview of the javax.sound.sampled API. The final section covers the MIDI system and the javax.sound.midi API.
The javax.sound.sampled package handles digital audio data, also referred to as sampled audio. ("Samples" refer to successive snapshots of a signal, which in the case of digital audio is a sound wave. For example, the audio recorded for storage on compact discs is sampled 44100 times per second. Typically, sampled audio comes from a sound recording, but the sound could instead be synthetically generated. The term "sampled audio" refers to the type of data, not its origin. Sampled audio can be thought of as the sound itself, whereas MIDI data can be thought of as a recipe for creating musical sound.)
Java Sound does not assume a specific audio hardware
configuration; it is designed to allow different sorts of audio
components to be installed on a system and accessed by the API.
Java Sound supports common functionality such as input and output
from a sound card (for example, for recording and playback of sound
files) as well as mixing of multiple streams of audio. Here is one
example of a typical audio architecture for which Java Sound might
A Typical Audio Architecture
In this example, a device such as a sound card has various input and output ports, and mixing is provided in software. The MIDI synthesizer shown as one of the mixer???s audio inputs might also be a feature of the sound card, or it might be implemented in software. (The javax.sound.midi package, discussed later, supplies a Java interface for synthesizers.)
The major concepts used in the javax.sound.sampled package are described in the sections below.
A line is an element of the digital audio "pipeline," such as an audio input or output port, a mixer, or an audio data path into or out of a mixer. The audio data flowing through a line can be mono or multichannel (for example, stereo). Each type of line will be described shortly, but first some of their functional relationships will be illustrated, showing the flow of audio through the "pipeline." The following diagram shows different types of lines in a simple audio-output system:
A Possible Configuration of Lines for Audio Output
In this example, an application has asked a mixer for one or more available clips and source data lines. A clip is a mixer input into which you can load audio data prior to playback; a source data line is a mixer input that accepts a real-time stream of audio data. The application preloads audio data from a sound file into the clips, and then pushes other audio data into the source data lines. The mixer reads data from these lines, each of which may have its own reverberation, gain, and pan controls, and uses the reverb settings to mix the "dry" audio signals with the reverberated ("wet") mix. The mixer delivers its final output to one or more output ports, such as a speaker, a headphone jack, and a line-out jack.
Although the various lines are depicted as separate rectangles in the diagram, they are all "owned" by the mixer, and can be considered integral parts of the mixer. The reverb, gain, and pan rectangles represent processing controls (rather than lines) that can be applied by the mixer to data flowing through the lines. (Note that this is just one example of a possible audio system that is supported by the API. Not all audio configurations will have all the features illustrated. An individual source data line might not support panning, a mixer might not implement reverb, and so on.)
A simple audio-input system might be similar:
A Possible Configuration of Lines for Audio Input
Here, data flows in to the mixer from one or more input ports, commonly the microphone or the line-in jack. Gain and pan are applied, and the mixer delivers the captured data to an application via the mixer's target data line. A target data line is a mixer output, containing the mixture of the streamed input sounds. The simplest mixer has just one target data line, but some mixers can deliver captured data to multiple target data lines simultaneously.The different types of line will now be examined more closely. Several types of line are defined by subinterfaces of the basic
Lineinterface. The interface hierarchy is shown below.
The base interface,
Line, describes the
minimal functionality common to all lines:
Line.Info) that indicates what mixer (if any) sends its mixed audio data as output directly to the line, and what mixer (if any) gets audio data as input directly from the line. Subinterfaces of
Linemay have corresponding subclasses of
Line.Infothat provide other kinds of information specific to the particular types of line.
Closing a line indicates that any resources used by the line may now be released. To free up resources, applications should close lines whenever they are not in use, and must close all opened lines when exiting.
Mixers are assumed to be shared system resources, and can be opened and closed repeatedly. Other lines may or may not support re-opening once they have been closed. Mechanisms for opening lines vary with the different sub-types and are documented where they are defined.
LineEventclass. Two types of
CLOSE, but subinterfaces of
Linecan introduce other types of events.
CLOSEevent, the event is sent to all objects that have registered to "listen" for events on that line. Such objects must implement the
LineListenerinterface. An application can create these objects, register them to listen for line events, and react to the events as desired.
Portsare simple lines for input or output of audio to or from audio devices. The
Portinterface has an inner class,
Port.Info, that specifies the type of port. Some common types are the microphone, line input, CD-ROM drive, speaker, headphone, and line output.
Mixer interface represents a hardware or
software device that has one or more input lines and one or more
output lines. This definition means that a mixer need not actually
mix data; it might have only a single input. The
API is intended to encompass a variety of devices, but the typical
case supports mixing.
Mixer interface provides methods for obtaining
a mixer's lines. These can include target data lines, from which an
application can read captured audio data, source data lines, to
which an application can write audio data for playback (rendering),
and clips, in which an application can preload sound data for
playback. The mixer can lock these resources. For example, if the
mixer has only one target data line and it is already in use, an
attempt by an application to obtain a target data line will cause
an exception to be thrown.
You can query a mixer for lines of different types, by passing
the appropriate type of
Line.Info. You can also ask
the mixer how many lines of a particular type it supports.
A mixer maintains textual information about its specific device
type in an inner class called
information include the product's name, version, and vendor, along
with a textual description.
Notice that the generic
Line interface does not
provide a means to start and stop playback or recording. For that
you need a data line. The
supplies the following additional media-related features beyond
those of a
AudioFormat) specifies the arrangement of the bytes in the audio stream. Some of the format's properties are the number of channels, the sample rate, the sample size, and the encoding technique. Common encoding techniques include linear pulse-code modulation (PCM), mu-law encoding, and a-law encoding.
STOPevents are produced when active presentation or capture of data from or to the data line starts or stops.
An application can obtain a data line from a mixer. If the data line cannot be allocated because of resource constraints (for example, if the mixer supports only one target data line and it is already in use), an exception is thrown.
TargetDataLine receives audio data from a
mixer. Commonly, the mixer has captured audio data from a port such
as a microphone; it might process or mix this captured audio before
placing the data in the target data line's buffer. The
TargetDataLine interface provides methods for reading
the data from the target data line's buffer and for determining how
much data is currently available for reading. If an application
attempts to read more data than is available, the read method
blocks until the requested amount of data is available. This
applies even if the amount of data requested is greater than the
line's buffer size. The read method returns if the line is closed,
paused, or flushed.
Applications recording audio should read data from the target data line fast enough to avoid overflow of the buffer, which results in discontinuities in the captured data. If the buffer does overflow, the oldest queued data is discarded and replaced by new data.
SourceDataLine receives audio data for
playback. It provides methods for writing data to the source data
line's buffer for playback, and for determining how much data the
line is prepared to receive without blocking. If an application
attempts to write more data than is available, the read method
blocks until the requested amount of data can be written. This
applies even if the amount of data requested is greater than the
line's buffer size. The write method also returns if the line is
closed, paused, or flushed.
Applications playing audio should write data to the source data
line fast enough to avoid underflow (emptying) of the buffer, which
may result in discontinuities in audio playback. If audio playback
stops due to underflow, a
STOP event is generated. A
START event is generated when presentation
Clip is a data line into which audio data
can be loaded prior to playback. Because the data is pre-loaded
rather than streamed, the clip???s duration is known before playback,
and you can choose any starting position in the media. Clips can be
looped, meaning that upon playback, all the data between two
specified loop points will repeat a specified number of times, or
GroupLineis a synchronized group of data lines. If a mixer supports group lines, you can specify which data lines should be treated as a group. Then you can start, stop, or close all those data lines by sending a single message to the group, instead of having to control each line individually.
Data lines and ports often have a set of controls that affect
the audio signal passing through the line. The way in which the
signal is affected depends on the type of control. The Java Sound
API defines the following subclasses of
GainControlobject can be queried for its resolution and for the minimum and maximum gain values it permits. The resolution is expressed as the number of increments over which the range of possible values is distributed.
GainControl, the change can be made gradually instead of immediately, and the control can be queried for its resolution and its minimum and maximum possible values.
Programmatically, you obtain a particular control object from a line through a reference to the control???s class. You can also obtain an array of all the controls for that line.
AudioSystemclass serves as an application's entry point for accessing the installed sampled-audio resources. You can query the AudioSystem to learn what sorts of audio components have been installed, and then you can obtain access to them. For example, an application might start out by asking the
AudioSystemclass whether there is a mixer that has a certain configuration, such as one of the input or output configurations illustrated earlier in the discussion of lines. From the mixer, the application would then obtain data lines, and so on.
Here are some of the resources an application can obtain from
Mixer.Infoobject. You can learn what mixers are available by invoking the
getMixerInfomethod, which returns an array of
AudioSystem, without dealing explicitly with mixers.
An application can use format conversions to translate audio
data from one format to another. (See the discussion of AudioFormat above.) Format conversions
are often used to compress and decompress audio data. An
application can query the
AudioSystem to learn what
translations are supported. It can then pass the
AudioSystem a stream of audio data and get back a
translated stream in a particular format.
An audio stream is an input stream with a specified audio data
AudioFormat) and data length. The
AudioInputStream class represents such a stream, from
which you can read bytes. Some audio input streams permit you to
remember positions in the stream and skip around in it. The
AudioSystem class provides methods for translating
between audio files and audio streams. The
can also report the file format of a sound file and can write files
in the different formats. A file format is represented by the
AudioFileFormat class. An
AudioFormat, the file's length, and its
type (WAV, AIFF, AU, etc.).
Service provider interfaces for the sampled audio system are
defined in the
Service providers can extend the classes defined here so that their
own audio devices, sound file parsers and writers, and format
converters can be installed and made available by a Java Sound
Interfaces describing MIDI event transport, synthesis, and
sequencing are defined in the
package. The major concepts used in the package are described in
the sections below.
The diagram below illustrates the functional relationships
between the major components in a typical Java Sound MIDI
configuration. (Java Sound permits a variety of devices to be
installed and interconnected. The system shown here is just one
possible scenario.) The flow of data between components is
indicated by arrows. The data can be in a standard file format, or
(as indicated by the key in the lower right corner of the diagram),
it can be audio, raw MIDI bytes, or Java Sound's
A Typical MIDI Configuration
In this example, the application prepares a musical performance
by loading a musical score that is stored as a Standard MIDI File
on a disk (lower left corner of the diagram). Standard MIDI files
contain tracks, each of which is a list of time-tagged MIDI events.
This MIDI file is read into a
Sequence object, whose
data structure reflects the file. A
a set of
Track objects, each of which contains a set
MidiEvent objects. The
then "performed" by a
Sequencer performs its music by sending
MidiEvents to some other device, such as an internal
or external synthesizer.
MidiEvents must be translated into
raw (non-time-tagged) MIDI before being sent through a MIDI output
port to an external synthesizer. This conversion is accomplished by
a a MIDI-output device called a
Similarly, raw MIDI data coming into the computer from an external
MIDI source is translated into
MidiEvents by a
The internal synthesizer (the rectangle marked "Synthesizer" in
the diagram) accepts
MidiEvents directly from the
StreamParser. It parses each
event and usually dispatches a corresponding command (such as
noteOn) to one of its
objects, according to the MIDI channel number specified in the
event. (The MIDI Specification calls for 16 MIDI channels, so a
Synthesizer typically has 16
MidiChannel uses the note information in these
messages to synthesize music. For example, a
message specifies the note's pitch and "velocity" (volume).
However, the note information is insufficient; the synthesizer also
requires precise instructions on how to create the audio signal for
each note. These instructions are represented by an
emulates a different real-world musical instrument or sound effect.
Instruments might come as presets with the
synthesizer, or they might be loaded from soundbank files. In the
Instruments are arranged by bank
number (the rows in the diagram) and program number (the columns).
Instrument can make use of stored digital audio,
Sample objects in the soundbank. For
example, to play the sound of a trumpet playing a 5-second-long
note, the synthesizer might loop (cycle) through a half-second
snippet of a recording of a trumpet.
Now that the components have been introduced from a functional perspective, we will take a brief look at the API from a programmatic perspective.
MidiEvent object specifies the type, data length,
and status byte of the raw MIDI message for which it serves as a
wrapper. In addition, it provides a tick value that is used by
devices involved in MIDI timing, such as sequencers.
There are three categories of events, each represented by a
ShortEventsare the most common and have at most two data bytes following the status byte.
SysexEventscontain system-exclusive MIDI messages. They may have many bytes, and generally contain manufacturer-specific instructions.
MetaEventsoccur in MIDI files, but not in raw MIDI data streams. Meta events contain data, such as lyrics or tempo settings, that might be useful to sequencers but are usually meaningless for synthesizers.
The base interface for devices is
devices provide methods for listing the set of MIDI modes that they
support, and for querying and setting the current mode. (The mode
is a combination of MIDI's Omni mode and Mono/Poly mode.) Devices
can be opened and closed, and they provide descriptions of
themselves through a
The following diagram illustrates the
interface hierarchy. Also depicted are two classes, connected by
dashed lines to the
MidiDevice interfaces they
Devices are generally either transmitters or receivers of
MidiDevice includes methods for setting and
querying the receivers to which the transmitter is sending
MidiEvents. From the perspective of a transmitter,
these receivers fall into two categories: MIDI Out and MIDI Thru.
The transmitter sends events that it generates itself to its MIDI
Out receivers. If the transmitter is itself also a receiver, it
passes along events that it has received from elsewhere to its MIDI
Thru receivers. The
Receiver subinterface of
MidiDevice consists of a single method for receiving
MidiEvents. Typically, this method is invoked by a
Java Sound includes concrete classes for converting between
MidiEvent objects and the raw byte stream used in MIDI
wire protocol. A
StreamGenerator is a
Receiver that accepts
Transmitter and writes out a raw MIDI byte
stream. Similarly, a
StreamParser is a
Transmitter that accepts a raw MIDI byte stream and
writes the corresponding
MidiEvent objects to its
Synthesizer is a type of
that generates sound. The
provides methods for manipulating soundbanks and instruments. In
addition, it provides access to a set of MIDI channels through
which sound is actually produced. A
MidiEvents and invokes corresponding
MidiChannels have methods representing the common
MIDI voice messages such as "note on" and "control change." They
also permit queries of the channel's current state.
interface extends both
Receiver (and therefore
Sequencer adds methods for basic MIDI sequencing
operations. A sequencer can load and play back a sequence, query
and set the tempo, and control the master and slave sync modes. An
application can register to be notified when the sequencer
MetaEvents and controller events. (A
controller event occurs when a MIDI controller, such as a
pitch-bend wheel or a data slider, changes its value. These events
MidiEvents, but are created when the
Sequencer encounters certain
Sequence object represents a MIDI sequence as
one or more tracks and associated timing information. A track
contains a list of time-stamped MIDI events. Sequences can be read
from MIDI files, or created from scratch and edited by adding
Tracks to the
Sequence (or removing
MidiEvents can be added to or
removed from the
Sequence(or removing them). Similarly,
MidiEventscan be added to or removed from the
It is not necessary to load a MIDI file into a
Sequence object before playing the file. The
setSequence(java.io.InputStream) method of
Sequencer lets you read a MIDI file directly into a
Sequencer, without creating a
MidiSystem acts as an application's entry point to
the MIDI music system. It provides information about, and access
to, the set of installed devices, including transmitters,
receivers, synthesizers, and sequencers.
MidiSystem class provides methods for reading
MIDI files to create
Sequence objects, and for writing
Sequences to MIDI files. A MIDI Type 0 file contains
only one track, while a Type 1 file may contain any number.
MidiSystem also provides methods to create
Soundbank objects by parsing soundbank files.
Configuration of the MIDI system is handled in the
javax.sound.midi.spi package. The abstract classes in
this package allow service providers to supply and install their
own MIDI devices, MIDI file readers and writers, and soundbank file