Sampled Sound as a Data Structure
In Java sampled sounds are stored as a series of bytes. A byte is a grouped series of 8 bits, where a bit is a binary number (either 1 or 0). This series is also stored as an array, where an array represents a series of values at specific indexes denoted by brackets. For example if array a[] contains all numbers 1 through 3 starting with 1, a[0] = 1, a[1] = 2, and a[2] = 3. The AudioFormat object in Java encompasses all relevant data about a sound such as encoding technique, number of channels, sample rate, number of bits per channel, frame rate, frame size in bytes, and byte-order. The only encoding technique this paper is concerned with is pulse code modulation (PCM). The number of channels denotes whether a sampled sound is mono or stereo and the number of channels involved. The sample rate is the number of samples per second per channel. The number of bits per channel denotes how many binary numbers are stored per channel. The frame rate in PCM is equal to the sample rate, and the frame size in bytes is the number of channels multiplied by the sample size in bits, divided by the number of bits in a byte. The byte-order is the order in which bytes are stored.1
Byte-order can be Big Endian or Little Endian. Little Endian means that the low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address.2 For example the Little Endian 4-byte integer Byte3 Byte2 Byte1 Byte0 would be arranged in memory as Byte0 Byte1 Byte2 Byte3. Big Endian means that the high-order byte of the number is stored in memory at the lowest address, and the low-order byte at the highest address.3 For example the Big Endian 4-byte integer Byte3 Byte2 Byte1 Byte0 would be arranged in memory as Byte3 Byte2 Byte1 Byte0.
Basics of Java Sound Manipulation
/* This code is based on the example found at: http://www.jsresources.org/examples/ClipPlayer.java.html */
//Load the sound file File soundFile = new File("sin300.wav"); //Object that will hold the audio input stream for the sound Object currentSound; //Object that will play the audio input stream of the sound Clip clip; //Object that contains format information about the audio input stream AudioFormat format; //Load the audio input stream try { currentSound = AudioSystem.getAudioInputStream(soundFile); } catch(Exception e1) { System.out.println(“Error loading file”); } try { //Get the format information from the Audio Input Stream AudioInputStream stream = (AudioInputStream) currentSound; format = stream.getFormat(); //Get information about the line DataLine.Info info = new DataLine.Info(Clip.class, stream.getFormat(), ((int) stream.getFrameLength() * format.getFrameSize())); //Load clip information from the line information clip = (Clip) AudioSystem.getLine(info); //Write the stream onto the clip clip.open(stream); //make the current sound the clip currentSound = clip; //start the clip clip.start(); //loop the clip continuously clip.loop(Clip.LOOP_CONTINUOUSLY); } catch (Exception ex) { ex.printStackTrace(); currentSound = null; }
4
This code plays a given sound file over and over again by looping through the byte array of audio data that can be obtained through the AudioFormat object, but this code by itself is of little use since no operations are performed on the sampled sound. In order to perform operations on the sampled sound the FloatControl object has to be used. The FloatControl object provides control over a range of floating-point values,5 which are the types of data found in a sampled sound. The most straightforward method for frequency manipulation of a sampled sound is changing the sample rate. The sample rate is the rate at which periodic measurements are taken of instantaneous amplitudes of an electric signal.6 Doubling the sample rate of a sampled sound causes the frequency to double and the duration to be halved. Halving the sample rate of a sampled sound causes the frequency to be halved and the duration to double. Manipulating the sample rate using the FloatControl object can be done in the following way:
//Get the sampling rate from the AudioFormat object in the code above and double it float newRate = format.getSampleRate() * 2; //Create the FloatControl object with the clip object from the code above FloatControl rateControl = (FloatControl) clip.getControl(FloatControl.Type.SAMPLE_RATE); //change the sampling rate rateControl.setValue(newRate);
7
If the Clip and AudioFormat objects are made global, this code can be called as a thread while the sampled sound is playing to change its frequency in real-time. The other sampled sound manipulation needed is the ability to change sound level. Sound level manipulation can also be done through using the FloatControl object:
//specify a new sound level from 0 to 100 double gain = 80.0; //create the gain control object using the clip object from the code above FloatControl control = (FloatControl) clip.getControl(FloatControl.Type.MASTER_GAIN); //convert the given sound level to decibel gain = gain / 100; float dB = (float) (Math.log(gain == 0.0 ? 0.0001 :gain) / Math.log(10.0)*20.0); //change the sound level control.setValue(dB);
8
Sample Rate Manipulation Issues
Changing the frequency of a sampled sound by modifying the sample rate can cause some potential problems though, especially when dealing with sounds that have complex harmonic spectrums. A single sound such as that of musical instrument contains a multitude of additional frequencies that make up its harmonic spectrum. The problem is that a sampled sound system can only handle sound data where the maximum frequency content is half that of the sample rate, which is called the Nyquist limit. Frequencies above the Nyquist limit create unwanted signals in the band of interest creating an effect called aliasing.9 An example of aliasing can be seen in almost any car commercial if you stare at the wheels of a car that is moving at a constant velocity. You will notice that the wheels rotate forward for a short time, slow down, and then appear to move backwards. This effect is the result of the frequency of wheel rotation being higher than half that of the frame rate (sampling rate) of the camera.
The solution to the aliasing problem is to use an anti-aliasing filter. Anti-aliasing works by using the Fast Fourier Transform (FFT) to break a digital signal into its basic waves (harmonics) and their corresponding amplitudes, and blocking out those harmonics that are at frequencies above the Nyquist limit.10
The Discrete Fourier Transform
Fourier transforms are used to convert digital signals from the time domain into the frequency domain by converting the signal information to a magnitude and phase component of each frequency.11 Magnitude describes a number assigned to the ratio of two quantities,12 while phase describes a measure of relative time between corresponding points on two identical waves.13 For my purposes though I am only interested in the amplitudes of all harmonics within a sound, so I can use a Discrete Fourier Transform (DFT). Signal analysis using DFT works in the following way:
Let x(t) represent the function of a digital signal
Where t is a time interval ranging [0,T], meaning that t can be any value from 0 to T
The digital signal represented by the function x(t) is sampled at n equidistant points in time
The function x(t) will be calculated for each time interval up to the sampling rate – 1
With values of t being represented by h where h = T/ n
For x0 = x(0), x1 = x(h), x2 = x(2h), … , xn-1 = x( (n-1)h )
A DFT, represented by fj, is then calculated for each value of xj
14
By using Euler’s formula on the DFT equation:
The DFT equation can be simplified to:
The DFT Algorithm
Before expressing DFT as an algorithm data storage and representation must be handled. Also the digital signal first must be stored in an array of bytes. Getting a sound file in this format is similar to what was done in the previous example, with a few major changes:
/* This code is based on the example found at: http://www.jsresources.org/examples/SimpleAudioPlayer.java.html */
//Create a global buffer size private static final int EXTERNAL_BUFFER_SIZE = 128000; //Get the location of the sound file File soundFile = new File("sin300.wav"); //Load the Audio Input Stream from the file AudioInputStream audioInputStream = null; try { audioInputStream = AudioSystem.getAudioInputStream(soundFile); } catch (Exception e) { e.printStackTrace(); System.exit(1); } //Get Audio Format information AudioFormat audioFormat = audioInputStream.getFormat(); //Handle opening the line SourceDataLine line = null; DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat); try { line = (SourceDataLine) AudioSystem.getLine(info); line.open(audioFormat); } catch (LineUnavailableException e) { e.printStackTrace(); System.exit(1); } catch (Exception e) { e.printStackTrace(); System.exit(1); } //Start playing the sound line.start(); //Write the sound to an array of bytes int nBytesRead = 0; byte[] abData = new byte[EXTERNAL_BUFFER_SIZE]; while (nBytesRead != -1) { try { nBytesRead = audioInputStream.read(abData, 0, abData.length); } catch (IOException e) { e.printStackTrace(); } if (nBytesRead >= 0) { int nBytesWritten = line.write(abData, 0, nBytesRead); } } //close the line line.drain(); line.close();
19
With the sound objects such as AudioFormat and AudioInputStream initialized, some of the variables needed for DFT computations can be calculated:
//Calculate the sample rate float sample_rate = audioFormat.getSampleRate(); System.out.println("sample rate = "+sample_rate); //Calculate the length in seconds of the sample float T = audioInputStream.getFrameLength() / audioFormat.getFrameRate(); System.out.println("T = "+T+ " (length of sampled sound in seconds)"); //Calculate the number of equidistant points in time int n = (int) (T * sample_rate) / 2; System.out.println("n = "+n+" (number of equidistant points)"); //Calculate the time interval at each equidistant point float h = (T / n); System.out.println("h = "+h+" (length of each time interval in seconds)");
20
When this code is run using the example sampled sound (sqr-10Hz.wav) it prints out the DFT variables calculated so far:
sample rate = 44100.0 T = 1.7329932 (length of sampled sound in seconds) n = 38212 (number of equidistant points) h = 4.535207E-5 (length of each time interval in seconds)
This array of bytes though does not directly contain signal data per time interval, which is needed in order to perform DFT. The array of bytes must be converted to Endian format where the specific type of Endian is dependant on the original encoding of the sampled sound. Each value in the byte array is 8 bits or 1 byte, and Endian values are stored in the format of 16 bits or 2 bytes, meaning that it takes 2 byte values from the byte array to represent the signal data at a given point in time, where the signal data at a given point in time is a single Endian value. The conversion can be done in the following way:
//Determine the original Endian encoding format boolean isBigEndian = audioFormat.isBigEndian(); //this array is the value of the signal at time i*h int x[] = new int[n]; //convert each pair of byte values from the byte array to an Endian value for (int i = 0; i < n*2; i+=2) { int b1 = abData[i]; int b2 = abData[i + 1]; if (b1 < 0) b1 += 0x100; if (b2 < 0) b2 += 0x100; int value; //Store the data based on the original Endian encoding format if (!isBigEndian) value = (b1 << 8) + b2; else value = b1 + (b2 << 8); x[i/2] = value; }
21
The array x now contains the signal values at each time interval, so the DFT can be performed on each value of x.
It should also be noted that only the first half of the equidistant points are useable due to the Nyquist Criterion, because data will only be reliable for half of the sampling frequency (sample rate). The frequency is also a combination of placement along the time domain, sample rate, the number of equidistant points, and the length of each equidistant point in seconds.
//do the DFT for each value of x sub j and store as f sub j double f[] = new double[n/2]; for (int j = 0; j < n/2; j++) { double firstSummation = 0; double secondSummation = 0; for (int k = 0; k < n; k++) { double twoPInjk = ((2 * Math.PI) / n) * (j * k); firstSummation += x[k] * Math.cos(twoPInjk); secondSummation += x[k] * Math.sin(twoPInjk); } f[j] = Math.abs( Math.sqrt(Math.pow(firstSummation,2) + Math.pow(secondSummation,2)) ); double amplitude = 2 * f[j]/n; double frequency = j * h / T * sample_rate; System.out.println("frequency = "+frequency+", amp = "+amplitude); }
22
Testing the DFT Algorithm
Given a sampled sound in which the frequency is known (sqr-10Hz.wav), a sound containing a 10 Hz square wave, the validity of this process can be tested.
Because this process generated a lot of data, it is easier to view the result as a plot instead of a 38,212 line long list of frequencies and amplitudes. I used a free plotting package called JFreeChart (www.jfree.org) to plot the resulting data.
Time Domain:
Frequency Domain:
Frequency Domain Zoomed In:
20 comments:
Hi John, I just read this article. It seems interesting because I am doing sound programming with java. By the way, I just want to count up the amplitude and frequency level only. I need the full source of this article. Can you help me to send it through my email (ichiwan@gmail.com)?
Thank you very much :)
It has been a while since I wrote this, but all of the code for performing DFT (minus the plotting) is here. I will see if I can locate the old code for this and send it your way though.
Hi, John.
Some of the code is deleted - I am guessing because it is confused to be html (or xhtml) code.
Especially, in the sixth block of code (in the for and if).
Could you provide the original code.
Thank you, sincerely.
Thanks for the code. It really helped me get going with a project i have regarding DSP in java for speech signals (for school). I still have a question that I sincerely hope you'll have the time and disposition to answer. Does the code work for any wav file? I tried it for an 8-bit 22KHz wav and worked fine, but on 16 bit or 24 it seems it doesn't. I'm guessing it has something to do with the conversion from byte to endian values. If you could help me on this, I'd really appreciate it. Anyhow, thank you and I await your reply
The code should pick up the sampling rate, but you might be exceeding the buffer size when using larger sound files.
Hello:
I have a doubt of my understanding of your code.
If I want to plot the frequency response for each sample, how i do?
Like Spectrum Analyzer of Winamp.
If you play a Sound File then Spectrum Analyzer of Winamp varies according to position of reader of Sound File, in other words, the Spectrum varies of Bytes read (in theory).
I need to know if your code cumply with that, or I need to overwrite it.
Thanks...
Hi John!
First and the last internet poster, I found yours very helpful and useful
I'm trying to understand the whole concept of frequency/amplitude exporting to the file so I can display wave form audio spectrum.
Would you be able to send full source code as a great example for me. I appreciate your help.
jekyjaw@live.com
thanks for the tutorial. But may i have the original code? This is my email (cell_st@hotmail.com)
As i am working on a program to retrieve the frequency for my assignment. But i can't run the program as the code is not complete.
Many thanks!
Hi John, I've read this article a while ago and I'm really interested for I am doing some digital signal processing using java. However, some of the equation where missing or has been blocked, so if you don't mind could I also have the full article and code. Here is my email account(naturejan08@gmail.com)
Thank you very much for the code John.But there is some part of code that is not visible properly or is deleted, can you please mail me the code in article? My mail id is pankajjagdale90@gmail.com
Thank you very much for the code John.Its a very nice article, but some part of the code is not visible or is deleted, so can you please mail me the code in this article? My mail id is pankajjagdale90@gmail.com
Great job man. So I have a couple questions, first in this part of your code here...
//convert each pair of byte values from the byte array to an Endian value
for (int i = 0; i < style="color: rgb(0, 0, 255);">int b1 = abData[i];
int b2 = abData[i + 1];
if (b1 < style="color: rgb(0, 0, 255);">if (b2 < style="color: rgb(0, 0, 255);">int value;
What the h*** is going on there? Is that some sort of pseudo code? Because I sort of understand what you're trying to say but I've never seen anything like that in Java. I'd just like to make sure I have a thorough understanding but more importantly, I'm trying to implement an equalizer, hence why I'm researching frequencies and amplitudes.
For example with a 16 band EQ there would be 16 points between the lowest and highest frequencies at which you could change the amplitude of a specific frequency range without affect the amplitude of the other frequency ranges, well more or less. Is some sort of filter necessary to do this to split it up into separate AudioInputStreams? How would you go about implementing something like an equalizer? Thanks for all the great information.
Sorry, it has been a while since I have had the time to respond to the questions on this post.
1. The equation images should again show in the post. They were hosted on another site that stopped allowing external linking. They now are hosted on this blog.
2. Those RGB statements were leftovers from when I converted to blogspot that were used to do CSS coloring of the Java code. I removed all of the color formatting so that this won't be in issue anymore.
3. I no longer have the source code to this project. It was from 10 years ago and it was probably lost when I had a hard drive die around that time. Al of the important parts are here though. The original code only put these examples inside of a JFrame and rendered the data to JFreeChart.
4. It is not possible to use this code to break out individual frequencies in order to increase the amplitude of a specific frequency. The DFT application here takes a single sampled sound in a WAV file format and converts the frequency domain to the time domain. The longer the time of the sampled sound, the higher the frequency resolution of the resulting transform.
While it is possible to do what you are asking, DFT would only be a small part of it. There are other transforms that may be better suited for moving back and forth as well.
Thanks for the reply John. Now there is no issue with the visibility of the code and the equations.
Hi John, i just read the article, it's really very much informational.. I was doing something similar too, so if possible could you please help me with the complete source?
My email id is: thakurvibhu@yahoo.com
Thank you... :)
Hi john, It's awesome, but could you tell me what library should I import there beside java.io?
thank you dude!
Hi john
you are great.
thank you for your post i am working with my project can you send me the complete codes??because when i run your codes their is a lot of error and i can't run it..this is my email
(jessicatamayo1989@yahoo.com)
thank you so much in advance.
can anybody here send me the complete code on saopayne@gmail.com
Can you send the complete code please javihashtag@gmail.com
Unfortunately I no longer have the source code to this project. It was from 10+ years ago and it was probably lost when I had a hard drive die around that time. All of the important parts are here though. The original code only put these examples inside of a JFrame and rendered the data to JFreeChart.
Post a Comment