Abstract: In this paper we have applied time domain based and frequency domain based approach for pitch extraction of Indian folk music with pitched instrument in background. After extracting pitch we have also done a comparative study of results. We have analyzed that several melody extraction algorithm were based on pitch extraction or fundamental frequency estimation and now days much advanced algorithm are used for the same. Lot of research work has been done on variety of music but we did not find much articles on Indian Folk music. So in our work we have applied traditional and well established algorithm for pitch extraction and fundamental frequency estimation for Indian folk music.
Abstract— In this paper we designed a robust method for processing of speech in cellphone communication using linear predictive analysis and synthesis. A set of speech parameters such as pitch, gain and prediction filter coefficients are computed for every 30ms of input speech and transmitted to the receiver. At the receiver speech is produced using the transmitted parameters .Linear predictive synthesis at the receiver is performed with original fundamental frequency or pitch. Pitch at the transmitter is estimated using Cepstrum method .We implemented the method on speech samples of both male and female speaker and showed that the method is robust. Both subjective and objective analyses are performed on experimental results.
is comparable to a traditional wide-band spectrogram, except that the reduction in variance afforded by the multitaper procedure results in a smoother image quality. Typically, spectral details are sharpened but also broadened by the rectangular spectral kernel, so formant frequencies are clearer but formant bandwidths may be arti cially enlarged. Thus, multitaper spectral estimates are not usually suitable for locating precise peaks, such as formants or harmonics. On the other hand, the multitaper F-test spectrogram resembles a traditional narrow-band spectrogram, but the harmonics are much sharper and the unvoiced portions of speech do not appear; consequently, the F-test statistic is usually better for locating harmonics than a direct spectral estimate. Figure 9 shows the F0 contour and the amplitudes of the rst ve harmonics estimated using the multitaper harmonic analysis procedure for the same phrase; tests were constructed at the 10% signi cance level for 10 harmonic components of a fundamental frequency lying in the range 0-200 Hz, at intervals of 0.5 Hz. Individual harmonics are tracked throughout each consonant, and localized perturbations in frequency and amplitude are clearly visible during periods of oral constriction. The envelope of the harmonic spectrum is determined by the transfer function of the vocal tract and the glottal source spectrum; higher harmonics often appear to cut off before lower harmonics at fricative boundaries, which may indicate a change in the spectral slope of the source as vocal fold vibration is inhibited. Information about individual harmonics is not available when using traditional pitch-tracking algorithms, but can be obtained automatically using multitaper analysis. CONCLUSIONS
As for the heterosexual-homosexual evaluation presented in Figure 4, the raters had a higher accuracy in identifying the sexual orientation in original voice recordings without any manipulation (HEMO, HEFO, HOMO). In voice recordings with manipulated F0 fundamental frequency (HEMP, HEFP, HOMP) the raters generally identified the female voices which had been converged towards male sounds as homosexual. The rat- ers identified the homosexual sounds which had been converged towards male voices almost equally as homo- sexual (N = 13) and heterosexual (N = 17). It is seen that in voice recordings in which the homosexual voice had been converged towards male sound with regard to both F0 fundamental frequency and pitch range (HOMPR), the number of raters who identified the sounds as homosexual decreased dramatically (Figure 4).
In two typical cases, the extra estimated pitches can be removed based on the above assumptions. In the first case, the extra pitch estimation is caused by a noise peak in the preliminary pitch estimation. In the second case, the harmonic components of an extra estimated pitch are partly overlapped by the harmonic components of the true pitches. In such a case, the nonoverlapped harmonic components become important clues to check the existence of the extra estimated pitch. If a polyphonic set of notes contains two concurrent music notes C5 and G5, for example, the fundamental frequency ratio of the two notes is nearly 2/3. Then, it is probable that there is an extra pitch estimation on the C4 note, because its even harmonics are overlapped by the odd harmonics of C5, and the C4 note’s third, sixth, ninth, and so forth, harmonic components are nearly overlapped by the G5 note’s odd harmonics. However, the C4’s first, fifth, and seventh harmonic components are not overlapped, so the extra C4 estimation can be easily identified by checking the existence of the first harmonic component based on the above assumption.
11 Read more
One of the oldest methods for pitch estimation is the comb filtering method [21,22], which is based on the following ideas. Mathematically, we can express peri- odicity as x(n) ≈ x(n - D) where D is the repetition or pitch period. From this observation it follows that we can measure the extent to which a certain waveform is periodic using a metric on the error e(n), defined as e (n) = x(n) - a x(n - D). The Z-transform of this is E(z) = X(z)(1 - a z -D ). This shows that the matching of a signal with a delayed version of itself can be seen as a filtering process, where the output of the filter is the modeling error e(n). This can of course also be seen as a prediction problem, only the unknowns are not just the filter coefficient a but also the lag D. If the pitch period is exactly D, the output error is just the obser- vation noise. Usually, however, the comb filter is not used in this form as it is restricted to integer pitch periods and is rather inefficient in several ways. Instead, one can derive more efficient methods based on notch filters . Notch filters are filters that can- cel out, or, more correctly, attenuate signal compo- nents at certain frequencies. Periodic signals can be comprised of a number of harmonics, for which reason we use L k such filters having notches at frequencies
18 Read more
were inserted with a 4 Hz modulation rate (unexpected), percentage correct scores decreased by 10% when compared to the target only conditions at every modulation rate. When the expected modulation rate was 32 Hz or 256 Hz, mean performance decreased by 0-10% for probe rates at and above 16 Hz. These scores decreased considerably for modulation rates below 16 Hz. Mean performance for the 4 Hz rate was 58% correct when the 32 Hz rate was ‘expected’. When the 256 Hz rate was expected performance was observed to be at 62% and 91% when the 4 kHz rate was presented alone. Findings indicated modulation at unexpected rates at or greater than 16 Hz were detected only slightly more poorly than at expected modulation rates, regardless of the expected rate of modulation. The results could not be attributed to the idea that listeners hear both expected and unexpected amplitude modulated signals equally well. But also they reject the unexpected signals if they do not sound sufficiently like the expected tone (Scharf et al., 1987). The pattern of results was found to be dependent on the modulation rate of the target. It was reported that it was difficult to see how the listener could reject the probe rate because it was different from the target rate in one condition but not the other. This is especially the case because in the 2IFC task the standard was an un-modulated noise, so any sound different from that standard could have been used as the detection cue. The results indicate that listeners may use two different cues for the detection of modulation: an individual fluctuation cue at low rates and a roughness or pitch cue at higher rates. The pitch or roughness cue explanation could be consistent with results obtained for modulation rates at 32 or 256 Hz. Here mean performance was seen to be best for the unexpected rate of 64 Hz, a value close to the 70 Hz rate which produces the most roughness for broadband carriers (Fastl, 1977).
66 Read more
In the past decades, detection and tracking of the fundamental frequency (F0) has been an essential part in Blind Signal Separation (BSS) and Music Information Retrieval (MIR) field. Firstly, it is the basic part in semantic level and many features are based on that, for example, if using pitch based features, it would be easier when retrieval since the pitch can be directly used on music. Secondly, pitch tracking can be used on many applications such as humming detection, polyphonic music identification, etc. Thirdly, generally, pitch is an independent direction by contrast with other music re- search directions (timbre, beat, rhythm, chord, melody) that results in pitch can be com- bined with other directions’ methods. At present, F0 tracking can be achieved by using many methods  such as probabilistic latent component analysis (PLCA) , Non- negative Matrix Factorization (NMF) , Support Vector Machines (SVM), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) , etc.
We investigated three scenarios. In scenario one, we applied all algorithms to the raw speech data. In sce- nario two, the datasets were band-pass filtered using a sixth-order Butterworth filter with a lower corner fre- quency of 50 Hz, an upper corner frequency of 500 Hz. This frequency range contains all typical fundamental frequencies of the human voice. In scenario three, fre- quency shaping was used in addition to the band-pass filter. This frequency shaping is a simple low-pass filter- ing above 50 Hz and attenuates higher frequency com- ponents with 6 dB/oct. To measure the performance of pitch detection algorithms, the gross error is calculated. This error is a measure of how often a deviation of more than 20 % from the reference fundamental fre- quency occurs and is frequently used in the literature [11, 18, 19]. Consistent with other studies [11, 19], this investigation only considered voiced parts of the speech signals, while unvoiced sequences were not taken into account.
Reference  investigated tone realization in continuous utterances in Yorùbá, in which features influencing syllable pitch targets in continuous utterances in Yorùbá were investigated in a small speech corpus of 4 speakers. It was found that the previous syllable pitch level is strongly correlated with pitch changes between syllables and a number of approaches and features were evaluated in this context. The resulting models was used to predict utterance pitch targets for speech synthesizers.
0°), the fundamental frequency is changed from 50 to 45 Hz at 0.15 s. The results are presented in Fig. 8. Both the proposed method and DDSRF-PLL can fast and accurately detect the signals of the positive-sequence component, even when the grid frequency deviates from its nominal value. The DSOGI-PLL, on the other hand, generates significant fluctuations in the frequency and d-axis voltage signals. As a matter of fact, it takes more than 1 s for the DSOGI-PLL to achieve zero steady-state error. Therefore, the robustness of the DSOGI-PLL against frequency deviations is proved to be very weak.
Cry analysis presented in this paper falls within the research on the relationship between the disease and the characteristics of the cry. Newborns do not have phona- tory control due to neurological immaturity at early age . The main objective of this work is to analyze cries of healthy and newborn with different categories of diseases in order to evaluate a fundamental frequency of these cries. We also establish quantitative relationships be- tween the different modes of cries and studied pathology. The SIFT algorithm ‘Simple Inverse Filtering Track- ing’ is used for estimation of the fundamental frequency because the performance of this algorithm has been tested on a real database of cries by [4,5].
In this experiment effect of frequency on the synthesis of vowels were investigated. Five vowels ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’ were synthesis with various glottal wave frequency ranging from 60 Hz to 520 Hz frequency. The synthesis vowels were analyzed used Praat tool. Various speech parameters such as intensity, maximum and minimum pitch, first format frequency were computed. Table 2 shows the various speech parameters with respect to glottal wave frequency for vowel ‘a’. It is observed from the value observation table that with increase in glottal wave frequency speech parameter shows random behavior. Maximum pitch is seen at 478 Hz glottal wave frequency. Maximum intensity of 88 dB is observed at 520 Hz. Highest first formant frequency of 683 Hz is observed at 359 Hz.
1.1 Example of utilization of Goertzel algorithm — DTMF The Goertzel algorithm is typically used for frequency detection in the telephone tone dialing (dual-tone multi- frequency, DTMF), where the meaning of the signaling is determined by two out of a total of eight frequencies being simultaneously present . The frequencies of each of the two groups of four signaling tones were cho- sen such that the frequencies of their higher harmonics or intermodulation products were sufficiently distant. The frequencies chosen for the DTFM have a big least common multiple. Hence, using a digital receiver with a sampling frequency of 8 kHz, the period of DTMF sig- nal amounts to several tens of thousands of samples. In practice, however, the transform length N must be much smaller, so naturally the effect of spectrum leak- age will appear. For example, with N = 205, instead of the accurate frequency 770 Hz the modulus at approxi- mately 780.5 Hz (= 20·8000/205) is computed. This situation is illustrated in Figure 1, where it is evident that the maximum occurs at the non-integer multiple of the fundamental frequency.
Abstract. The fundamental frequencies of the variable thickness truncated conical shells with different boundary conditions are studied by combining the vibration theory with the generalized differential quadrature method which is applied to discrete the derivatives in the governing equations. The discretization of the system leads to a standard linear eigenvalue problem. The coefficients of the governing equations are obtained by theoretical derivation and different boundary conditions are considered. The work can provide the theoretical evidences to design the conical shell for good structural performance.
Distributed Generation systems. For a perfectly matched load condition, the deviation in frequency during islanding is very low such that it falls inside NDZ. The injection of a disturbance signal of frequency other than the fundamental frequency of small magnitude will reflect in the PCC frequency thereby the islanding can be discovered.The frequency deviation measurement is calculated by using an average absolute frequency deviation value (AFDV avg ). It
unstable zones, the roll response was almost zero. In the case of the pitch mode, we saw that we were getting predominantly first order pitch response in regular waves for all wave excitation frequencies, except at wave excitation frequencies at and around the tuning factor of 2, where the pitch responded mainly at the pitch natural frequency due to parametric motion. In the case of long crested waves, we see that most of the pitch motion energy is concentrated at the pitch natural frequency, not just at the tuning factor of 2 but also at tuning factors of 4.7059 and 1.7665. Similar to the roll mode, there will always be some level of excitation at twice the pitch natural frequency due to broadband excitation, and hence, there will always be some level of parametric pitch taking place. A time-frequency analysis of test T34 is shown in Figs. 7.8 to understand how the response spectrum of the diﬀerent modes evolve over time. The tuning factor for this test is 2, and we can see that the roll motion responds mainly at it’s own natural frequency for the duration of the test. For the first 200s, the pitch motion is responding with a low amplitude, at both the wave excitation frequency and the pitch natural frequency. After 200s, the largest pitch response amplitude is at the pitch natural frequency, which is typical of parametric motion. The heave motion in this test is responding at the wave peak frequency (ω p = 2.137 rad/s) and at the heave natural frequency of 2.199 rad/s.
186 Read more
The comparison of pitch angle with flapping angle at varying lag angles as well as at varying link-lengths have been plotted for reference and the results, obtained from the code FMAV1, developed by the authors in MATLAB, are discussed. The best-suited dimensions for the link lengths following the Grashoff’s criteria and a rotary input of 2000 rpm as rotational speed have been used as inputs.
Variation of natural frequencies for steel and composite materials is shown in Figure 5. HS carbon epoxy composite material shows the excellent material properties for the design of single-piece composite drive shaft to meet the stringent design requirements for heavy vehicles. In order to avoid the whirling or resonance vibration the bending frequency should be higher than (2400-4000) rpm for trucks and vans and the transmission capability should be higher than 154 Nm. The HS carbon epoxy composite material fulfills these technical requirements. The bending natural frequency is 10930 rpm much higher than 2400 rpm, so it reduces the chances of whirling or resonance. The torque transmission capability of single-piece drive shaft was considered as 245 Nm.
The right half plane zeroes arising from the interaction between the drive-train dynamics and the tower, at above rated wind speeds and at frequencies close to the tower frequency, can be removed by a control scheme called power coordinated control (PCC) , see Figure 3. The control action of the PCC is achieved through a combination of pitch and torque demand. The element 𝑌 is designed as a low pass filter or a notch filter centred at the tower frequency to reduce pitch activity in the vicinity of such frequency. The element 𝑋 is applied to torque demand such that the transmittance from its input to Ω 𝐺 is similar to the transmittance from β 𝑑 to Ω 𝐺 and the speed controller remains unchanged. For wind speeds, particularly just above rated, the generator speed obtained using PCC is the same as that using the speed controller alone. However, there can be large power fluctuations because the gain from 𝑇 𝑑 to Ω 𝐺 is much weaker than that from β 𝑑 to Ω 𝐺 . These fluctuations have a direct impact on the drive-train components such as gearbox and generator . A reduction in these fluctuations can be attained by replacing the speed control loop with a power control loop. Since the power converter is relatively fast acting, torque fluctuation Δ𝑇 𝐺 about 𝑇 𝐺0 are relatively small compared to fluctuations ΔΩ 𝐺 about Ω 𝐺0 , thus if 𝑃 is well controlled then so is Ω 𝐺 and the power control loop from Figure 3 is similar to the speed
10 Read more