Abstract: In this paper we have applied time domain based and **frequency** domain based approach for **pitch** extraction of Indian folk music with pitched instrument in background. After extracting **pitch** we have also done a comparative study of results. We have analyzed that several melody extraction algorithm were based on **pitch** extraction or **fundamental** **frequency** estimation and now days much advanced algorithm are used for the same. Lot of research work has been done on variety of music but we did not find much articles on Indian Folk music. So in our work we have applied traditional and well established algorithm for **pitch** extraction and **fundamental** **frequency** estimation for Indian folk music.

Abstract— In this paper we designed a robust method for processing of speech in cellphone communication using linear predictive analysis and synthesis. A set of speech parameters such as **pitch**, gain and prediction filter coefficients are computed for every 30ms of input speech and transmitted to the receiver. At the receiver speech is produced using the transmitted parameters .Linear predictive synthesis at the receiver is performed with original **fundamental** **frequency** or **pitch**. **Pitch** at the transmitter is estimated using Cepstrum method .We implemented the method on speech samples of both male and female speaker and showed that the method is robust. Both subjective and objective analyses are performed on experimental results.

is comparable to a traditional wide-band spectrogram, except that the reduction in variance afforded by the multitaper procedure results in a smoother image quality. Typically, spectral details are sharpened but also broadened by the rectangular spectral kernel, so formant frequencies are clearer but formant bandwidths may be arti cially enlarged. Thus, multitaper spectral estimates are not usually suitable for locating precise peaks, such as formants or harmonics. On the other hand, the multitaper F-test spectrogram resembles a traditional narrow-band spectrogram, but the harmonics are much sharper and the unvoiced portions of speech do not appear; consequently, the F-test statistic is usually better for locating harmonics than a direct spectral estimate. Figure 9 shows the F0 contour and the amplitudes of the rst ve harmonics estimated using the multitaper harmonic analysis procedure for the same phrase; tests were constructed at the 10% signi cance level for 10 harmonic components of a **fundamental** **frequency** lying in the range 0-200 Hz, at intervals of 0.5 Hz. Individual harmonics are tracked throughout each consonant, and localized perturbations in **frequency** and amplitude are clearly visible during periods of oral constriction. The envelope of the harmonic spectrum is determined by the transfer function of the vocal tract and the glottal source spectrum; higher harmonics often appear to cut off before lower harmonics at fricative boundaries, which may indicate a change in the spectral slope of the source as vocal fold vibration is inhibited. Information about individual harmonics is not available when using traditional **pitch**-tracking algorithms, but can be obtained automatically using multitaper analysis. CONCLUSIONS

As for the heterosexual-homosexual evaluation presented in Figure 4, the raters had a higher accuracy in identifying the sexual orientation in original voice recordings without any manipulation (HEMO, HEFO, HOMO). In voice recordings with manipulated F0 **fundamental** **frequency** (HEMP, HEFP, HOMP) the raters generally identified the female voices which had been converged towards male sounds as homosexual. The rat- ers identified the homosexual sounds which had been converged towards male voices almost equally as homo- sexual (N = 13) and heterosexual (N = 17). It is seen that in voice recordings in which the homosexual voice had been converged towards male sound with regard to both F0 **fundamental** **frequency** and **pitch** range (HOMPR), the number of raters who identified the sounds as homosexual decreased dramatically (Figure 4).

In two typical cases, the extra estimated pitches can be removed based on the above assumptions. In the first case, the extra **pitch** estimation is caused by a noise peak in the preliminary **pitch** estimation. In the second case, the harmonic components of an extra estimated **pitch** are partly overlapped by the harmonic components of the true pitches. In such a case, the nonoverlapped harmonic components become important clues to check the existence of the extra estimated **pitch**. If a polyphonic set of notes contains two concurrent music notes C5 and G5, for example, the **fundamental** **frequency** ratio of the two notes is nearly 2/3. Then, it is probable that there is an extra **pitch** estimation on the C4 note, because its even harmonics are overlapped by the odd harmonics of C5, and the C4 note’s third, sixth, ninth, and so forth, harmonic components are nearly overlapped by the G5 note’s odd harmonics. However, the C4’s first, fifth, and seventh harmonic components are not overlapped, so the extra C4 estimation can be easily identified by checking the existence of the first harmonic component based on the above assumption.

11 Read more

One of the oldest methods for **pitch** estimation is the comb filtering method [21,22], which is based on the following ideas. Mathematically, we can express peri- odicity as x(n) ≈ x(n - D) where D is the repetition or **pitch** period. From this observation it follows that we can measure the extent to which a certain waveform is periodic using a metric on the error e(n), defined as e (n) = x(n) - a x(n - D). The Z-transform of this is E(z) = X(z)(1 - a z -D ). This shows that the matching of a signal with a delayed version of itself can be seen as a filtering process, where the output of the filter is the modeling error e(n). This can of course also be seen as a prediction problem, only the unknowns are not just the filter coefficient a but also the lag D. If the **pitch** period is exactly D, the output error is just the obser- vation noise. Usually, however, the comb filter is not used in this form as it is restricted to integer **pitch** periods and is rather inefficient in several ways. Instead, one can derive more efficient methods based on notch filters [23]. Notch filters are filters that can- cel out, or, more correctly, attenuate signal compo- nents at certain frequencies. Periodic signals can be comprised of a number of harmonics, for which reason we use L k such filters having notches at frequencies

18 Read more

were inserted with a 4 Hz modulation rate (unexpected), percentage correct scores decreased by 10% when compared to the target only conditions at every modulation rate. When the expected modulation rate was 32 Hz or 256 Hz, mean performance decreased by 0-10% for probe rates at and above 16 Hz. These scores decreased considerably for modulation rates below 16 Hz. Mean performance for the 4 Hz rate was 58% correct when the 32 Hz rate was ‘expected’. When the 256 Hz rate was expected performance was observed to be at 62% and 91% when the 4 kHz rate was presented alone. Findings indicated modulation at unexpected rates at or greater than 16 Hz were detected only slightly more poorly than at expected modulation rates, regardless of the expected rate of modulation. The results could not be attributed to the idea that listeners hear both expected and unexpected amplitude modulated signals equally well. But also they reject the unexpected signals if they do not sound sufficiently like the expected tone (Scharf et al., 1987). The pattern of results was found to be dependent on the modulation rate of the target. It was reported that it was difficult to see how the listener could reject the probe rate because it was different from the target rate in one condition but not the other. This is especially the case because in the 2IFC task the standard was an un-modulated noise, so any sound different from that standard could have been used as the detection cue. The results indicate that listeners may use two different cues for the detection of modulation: an individual fluctuation cue at low rates and a roughness or **pitch** cue at higher rates. The **pitch** or roughness cue explanation could be consistent with results obtained for modulation rates at 32 or 256 Hz. Here mean performance was seen to be best for the unexpected rate of 64 Hz, a value close to the 70 Hz rate which produces the most roughness for broadband carriers (Fastl, 1977).

66 Read more

In the past decades, detection and tracking of the **fundamental** **frequency** (F0) has been an essential part in Blind Signal Separation (BSS) and Music Information Retrieval (MIR) field. Firstly, it is the basic part in semantic level and many features are based on that, for example, if using **pitch** based features, it would be easier when retrieval since the **pitch** can be directly used on music. Secondly, **pitch** tracking can be used on many applications such as humming detection, polyphonic music identification, etc. Thirdly, generally, **pitch** is an independent direction by contrast with other music re- search directions (timbre, beat, rhythm, chord, melody) that results in **pitch** can be com- bined with other directions’ methods. At present, F0 tracking can be achieved by using many methods [1] such as probabilistic latent component analysis (PLCA) [2], Non- negative Matrix Factorization (NMF) [3], Support Vector Machines (SVM), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) [4], etc.

We investigated three scenarios. In scenario one, we applied all algorithms to the raw speech data. In sce- nario two, the datasets were band-pass filtered using a sixth-order Butterworth filter with a lower corner fre- quency of 50 Hz, an upper corner **frequency** of 500 Hz. This **frequency** range contains all typical **fundamental** frequencies of the human voice. In scenario three, fre- quency shaping was used in addition to the band-pass filter. This **frequency** shaping is a simple low-pass filter- ing above 50 Hz and attenuates higher **frequency** com- ponents with 6 dB/oct. To measure the performance of **pitch** detection algorithms, the gross error is calculated. This error is a measure of how often a deviation of more than 20 % from the reference **fundamental** fre- quency occurs and is frequently used in the literature [11, 18, 19]. Consistent with other studies [11, 19], this investigation only considered voiced parts of the speech signals, while unvoiced sequences were not taken into account.

Reference [3] investigated tone realization in continuous utterances in Yorùbá, in which features influencing syllable **pitch** targets in continuous utterances in Yorùbá were investigated in a small speech corpus of 4 speakers. It was found that the previous syllable **pitch** level is strongly correlated with **pitch** changes between syllables and a number of approaches and features were evaluated in this context. The resulting models was used to predict utterance **pitch** targets for speech synthesizers.

0°), the **fundamental** **frequency** is changed from 50 to 45 Hz at 0.15 s. The results are presented in Fig. 8. Both the proposed method and DDSRF-PLL can fast and accurately detect the signals of the positive-sequence component, even when the grid **frequency** deviates from its nominal value. The DSOGI-PLL, on the other hand, generates significant fluctuations in the **frequency** and d-axis voltage signals. As a matter of fact, it takes more than 1 s for the DSOGI-PLL to achieve zero steady-state error. Therefore, the robustness of the DSOGI-PLL against **frequency** deviations is proved to be very weak.

Cry analysis presented in this paper falls within the research on the relationship between the disease and the characteristics of the cry. Newborns do not have phona- tory control due to neurological immaturity at early age [3]. The main objective of this work is to analyze cries of healthy and newborn with different categories of diseases in order to evaluate a **fundamental** **frequency** of these cries. We also establish quantitative relationships be- tween the different modes of cries and studied pathology. The SIFT algorithm ‘Simple Inverse Filtering Track- ing’ is used for estimation of the **fundamental** **frequency** because the performance of this algorithm has been tested on a real database of cries by [4,5].

In this experiment effect of **frequency** on the synthesis of vowels were investigated. Five vowels ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’ were synthesis with various glottal wave **frequency** ranging from 60 Hz to 520 Hz **frequency**. The synthesis vowels were analyzed used Praat tool. Various speech parameters such as intensity, maximum and minimum **pitch**, first format **frequency** were computed. Table 2 shows the various speech parameters with respect to glottal wave **frequency** for vowel ‘a’. It is observed from the value observation table that with increase in glottal wave **frequency** speech parameter shows random behavior. Maximum **pitch** is seen at 478 Hz glottal wave **frequency**. Maximum intensity of 88 dB is observed at 520 Hz. Highest first formant **frequency** of 683 Hz is observed at 359 Hz.

1.1 Example of utilization of Goertzel algorithm — DTMF The Goertzel algorithm is typically used for **frequency** detection in the telephone tone dialing (dual-tone multi- **frequency**, DTMF), where the meaning of the signaling is determined by two out of a total of eight frequencies being simultaneously present [5]. The frequencies of each of the two groups of four signaling tones were cho- sen such that the frequencies of their higher harmonics or intermodulation products were sufficiently distant. The frequencies chosen for the DTFM have a big least common multiple. Hence, using a digital receiver with a sampling **frequency** of 8 kHz, the period of DTMF sig- nal amounts to several tens of thousands of samples. In practice, however, the transform length N must be much smaller, so naturally the effect of spectrum leak- age will appear. For example, with N = 205, instead of the accurate **frequency** 770 Hz the modulus at approxi- mately 780.5 Hz (= 20·8000/205) is computed. This situation is illustrated in Figure 1, where it is evident that the maximum occurs at the non-integer multiple of the **fundamental** **frequency**.

Abstract. The **fundamental** frequencies of the variable thickness truncated conical shells with different boundary conditions are studied by combining the vibration theory with the generalized differential quadrature method which is applied to discrete the derivatives in the governing equations. The discretization of the system leads to a standard linear eigenvalue problem. The coefficients of the governing equations are obtained by theoretical derivation and different boundary conditions are considered. The work can provide the theoretical evidences to design the conical shell for good structural performance.

Distributed Generation systems. For a perfectly matched load condition, the deviation in **frequency** during islanding is very low such that it falls inside NDZ. The injection of a disturbance signal of **frequency** other than the **fundamental** **frequency** of small magnitude will reflect in the PCC **frequency** thereby the islanding can be discovered.The **frequency** deviation measurement is calculated by using an average absolute **frequency** deviation value (AFDV avg ). It

unstable zones, the roll response was almost zero. In the case of the **pitch** mode, we saw that we were getting predominantly first order **pitch** response in regular waves for all wave excitation frequencies, except at wave excitation frequencies at and around the tuning factor of 2, where the **pitch** responded mainly at the **pitch** natural **frequency** due to parametric motion. In the case of long crested waves, we see that most of the **pitch** motion energy is concentrated at the **pitch** natural **frequency**, not just at the tuning factor of 2 but also at tuning factors of 4.7059 and 1.7665. Similar to the roll mode, there will always be some level of excitation at twice the **pitch** natural **frequency** due to broadband excitation, and hence, there will always be some level of parametric **pitch** taking place. A time-**frequency** analysis of test T34 is shown in Figs. 7.8 to understand how the response spectrum of the diﬀerent modes evolve over time. The tuning factor for this test is 2, and we can see that the roll motion responds mainly at it’s own natural **frequency** for the duration of the test. For the first 200s, the **pitch** motion is responding with a low amplitude, at both the wave excitation **frequency** and the **pitch** natural **frequency**. After 200s, the largest **pitch** response amplitude is at the **pitch** natural **frequency**, which is typical of parametric motion. The heave motion in this test is responding at the wave peak **frequency** (ω p = 2.137 rad/s) and at the heave natural **frequency** of 2.199 rad/s.

186 Read more

The comparison of **pitch** angle with flapping angle at varying lag angles as well as at varying link-lengths have been plotted for reference and the results, obtained from the code FMAV1, developed by the authors in MATLAB, are discussed. The best-suited dimensions for the link lengths following the Grashoff’s criteria and a rotary input of 2000 rpm as rotational speed have been used as inputs.

Variation of natural frequencies for steel and composite materials is shown in Figure 5. HS carbon epoxy composite material shows the excellent material properties for the design of single-piece composite drive shaft to meet the stringent design requirements for heavy vehicles. In order to avoid the whirling or resonance vibration the bending **frequency** should be higher than (2400-4000) rpm for trucks and vans and the transmission capability should be higher than 154 Nm. The HS carbon epoxy composite material fulfills these technical requirements. The bending natural **frequency** is 10930 rpm much higher than 2400 rpm, so it reduces the chances of whirling or resonance. The torque transmission capability of single-piece drive shaft was considered as 245 Nm.

The right half plane zeroes arising from the interaction between the drive-train dynamics and the tower, at above rated wind speeds and at frequencies close to the tower **frequency**, can be removed by a control scheme called power coordinated control (PCC) [7], see Figure 3. The control action of the PCC is achieved through a combination of **pitch** and torque demand. The element 𝑌 is designed as a low pass filter or a notch filter centred at the tower **frequency** to reduce **pitch** activity in the vicinity of such **frequency**. The element 𝑋 is applied to torque demand such that the transmittance from its input to Ω 𝐺 is similar to the transmittance from β 𝑑 to Ω 𝐺 and the speed controller remains unchanged. For wind speeds, particularly just above rated, the generator speed obtained using PCC is the same as that using the speed controller alone. However, there can be large power fluctuations because the gain from 𝑇 𝑑 to Ω 𝐺 is much weaker than that from β 𝑑 to Ω 𝐺 . These fluctuations have a direct impact on the drive-train components such as gearbox and generator [7]. A reduction in these fluctuations can be attained by replacing the speed control loop with a power control loop. Since the power converter is relatively fast acting, torque fluctuation Δ𝑇 𝐺 about 𝑇 𝐺0 are relatively small compared to fluctuations ΔΩ 𝐺 about Ω 𝐺0 , thus if 𝑃 is well controlled then so is Ω 𝐺 and the power control loop from Figure 3 is similar to the speed

10 Read more