o �J�h�� @sHddlZddlZddlmZddlmZmZddlmZddl Z ddl Z ddl Z ddl mmZdd�ZdZdZd Zd ZeeZeee�Zed Zeee�Zeee�Zefd eee jfd ede jfdd�Zefdd�dedefdd�Zedd�dede j fdd��Z! d deee je j fdededeeee j"ffdd�Z#dS)!�N)� lru_cache)�Optional�Union)�writecCs||dksJ�||S)Nr�)�x�yrr�DC:\pinokio\api\whisper-webui.git\app\modules\diarize\audio_loader.py� exact_div sr i�>i�����file�sr�returncCs:t|tj�r:|jtjkr|�tj�}|jdkrtj|dd�}tj ddd�}t |j t |d�tj ��|j }|��n|}zDzddd d d |d d dddddt|�dg}tj|ddd�j}Wntjys}z td|j�����|�d}~wwWt|tj�r�t�|�n t|tj�r�t�|�wwt�|tj ����tj�dS)a� Open an audio file or process a numpy array containing audio data as mono waveform, resampling as necessary. Parameters ---------- file: Union[str, np.ndarray] The audio file to open or a numpy array containing the audio data. sr: int The sample rate to resample the audio if necessary. Returns ------- A NumPy array containing the audio waveform, in float32 dtype. ���axisFz.wav)�delete�suffixi��ffmpegz-nostdinz-threads�0z-iz-f�s16lez-ac�1z-acodec� pcm_s16lez-ar�-T)�capture_output�checkzFailed to load audio: Ng�@)� isinstance�np�ndarray�dtype�float32�astype�ndim�mean�tempfile�NamedTemporaryFiler�name� SAMPLE_RATE�int16�close�str� subprocess�run�stdout�CalledProcessError� RuntimeError�stderr�decode�os�remove� frombuffer�flatten)rr� temp_file�temp_file_path�cmd�out�errr � load_audiosP     ����  � � �r=�����r�lengthrcCs�t�|�rC|j||kr|j|tj||jd�d�}|j||krAdg|j}d||j|f||<t�|dd�|ddd�D��}|S|j||krS|j t |�|d �}|j||krqdg|j}d||j|f||<t �||�}|S) zO Pad or trim the audio array to N_SAMPLES, as expected by the encoder. )�device)�dim�index)rrrcSsg|] }|D]}|�qqSrr)�.0�sizes�padrrr � <listcomp>eszpad_or_trim.<locals>.<listcomp>Nr>)�indicesr) �torch� is_tensor�shape� index_select�aranger@r$�FrE�take�ranger)�arrayr?r� pad_widthsrrr � pad_or_trimXs" �   �  rR)�maxsize�n_melscCsr|dvs Jd|����t�tj�tj�t�dd���}t�|d|���� |�Wd�S1s2wYdS)a load the mel filterbank matrix for projecting STFT into a Mel spectrogram. Allows decoupling librosa dependency; saved using: np.savez_compressed( "mel_filters.npz", mel_80=librosa.filters.mel(sr=16000, n_fft=400, n_mels=80), ) )�P�zUnsupported n_mels: �assetszmel_filters.npz�mel_N) r�loadr4�path�join�dirname�__file__rH� from_numpy�to)r@rT�frrr � mel_filtersrs �$�ra�audio�paddingr@c Cs�t�|�st|t�rt|�}t�|�}|dur|�|�}|dkr(t�|d|f�}t� t ��|j �}tj |t t |dd�}|ddd�f��d}t|j |�}||}tj|dd ���} t�| | ��d �} | d d } | S) ap Compute the log-Mel spectrogram of Parameters ---------- audio: Union[str, np.ndarray, torch.Tensor], shape = (*) The path to audio or either a NumPy array or Tensor containing the audio waveform in 16 kHz n_mels: int The number of Mel-frequency filters, only 80 is supported padding: int Number of zero samples to pad to the right device: Optional[Union[str, torch.device]] If given, the audio tensor is moved to this device before STFT Returns ------- torch.Tensor, shape = (80, n_frames) A Tensor that contains the Mel spectrogram NrT)�window�return_complex.r>r g�����|�=)�ming @g@)rHrIrr,r=r^r_rMrE� hann_window�N_FFTr@�stft� HOP_LENGTH�absra�clamp�log10�maximum�max) rbrTrcr@rdri� magnitudes�filters�mel_spec�log_specrrr �log_mel_spectrogram�s"      rt)rN)$r4r-� functoolsr�typingrrZscipy.io.wavfilerr&�numpyrrHZtorch.nn.functional�nn� functionalrMr r)rhrj� CHUNK_LENGTH� N_SAMPLES�N_FRAMES�N_SAMPLES_PER_TOKEN�FRAMES_PER_SECOND�TOKENS_PER_SECONDr,r �intr=rR�Tensorrar@rtrrrr �<module>sD     &:�����
Memory