o �J�h_��@s�ddlZddlZddlmZddlmZmZmZmZm Z m Z mZddlZ ddlZddlmZddlmZmZmZmZmZmZmZmZmZmZmZmZddlmZm Z m!Z!m"Z"m#Z#m$Z$e�r�ddl%Z&ddl'Z&e�(e�(e&j)�j*�e�(d�krye&j+j,Z-ne&j+Z-e�r�dd l.m/Z/e-j0e/j0e-j1e/j1e-j2e/j2e-j3e/j3e-j4e/j4e-j5e/j5iZ6er�e�r�ddl7Z7e�8e9�Z:ed e j;ded ee j;edfZ<eed ddededeed eed eedfZ=Gdd�de�Z>Gdd�de�Z?Gdd�de�Z@eeAeeBeAeeffZCdd�ZDGdd�de�ZEdd�ZFdd�ZGdd�ZHdd�ZId e j;d!eJfd"d#�ZKdad%eBd!ee<fd&d'�ZLd!e j;fd(d)�ZM dbd e j;d*e eeBe eBd+ffd!e>fd,d-�ZN dbd e j;d.e ee>eAfd!eBfd/d0�ZOdbd e j;d1e>d!e eBeBffd2d3�ZPd4eeAeee ffd!eJfd5d6�ZQd4eeAeee ffd!eJfd7d8�ZRd9eeeAeee ffd!eJfd:d;�ZSd9eeeAeee ffd!eJfd<d=�ZTdbd eeAd fd>e eUd!d fd?d@�ZV dbdAeee eAd fd>e eUd!ed ed eed ffdBdC�ZW dcdDe eJdEe eUdFe eJdGe eeUeeUfdHe eeUeeUfdIe eJdJe eBdKe eJdLe eeAeBfdMe eJdNe eeAeBfdOe dPfdQdR�ZXddddddddddddde>jYfdDe eJdEe eUdFe eJdGe eeUeeUfdHe eeUeeUfdIe eJdJe eBdKe eJdLe eeAeBfdMe eJdNe eeAeBfdOe dPdSe eeAefdTe e>fdUdV�ZZGdWdX�dX�Z[dYe?dZe e?d+fd9eed!dfd[d\�Z\d]eeAd^eeAfd_d`�Z]dS)d�N)�BytesIO)� TYPE_CHECKING�Dict�Iterable�List�Optional�Tuple�Union)�version�)�ExplicitEnum� TensorType� is_jax_tensor�is_numpy_array�is_tf_tensor�is_torch_available�is_torch_tensor�is_torchvision_available�is_vision_available�logging�requires_backends�to_numpy)�IMAGENET_DEFAULT_MEAN�IMAGENET_DEFAULT_STD�IMAGENET_STANDARD_MEAN�IMAGENET_STANDARD_STD�OPENAI_CLIP_MEAN�OPENAI_CLIP_STDz9.1.0)�InterpolationModezPIL.Image.Imageztorch.Tensorz np.ndarrayznp.ndarrrayc@�eZdZdZdZdS)�ChannelDimension�channels_first� channels_lastN)�__name__� __module__�__qualname__�FIRST�LAST�r(r(�VC:\pinokio\api\whisper-webui.git\app\env\lib\site-packages\transformers\image_utils.pyr `�r c@r)�AnnotationFormatZcoco_detectionZ coco_panopticN)r#r$r%�COCO_DETECTION� COCO_PANOPTICr(r(r(r)r+er*r+c@seZdZejjZejjZdS)�AnnotionFormatN)r#r$r%r+r,�valuer-r(r(r(r)r.jsr.cCst�o t|tjj�S�N)r� isinstance�PIL�Image��imgr(r(r)�is_pil_imagersr6c@s eZdZdZdZdZdZdZdS)� ImageType�pillow�torch�numpy� tensorflow�jaxN)r#r$r%r2�TORCH�NUMPY� TENSORFLOW�JAXr(r(r(r)r7vsr7cCsXt|�rtjSt|�rtjSt|�rtjSt|�rtjSt |�r#tj Stdt|��)NzUnrecognised image type ) r6r7r2rr=rr>rr?rr@� ValueError�type��imager(r(r)�get_image_type~srEcCs(t|�pt|�pt|�pt|�pt|�Sr0)r6rrrrr4r(r(r)�is_valid_image�s(rFcCs:t|ttf�r|D] }t|�sdSq dSt|�sdSdS)NFT)r1�list�tuple�valid_imagesrF)�imgsr5r(r(r)rI�s��rIcCst|ttf�r t|d�SdS)NrF)r1rGrHrFr4r(r(r)� is_batched�srKrD�returncCs,|jtjkrdSt�|�dkot�|�dkS)zV Checks to see whether the pixel values have already been rescaled to [0, 1]. Frr)�dtype�np�uint8�min�maxrCr(r(r)�is_scaled_image�srR��expected_ndimscCs�t|�r|St|tjj�r|gSt|�r<|j|dkr!t|�}|S|j|kr+|g}|Std|d�d|�d|j�d��tdt|��d��)a Ensure that the input is a list of images. If the input is a single image, it is converted to a list of length 1. If the input is a batch of images, it is converted to a list of images. Args: images (`ImageInput`): Image of images to turn into a list of images. expected_ndims (`int`, *optional*, defaults to 3): Expected number of dimensions for a single input image. If the input image has a different number of dimensions, an error is raised. rz%Invalid image shape. Expected either z or z dimensions, but got z dimensions.ztInvalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got �.) rKr1r2r3rF�ndimrGrArB)�imagesrTr(r(r)�make_list_of_images�s* ��rXcCs@t|�s tdt|��t�rt|tjj�rt�|�St |�S)NzInvalid image type: ) rFrArBrr1r2r3rN�arrayrr4r(r(r)�to_numpy_array�s rZ�num_channels.cCs�|dur|nd}t|t�r|fn|}|jdkrd\}}n|jdkr&d\}}ntd|j��|j||vrI|j||vrIt�d|j�d��tjS|j||vrStjS|j||vr]tj Std ��) a[ Infers the channel dimension format of `image`. Args: image (`np.ndarray`): The image to infer the channel dimension of. num_channels (`int` or `Tuple[int, ...]`, *optional*, defaults to `(1, 3)`): The number of channels of the image. Returns: The channel dimension of the image. N�rrSrS)r��z(Unsupported number of image dimensions: z4The channel dimension is ambiguous. Got image shape z,. Assuming channels are the first dimension.z(Unable to infer channel dimension format) r1�intrVrA�shape�logger�warningr r&r')rDr[� first_dim�last_dimr(r(r)�infer_channel_dimension_format�s" �re�input_data_formatcCsF|durt|�}|tjkr|jdS|tjkr|jdStd|��)a� Returns the channel dimension axis of the image. Args: image (`np.ndarray`): The image to get the channel dimension axis of. input_data_format (`ChannelDimension` or `str`, *optional*): The channel dimension format of the image. If `None`, will infer the channel dimension from the image. Returns: The channel dimension axis of the image. NrSr�Unsupported data format: )rer r&rVr'rA)rDrfr(r(r)�get_channel_dimension_axiss rh�channel_dimcCsZ|durt|�}|tjkr|jd|jdfS|tjkr&|jd|jdfStd|��)a� Returns the (height, width) dimensions of the image. Args: image (`np.ndarray`): The image to get the dimensions of. channel_dim (`ChannelDimension`, *optional*): Which dimension the channel dimension is in. If `None`, will infer the channel dimension from the image. Returns: A tuple of the image's height and width. N��rg)rer r&r`r'rA)rDrir(r(r)�get_image_sizes rm� annotationcCsVt|t�r)d|vr)d|vr)t|dttf�r)t|d�dks't|ddt�r)dSdS)N�image_id�annotationsrTF�r1�dictrGrH�len�rnr(r(r)�"is_valid_annotation_coco_detection2s��"rucCs^t|t�r-d|vr-d|vr-d|vr-t|dttf�r-t|d�dks+t|ddt�r-dSdS)NroZ segments_info� file_namerTFrqrtr(r(r)�!is_valid_annotation_coco_panopticAs��"rwrpcC�tdd�|D��S)Ncs��|]}t|�VqdSr0)ru��.0�annr(r(r)� <genexpr>R��z3valid_coco_detection_annotations.<locals>.<genexpr>��all�rpr(r(r)� valid_coco_detection_annotationsQ�r�cCrx)Ncsryr0)rwrzr(r(r)r}Vr~z2valid_coco_panoptic_annotations.<locals>.<genexpr>rr�r(r(r)�valid_coco_panoptic_annotationsUr�r��timeoutc Csttdg�t|t�re|�d�s|�d�r$tj�tt j ||d�j��}nOtj �|�r1tj�|�}nB|�d�r=|�d�d}zt�|��}tj�t|��}Wn$tyd}z td|�d |��d }~wwt|tjj�ro|}ntd��tj�|�}|�d�}|S) a3 Loads `image` to a PIL Image. Args: image (`str` or `PIL.Image.Image`): The image to convert to the PIL Image format. timeout (`float`, *optional*): The timeout value in seconds for the URL request. Returns: `PIL.Image.Image`: A PIL Image. �visionzhttp://zhttps://�r�zdata:image/�,rz�Incorrect image source. Must be a valid URL starting with `http://` or `https://`, a valid path to an image file, or a base64 encoded string. Got z. Failed with NzuIncorrect format used for image. Should be an url linking to an image, a base64 string, a local path, or a PIL image.�RGB)r� load_imager1�str� startswithr2r3�openr�requests�get�content�os�path�isfile�split�base64�decodebytes�encode� ExceptionrA� TypeError�ImageOps�exif_transpose�convert)rDr�Zb64�er(r(r)r�Ys2 �� r�rWcsXt|ttf�r&t|�rt|dttf�r�fdd�|D�S�fdd�|D�St|�d�S)aLoads images, handling different levels of nesting. Args: images: A single image, a list of images, or a list of lists of images to load. timeout: Timeout for loading images. Returns: A single image, a list of images, a list of lists of images. rcsg|]}�fdd�|D��qS)c�g|]}t|�d��qS�r��r��r{rDr�r(r)� <listcomp>��z*load_images.<locals>.<listcomp>.<listcomp>r()r{�image_groupr�r(r)r��szload_images.<locals>.<listcomp>cr�r�r�r�r�r(r)r��r�r�)r1rGrHrsr�)rWr�r(r�r)�load_images�s r�� do_rescale�rescale_factor�do_normalize� image_mean� image_std�do_pad�size_divisibility�do_center_crop� crop_size� do_resize�size�resample�PILImageResamplingcCs||r |dur td��|r|durtd��|r"|dus|dur"td��|r,|dur,td��| r:| dus6|dur<td��dSdS)a� Checks validity of typically used arguments in an `ImageProcessor` `preprocess` method. Raises `ValueError` if arguments incompatibility is caught. Many incompatibilities are model-specific. `do_pad` sometimes needs `size_divisor`, sometimes `size_divisibility`, and sometimes `size`. New models and processors added should follow existing arguments when possible. Nz=`rescale_factor` must be specified if `do_rescale` is `True`.zzDepending on the model, `size_divisibility`, `size_divisor`, `pad_size` or `size` must be specified if `do_pad` is `True`.zP`image_mean` and `image_std` must both be specified if `do_normalize` is `True`.z<`crop_size` must be specified if `do_center_crop` is `True`.zA`size` and `resample` must be specified if `do_resize` is `True`.)rA)r�r�r�r�r�r�r�r�r�r�r�r�r(r(r)�validate_preprocess_arguments�s��r��return_tensors�data_formatc Cs>t|||||| | |d�|dkrtd��| tjkrtd��dS)z� Checks validity of typically used arguments in an `ImageProcessorFast` `preprocess` method. Raises `ValueError` if arguments incompatibility is caught. )r�r�r�r�r�r�r�r��ptz6Only returning PyTorch tensors is currently supported.z6Only channel first data format is currently supported.N)r�rAr r&)r�r�r�r�r�r�r�r�r�r�r�r�r�r�r(r(r)�"validate_fast_preprocess_arguments�s� �r�c@s�eZdZdZdd�Zddd�Zdd�Zd ejd e e efdejfdd �Zd dd�Z dd�Zd!dd�Zd"dd�Zdd�Zdd�Zd#dd�ZdS)$�ImageFeatureExtractionMixinzD Mixin that contain utilities for preparing image features. cCs8t|tjjtjf�st|�stdt|��d��dSdS)Nz Got type zS which is not supported, only `PIL.Image.Image`, `np.array` and `torch.Tensor` are.)r1r2r3rN�ndarrayrrArB��selfrDr(r(r)�_ensure_format_supported�s ��z4ImageFeatureExtractionMixin._ensure_format_supportedNcCs�|�|�t|�r |��}t|tj�rE|dur t|jdtj�}|jdkr3|j ddvr3|� ddd�}|r9|d}|�tj�}t j�|�S|S)a" Converts `image` to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed. Args: image (`PIL.Image.Image` or `numpy.ndarray` or `torch.Tensor`): The image to convert to the PIL Image format. rescale (`bool`, *optional*): Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default to `True` if the image type is a floating type, `False` otherwise. NrrSr\rr]��)r�rr:r1rNr��flat�floatingrVr`� transpose�astyperOr2r3� fromarray)r�rD�rescaler(r(r)�to_pil_image�s z(ImageFeatureExtractionMixin.to_pil_imagecCs&|�|�t|tjj�s|S|�d�S)z� Converts `PIL.Image.Image` to RGB format. Args: image (`PIL.Image.Image`): The image to convert. r�)r�r1r2r3r�r�r(r(r)�convert_rgbs z'ImageFeatureExtractionMixin.convert_rgbrD�scalerLcCs|�|�||S)z7 Rescale a numpy image by scale amount )r�)r�rDr�r(r(r)r�"s z#ImageFeatureExtractionMixin.rescaleTcCs�|�|�t|tjj�rt�|�}t|�r|��}|dur&t|jdtj �n|}|r4|� |�tj�d�}|rB|j dkrB|�ddd�}|S)a� Converts `image` to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension. Args: image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`): The image to convert to a NumPy array. rescale (`bool`, *optional*): Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default to `True` if the image is a PIL Image or an array/tensor of integers, `False` otherwise. channel_first (`bool`, *optional*, defaults to `True`): Whether or not to permute the dimensions of the image to put the channel dimension first. Nr�p?rSr]r)r�r1r2r3rNrYrr:r��integerr�r��float32rVr�)r�rDr�� channel_firstr(r(r)rZ)s z*ImageFeatureExtractionMixin.to_numpy_arraycCsD|�|�t|tjj�r|St|�r|�d�}|Stj|dd�}|S)z� Expands 2-dimensional `image` to 3 dimensions. Args: image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`): The image to expand. r)�axis)r�r1r2r3r� unsqueezerN�expand_dimsr�r(r(r)r�Is �z'ImageFeatureExtractionMixin.expand_dimsFcCsh|�|�t|tjj�r|j|dd�}n|r3t|tj�r'|�|�tj �d�}nt |�r3|�|��d�}t|tj�rXt|tj�sHt�|��|j �}t|tj�sWt�|��|j �}n6t |�r�ddl}t||j�swt|tj�rr|�|�}n|�|�}t||j�s�t|tj�r�|�|�}n|�|�}|jdkr�|jddvr�||dd�ddf|dd�ddfS|||S)a Normalizes `image` with `mean` and `std`. Note that this will trigger a conversion of `image` to a NumPy array if it's a PIL Image. Args: image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`): The image to normalize. mean (`List[float]` or `np.ndarray` or `torch.Tensor`): The mean (per channel) to use for normalization. std (`List[float]` or `np.ndarray` or `torch.Tensor`): The standard deviation (per channel) to use for normalization. rescale (`bool`, *optional*, defaults to `False`): Whether or not to rescale the image to be between 0 and 1. If a PIL image is provided, scaling will happen automatically. T)r�r�rNrSr\)r�r1r2r3rZrNr�r�r�r�r�floatrYrMr9�Tensor� from_numpy�tensorrVr`)r�rD�mean�stdr�r9r(r(r)� normalize]s6 � (z%ImageFeatureExtractionMixin.normalizec CsJ|dur|ntj}|�|�t|tjj�s|�|�}t|t�r#t|�}t|t �s.t |�dkr�|rBt|t �r9||fn|d|df}n\|j\}}||krO||fn||f\}} t|t �r\|n|d} || krf|S| t | | |�}}|dur�|| kr�td|�d|��||kr�t |||�|}}||kr�||fn||f}|j ||d�S)a� Resizes `image`. Enforces conversion of input to PIL.Image. Args: image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`): The image to resize. size (`int` or `Tuple[int, int]`): The size to use for resizing the image. If `size` is a sequence like (h, w), output size will be matched to this. If `size` is an int and `default_to_square` is `True`, then image will be resized to (size, size). If `size` is an int and `default_to_square` is `False`, then smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). resample (`int`, *optional*, defaults to `PILImageResampling.BILINEAR`): The filter to user for resampling. default_to_square (`bool`, *optional*, defaults to `True`): How to convert `size` when it is a single int. If set to `True`, the `size` will be converted to a square (`size`,`size`). If set to `False`, will replicate [`torchvision.transforms.Resize`](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Resize) with support for resizing only the smallest edge and providing an optional `max_size`. max_size (`int`, *optional*, defaults to `None`): The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater than `max_size` after being resized according to `size`, then the image is resized again so that the longer edge is equal to `max_size`. As a result, `size` might be overruled, i.e the smaller edge may be shorter than `size`. Only used if `default_to_square` is `False`. Returns: image: A resized `PIL.Image.Image`. Nrrzmax_size = zN must be strictly greater than the requested size for the smaller edge size = )r�)r��BILINEARr�r1r2r3r�rGrHr_rsr�rA�resize) r�rDr�r��default_to_square�max_size�width�height�short�long�requested_new_short� new_short�new_longr(r(r)r��s4 $ ��z"ImageFeatureExtractionMixin.resizecCs�|�|�t|t�s||f}t|�st|tj�r8|jdkr"|�|�}|jddvr0|jdd�n|jdd�}n |j d|j df}|d|dd}||d}|d|dd}||d}t|t jj�rr|�||||f�S|jddvr{dnd}|s�t|tj�r�|� ddd�}t|�r�|�ddd�}|dkr�||dkr�|dkr�||dkr�|d||�||�fS|jdd �t|d|d�t|d|d�f} t|tj�r�tj|| d �} n t|�r�|�| �} | d |dd}||d}| d|dd} | |d}|| d||�| |�f<||7}||7}|| 7}|| 7}| dtd|�t| jd |��td|�t| jd|��f} | S)a� Crops `image` to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked). Args: image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor` of shape (n_channels, height, width) or (height, width, n_channels)): The image to resize. size (`int` or `Tuple[int, int]`): The size to which crop the image. Returns: new_image: A center cropped `PIL.Image.Image` or `np.ndarray` or `torch.Tensor` of shape: (n_channels, height, width). r]rr\rNTF.rj)r`rk)r�r1rHrrNr�rVr�r`r�r2r3�cropr��permuterQ� zeros_like� new_zerosrP)r�rDr��image_shape�top�bottom�left�rightr�� new_shape� new_image�top_pad� bottom_pad�left_pad� right_padr(r(r)�center_crop�sP ,(2 4�z'ImageFeatureExtractionMixin.center_cropcCs>|�|�t|tjj�r|�|�}|ddd�dd�dd�fS)a� Flips the channel order of `image` from RGB to BGR, or vice versa. Note that this will trigger a conversion of `image` to a NumPy array if it's a PIL Image. Args: image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`): The image whose color channels to flip. If `np.ndarray` or `torch.Tensor`, the channel dimension should be first. Nrk)r�r1r2r3rZr�r(r(r)�flip_channel_orders z.ImageFeatureExtractionMixin.flip_channel_orderrcCsL|dur|ntjj}|�|�t|tjj�s|�|�}|j||||||d�S)a� Returns a rotated copy of `image`. This method returns a copy of `image`, rotated the given number of degrees counter clockwise around its centre. Args: image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`): The image to rotate. If `np.ndarray` or `torch.Tensor`, will be converted to `PIL.Image.Image` before rotating. Returns: image: A rotated `PIL.Image.Image`. N)r��expand�center� translate� fillcolor)r2r3�NEARESTr�r1r��rotate)r�rD�angler�r�r�r�r�r(r(r)r�0s �z"ImageFeatureExtractionMixin.rotater0)NT)F)NTN)NrNNN)r#r$r%�__doc__r�r�r�rNr�r r�r_r�rZr�r�r�r�r�r�r(r(r(r)r��s " 4CKr��annotation_format�supported_annotation_formatscCsX||vrtdt�d|��|tjurt|�std��|tjur(t|�s*td��dSdS)NzUnsupported annotation format: z must be one of z�Invalid COCO detection annotations. Annotations must a dict (single image) or list of dicts (batch of images) with the following keys: `image_id` and `annotations`, with the latter being a list of annotations in the COCO format.z�Invalid COCO panoptic annotations. Annotations must a dict (single image) or list of dicts (batch of images) with the following keys: `image_id`, `file_name` and `segments_info`, with the latter being a list of annotations in the COCO format.)rA�formatr+r,r�r-r�)r�r�rpr(r(r)�validate_annotationsIs � ��r��valid_processor_keys�captured_kwargscCs:t|��t|��}|rd�|�}t�d|�d��dSdS)Nz, zUnused or unrecognized kwargs: rU)�set� difference�joinrarb)r�r�Zunused_keysZunused_key_strr(r(r)�validate_kwargsbs �r)rSr0)NNNNNNNNNNNN)^r�r��ior�typingrrrrrrr r:rNr�� packagingr �utilsrr rrrrrrrrrrZutils.constantsrrrrrr� PIL.Imager2ZPIL.ImageOps�parse�__version__�base_versionr3� Resamplingr��torchvision.transformsrr��BOXr��HAMMING�BICUBIC�LANCZOSZpil_torch_interpolation_mappingr9� get_loggerr#rar�� ImageInputZ VideoInputr r+r.r�r_ZAnnotationTyper6r7rErFrIrK�boolrRrXrZrerhrmrurwr�r�r�r�r�r�r&r�r�r�rr(r(r(r)�<module>sd$8 � � ��' �� &�� """&&$-�� *�� 'a� ��