Edit model card

SimpleCNN for Text Image Orientation Detection

image

Model Overview

This is a SimpleCNN model designed to detect whether an image containing text (e.g. a scan of a document) is correctly oriented or rotated. It takes grayscale images as input, resizes them to 128x128 pixels, and outputs a prediction indicating whether the image is "Rotated" or "Normal."

  • Model type: Convolutional Neural Network (CNN)
  • Task: Binary classification (Rotated / Normal)
  • Input: Grayscale images (128x128)
  • Output: Label indicating if the image is "Rotated" or "Normal"
  • Framework: PyTorch
  • Model architecture: A simple 3-layer CNN followed by two fully connected layers

Model Description

The model consists of three convolutional layers followed by max-pooling operations. These convolutional layers extract features from the input image. The feature maps are then flattened and passed through two fully connected layers, where the final layer outputs a prediction between two classes:

Class 0 (Normal): The image is correctly oriented. Class 1 (Rotated): The image is rotated and needs adjustment. The model is trained on a dataset of rotated and correctly oriented grayscale images. It is capable of accurately distinguishing between the two classes and can be used in applications that involve automatic image processing or document scanning.

Usage

Inference To use this model for inference, you can load it using Hugging Face's from_pretrained functionality and pass in an image for orientation prediction.

from PIL import Image
import torch
from transformers import SimpleCNN

# Load the model
model = SimpleCNN.from_pretrained("path_to_model")

# Function to predict orientation
def predict_orientation(image_path, model):
    img = Image.open(image_path).convert('L')  # Load image in grayscale
    img = img.resize((128, 128))               # Resize to 128x128
    img_tensor = torch.tensor(np.array(img) / 255.0).unsqueeze(0).unsqueeze(0)
    with torch.no_grad():
        output = model(img_tensor)
    is_rotated = torch.argmax(output, dim=1).item() == 1
    return "Rotated" if is_rotated else "Normal"

# Example usage
result = predict_orientation("example_image.jpg", model)
print(f"Image Orientation: {result}")

Training

The model was trained using standard binary cross-entropy loss and an Adam optimizer. It was trained on grayscale images resized to 128x128 pixels.

Model Performance

The model performs well in scenarios where images need to be automatically detected for correct orientation. However, the performance can vary based on the image quality, input resolution, and types of rotations present in the dataset.

Limitations:

The model is trained only on 90-degree rotations, meaning performance might degrade with other types of rotations (e.g., slight tilts or partial rotations). It is designed to work on grayscale images, so it might not perform optimally on colored or highly textured images. Intended Use The primary use case for this model is in scenarios where the orientation of images needs to be detected or corrected, such as:

  • Document scanning systems: Automatically detecting if scanned documents are oriented correctly.
  • Image processing pipelines: Ensuring that images are not accidentally rotated during preprocessing or ingestion.

Ethical Considerations

The model does not process or output sensitive information. However, users should be aware of potential biases that could be introduced by the training dataset (e.g., specific types of images or orientations might be overrepresented).

Citation

If you use this model, please cite the following:

@misc{simplecnn_orientation,
  author = {Francesco Crescioli},
  title = {SimpleCNN for Image Orientation Detection},
  year = {2024},
  howpublished = {\url{https://huggingface.co/fcrescio/rotdet}},
}

License

This model is licensed under the Creative Commons Attribution 4.0 International (CC-BY-4.0) License.

Attribution

This model was trained using the Docmatix database, which is licensed under the MIT license. As such, the following MIT license applies to the data used in training this model:

Downloads last month
27
Safetensors
Model size
276k params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train fcrescio/rotdet