Meta has launched an open-source AI tool named Automated Sensitive Document Classification, originally developed for internal use. This tool automatically identifies sensitive information in documents and applies security labels to protect them from unauthorized access. It utilizes customizable classification rules and works with text files, using Apache Tika for text extraction from Google Docs, Sheets, and Slides, combined with Llama to analyze content. The initiative addresses the challenge of data loss prevention, particularly due to the diverse file types Meta manages. Traditional methods like regular expressions were found inadequate, leading to a large language model (LLM)-based solution that enhances accuracy and scalability. The tool allows for detailed classification output, such as file enumerations stored in a SQLi database. Meta aims to support other organizations by providing this tool, which is adaptable through Docker containers and Python packages, and plans to expand its compatibility with additional platforms. The tool is available on GitHub.
Source link
Meta Releases Open-Source AI Tool for Automated Sensitive Document Classification

Leave a Comment
Leave a Comment