This project focuses on creating an AWS infrastructure to process PDFs using AWS CDK. It splits a PDF into chunks, processes them through AWS Step Functions, and merges the chunks using ECS tasks, with monitoring via CloudWatch. Key prerequisites include access to AWS Bedrock, an Adobe API account, Python (3.7+), AWS CLI, npm for AWS CDK, and Docker. A proper project structure is essential, which should include Lambda functions for splitting and merging PDFs as well as Docker images. Users must clone the repository, set up AWS CLI, bootstrap the CDK environment, and configure Adobe credentials stored in AWS Secrets Manager. After installing requirements and deploying the CDK stack, users can upload PDFs to an S3 bucket for processing. While the solution supports scanned PDFs (with ~80% accuracy), it has limitations, such as not handling corrupted files or fillable forms. Troubleshooting guidance is available for deployment issues.
Source link
Unlocking Digital Accessibility: ASU AI Cloud Innovation Center’s Cutting-Edge PDF Remediation Tool for WCAG 2.1 Compliance

Share
Read more