Hello world. This is the monthly Natural Language Processing(NLP) newsletter covering everything related to NLP at AWS. This is our third newsletter on Dev.to. If you missed our earlier episode, here are Ep01 and Ep02. Feel free to leave comments, share it on your social network to celebrate this new launch with us!
Service updates about NLP on AWS
-
Amazon Lex launches progress updates for fulfillment
You can now configure your Amazon Lex bots to provide periodic updates to users while their requests are processed. Customer support conversations often require execution of business logic that can take some time to complete. For example, updating an itinerary on an airline reservation system may take a couple of minutes during peak hours. Typically, support agents put the call on hold and provide periodic updates (e.g., “We are still processing your request; thank you for your patience”) until the request is fulfilled. Now, you can easily configure your bot to automatically provide such periodic updates in a conversation. With progress updates capability, bot builders can quickly enhance the ability of virtual contact center agents and smart assistants.
-
New AWS Solution: AWS QnABot, a self-service conversational chatbot built on Amazon Lex
The AWS QnABot has now been released as an official AWS Solution Implementation. The AWS QnABot is an open source, multichannel, multi-language conversational chatbot built on Amazon Lex, that responds to your customer’s questions, answers, and feedback. Without programming, the AWS QnABot solution allows customers to quickly deploy self-service conversational AI on multiple channels including their contact centers, websites, social media channels, SMS text messaging, or Amazon Alexa.
-
Amazon Transcribe now supports custom language models for streaming transcription
Amazon Transcribe will now support custom language models (CLM) for streaming transcription. Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for you to add speech-to-text capabilities to your applications. CLM allows you to leverage pre-existing data to build a custom speech engine tailored for your transcription use case. No prior machine learning experience required. AWS ML Blog, Transcribe Documentation.
-
You already know how to use Amazon Redshift to transform data using simple SQL commands and built-in functions. Now you can also use Amazon Redshift to translate, analyze, and redact text fields, thanks to Amazon Translate, Amazon Comprehend, and the power of Amazon Redshift supported AWS Lambda user-defined functions (UDFs).
-
Amazon Comprehend adds two Trusted Advisor checks
Amazon Comprehend now supports two new AWS Trusted Advisor checks to help customers optimize the cost and security of Amazon Comprehend endpoints. Today, Amazon Comprehend checks are available in the AWS Business Support and AWS Enterprise Support plans. The new checks are:
- Underutilized endpoints: Checks the throughput configuration of your endpoints and generates a warning when they are not actively used for any real-time inference requests;
- Endpoint permissions: Checks the KMS key permissions for an endpoint whose underlying model was encrypted using customer managed keys. If the customer managed key has been disabled or the key policy has been changed to alter the granted permissions for Amazon Comprehend for any reason, the endpoint availability might be impacted.
-
Amazon Textract now supports Tag Image File Format (TIFF) documents in addition to the PNG, JPEG, and PDF formats. Customers can now process TIFF documents either synchronously or asynchronously using any of the following Amazon Textract APIs - DetectDocumentText, StartDocumentAnalysis, StartDocumentTextDetection, AnalyzeDocument, and AnalyzeExpense. Amazon Textract is a machine learning service that automatically extracts printed and handwritten text and data from any document.
NLP on SageMaker
-
With this new release, you can use the new set of multimodal financial analysis tools within Amazon SageMaker JumpStart. With these new tools, you can enhance your tabular ML workflows with new insights from financial text documents and potentially help save up to weeks of development time. Using the new SageMaker JumpStart Industry SDK, you can easily retrieve common public financial documents, including SEC filings, and further process financial text documents with features such as summarization and scoring for sentiment, litigiousness, risk, readability etc. In addition, you can access pre-trained language models trained on financial text for transfer learning, and use example notebooks for data retrieval, text feature engineering, multimodal classification and regression models. AWS ML Blog #1, AWS ML Blog #2, AWS ML Blog #3, JumpStart Documentation
-
Organize product data to your taxonomy with Amazon SageMaker
When companies deal with data that comes from various sources or the collection of this data has changed over time, the data often becomes difficult to organize. Perhaps you have product category names that are similar but don’t match, and on your website you want to surface these products as a group. Therefore, you need to go through the tedious work of manually creating a map from source to target to be able to transform the data into your own taxonomy. In these cases, we’re not talking about a few hundred rows of data, but more often many hundreds of thousands of rows, with new data flowing in regularly. In this post, we discuss how to organize product data to your classification needs with Amazon SageMaker.
-
From application forms, to identity documents, recent utility bills, and bank statements, many business processes today still rely on exchanging and analyzing human-readable documents—particularly in industries like financial services and law. In this post, we show how you can use Amazon SageMaker, an end-to-end platform for machine learning (ML), to automate especially challenging document analysis tasks with advanced ML models.
AWS Blog posts, papers, and more
-
Create a dashboard with SEC text for financial NLP in Amazon SageMaker JumpStart
In this post, the author showed how to curate a dataset of Securities Exchange Commission, SEC filings, use NLP for feature engineering on the dataset, and present the features in a dashboard.
To get started, you can refer to the example notebook in JumpStart titled Dashboarding SEC Filings. You can also refer to the example notebook in JumpStart titled Create a TabText Dataset of SEC Filings in a Single API Call, which contains more details of SEC forms retrieval, summarization, and NLP scoring.
-
Building supervised targeted sentiment analysis models for a new target domain requires substantial annotation effort since most datasets for this task are domain-specific. Domain adaptation for this task has two dimensions: the nature of targets and the opinion words used to describe sentiment towards the target. We present a data sampling strategy informed by domain differences across these two dimensions with the goal of selecting a small number of examples, thereby minimizing annotation effort. This obtains performance in the 86-100% range compared to the full supervised model using only ∼4-15% of the full training data.
-
YouTube demo video "Amazon Transcribe video snacks: Using vocabulary filters"
Amazon Transcribe is a automatic speech recognition service that can be used when you have audio and video that contains speech you want to convert to text. You can mask, remove, or tag words you don't want in your transcription results with vocabulary filtering. For example, you can use vocabulary filtering to prevent the display of offensive or profane terms. In the demo, we will customize Transcribe to mask swear words that we recently encountered in a famous play written by William Shakespeare.
-
4 ways conversational AI and Amazon Lex help the public sector transform customer engagement
Conversational artificial intelligence (AI) and chatbots can be used to transform the customer experience, enhance engagement, improve services, and help scale more simply. Learn how conversational AI and chatbots help public sector organizations.
Community content
Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
Learn how to use Amazon SageMaker to train a Hugging Face Transformer model and deploy it afterward. Prepare and upload a test dataset to S3, prepare a fine-tuning script to be used with Amazon SageMaker Training jobs, Launch a training job and store the trained model into S3, and Deploy the model after successful training. GitHub Repository-
October “HuggingFace Blog” entries:
- Showcase Your Projects in Spaces using Gradio
- Hosting your Models and Datasets on Hugging Face Spaces using Streamlit
- Fine-tuning CLIP with Remote Sensing (Satellite) images and captions
- The Age of Machine Learning As Code Has Arrived
- Train a Sentence Embedding Model with 1B Training Pairs
- Large Language Models: A New Moore’s Law?
- Course Launch Community Event
Upcoming NLP events
Both community events and AWS events
-
Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models | Hugging Face
- Tuesday, November 2nd, 2021
- 5:00 PM to 6:00 PM CEST
-
Pie & AI Suisse - Trustworthiness of AI models: Improving NLP with Causality | Meetup
- Wednesday, November 3, 2021.
- 6:30 PM to 8:00 PM CEST
-
NLP inference optimization on Amazon SageMaker in NDR conference
- Tuesday, November 09, 2021
- 11:40 AM to 12:20 PM CEST
-
Analysing Politeness: Can NLP Tools Help? | Meetup
- Wednesday, November 17, 2021
- 8:00 PM to 9:30 PM CEST
Stay in touch with NLP on AWS
Our contact: aws-nlp@amazon.com
Email us about (1) your awesome project about NLP on AWS, (2) let us know which post in the newsletter helped your NLP journey, (3) other things that you want us to post on the newsletter. Talk to you soon.