Introduction
In this blog post, I demonstrated generating replies with Huggingface Inference and Mistral 7B model. Buyers can provide ratings and comments on sales transactions in auction sites such as eBay. When the feedback is negative, the seller must reply promptly to resolve the dispute. This demo aims to generate responses in the same language as the buyer according to the tone (positive, neutral, or negative) and topics. Chatbot and user engaged in multi-turn conversations to obtain the language, sentiment, and topics of the feedback. Finally, the model generates the final reply to keep customers happy.
Generate Huggingface Token
Log in to your Huggingface account at https://huggingface.co. Click Access Token in the menu to generate a new token.
Create a new NestJS Project
nest new nestjs-huggingface-customer-feedback
Install dependencies
npm i --save-exact @nestjs/swagger @nestjs/throttler dotenv compression helmet class-validator class-transformer @huggingface/inference
Generate a Feedback Module
nest g mo advisoryFeedback
nest g co advisoryFeedback/presenters/http/advisoryFeedback --flat
nest g s advisoryFeedback/application/advisoryFeedback --flat
nest g s advisoryFeedback/application/advisoryFeedbackPromptChainingService --flat
Create an AdvisoryFeedbackModule
module, a controller, a service for the API, and another service to build chained prompts.
Define Huggingface environment variables
// .env.example
PORT=3003
HUGGINGFACE_API_KEY=<huggingface api key>
HUGGINGFACE_MODEL=mistralai/Mistral-7B-Instruct-v0.2
Copy .env.example
to .env
, and replace GROQ_API_KEY
and GROQ_MODEL
with the actual API Key and the Gemma model.
- PORT - port number of the NestJS application
- HUGGINGFACE_API_KEY - Access token of Huggingface
- HUGGINGFACE_MODEL - the Huggingface model and I used Mistral 7B in this demo
Add .env
to the .gitignore
file to prevent accidentally committing the Huggingface Access Token to the GitHub repo.
Add configuration files
The project has 3 configuration files. validate.config.ts
validates the payload is valid before any request can route to the controller to execute
// validate.config.ts
import { ValidationPipe } from '@nestjs/common';
export const validateConfig = new ValidationPipe({
whitelist: true,
stopAtFirstError: true,
forbidUnknownValues: false,
});
env.config.ts
extracts the environment variables from process.env and stores the values in the env object.
// env.config.ts
import dotenv from 'dotenv';
dotenv.config();
export const env = {
PORT: parseInt(process.env.PORT || '3003'),
HUGGINGFACE: {
API_KEY: process.env.HUGGINGFACE_API_KEY || '',
MODEL_NAME: process.env.HUGGINGFACE_MODEL || 'google/gemma-2b',
},
};
throttler.config.ts
defines the rate limit of the API
// throttler.config.ts
import { ThrottlerModule } from '@nestjs/throttler';
export const throttlerConfig = ThrottlerModule.forRoot([
{
ttl: 60000,
limit: 10,
},
]);
Each route allows ten requests in 60,000 milliseconds or 1 minute.
Bootstrap the application
// bootstrap.ts
export class Bootstrap {
private app: NestExpressApplication;
async initApp() {
this.app = await NestFactory.create(AppModule);
}
enableCors() {
this.app.enableCors();
}
setupMiddleware() {
this.app.use(express.json({ limit: '1000kb' }));
this.app.use(express.urlencoded({ extended: false }));
this.app.use(compression());
this.app.use(helmet());
}
setupGlobalPipe() {
this.app.useGlobalPipes(validateConfig);
}
async startApp() {
await this.app.listen(env.PORT);
}
setupSwagger() {
const config = new DocumentBuilder()
.setTitle('ESG Advisory Feedback with Huggingface')
.setDescription(
'Integrate with HuggingFace and Mistral model to improve ESG advisory feebacking by prompt chaining',
)
.setVersion('1.0')
.addTag('Huggingface, Mistral, Conversation')
.build();
const document = SwaggerModule.createDocument(this.app, config);
SwaggerModule.setup('api', this.app, document);
}
}
Added a Bootstrap class to set up Swagger, middleware, global validation, CORS, and finally, application start.
// main.ts
import { env } from '~configs/env.config';
import { Bootstrap } from '~core/bootstrap';
async function bootstrap() {
const bootstrap = new Bootstrap();
await bootstrap.initApp();
bootstrap.enableCors();
bootstrap.setupMiddleware();
bootstrap.setupGlobalPipe();
bootstrap.setupSwagger();
await bootstrap.startApp();
}
bootstrap()
.then(() => console.log(`The application starts successfully at port ${env.PORT}`))
.catch((error) => console.error(error));
The bootstrap function enabled CORS, registered middleware to the application, set up Swagger documentation, and validated payloads using a global pipe.
I have laid down the groundwork, and the next step is to add an endpoint to receive a payload for generating replies with chatbot conversations.
Define Feedback DTO
// feedback.dto.ts
import { IsNotEmpty, IsString } from 'class-validator';
export class FeedbackDto {
@IsString()
@IsNotEmpty()
prompt: string;
}
FeedbackDto accepts a prompt, which is customer feedback.
Construct Huggingface Model
// huggingface.constant.ts
export const HUGGINGFACE_INFERENCE = 'HUGGINGFACE_INFERENCE';
// huggingface.provider.ts
import { HfInference } from '@huggingface/inference';
import { Provider } from '@nestjs/common';
import { env } from '~configs/env.config';
import { HUGGINGFACE_INFERENCE } from '../constants/huggingface.constant';
export const HuggingFaceProvider: Provider<HfInference> = {
provide: HUGGINGFACE_INFERENCE,
useFactory: () => new HfInference(env.HUGGINGFACE.API_KEY),
};
HuggingFaceProvider
is a Huggingface Inference that calls the Mistral 7B model to write a short reply in the same feedback language.
Implement Reply Service
// chat-message.type.ts
export type ChatMessage = {
role: string;
content: string;
};
// conversation.type.ts
export type Conversation = {
previousAnswer?: string;
query: string;
};
// advisory-feedback-prompt-chaining.service.ts
// Omit the import statements
@Injectable()
export class AdvisoryFeedbackPromptChainingService {
private readonly logger = new Logger(AdvisoryFeedbackPromptChainingService.name);
constructor(@Inject(HUGGINGFACE_INFERENCE) private hfInference: HfInference) {}
async generateReply(feedback: string): Promise<string> {
try {
const messages: ChatMessage[] = [];
this.appendMessages(messages, {
query: `What is the language used to write the feedback? Give me the language name, no explanation, no formal response.
When the feedback is written in Traditional Chinese, return Traditional Chinese. When the feedback is written in
Simplified Chinese, return Simplified Chinese.`,
});
this.appendMessages(messages, { previousAnswer: 'What is the feedback?', query: feedback });
const response = await this.chat(messages);
const language = response.replace('.', '');
this.logger.log(`language -> ${language}`);
this.appendMessages(messages, {
previousAnswer: language,
query: `Identify the sentiment of the feedback (positive, neutral, negative).
When the sentiment is positive, return 'POSITIVE', is neutral, return 'NEUTRAL', is negative, return 'NEGATIVE'.
Do not provide explanation.`,
});
const sentiment = await this.chat(messages);
this.logger.log(`sentiment -> ${sentiment}`);
this.appendMessages(messages, {
previousAnswer: sentiment,
query: `Identify the topic of the feedback. Keep the number of sub-topics to 3 or less. Do not provide explanation.`,
});
const topic = await this.chat(messages);
this.logger.log(`topic -> ${topic}`);
this.appendMessages(messages, {
previousAnswer: topic,
query: `The customer wrote a ${sentiment} feedback about ${topic} in ${language}. Please give a short reply in the same language. Do not do more and provide English translation.`,
});
const reply = await this.chat(messages);
this.logger.log(reply);
return reply;
} catch (ex) {
console.error(ex);
throw ex;
}
}
private appendMessages(messages: ChatMessage[], { previousAnswer = '', query }: Conversation): void {
if (previousAnswer) {
messages.push({
role: 'assistant',
content: previousAnswer,
});
}
messages.push({
role: 'user',
content: query,
});
}
private async chat(messages: ChatMessage[]): Promise<string> {
const response = await this.hfInference.chatCompletion({
accessToken: env.HUGGINGFACE.API_KEY,
model: env.HUGGINGFACE.MODEL_NAME,
temperature: 0.5,
top_p: 0.5,
max_tokens: 1024,
messages,
});
return (response.choices?.[0]?.message?.content || '').trim();
}
}
AdvisoryFeedbackPromptChainingService
injects a chat model in the constructor.
- hfInference - Huggingface Inference API to make calls to the large language model.
- generateReply - In this method, a user asked the chat model about the language, sentiment, and topics of the feedback. Then, the assistant gave the answers according to the context of the prompts. Next, I manually appended the queries and answers to the messages array to update the chat history. This was important because the chatbot referred to previous conversations to derive the correct context to answer future questions. Finally, the chatbot generated replies in the same language based on sentiment and topics.
private async chat(messages: ChatMessage[]): Promise<string> {
const response = await this.hfInference.chatCompletion({
accessToken: env.HUGGINGFACE.API_KEY,
model: env.HUGGINGFACE.MODEL_NAME,
temperature: 0.5,
top_p: 0.5,
max_tokens: 1024,
messages,
});
return (response.choices?.[0]?.message?.content || '').trim();
}
private appendMessages(messages: ChatMessage[], { previousAnswer = '', query }: Conversation): void {
if (previousAnswer) {
messages.push({
role: 'assistant',
content: previousAnswer,
});
}
messages.push({
role: 'user',
content: query,
});
}
The chat
method accepted the conversation and return the answer. The appendMessages
method appended the answer from the assistant and the user's query to the messages array.
The process for generating replies ended by producing the text output from generateReply. The method asked questions iteratively and wrote a descriptive prompt for the LLM to draft a reply that was polite and addressed the needs of the customer.
// advisory-feedback.service.ts
// Omit the import statements to save space
@Injectable()
export class AdvisoryFeedbackService {
constructor(private promptChainingService: AdvisoryFeedbackPromptChainingService) {}
generateReply(prompt: string): Promise<string> {
return this.promptChainingService.generateReply(prompt);
}
}
AdvisoryFeedbackService
injects AdvisoryFeedbackPromptChainingService
and constructs multiple chains to ask the chat model to generate a reply.
Implement Advisory Feedback Controller
// advisory-feedback.controller.ts
// Omit the import statements to save space
@Controller('esg-advisory-feedback')
export class AdvisoryFeedbackController {
constructor(private service: AdvisoryFeedbackService) {}
@Post()
generateReply(@Body() dto: FeedbackDto): Promise<string> {
return this.service.generateReply(dto.prompt);
}
}
The AdvisoryFeedbackController
injects AdvisoryFeedbackService
using HuggingInference ChatCompletion and Mistral 7B model. The endpoint invokes the method to generate a reply from the prompt.
- /esg-advisory-feedback - generate a reply from a prompt
Module Registration
The AdvisoryFeedbackModule
provides AdvisoryFeedbackPromptChainingService
, AdvisoryFeedbackService
and HuggingFaceProvider
. The module has one controller that is AdvisoryFeedbackController
.
// advisory-feedback.module.ts
// Omit the import statements due to brevity reason
@Module({
controllers: [AdvisoryFeedbackController],
providers: [AdvisoryFeedbackPromptChainingService, AdvisoryFeedbackService, HuggingFaceProvider],
})
export class AdvisoryFeedbackModule {}
Import AdvisoryFeedbackModule into AppModule.
// app.module.ts
@Module({
imports: [throttlerConfig, AdvisoryFeedbackModule],
controllers: [AppController],
providers: [
{
provide: APP_GUARD,
useClass: ThrottlerGuard,
},
],
})
export class AppModule {}
Test the endpoints
I can test the endpoints with cURL, Postman or Swagger documentation after launching the application.
npm run start:dev
The URL of the Swagger documentation is http://localhost:3003/api.
In cURL
curl --location 'http://localhost:3003/esg-advisory-feedback' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "Looking ahead, the needs of our customers will increasingly be defined by sustainable choices. ESG reporting through diginex has brought us uniformity, transparency and direction. It provides us with a framework to be able to demonstrate to all stakeholders - customers, employees, and investors - what we are doing and to be open and transparent."
}'
Dockerize the application
// .dockerignore
.git
.gitignore
node_modules/
dist/
Dockerfile
.dockerignore
npm-debug.log
Create a .dockerignore
file for Docker to ignore some files and directories.
// Dockerfile
# Use an official Node.js runtime as the base image
FROM node:20-alpine
# Set the working directory in the container
WORKDIR /app
# Copy package.json and package-lock.json to the working directory
COPY package*.json ./
# Install the dependencies
RUN npm install
# Copy the rest of the application code to the working directory
COPY . .
# Expose a port (if your application listens on a specific port)
EXPOSE 3003
# Define the command to run your application
CMD [ "npm", "run", "start:dev"]
I added the Dockerfile
that installed the dependencies, built the NestJS application, and started it at port 3003.
// docker-compose.yaml
version: '3.8'
services:
backend:
build:
context: .
dockerfile: Dockerfile
environment:
- PORT=${PORT}
- HUGGINGFACE_API_KEY=${HUGGINGFACE_API_KEY}
- HUGGINGFACE_MODEL=${HUGGINGFACE_MODEL}
ports:
- "${PORT}:${PORT}"
networks:
- ai
restart: unless-stopped
networks:
ai:
I added the docker-compose.yaml
in the current folder, which was responsible for creating the NestJS application container.
Launch the Docker application
docker-compose up
Navigate to http://localhost:3003/api to read and execute the API.
This concludes my blog post about using HuggingFace Inference ChatCompletion and Mistral 7b model to tackle generating replies regardless the written languages. Generating replies with Generative AI reduces the efforts that a writer needs to compose a polite reply to any customer. I hope you like the content and continue to follow my learning experience in Angular, NestJS, Generative AI, and other technologies.
Resources:
- Github Repo: https://github.com/railsstudent/fullstack-genai-prompt-chaining-customer-feedback/tree/main/nestjs-huggingface-customer-feedback
- Huggingface JS: https://huggingface.co/docs/huggingface.js/inference/README
- Huggingface Inference Chat Completion: https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion