Introduction
I create different Generative AI examples in this blog post using NestJS and Gemini API. The examples generate text from 1) a text input 2) a prompt and an image, and 3) a prompt and two images to analyze them. Google team provided the examples in NodeJS and I ported several of them to NestJS, my favorite framework that builds on top of the popular Express framework.
Generate Gemini API Key
Go to https://aistudio.google.com/app/apikey to generate an API key on a new or an existing Google Cloud project
Create a new NestJS Project
nest new nestjs-gemini-api-demo
Install dependencies
npm i --save-exact @google/generative-ai @nestjs/swagger class-transformer class-validator dotenv compression
npm i --save-exact --save-dev @types/multer
Generate a Gemini Module
nest g mo gemini
nest g co gemini/presenters/http/gemini --flat
nest g s gemini/application/gemini --flat
Create a Gemini module, a controller, and a service for the API.
Define Gemini environment variables
In .env.example
, it has environment variables for the Gemini API Key, Gemini Pro model, Gemini Pro Vision model, and port number
// .env.example
GEMINI_API_KEY=<google_gemini_api_key>
GEMINI_PRO_MODEL=gemini-pro
GEMINI_PRO_VISION_MODEL=gemini-pro-vision
PORT=3000
Copy .env.example
to .env
and replace the placeholder of GEMINI_API_KEY
with the real API Key.
Add .env
to the .gitignore
file to ensure we don't accidentally commit the Gemini API Key to the GitHub repo
// .gitignore
.env
Add configuration files
The project has 3 configuration files. validate.config.ts
validates the payload is valid before any request can route to the controller to execute
// validate.config.ts
import { ValidationPipe } from '@nestjs/common';
export const validateConfig = new ValidationPipe({
whitelist: true,
stopAtFirstError: true,
});
env.config.ts
extracts the environment variables from process.env
and stores the values in the env
object.
// env.config.ts
import dotenv from 'dotenv';
dotenv.config();
export const env = {
PORT: parseInt(process.env.PORT || '3000'),
GEMINI: {
KEY: process.env.GEMINI_API_KEY || '',
PRO_MODEL: process.env.GEMINI_PRO_MODEL || 'gemini-pro',
PRO_VISION_MODEL: process.env.GEMINI_PRO_VISION_MODEL || 'gemini-pro-vision',
},
};
gemini.config.ts
defines the options for the Gemini API
// gemini.config.ts
import { GenerationConfig, HarmBlockThreshold, HarmCategory, SafetySetting } from '@google/generative-ai';
export const GENERATION_CONFIG: GenerationConfig = { maxOutputTokens: 1024, temperature: 1, topK: 32, topP: 1 };
export const SAFETY_SETTINGS: SafetySetting[] = [
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
];
Bootstrap the application
// main.ts
function setupSwagger(app: NestExpressApplication) {
const config = new DocumentBuilder()
.setTitle('Gemini example')
.setDescription('The Gemini API description')
.setVersion('1.0')
.addTag('google gemini')
.build();
const document = SwaggerModule.createDocument(app, config);
SwaggerModule.setup('api', app, document);
}
async function bootstrap() {
const app = await NestFactory.create<NestExpressApplication>(AppModule);
app.enableCors();
app.useGlobalPipes(validateConfig);
app.use(express.json({ limit: '1000kb' }));
app.use(express.urlencoded({ extended: false }));
app.use(compression());
setupSwagger(app);
await app.listen(env.PORT);
}
bootstrap();
The bootstrap
function registers middleware to the application, sets up Swagger documentation, and uses a global pipe to validate payloads.
I have laid down the groundwork and the next step is to add routes to receive Generative AI inputs to generate text
Example 1: Generate text from a prompt
// generate-text.dto.ts
import { ApiProperty } from '@nestjs/swagger';
import { IsNotEmpty, IsString } from 'class-validator';
export class GenerateTextDto {
@ApiProperty({
name: 'prompt',
description: 'prompt of the question',
type: 'string',
required: true,
})
@IsNotEmpty()
@IsString()
prompt: string;
}
The DTO accepts a text prompt to generate text.
// gemini.constant.ts
export const GEMINI_PRO_MODEL = 'GEMINI_PRO_MODEL';
export const GEMINI_PRO_VISION_MODEL = 'GEMINI_PRO_VISION_MODEL';
// gemini.provider.ts
import { GenerativeModel, GoogleGenerativeAI } from '@google/generative-ai';
import { Provider } from '@nestjs/common';
import { env } from '~configs/env.config';
import { GENERATION_CONFIG, SAFETY_SETTINGS } from '~configs/gemini.config';
import { GEMINI_PRO_MODEL, GEMINI_PRO_VISION_MODEL } from './gemini.constant';
export const GeminiProModelProvider: Provider<GenerativeModel> = {
provide: GEMINI_PRO_MODEL,
useFactory: () => {
const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
return genAI.getGenerativeModel({
model: env.GEMINI.PRO_MODEL,
generationConfig: GENERATION_CONFIG,
safetySettings: SAFETY_SETTINGS,
});
},
};
export const GeminiProVisionModelProvider: Provider<GenerativeModel> = {
provide: GEMINI_PRO_VISION_MODEL,
useFactory: () => {
const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
return genAI.getGenerativeModel({
model: env.GEMINI.PRO_VISION_MODEL,
generationConfig: GENERATION_CONFIG,
safetySettings: SAFETY_SETTINGS,
});
},
};
I define two providers to provide the Gemini Pro model and the Gemini Pro Vision model respectively. Then I can inject these providers into the Gemini service.
// content.helper.ts
import { Content, Part } from '@google/generative-ai';
export function createContent(text: string, ...images: Express.Multer.File[]): Content[] {
const imageParts: Part[] = images.map((image) => {
return {
inlineData: {
mimeType: image.mimetype,
data: image.buffer.toString('base64'),
},
};
});
return [
{
role: 'user',
parts: [
...imageParts,
{
text,
},
],
},
];
}
createContent
is a helper function that creates the content for the model.
// gemini.service.ts
// ... omit the import statements to save space
@Injectable()
export class GeminiService {
constructor(
@Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
@Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
) {}
async generateText(prompt: string): Promise<GenAiResponse> {
const contents = createContent(prompt);
const { totalTokens } = await this.proModel.countTokens({ contents });
const result = await this.proModel.generateContent({ contents });
const response = await result.response;
const text = response.text();
return { totalTokens, text };
}
...
}
generateText
method accepts a prompt and calls the Gemini API to generate the text. The method returns the total number of tokens and the text to the controller
// gemini.controller.ts
// omit the import statements to save space
@ApiTags('Gemini')
@Controller('gemini')
export class GeminiController {
constructor(private service: GeminiService) {}
@ApiBody({
description: 'Prompt',
required: true,
type: GenerateTextDto,
})
@Post('text')
generateText(@Body() dto: GenerateTextDto): Promise<GenAiResponse> {
return this.service.generateText(dto.prompt);
}
... other routes....
}
Example 2: Generate text from a prompt and an image
This example needs both the prompt and the image file
// gemini.service.ts
// ... omit the import statements to save space
@Injectable()
export class GeminiService {
constructor(
@Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
@Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
) {}
... other methods ...
async generateTextFromMultiModal(prompt: string, file: Express.Multer.File): Promise<GenAiResponse> {
try {
const contents = createContent(prompt, file);
const { totalTokens } = await this.proVisionModel.countTokens({ contents });
const result = await this.proVisionModel.generateContent({ contents });
const response = await result.response;
const text = response.text();
return { totalTokens, text };
} catch (err) {
if (err instanceof Error) {
throw new InternalServerErrorException(err.message, err.stack);
}
throw err;
}
}
}
// file-validator.pipe.ts
import { FileTypeValidator, MaxFileSizeValidator, ParseFilePipe } from '@nestjs/common';
export const fileValidatorPipe = new ParseFilePipe({
validators: [
new MaxFileSizeValidator({ maxSize: 1 * 1024 * 1024 }),
new FileTypeValidator({ fileType: new RegExp('image/[jpeg|png]') }),
],
});
Define fileValidatorPipe
to validate that the uploaded file is either a JPEG or a PNG file, and that the file does not exceed 1MB.
// gemini.controller.ts
@ApiConsumes('multipart/form-data')
@ApiBody({
schema: {
type: 'object',
properties: {
prompt: {
type: 'string',
description: 'Prompt',
},
file: {
type: 'string',
format: 'binary',
description: 'Binary file',
},
},
},
})
@Post('text-and-image')
@UseInterceptors(FileInterceptor('file'))
async generateTextFromMultiModal(
@Body() dto: GenerateTextDto,
@UploadedFile(fileValidatorPipe)
file: Express.Multer.File,
): Promise<GenAiResponse> {
return this.service.generateTextFromMultiModal(dto.prompt, file);
}
file
is the key that provides the binary file in the form data
Example 3: Analyze two images
This example is similar to example 2 except it needs a prompt and 2 images for comparison and contrast.
// gemini.service.ts
async analyzeImages({ prompt, firstImage, secondImage }: AnalyzeImage): Promise<GenAiResponse> {
try {
const contents = createContent(prompt, firstImage, secondImage);
const { totalTokens } = await this.proVisionModel.countTokens({ contents });
const result = await this.proVisionModel.generateContent({ contents });
const response = await result.response;
const text = response.text();
return { totalTokens, text };
} catch (err) {
if (err instanceof Error) {
throw new InternalServerErrorException(err.message, err.stack);
}
throw err;
}
}
// gemini.controller.ts
@ApiConsumes('multipart/form-data')
@ApiBody({
schema: {
type: 'object',
properties: {
prompt: {
type: 'string',
description: 'Prompt',
},
first: {
type: 'string',
format: 'binary',
description: 'Binary file',
},
second: {
type: 'string',
format: 'binary',
description: 'Binary file',
},
},
},
})
@Post('analyse-the-images')
@UseInterceptors(
FileFieldsInterceptor([
{ name: 'first', maxCount: 1 },
{ name: 'second', maxCount: 1 },
]),
)
async analyseImages(
@Body() dto: GenerateTextDto,
@UploadedFiles()
files: {
first?: Express.Multer.File[];
second?: Express.Multer.File[];
},
): Promise<GenAiResponse> {
if (!files.first?.length) {
throw new BadRequestException('The first image is missing');
}
if (!files.second?.length) {
throw new BadRequestException('The second image is missing');
}
return this.service.analyzeImages({ prompt: dto.prompt, firstImage: files.first[0], secondImage: files.second[0] });
}
first
is the key that provides the first binary file in the form data and second
is another key that provides the second binary file.
Test the endpoints
I can test the endpoints with Postman or Swagger documentation after starting the application
npm run start:dev
The URL of the Swagger documentation is http://localhost:3000/api
(Bonus) Deploy to Google Cloud Run
Install the gcloud CLI on the machine according to the official documentation. On my machine, the installation path is ~/google-cloud-sdk.
Then, I open a new terminal and change to the root of the project. On the command line, I update the environment variables before the deployment
$ ~/google-cloud-sdk/bin/gcloud run deploy \
--update-env-vars GEMINI_API_KEY=<replace with your own key>,GEMINI_PRO_MODEL=gemini-pro,GEMINI_PRO_VISION_MODEL=gemini-pro-vision
If the deployment is successful, the NestJS application will run on Google Cloud Run.
This is the end of the blog post that analyzes data retrieval patterns in Angular. I hope you like the content and continue to follow my learning experience in Angular, NestJS, and other technologies.
Resources:
- 1. - Github Repo: https://github.com/railsstudent/nestjs-gemini-api-demo
- 2. - NodeJS Gemini tutorials: https://ai.google.dev/tutorials/node_quickstart
- 3. - Cloud run deploy documentation: https://cloud.google.com/run/docs/deploying-source-code