Dennis Alund
Dennis Alund - Founder of Oddbit
Sep 30, 2024 14 min read

Building an AI ghostwriter with Firebase App hosting and genkit

At Oddbit we built a self hosted CMS with a built-in AI ghostwriter that can help you generate articles in your own writing style. It provides a speech-to-text AI feature that can turn your thoughts on voice notes into well structured articles. As a content creator, you provide the draft or voice transcript and the ghostwriter will make it into a well written article for you.

Using AI to write content, whether it is from a small prompt or voice is not a new thing. But using LLM to generate text can often result in synthetic sounding articles and texts. The generated content feels bland and out of character with our own voice.

The content creation that we built is using a database of your own written content as a source for the AI to learn your style of writing, choices of words and whether your language is casual, formal, serious or a comical touch.

The purpose of the project was to provide a free self hosted CMS where you own your content and free to build and extend around it.

The CMS is built on Firebase App Hosting with NextJs and NX (read our two linked articles about each topic).

The CMS is publishing the data to a Firestore collection where you can easily read and consume the data in your own web application, such as a blogging platform.

The ghostwriter

Content generation with LLM is probably one of the most frequent use-cases of its use. And this is yet another one.

But contrary to just pasting a rough draft into a AI chat application and asking to get an article written. The CMS ghostwriter learns from your style of writing and tries to help you express your thoughts in the way you would when writing an article.

By capturing your thoughts in a voice recording, it is still very much your own thoughts and creative content that is the source of the article. The LLM is instructed to not fill out the article with empty words, but instead to just clarify what you expressed in the voice recording.

Recording voice notes can be very effective way to quickly and freely capture ideas and thoughts. But voice recordings can in my own experience become very long and unstructured with a lot of jumping between points of interest and repetitions.

At the same time, writing drafts for articles on keyboard is slower than thought and makes it difficult to jump between points as the structure of paragraphs, sentences and auto correction discourages chaotic or confusing thoughts or ideas.

On the other hand capturing the concept on a voice note allows you to detach from the media of how the thought is being captured and the recording becomes a good representation of a lot of information on how you reason and think around a subject.

The LLM is instructed to try to understand the essence of your expressed ideas and summarize it in a way that is easy to consume for a reader. The writing of the content is merely syntax for delivering an idea to your readers. The focus becomes the story telling rather than the art of writing skills.

The ghostwriter feature is particularly effective when it comes to thought pieces and expressing stories and ideas.

Generating an article from voice recording with AI

Use AI and LLM to enhance in delightful or subtle ways

Not all AI powered applications need to provide a chat interface or a prompt. As impressive as it is to be able to talk directly to a highly intelligent and helpful agent, it can also cause decision fatigue or confusion for users when there are no rules of engagement. And also, we already have plenty of chat-based AI applications to choose from. What we want to accomplish here is to enhance an already familiar workflow, namely writing. We want to make the AI integration subtle and seemingly magical by hiding as much as possible of its presence from the user.

The user is kept focused on the task by only being asked to provide the ideas and material for an article and our AI will do all the heavy lifting of finding the source articles to learn from and structuring a prompt that instructs the LLM to write articles in the style of the user’s “voice”.

Integrating AI into your applications doesn’t need to always be underlined for the user to be aware of. Making the AI work behind the scenes to provide a seemingly magical experience can be very delightful. Almost as having an invisible human collaborator.

The concept of subtle AI integration is about creating a user experience where the AI acts as an invisible assistant, enhancing existing workflows without drawing unnecessary attention to itself.

Consider these aspects of subtle AI integration:

Seamless Assistance: The AI should smoothly integrate into the user’s tasks, anticipating needs and providing support without requiring explicit instructions.
Contextual Awareness: The AI should understand the user’s context, the task at hand, and past interactions to offer relevant and timely assistance.
Adaptive Learning: The AI should continuously learn from the user’s actions and preferences, tailoring its support to become more effective over time.
Minimal Disruption: The AI’s presence should not disrupt the user’s flow. It should provide enhancements without introducing unnecessary complexity or distractions.

This approach allows users to focus on their primary goals while the AI works in the background, making their tasks easier and more efficient.

Structuring your app code

You can choose to deploy your genkit implementation as part of your app code or on cloud functions.

When building with Nx Devtools it is really convenient to place your genkit code in a lib of its own, since it is possible to deploy the genkit code both on cloud functions and also possible to deploy it in your app code.

repo/
├─ libs/
│  ├─ genkit/
│  │  ├─ src/
│  │  │  ├─ article/ 
│  │  │  │  ├─ flow.ts
│  │  │  │  ├─ index.ts
│  │  │  │  ├─ prompt.ts 
│  │  │  │  ├─ schema.ts
│  │  │  ├─ index.ts
│  │  ├─ project.json

When you deploy your genkit code on cloud functions you will declare a onFlow cloud function that is a wrapper for a callable cloud function. Later in your app code you can call it the same way that you call a callable cloud function.

The benefits of deploying your genkit code on cloud functions is that you are not depending on whether your client platform has a supported genkit SDK. As long as your current app platform has a supported Firebase SDK to call the callable cloud function you can also use genkit in your app.

Although, deploying your genkit code on cloud functions requires you to spin up the firebase emulator and deploy the functions there during development and testing.

If you deploy your genkit code as part of your app, then it is much more straightforward to invoke calls to the flows. You simply call runFlow in your application code to invoke the AI call. There is no particular consideration to make during development and testing.

The good thing about separating the genkit code into a separate lib is that you can now maintain all your genkit code in one place and apart from the benefits of caching and encapsulating your library that are the general benefits of using nx devtools. You also get additional benefits particular to working with genkit and choosing to implement the genkit code as a nx library module.

You can easily choose to experiment with the deployment of your LLM whether you want to try and move it between cloud functions and application code
You can maintain all genkit code, prompts, schemas and configuration in one place and choose to deploy some flows on cloud functions and others in your app.

The genkit SDK is supported on both cloud functions and client so you do not need to make any changes to your lib code if you choose to move the implementation of the access point on cloud functions or client app code. Just make sure to isolate the lib to only contain your flows, prompts and schemas.

Genkit is conceptually dividing your implementation of AI integration into flows and prompts.

Configure genkit

The first step in using Genkit is to configure it with the necessary plugins and settings. This configuration is typically done in the top-level index.ts file within your Genkit lib.

The configureGenkit function in Genkit takes a number of configuration options to customize Genkit’s behavior and integrate with various services.

Let’s take a look at the core configuration options and best practices when setting them up:

Plugins

The plugins option is an array that allows you to specify the Genkit plugins to load. These plugins extend Genkit’s functionality by providing support for different AI models, vector stores, cloud services, and other integrations. For this project, we are using the googleAI and firebase plugins.

googleAI: This plugin provides access to Google’s Gemini family of large language models. It requires an API key, which should be stored securely as an environment variable (GEMINI_API_KEY).
firebase: This plugin enables integration with various Firebase services, such as Cloud Firestore for storing flow states and traces, Cloud Functions for deploying Genkit flows as serverless functions, and Firebase Authentication for securing your API.

Flow State and Trace Storage

Genkit keeps track of the state of your AI workflows and generates traces to help you understand and debug their execution. You can choose where to store this data using the flowStateStore and traceStore options.

For both options, we’ve chosen “firebase”, which configures Genkit to store this data in Cloud Firestore. This provides a convenient and scalable solution for managing this data, especially if you are already using Firebase for other parts of your application.

Tracing and Metrics

Genkit uses OpenTelemetry for tracing and metrics collection, providing insights into the performance of your AI workflows.

When the enableTracingAndMetrics option is set to true, it is enabling OpenTelemetry instrumentation. This means that Genkit will automatically generate traces and metrics data for your AI workflows, which can then be exported to a monitoring system for analysis and visualization.

Log Level

The logLevel option controls the verbosity of Genkit’s logging.

We’ve set this to "info", which provides a good balance between logging useful information and avoiding excessive logging output. However, for troubleshooting or debugging purposes, you might consider setting this to “debug” to get more detailed logs.

Additional Configuration Options

Genkit offers other configuration options as well:

secureMode: If set to true, Genkit will require an authPolicy to be defined for every flow. This enforces a security-first approach, ensuring that all your AI workflows are protected from unauthorized access.
defaultAuthPolicy: When secureMode is enabled, this option allows you to define a default authentication policy that will be applied to all flows unless overridden by a flow-specific authPolicy.
defaultLocation: This option specifies the default location for cloud resources, such as Cloud Functions, when using cloud-based plugins.

This configuration ensures that all AI operations are logged, flow states and traces are stored in Firestore for persistence and analysis, and a reasonable level of logging detail is provided.

Prompt

The prompt is the part that implements the instructions for the LLM on how to interpret your request.

Flow Configuration and Parameters

name: A unique identifier for the prompt.
inputSchema: A Zod schema defining the expected structure of the input data.
output: An object defining the output format and schema. format can be json or text. schema is a Zod schema defining the expected structure of the output data.
config: An object defining configuration options for the LLM.
messages: An array of messages defining the conversation with the LLM. Each message has a role (either system, user, or assistant) and content.

The temperature controls how much creativity you allow for the model. Adjust the number to your liking and experiment to see what creates best outcome in variety and creativity. When writing articles we don’t want the LLM to just generate a transcript from the recording. We want the articles to build on the conceptual ideas and reasoning from the thoughts that are expressed in voice recording.

This following prompt instructs the LLM to act as a ghostwriter, adapting to the user’s writing style and expanding on the provided voice recording to create an engaging article. It also provides specific rules and guidelines for title generation, article length, HTML markup usage, summary generation, and tag creation.

Prompt engineering

Although prompt engineering isn’t traditional engineering, it still requires considerable tinkering and tweaking to get right. It’s important to be very explicit in what you expect and to leave as few creative gaps as possible to prevent the language model from “hallucinating”.

Hallucination, in the context of language models, is when the LLM outputs plausible but incorrect or nonsensical information. This happens because the model predicts text based on patterns in the data it was trained on, and without precise prompts, it may fill in missing details with fabricated content.

You can reduce the likelihood of hallucinations by:

Providing Explicit Rules: Clearly define the expected output format, content restrictions, and any specific guidelines the LLM should follow.
Avoiding Vagueness: Use precise language and avoid ambiguous instructions that could be interpreted in multiple ways.
Preventing Contradictions: Ensure that different parts of the prompt do not contradict each other, leading to confusion for the LLM.
Giving Concrete Examples: Provide sample outputs that demonstrate the desired format and style you expect from the LLM.

If you want your response to be provided in a specific JSON structure, for example, you can provide the example output that the response should follow. By using Zod, a TypeScript schema definition library, you can easily provide well-defined instructions in your prompt for how you want the LLM to respond.

In the prompt above, occasional LLM confusion about the expected output structure was observed. Using the zodToJsonSchema library to explicitly define the expected JSON structure in the prompt helped to address this issue.

Flow

The flow is a level of abstraction and logic between your app’s interaction with the LLM and the prompt. Your flow can implement any necessary logic to prepare your prompt, such as fetching data, calling APIs before calling the prompt, or combining one or more prompts before providing a final response to your application.

In this case, we’re using the articlePrompt that we previously defined. The input schema defines the URL to the recording that we previously uploaded to cloud storage. We also provide an input parameter to tell the LLM how long we want the article to be in minutes of reading time.

Flow Configuration and Parameters

name: A unique identifier for the flow.
inputSchema: A Zod schema defining the expected structure of the input data for the flow.
outputSchema: A Zod schema defining the expected structure of the output data from the flow.
authPolicy: A function that defines the authentication policy for the flow. If the function returns false, the flow will throw an error.
httpsOptions: An object defining options for the HTTPS server when the flow is deployed as a cloud function.

This flow takes the input data, renders the articlePrompt, calls the LLM using generate, parses the output using the ArticleSchema, and returns the parsed article data. The authPolicy ensures that only authenticated users can invoke this flow.

Ghostwriting echo chamber

As we use our AI ghostwriter to help us write more and more articles, it becomes interesting to think about what happens to our writing style over time, Since the AI learns from the articles that we provide. It is not yet clear how it will behave when it starts learning from the articles that were written by the AI itself.

As LLMs are trained on larger and larger datasets, there is a risk that they will start to mimic their own imitations, creating an echo chamber of AI-generated content.

The key is to ensure that our AI ghostwriter remains a tool for enhancing our writing, rather than replacing it. We can achieve this by considering the AI generation as a teacher for good story telling, and the generated content as a draft that we further customize and correct for personal touch.

By staying actively involved in the writing process and using the AI as a collaborative partner, we can prevent our writing style from becoming stagnant and ensure that our voice remains distinct and engaging. The goal is to create a symbiotic relationship where both human and machine contribute to the creation of compelling and original content.

We are still curiously aware of this concern and interested to see how it evolves over time. We are also interested in hearing from the community in discussions and anecdotes on how you experience it.

Get involved

Tanam CMS is open sourced on Github and we welcome anyone to get involved to play and learn or to help build the project in the direction that you would find useful to yourself.

« Building a Genkit Plugin for Deepseek: A Step-by-Step Guide Firebase projects with NX monorepo »

Dennis Alund