8 minutes to read

AI-Powered Developer Docs @ your fingertips!

AI-Powered Developer Docs @ your fingertips!

Waking up to 2023 with warm greetings – “Hi, how can I help you today?” from a human-like assistant. Wow! Welcome to the AI era…

The hype around Artificial Intelligence continues to grow stronger and is revolutionizing the ecommerce sector, creating unprecedented user engagement and experience. Shopware is at the forefront of this transformation, continuously introducing the latest AI features and advancements to stay ahead of the curve.

"There is a lot of movement in the AI area, and it’s getting very interesting for many businesses."

– Stefan Hamann, CEO, Shopware

Developers strive to improve productivity and efficiency through technological advancements, and accurate documentation is crucial for this. At our organization, we prioritize developer experience and its impact on product success. To enhance this experience, we took part in a "Machine Learning project week" workshop that explored leveraging AI and ML.

The workshop

The goal was to generate concrete ideas and develop an MVP in eight working days. We virtually checked into the workshop and introduced ourselves, comprising a cross-functional group from diverse backgrounds. We then established the workshop agenda and leading question: How can we improve the developer documentation experience? We dove into the user perspective by considering what processes developers follow when reading documentation and what they spend a lot of time on or have problems with. We then brainstormed ideas in breakout rooms, clustered, and presented them in group discussions. We voted on our ideas, created idea profiles for the most voted ones, and selected the final idea with the help of predefined rating criteria.

The idea that won through the process was advanced search on holistic documentation, which provides search recommendations and streamlines the user’s learning path while increasing their productivity.

After careful deliberation, we finally settled on a rough architecture to start with the implementation process.

Behind the scenes

The architecture of this AI search engine includes uploading files, ingesting data, obtaining embeddings, storing them in vector stores, and querying data. For performing this NLP task, we built the entire process from scratch, like data ingestion or data pipelining, as we wanted to explore the space ourselves and get better control over the components. So here is the process:

  • Data upload: Initially, the document base (set of markdown files or URL’s) is collected and prepared for ingestion. 

  • Ingestion: The prepared data is then ingested into the neural search system using the API endpoints. The API accepts a zip file containing the data, which is then split and classified into relevant clusters and stored as vector embeddings.

  • Index creation: The system creates a new index for each collection of ingested documents. The index is created using the vector embeddings generated in the previous step. 

  • Search: Once the documents are ingested and indexed, users can search for the documents using the API endpoint. The search endpoint accepts a search query and collection name as input and returns the 5 closest documents based on the vector embeddings.

  • Nearest neighbors: In addition to the search functionality, users can also find the nearest neighbors of a document. This is done by passing the id of a document to the API endpoint.

This project provides a centralized query-able data structure and users can easily get contextually related recommendations for every search via the navigation panel or search bar. Additionally, we experimented with other features that can search for relevant information using keywords or URLs and a feature that prompts answers to users’ questions quickly and efficiently.

The first feature will be fully implemented with the launch of our holistic developer documentation platform, which is currently a work in progress. This platform will bring together all our developer-related documentation under a single hood, making it easier for users to access and find the information they need. Keep an eye out!

Thoughts and assumptions during implementation

When choosing a tool or approach for a specific purpose, it's important to consider factors such as ease of use, cost, efficiency, and limitations. Take a look at the below decisions made in this project:

  • Implementation language: AI/ML ecosystem is very Python-oriented, best suited for tasks like tokenization and clustering.

  • Web framework: FastAPI is chosen over web frameworks like Ginger, Flash, and Django for its capabilities. It also focuses on building APIs over classic websites as it comes with autogenerated endpoint Swagger and ReDoc docs.

  • Deployment platform: We opted for fly.io for app deployment as it offers greater computation power than Vercel, which is more suited to static pages with limited dynamic processing. Additionally, Vercel's 50MB limit (AWS-enforced) per serverless function and 4.5MB limit per request body make it unsuitable for our needs, given our large dependency sizes of around 3GB and the potential need to upload more than 10 MB of documents at once. 

  • Vector stores: Faiss, Weaviate, and Pinecone are vector stores. FAISS is preferred over others as it performs effective searches and is cost-friendly.

  • Embedding technique: When choosing between Tensorflow and OpenAI for embeddings, it is important to consider that Tensorflow is optimized for high-performance and large-scale computations, which can result in higher memory usage due to additional features like deep learning. OpenAI, on the other hand, focuses on language modeling and natural language processing, which can be more memory efficient. Ultimately, the choice of framework depends on the project's specific needs.

Analysis of embeddings and techniques

Tensorflow and OpenAI

We have chosen OpenAI's embeddings as our default option due to their large language and associated models. For offline access, we have included TensorFlow as an alternative. To compare their effectiveness, we created a matrix using a small set of articles from our developer documentation. The diagonal line shows a 100% match. Darker cells indicate greater article similarity, while white horizontal and vertical lines indicate articles with low content similarity. Blue islands represent closely linked sections or groups with similar topics.

Additionally, we compared the similarity between articles related to TensorFlow and OpenAI. Both technologies accurately detected similar articles, with varying similarity scores due to different models.

Our goal is to provide a range of options for users to access and explore our developer documentation with accurate and efficient search capabilities.

Take a glimpse!

Check out this short video showcasing our AI feature, which is currently in the beta stage. Stay tuned for its official release!

Time to say bye…

In conclusion, we are excited to embark on this journey towards creating AI-powered developer documentation that will enhance the developer’s experience and provide them with the information and tools they need to build better products faster.


Never miss out - get all the latest news sent straight to your inbox.

To the newsletter manager