Generative AI Archives - Creospan

Why Model Context Protocol Matters: Building Real-World Workflows

Donna Mathew — Thu, 22 Jan 2026 17:59:42 +0000

When large language models (LLMs) first became accessible, most of our interactions with them were bound within a single prompt-response cycle. You asked, they answered. But as developers began embedding AI into real systems (IDE copilot etc.), it became clear that prompts alone couldn’t sustain meaningful workflows. AI needed context, memory, and the ability to act, not just chat. That’s where the Model Context Protocol (MCP) enters the picture (to solve the context and ability needs).

At its core, MCP is an open standard that lets AI models connect to external systems in a structured, context-aware way. Think of it as the connective tissue between an AI and the tools it depends on; databases, project trackers, and code environments. Rather than reinventing integrations for each tool, MCP solves the integration bottleneck for agentic systems and enables real-time, context-aware automation.

Why Not Just Call APIs Directly?

Why not let the model talk directly to the tool’s API?

The short answer is control and security.

MCP defines a client-server pattern that allows AI systems to interact with real-world applications through a common interface. This allows models to securely call external tools, fetch structured data, and perform actions without the LLM needing to know every detail about the API behind it. It standardizes how models “see” tools, what they can access, and how they act to keep everything modular, secure, and interoperable.

How it Works

In a typical MCP architecture, an LLM communicates through an MCP client, which routes requests to one or more MCP servers. The client handles translation between the model’s natural language intent and the technical request schema, while the server executes the actual tool actions like storing data, fetching content, or performing updates. Some IDE environments, such as Cursor, already act as an MCP client under the hood, enabling seamless communication with compatible servers. This design separates the language model from the tool’s raw APIs.

Our Workflow: IDE Centered Intelligence with MCP

At Creospan, we deliberately designed our MCP based workflow around a simple but important belief: meaningful engineering decisions require code-level context. While large language models can reason over user stories and tickets in isolation, real prioritization, dependency analysis, and implementation planning only become reliable when the model understands the actual code, it is going to change. This is precisely the gap MCP helps us bridge.

This is why our workflow places the IDE, not the task tracker or project planning tool, at the center.

At Creopsan, Linear serves as our project management tool. Linear is a high-performance project management tool designed to streamline software development workflows through a minimalist interface. It holds user stories, priorities, and labels. However, instead of treating Linear as the place where decisions are made, we treat it as a structured input source. Through an MCP connection, stories flow from Linear directly into the coding environment, where they can be evaluated with full visibility into the codebase using AI Assisted IDE’s context engine.

Once inside the AI Assisted IDE (Cursor, GitHub Copilot, Augment Code, etc.), the LLM operates with two critical forms of context. The first is project management context, fetched from Linear via MCP. The second is implementation context, derived from the code repository itself using the IDE’S context engine which maintains a live understanding of the stack across repositories, services, and code history.

This combination enables a class of reasoning that is difficult to achieve elsewhere. As stories are loaded into the IDE, the LLM can reason across them to surface overlaps, shared implementation paths, and implicit relationships. Similar stories can be grouped not just based on description but based on the parts of the codebase they affect. Common work emerges naturally when multiple tickets map to the same components or abstractions. Ordering concerns surface by inspecting dependencies in code rather than relying solely on ticket-level links.

Importantly, this reasoning is not fully automated or opaque. The LLM proposes insights and prioritization suggestions, but developers remain in the loop. Engineers validate, adjust, or override decisions with a clear understanding of why a particular ordering or grouping was suggested. MCP makes this possible by ensuring that product intent from Linear and technical reality from the codebase using context engine are available together inside the IDE.

Once decisions are validated, the workflow completes its loop. Updates, refinements, and execution outcomes are pushed back into Linear via MCP, keeping the product view synchronized without forcing developers to leave their editor. Developers can then pick up a story, begin implementation, and update its status directly from the IDE. Every change, discussion, and update stay synchronized, giving stakeholders a live view of progress while preserving developer flow.

Notion as the Learning Layer

If Linear captures what we plan to build, Notion captures how we build it. Notion is an all-in-one workspace that blends note-taking, document collaboration, and database management into a single, highly customizable platform. Through a separate MCP server, we log meaningful AI interactions from the IDE into Notion. This includes prompts that led to better architectural decisions, reasoning traces behind prioritization choices, and patterns that repeat across projects. Over time, these logs have evolved into a knowledge dataset, a reflection of how our team collaborates with AI. By analyzing them, we uncover which prompts drive faster development or cleaner code, and which patterns repeat across projects. The most effective ones become shared templates, enabling the entire team to improve collectively rather than individually.

The result is a connected system where planning, implementation, and learning reinforce each other through shared context. MCP’s value lies not in tool integration itself, but in enabling intelligence to operate within the IDE, where code and product intent converges.

At Creospan, we see this as a key step forward for SDLC productivity, where small efficiencies compound across teams and projects. In the end, our implementation shows how AI systems can evolve from reactive to proactive. Tools like Notion and Linear are not just endpoints; they are contexts. With MCP, we give AI the means to understand, navigate, and contribute to those contexts intelligently.

Conclusion

As AI continues to reshape the landscape of software development, MCP stands out as a transformative standard for building agentic, context-aware workflows. By bridging product intent and technical reality within the IDE, MCP empowers both AI and human collaborators to make informed, reliable decisions driving productivity and innovation across teams. The recent evolution of MCP, with enhanced security, structured tool output, and seamless IDE integrations, positions it not just as a technical solution but as a foundation for the next generation of intelligent engineering systems.

Article Written By Dhairya Bhuta

The post Why Model Context Protocol Matters: Building Real-World Workflows appeared first on Creospan.

What’s Holding You Back from Unlocking AI-Powered Workforce Productivity?

Donna Mathew — Sat, 24 May 2025 22:40:34 +0000

Across industries, individual users are embracing AI as their “digital coworker” – one who’s fast, tireless, and surprisingly helpful. Whether they’re drafting blog posts, crunching data, or writing code. AI can do it all. Yet, many organizations hesitate to fully integrate AI into their workflows.

Why the disconnect?

Their concerns are valid. Worries about data privacy, fears surrounding misinformation, and uncertainty about how to scale initiatives responsibly do not instill trust in organizations wanting to extend their workflows. However, a well-structured AI adoption strategy can address and overcome these challenges.

In this article, we walk through a 7-stage roadmap for introducing Microsoft 365 Copilot across your organization, helping you accelerate productivity while staying secure and compliant.

Stage 1: Adopting Microsoft 365 – Laying the Foundation

The journey begins with Microsoft 365, a comprehensive platform designed to power productivity and collaboration. Many organizations stop at basic familiar and functional tools such as Teams, Excel, Word, Outlook, while missing out on the AI capabilities embedded into the ecosystem such as predictive text suggestions, summarizing, content creation smart templates, real-time collaboration enhancements and automating processes.

Pro Tip: If you’ve already deployed Microsoft 365, you’re halfway there. The next step is unlocking its AI-enhanced features.

Stage 2: Introducing Microsoft Copilot – The Productivity Multiplier

As familiarity with Microsoft 365 grows, so does awareness of Microsoft Copilot, an AI add-on that can automate repetitive tasks, summarize content, generate insights, and more. However, uncertainty around how Copilot fits into daily workflows can slow its adoption.

Pro Tip: Host internal demos or lunch-and-learns sessions showcasing real-world use cases tailored to finance, HR, or sales roles.

Stage 3: Addressing Security, Privacy & Compliance

AI adoption must be built on trust. At this stage, organizations are asking:

What data does Copilot access?
Can access be role-based?
How is sensitive information protected?
Is the solution compliant with our regulatory standards?
What safeguards are in place to prevent misuse?

Pro Tip: Partner with IT and compliance teams early in the adoption and integration process. Establish clear documentation on data access, protection protocols, and AI risk mitigation.

Stage 4: Establishing AI Policies & Governance

Without a strong governance framework, organizations risk inconsistent adoption and exposure to compliance risks. Key policy areas include:

Responsible use guidelines
Data retention and sharing protocols
Alignment with internal and external regulatory standards
Ethical use policies, including bias mitigation

Pro Tip: Create a cross-functional AI Governance Council to steer strategy, policy, and education.

Stage 5: Prototyping & Piloting for Proof of Value

Rather than jumping straight to full deployment, many successful organizations begin with targeted pilots. A focused rollout enables teams to:

Experiment with real use cases
Identify integration or cultural challenges
Measure productivity uplift
Build internal champions

Pro Tip: Choose a pilot team with measurable KPIs and a high volume of knowledge-work for maximum impact.

Stage 6: Scaling Across the Enterprise

Once early wins are documented, scaling can begin. This phase is about:

Delivering role-specific training
Embedding Copilot into standard workflows
Ensuring executive sponsorship
Managing resistance and change with empathy

Pro Tip: Track usage analytics and feedback to tailor your training and adoption campaigns.

Stage 7: Measuring ROI and Driving Continuous Improvement

Implementation is just the beginning. Leading organizations continuously monitor:

Time saved per task or team
Increase in throughput or decision quality
Employee satisfaction and Copilot adoption
Opportunities for new use cases or advanced integration

Pro Tip: Treat this as a feedback loop – measure, learn, adapt. The path to AI-powered productivity isn’t linear, but with the right plan, you can turn uncertainty into action. When deployed thoughtfully, Microsoft Copilot doesn’t just improve workflows; it transforms them.

How We Can Help

Choosing the right partner for your AI adoption journey is critical. Here’s why organizations trust Creospan to help them unlock the full potential of Microsoft Copilot:

Expertise in AI Productivity Tools: Our team has deep experience with Microsoft Copilot and other generative AI solutions, ensuring a smooth and effective implementation.
Tailored Solutions: We understand that every organization is unique. Our strategies are customized to align with your specific needs, workflows, and goals.
End-to-End Support: From initial education to enterprise-wide rollout and ongoing optimization, we’re with you at every step of your AI journey.
Focus on Security and Compliance: We prioritize data security, privacy, and adherence to industry standards, giving you peace of mind as you adopt AI tools

Ready to transform your workforce with Microsoft Copilot? Contact us today to start your AI adoption journey.

Article written by Davinder Kohli and Shirali Shah.

The post What’s Holding You Back from Unlocking AI-Powered Workforce Productivity? appeared first on Creospan.

Caching Patterns in Retrieval Augmented Generation

Donna Mathew — Sat, 21 Dec 2024 22:36:14 +0000

Retrieval-Augmented Generation (RAG) systems are transforming the way we interact with large-scale language models by integrating external knowledge retrieval into the generation process. But as powerful as RAG is, it comes with its own performance challenges, especially when working with massive datasets and high query volumes.

One way to make RAG faster and more efficient? Caching.

By strategically caching data, RAG systems can reduce redundancy, speed up response times, and lower operational costs. Let’s break down the most effective caching patterns for RAG and the trade-offs you need to be aware of.

Key RAG Caching Patterns:

1. Knowledge Tree Caching:

Organizes intermediate states of retrieved knowledge in a hierarchical structure, caching them in both GPU and host memory.

Benefits: Efficiently shares cached knowledge across multiple requests, reducing redundant computations and speeding up response times.

2. Semantic Caching:

Identifies and caches similar or identical user requests. When a matching request is found, the system retrieves the corresponding information from the cache. This is the most popular one that is readily available with fully managed cloud service providers.

Benefits: Reduces the need to fetch information from the original source, improving response times.

3. Chunk-Based Caching:

Breaks down large documents into smaller chunks and caches these chunks individually.

Benefits: Improves retrieval speed and accuracy by focusing on smaller, relevant sections of the document.

4. Multilevel Dynamic Caching:

Implements a multilevel caching system that dynamically adjusts based on the characteristics of the RAG system and the underlying hardware.

Benefits: Optimizes the use of memory and computational resources, enhancing overall system performance.

5. Replacement Policies

Uses intelligent replacement policies to manage the cache, ensuring that the most relevant and frequently accessed data is retained.

Benefits: Maintains cache efficiency and relevance, reducing the likelihood of cache misses.

These caching patterns help RAG systems manage and retrieve large volumes of data more efficiently, leading to faster and more accurate responses.

For any RAG implementation we have to prep and plan the pitfalls:

Retrieval-Augmented Generation (RAG) Caching Pattern Pitfalls:

Consistency Issues: Ensuring consistency between the cached data and the source data can be challenging, especially in distributed systems.

Complexity: Implementing RAG caching patterns can be complex due to the need to manage both retrieval and generation components effectively. This complexity can lead to higher development and maintenance costs.

Latency: While caching can reduce retrieval times, it may introduce latency in scenarios where the cache needs to be updated frequently. This can affect the overall performance of the system.

Storage Overhead: You cannot cache without an additional storage, which can be significant depending on the size and frequency of the data being cached.

Staleness: Cached data can become outdated, leading to the generation of responses based on obsolete information. This is particularly problematic in dynamic environments where information changes rapidly.

Conclusion

Even though these patterns are effective in reducing costs and improving response times, they have to be thoroughly validated to ensure the objectives of the RAG implementation are met with effective invalidation techniques like Staleness. Implement a Semantic pattern first and test the model’s ability, then try out other options.

Article written by Krishnam Raju Bhupathiraju.

The post Caching Patterns in Retrieval Augmented Generation appeared first on Creospan.

The Power of Generative AI and RAG?

Donna Mathew — Sat, 12 Oct 2024 22:24:38 +0000

This article explores three key areas: Generative AI and its patterns, the Retrieval-Augmented Generation (RAG) framework, and AWS’s role in supporting this journey.

What is Generative AI?

Generative AI is a type of artificial intelligence focused on the ability of computers to use models to create content like images, text, code, and synthetic data.

The foundation of Generative AI applications are large language models (LLMs) and foundation models (FMs).

Large Language Models (LLMs) are trained effectively on vast volumes of data and use billions of parameters, Then LLM’s get the ability to generate original output for tasks like completing sentences, translating languages and answering questions.

Foundation models (FMs) are large ML models are pre-trained with the intention that they are to be fine-tuned for more specific language understanding and generation tasks.

Once these models have completed their learning processes, together they generate statistically probable outputs. On prompted (Queried) they can be employed to accomplish various tasks like:

Image generation based on existing ones or utilizing the style of one image to modify or create a new one.
Speech oriented tasks such as translation, question/answer generation, and interpretation of the intent or meaning of text.

Generative AI has the following list of design patterns:

Prompt Engineering: Crafting specialized prompts to guide LLM behavior
Retrieval Augmented Generation (RAG): Combining an LLM with external knowledge retrieval. Combining best of two capabilities (most recommended).
Fine-tuning: Adapting a pre-trained LLM to specific data sets of domains. Eg: Specific for Customer service or in Health Care etc.
Pre-training: Training an LLM from scratch. Needs lot of computing power/time.

Retrieval Augmented Generation (RAG):

RAG (Retrieval Augmented Generation) is a method to improve LLM response accuracy by giving your LLM access to external data sources.

LLMs are trained on enormous data sets, but they don’t have specific context for your business, industry, or customer specific needs. RAG adds that crucial layer of information for LLMs to make effective closures.

To understand RAG, we need to explore the limitations of LLMs.

Limitations of LLM’s:

Hallucination: LLM’s try to present false information when it does not have the answer or even there is no answer.
Outdated Info: Presenting out-of-date or generic information when the user wants a specific, accurate response.
Tech Confusion: Generating inaccurate responses due to terminology confusion, wherein different training sources use the similar terminology about different things.
Unauthorized: Creating a response from non-authoritative sources.

RAG works in three stages:

Retrieval: When a request reaches LLM and the system looks for relevant information that informs the final response. It searches through an external dataset or document collection to find most relevant pieces of information. This dataset could be a curated knowledge base, or any extensive collection of text, images, videos, and audio or even your local database.
Augmentation: In this step the query is enhanced with the information retrieved in the previous step.
Generation: The final augmented response or output is generated. Your LLM uses the additional context provided by the augmented input to produce an answer that is not only relevant to the original query but enriched with information from external sources.

Customer service RAG use cases:

Personalized recommendations: Generate personalized product recommendations based on customer’s browsing patterns or past interactions and preferences

Advanced chatbots: RAG empowers chatbots to answer complex questions and provide personalized support to customers – improving customer satisfaction and reducing support costs.

Knowledge base search: Quickly retrieve relevant information from internal knowledge bases to answer customer inquiries faster and more accurately.

AWS had the following ways of support for RAG:

Amazon Bedrock: Is a fully-managed service that offers a choice of high-performing foundation models—along with a broad set of capabilities—to build generative AI applications while simplifying development and maintaining privacy and security. With knowledge bases for Amazon Bedrock, you can connect FMs to your data sources for RAG in just a few clicks. Vector conversions, retrievals, and improved output generation are all handled automatically.

Amazon Kendra: Is for organizations managing their own RAG.A highly-accurate enterprise search service powered by machine learning. It provides an optimized Kendra Retrieve API that you can use with Amazon Kendra’s high-accuracy semantic ranker as an enterprise retriever for your RAG workflows.

Amazon SageMaker: Amazon SageMaker – JumpStart is a ML hub with FMs, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks. You can speed up RAG implementation by referring to existing SageMaker notebooks and code examples.

Article written by Krishnam Raju Bhupathiraju.

The post The Power of Generative AI and RAG? appeared first on Creospan.

Small Language Models are the New Big Thing in AI

Donna Mathew — Wed, 24 Jul 2024 22:04:17 +0000

Throughout the history of technology, we’ve witnessed the evolution of software applications—from massive monolithic servers to sleek microservices and miniaturized platforms. History, indeed, has a way of repeating itself, and Generative AI is no exception to this cyclical progression.

Today when you consume ChatGPT, Gemini, CoPilot the intelligence comes from the centralized computing:

Consuming Large Language Model

Now that you’ve delved into the realm of Large Language Models (LLMs), mastering techniques like Prompt Engineering and Retrieval-Augmented centralized intelligence Generation (RAG) patterns, it’s time to shift gears. The focus is no longer on centralized intelligence, but rather on bringing AI-driven capabilities closer to customers or end user devices.

With the devices limitation we could not deploying LLM intelligence directly on the devices. Now comes the rise of SLM (Small Language Models), which are built for everyday use.

The Rise of SLMs in Everyday Use

From powering smarter mobile applications to revolutionizing real-time translations and beyond, small language models are quietly shaping our digital lives. Developers and researchers are increasingly favoring SLMs for their ability to deliver lightweight yet impactful solutions.

What are Small Language Models?

Inherently, Small Language Models (SLMs) are smaller counterparts of Large Language Models. They have fewer parameters and are more lightweight and faster in inference time. We can consider models with billions and trillion of parameters as LLMs (the largest Chat GPT-4o: has 1.8 trillion parameters), demanding resource-heavy training and inferences. The definition of a Small Language Model can vary among different authors.

How are they different from LLMs?

Unlike large language models (LLMs), with primary purpose of general-purpose capabilities across a variety of applications, SLMs are optimized for efficiency, making them ideal for deployment in resource-constrained environments such as mobile devices, point of sale, IOT and edge computing systems.

SLMs are compact versions of Language Models, and they excel in two main areas:

SLMs are suitable for Edge Devices, offering businesses benefits such as cost reduction, offline usage, or enhanced data privacy.
SLMs facilitate speeding up R&D progress, swiftly testing new ideas, benchmarking at scale, and iterating relatively fast. Even retraining SLMs (even from scratch) is feasible for small groups with access to home-grade GPUs.

SLM (Small Language Model) vs. LLM (Large Language Model) Comparison:

Small Language Models (SLMs) are designed for efficiency and specialization, making them ideal for a variety of use cases across industries. Here are some notable applications:

Real-Time Mobile Apps
- Customer Support: SLMs can power chatbots and virtual assistants on websites or apps, providing instant responses to customer queries.
- Sentiment Analysis: Analyze customer feedback from social media and integrate insights into customer data platforms.
- Personalized Offers: Generate tailored promotions and recommendations based on user profiles and behavior.
- Self-Healing Systems: SLMs can enable networks to automatically detect and resolve issues without human intervention.
Edge Computing
- IoT Devices: SLMs enable smart home devices, like thermostats or speakers, to process commands locally without relying on cloud servers or choking the internet.
- Connected Cars: They can assist with navigation, voice commands, and diagnostics directly within the vehicle.
Domain-Specific Applications
- Retail: SLMs can enhance Point-of-Sale (POS) systems by offering personalized recommendations or promotions.
- Finance: Used for fraud detection, transaction analysis, and customer service in banking apps.
Privacy-Sensitive Environments
- Data Masking: SLMs can anonymize sensitive data, such as personally identifiable information (PII), ensuring compliance with privacy regulations.
- On-Device Processing: By running locally, SLMs reduce the need to send data to external servers, enhancing security.
- 5. Specialized Content Creation
- Marketing: SLMs can generate targeted ad copy or social media posts for specific audiences.
- Technical Writing: Used to create concise and accurate documentation for niche industries.

Now let’s redraw the same image with SLM, the compute can be deployed on every entry with specific customizations:

Customized Small Language Model’s for Specific Use Cases

Conclusion

Small Language Models (SLMs) are revolutionizing the way we think about AI—bringing the power of intelligent computation closer to end users. They are compact, efficient, and purpose-built to address specific use cases across industries, from retail and IoT devices to connected vehicles and telecom.

By processing data closer to the edge, SLMs not only reduce latency but greatly improve privacy and accessibility, making them the future of responsive, on-device intelligence. As we continue to innovate and adapt these models, the possibilities for seamless integration, improved customer experiences, and optimized operational efficiencies are boundless.

The future of AI isn’t just large-scale intelligence—it’s small, smart, and specialized. Let’s embrace this next frontier.

Article Written by Krishnam Raju Bhupathiraju.

The post Small Language Models are the New Big Thing in AI appeared first on Creospan.