Predictable Verification: "Plausible Forecasting"

r1raider
Feb 17
17 min read

The Forecasting Problem

In this paper we explore why good resource forecasting is critical to achieving predictable verification for modern hardware semiconductor development projects and programs.

Such projects can consume vast amounts of computer hardware & infrastructure plus EDA design and verification tool license resources. This can equate to a significant part of the overall development cost of the product. Verification is typically the dominant cost in engineering time and resources for complex hardware product developments.

Good forecasting allows engineering teams to plan and provision resources so that projects deliver on schedule, and meet quality targets, within expected costs.

It also enables engineering leaders to make data-driven choices about product roadmap delivery, engineering platform investments, and engineering efficiency and effectiveness improvement initiatives.

Some of these decisions can have $multi-million implications.

However, accurate and plausible forecasting may not be the highest priority for stressed semiconductor hardware development teams. They want to access the resources needed to deliver the product on-time and on-quality, plus flexibility in resources to accommodate the various peaks and troughs seen in a typical hardware development lifecycle. Their focus is on good engineering and robust verification to ensure that the delivered product is bug free and functional. Access to the necessary resources may be an assumption.

The cost of forecasting failure for complex SoCs or IP Cores can be significant. Forecasting failure can lead to both underspend and overspend and both can have consequences for the product ROI.

Underspend leads to $multi-million time-to-revenue delays and quality problems with the delivered product due to capacity shortfalls, leading to missed sales or rework costs.

Overspend on $multi-million product development platforms and tools make the product development cycle inefficient due to over-provisioning and erode product ROI.

The potential cost savings from better forecasting can be significant when developing high value hardware IP products such as IP Cores or SoCs, so we strongly advocate for a better approach to forecasting, or if you are not currently doing forecasting at all, then to start doing so right now!

In many organizations, the engineering platforms are delivered as a service by either the IT team or an engineering services team. In start-ups, resources may be managed by engineering teams themselves, or by a senior individual. These teams should care about the cost of operating the platform as a service and ensure they provision sufficient resources to meet the demands of all teams.

For start-up companies, where investment money is usually limited, it is even more critical to spend engineering platform money wisely and make good right-sized investments.

Senior engineering leadership, product marketing, and sales teams, need assurances that the roadmap can be delivered within the available resources, cost, and schedule constraints. Failing that, they need to be sure the right investment decisions are made early enough so that product development teams are not delayed by shortfalls in platform resources. Equally they need to be reassured that the engineering platforms are right-sized and right-performance for their teams.

Is the engineering team also focused on engineering efficiency and effectiveness? It is sometimes easier to over-consume resources, rather than diverting product development engineering time onto engineering efficiency and effectiveness improvement programs. And this is despite the potential benefits of reducing cost and/or improving product quality, which results in shorter time to market for the longer term. Teams need to keep in mind the overall ROI of the products they are developing, and over-consumption of resources might erode this.

Conversely, poorly performing tools and availability limitations will impair their ability to deliver a product that meets both quality and schedule requirements. See an earlier article “The Cost of Bugs” where we expand on these ideas and discuss the balance between the “cost of finding bugs” versus the “cost of not finding bugs”.

So, coming back to forecasting, why is this historically so difficult and what are the blockers for teams to do this better?

One possible answer is that it can be time consuming and hard to do well, and if the forecasts turn out to be mostly inaccurate, then the benefits to effort ratio is low. The reason for this tends to relate to lack of automatically generated analytics that allow for easy scenario generation and decision making. All too often, teams use spreadsheets driven by incomplete data, where re-spins take ages to generate, often creating spurious results nobody can quite believe.

This approach also causes other problems. When individual teams are asked to supply forecast data by a central capacity management function for example, the motivation to deliver the forecast is one of securing the right quota for your team. When other teams are doing the same, it’s a foregone conclusion that the aggregate of all project forecasts will exceed the available or planned capacity. Teams are inclined to inflate the demand request, on the assumption that the request will be negotiated downwards.

Over-inflated forecast requests can lead to expensive capacity expansion decisions; therefore they need to be scrutinized before committing to large engineering platform investments. Getting this wrong could mean that the utilization of the expanded platform does not turn out to be that high; resources are wasted.

Alternatively, and often the case, teams will opportunistically consume what appear to be now plentiful resources because they are available, for very marginal or no improvements in product quality. The net effect is that utilization data shows full usage, and therefore even more resources may be needed to keep up with an expanding roadmap.

It’s a bit like “building another lane on the freeway”. If you add capacity, it will be used, and therefore it appears to have been required. But was it?

Common Forecasting Failures

So, how do teams generate these demand forecasts when asked to? There are several pitfalls and behaviors to watch out for.

No Forecasting! Surprisingly, it’s not uncommon in engineering teams that no forecasting of engineering resources is done at all. The often-heard reasons for this include lack of time – “we need to ship a product urgently”, lack of understanding of how to do this well, or lack of concern for development costs – “the team has never run out of resources so what’s the problem?”, or potentially “forecasting is boring!”.
Worst-case forecasting. If everyone builds up their forecast using the worst-case peaks in expected consumption, it results in an over-inflated demand that is unrealistic. Yes, there will be peaks of demand which are real, but in a shared capacity, these peaks can usually be absorbed by the slack in the overall system. Peak demands of multiple projects don’t often align to an absolute worst case. If a peak cannot be met at a point in time, it usually can be met shortly afterwards with minimal overall delivery impact. Demand peaks should be expected, but it’s more important to understand the total area under the curve and the how average demand changes through different stages of the product development lifecycle.
Understanding forecast confidence. Do forecasters understand worst-case, best-case and typical-case demand, or are they only offering one set of datapoints? Clearly when forecasting there is some uncertainty to be expected, and the level of confidence in the prediction will decrease over time. The further into the future being forecasted, the bigger the error bar.
Lack of accurate usage data. Historical usage data is one of the best inputs to any forecast prediction process, but if the data has not been captured, they are not much better than guesses based on judgement, experience, and memory recall. To understand the various layers in the data, i.e. the same layers required to construct a bottom-up forecast, utilization data that contains sufficient levels of detail must be captured. Historical usage patterns can be used to model future projects and scaled according to known differences between the historical product and the forecasted product such as size, complexity, changes in methodology, changes in tool performances, and changes in schedule.
Lack of forecast fidelity. Forecasting in rectangles is the simplest way to build a demand forecast, but probably not representative of how usage waxes and wanes several times over the course of a product development lifecycle. For example, it might be expected to see usage ramp as key delivery milestones are approached, to guarantee there is sufficient effort to achieve a sign-off. There may be waning in usage, post milestone and then a ramp up again shortly afterwards.
Lack of forecast detail. When forecasters try to predict high level demand for entire workflows, they might miss some of the nuance of different testbenches requiring different volumes of testing or performing at different performance points. Best practice is to adopt a bottom-up approach to forecast construction but with a top-down sense checks to make sure this does not lead to a worst-case forecast scenario.
Understanding rework impact. When projects hit critical bugs either late in the development cycle or post release, there can be an unexpected demand on resources that was not forecasted. Historical data should give some guidance on the typical impact of rework.
Lack of forecast scrutiny may lead to false conclusions from the forecast analysis. There needs to be a level of governance and process where forecasts are scrutinized and can be challenged, rejected, or sent back for further work. This requires that forecasters can explain their “workings-out” satisfactorily, and evidence the forecast data with the list of input parameters and the set of assumptions that are being made. These things should be documented and recorded by a process of forecast review and approval, that can be recalled later when teams are escalating any resource expansion requests.

Plausible Forecasting

So, what is a “plausible forecast”? Or better still, a “highly plausible forecast”?

Plausible means that the consumer of the forecast can rely on the forecast data to make important business decisions about product roadmaps, product delivery commitments and product development investment needs.

Plausible literally means believable, likely, probable. Financial forecasting for the engineering platform is only as good as the hardware and tools forecasts. Typically, companies plan for 3–5-year budget cycles and this is good place to target for engineering platform forecasting purposes. One would expect 0-12 month’s forecast to be in the range of (next year’s budget) ~+-5% accuracy. For 12-36 months a reasonable target of +-10% would be desirable. After 36 months, it is much more difficult to predict requirements as product roadmaps are not always committed for that length/duration.

There’s a caveat of course, having no data upon which to base a forecast, or very crude low-fidelity forecasts might mean an accuracy rate of -50% to >+100% (it’s not unheard of that teams ended up using twice the resources that were predicted at the start of the project). Early forecasts may fall short of these targets, but analysis of historical consumption data patterns leads to refined forecast modelling over time and brings the team closer to the targets mentioned above. The key is to start collecting historical usage data now.

A Programmatic Approach

In this article we advocate a programmatic approach to demand forecasting based on a bottom-up process. Forecast data can be generated in a programmatic way using a set of input parameters that describe the characteristics of the design and verification environment, versus the traditional approach of writing data directly into spreadsheets.

Forecast “scenarios” can be refined and regenerated as input parameters are adjusted over time, and the resulting data can be saved as database tables on which visualization and analytics can be performed. This scenario modelling approach does not then require historical usage data, but the model input parameters can be refined based on analysis of usage data, which will lead to more accurate models. Some elements of randomization can be applied to scenario generation so that forecasters can play what-if analysis where model input parameters and randomization seeds can be changed to see the effect on the final forecast data.

Layered Forecasting

We propose a layered approach to forecast creation:

When (Project phase) of the forecast hierarchy is the time-phase of the forecast. When does the team expect to be in the different development phases of the project? It is normal that the profile of work varies from development phase to development phase and so the forecast characteristics need to be controlled on a phase-by-phase approach. These project phases and quality release milestones are usually set by the project plan.
What (Activity type) describes the type of the activity or a set of activities within a phase. Each activity may encompass several different methodologies but can be characterized in verification terms as the “activity objective”, i.e. what is the purpose of this work? Why are we expecting to run these testing cycles? Examples might include bring-up testing in the earlier phases, weekly regression testing, CICD testing for merge-requests, coverage closure testing, sign-off testing at key release milestones etc.
Describing the forecast in these terms really shows what a typical product development platform usage model normally looks like and leads to a deeper understanding of the efficiency and effectiveness of each of these activity types.
How (Methodology->stimulus->Tool) describes the details of how we perform the above activities. There will be multiple distinct methodologies (e.g. Simulation, Emulation, Formal Verification, Implementation, Functional Safety, etc.) and each one will deploy different payloads (or stimulus) (tests, test-cases, testbenches etc.) and will use different tools to run the jobs.
Which (Testbench) finally describes which testbenches or DuTs (Design-under-Test) will be the target of the jobs identified in Level3.

Layer 1: Phase Layering

Building up the forecast in a bottom-up fashion can create the most plausible and realistic forecast when compared to actual utilization data. We propose a layering approach where the forecast is divided into phases that represent key development phases of the product development lifecycle.

Each lifecycle phase is likely to have different characteristics in terms of what tools and activities are prevalent in each phase and what the volumes of resource consumption are. For example, in the initial phase (we will call it the “bring-up” phase), the volume of resource demand will be low, and the key activities will be initial bring up of workflows and testbench development.

We would not expect to see much regression testing or protracted bug hunting as this will be a more interactive phase where developers are running short bursts of testing and lots of interactive debug for both testbench and RTL code development.

In later phases we see a ramp of resource demand which might increase towards a peak at the end of the phase which may be defined by a key milestone, e.g. alpha, beta or final release. Sign-off testing and regression testing can dominate consumption as the team iterate towards a clean release that is bug free and meets performance targets. Soak testing can dominate the latter phases where regressions are stable and coverage is complete, but developers are running assurance testing cycles to build a depth of soak testing (normally driven by constrained-random testing environments), to demonstrate bug absence.

Here's what an example of what phase layering might look like

In this example a product development lifecycle is illustrated where there is heavy activity in the beta phase and much less activity in the release phase. This is a good example of shift-left. Confidence in the final release milestone should be high because the team have done most of the testing and development in the beta phase. It is a high-quality beta.

Note that each phase approximates to a normal distribution, but when layered together the overall effect is more complex curve representing the full product development lifecycle.

Note also the application of Error Bars to capture the uncertainty in the forecast and show worst-case and best-case scenarios and to address pitfalls 2 and 3 from earlier. The uncertainty as a percentage is shown as increasing over time in the forecast data.

Layer 2: Activity Layering

Once the project phases for your forecast have been identified, it is useful to consider the set of activities that will be necessary to deliver the product and meet the products verification objectives.

In the example below, we see a typical profile where the dominant activity in the earlier phases is “bring-up” while “deliverables testing” appears only towards the end of the project.

Layer 3: Methodology, Payload & Tool Layering

With a bottom-up approach, it is possible to build layers of activity by Methodology, Payload and Tool. For example, if the methodology is simulation, the payloads could be a mixture of random tests or deterministic tests typically, and the tools for simulation can be one or several simulation tools.

It’s useful to be able to look at the overall picture by methodology to understand the relative weightings of simulation versus formal verification, versus functional safety for example. In our example forecast below we can see that simulation is the dominant methodology, which is typical of many real-world development lifecycles.

When looking at the forecast through the payload (or stimulus) dimension we can see that a typical project might focus more simulation effort on directed testing in the early stages and transition to more random testing cycles as the project progresses.

Looking at the EDA tool level of detail, we can see what the ratio is between CPU hours spent on simulation verses formal verification tools for example.

Layer 4: Testbench Layering

The final level of detail is the testbench or the DuT that will be the target of the verification activity. In this example there is a range of different testbenches that are typically unit-level UVM testbenches, top-level or sub-system level testbenches. Different testbenches will have different rates of consumption as some are simple, some are complex. Some will perform faster or slower than others and some will require more or less cycles of testing to meet the verification targets for the DuT.

If the team cannot forecast the details of individual testbenches it’s best to think of it in terms of testbench abstraction levels which are typically unit, top, sub-system and system.

Forecast Scaling

Since forecasting is not an exact science and depends on some judgement about future demands an approach is needed that will lead to plausible forecasts based upon the analysis of available historical usage data. Each product on the roadmap may be different in terms of size, complexity/degree of difficulty, amount of reuse, experience of the team, and methodologies used. The following aspects need to be considered when constructing a bottom-up model that is founded on historical datasets.

Size: The size of the project in terms of the physical size of the implemented design in gates is not necessarily a good indicator of the effort levels that will be required for successful verification. It depends more on the complexity in general. Code that replicates structures or an architecture that scales with multiple components of the SoC for example, does not imply more effort and resources for simulation-based verification, as coverage can be obtained in the main from single core testing.

However, in the case of emulation or FPGA based verification, there is an impact when the footprint of the testbench/DuT scales up and more emulation or FPGA prototyping gates are required to simulate the full build. It is likely most of the verification effort for emulation and FPGA prototyping can be done on smaller multi-core builds and only a limited amount of checking is needed for the largest possible build (if the available platform is big enough to accommodate that of course).
Complexity can be harder to measure when considering the effect this will have on demand. In general, higher complexity leads to higher resource demand as the verification task is more difficult and the number of scenarios to test increases with higher probability of corner-case bugs.

Design teams should aim to mitigate complexity wherever possible with good coding practice and well-structured and partitioned architectures and design implementations. However, it is often the case that complexity is unavoidable in cases where the design is being optimized to a high degree to meet challenging performance and functionality targets for example.

Highly optimized RTL code for performance or power tuning for example, can introduce obfuscation to the RTL code and increase the risk of hidden corner cases. Additionally, the design architecture is inherently complex. Architects and design and verification engineers need to assess complexity and the impact on the predictability of product delivery and in some cases simplify the architecture accordingly.

When a more scientific approach is needed to measure complexity on a historic project for example, there are multiple measures to consider such as logic-cone depth, lines-of-code, cyclomatic-complexity etc., but this is a subject for a separate article! If the team know their next project is going to be challenging complexity-wise, it might be necessary to scale-up the demand forecast accordingly.
Reuse: Some projects are easier to predict, especially derivatives from a former project, where the effort may vary, depending on the degree of reuse. A high degree of reuse points towards less effort if entire sub-blocks or third-party IP blocks are un-changed. So, focus on the areas of the design that are changing and where focused verification time is required. If the design is implementing a new architecture for the first time, then it may be necessary to factor that into the scaling and predict a significantly higher demand than previous projects.
Engineering capability: Variance in the levels of experience in the product development team might need to be taken into account when considering the amount of effort and time required to deliver the product. Less experienced teams may fair worse than more experienced ones and need more time to get the design and verification right. If using the same team as a previous project, then it can be assumed that effort levels recorded for that project will be commensurate with the future one.
Methodology: If the design and verification methodologies for the forecasted project are the same as the previous one then it can be assumed that profiles of the various payloads, tools and testbenches will be highly similar. If planning a methodology shift or a shift-left initiative to bring forward verification activity peaks, then this will need to be factored into a bottom-up forecast model. For example, a shift of effort from one methodology to another might indicate an increase formal verification and reduction in simulation. Alternatively, the team might be looking to profile and tune slow simulation testbenches so more verification cycles are run within the same timeframe, so finding bugs more quickly.

If considering an upgrade to major verification platforms or EDA tools for a higher performance version, this will inevitably shrink forecasted resource demands, e.g. an upgrade to compute-grid hardware, or an increase in performance and effectiveness of a mainstream verification tool by the EDA supplier. In some cases, it will be better to predict the forecast by assuming that methodologies and platform performances remain the same, unless there is existing data that supports the expected improvements.

Summary

With accurate on-time resource forecasting, engineering teams and engineering leaders will be more able to predict the successful delivery of their projects and programs. Teams need to evolve a culture of good forecasting practice backed up with a good understanding and analysis of historical data.

If there is no historical data, teams should start collecting it without delay. Even without historical data it is possible to make plausible forecasts using a bottom-up approach and by investing some time and effort in thinking about all the various layers of the forecast that need to be predicted, and then what the scaling factors might be.

Forecasts are unlikely to ever be 100% correct on first iteration and will need to be checked and revised as new data comes in as the project evolves. This actuals->forecast feedback loop helps forecasters to get better at forecast modelling over time and eventually lead to highly trusted forecasts that senior leaders and product managers can rely on with a higher degree of probability of hitting schedule, quality, performance, costs and revenue targets for their products and programs.

Forecast modelling and usage analytics need to become established as normal practice for high performing hardware development teams, like any other aspect of project and program planning. Forecasts should be reviewed and scrutinized before approving and rolling up into the business or departmental program planning. Forecasters need to be held to account for forecasting and be able to explain and justify forecast data values before any platform investment decisions are made, or commitments to project delivery dates are approved. A bottom-up approach as described above enables this level of scrutiny and leads to increased predictability for high-complexity product delivery.

Finally, a forecast modelling approach such as this can support 3 distinct use cases: -

Modelling of missing datasets – in the case where the historical data is missing or incomplete, a forecast model can be used to backfill missing data. This task is not unreasonable if there is evidence of actual consumption that can be used as an anchor point for the model. For example, knowing that a certain payload takes a certain amount of time on a certain platform, can be used to gauge the relative consumptions of missing payloads in the data.
Forecasting future projects based on historical datasets or models of missing datasets and applying scaling principals to the modelling.
Forecast “what-if” analysis where forecasters can change forecast model parameters and visualize the effect on the forecast for an individual project, or across an entire program. For example, switching to a new and faster verification platform or tool, might reduce the area under the curve while achieving the same volume of testing and quality level achieved. Or, increasing verification effort to address shortfalls in quality

Further Information

Joe Convey and Bryan Dickman of Silicon Insights Limited have many years of combined experience in the challenges of delivering complex semiconductor hardware products. Our forecast modelling ideas use this knowledge and experience to formulate methods and tools to support the above bottom-up forecast modelling approach. If you want to know more about this, please contact joe.convey@siliconinsights.co.uk or bryan.dickman@siliconinsights.co.uk and we would love to discuss your forecasting requirements in more detail.

Note that all scenarios are fabricated models for the purposes of illustration.

There is no intended vendor bias on any EDA tooling product names that may appear in our fabricated forecasting data.

Predictable Verification: "Plausible Forecasting"

The Forecasting Problem

Recent Posts

Comentários