CUPTI

What's New

CUPTI contains a number of changes and new features as part of the CUDA Toolkit 5.0 release.
  • Using CUPTI to profile an application no longer disables concurrent kernel execution within that application.
  • CUDA peer-to-peer memory copies are reported via the activity API, but for this release are not identified explicitly as peer-to-peer (this issue will be resolved in a future release). A peer-to-peer memory copy is reported as a single DtoD memcpy if the memcpy is performed using the copy engine of one of the devices. A peer-to-peer memory copy is reported as a DtoH memcpy followed by an HtoD memcpy if the memcpy is performed using a staging buffer on the host.
  • Several new activity kinds are introduced to support the NVIDIA Tools Extension library. These new activity kinds are used to record resource naming, markers, and timed regions inserted by the developer.
  • Several new activity kinds are introduced to enable source-level metrics. When enabled, these activity kinds cause collection of kernel metrics that can be attributed to specific kernel source lines.
  • A new activity kind that records several types of overheads introduced by the CUPTI profiling infrastructure.
  • Activity kinds can now be enabled and disabled for individual contexts using cuptiActivityEnableContext() and cuptiActivityDisableContext().
  • A new callback domain is introduced for the NVIDIA Tools Extension Library. You can use this callback domain to register functions that should be invoked when the application calls NVTX functions.
  • The cuptiGetCallbackName() function can be used to get the textual name for any callback ID in any callback domain.
  • The thread-safety of the CUPTI event API has been clarified. See the documentation of each function for details.

Introduction

The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides four APIs: the Activity API, the Callback API, the Event API, and the Metric API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA.

CUPTI Compatibility and Requirements

New versions of the CUDA driver are backwards compatible with older versions of CUPTI. For example, a developer using a profiling tool based on CUPTI 4.1 can update to a more recently released CUDA driver. However, new versions of CUPTI are not backwards compatible with older versions of the CUDA driver. For example, a developer using a profiling tool based on CUPTI 4.1 must have a version of the CUDA driver released with CUDA Toolkit 4.1 (or later) installed as well. CUPTI calls will fail with CUPTI_ERROR_NOT_INITIALIZED if the CUDA driver version is not compatible with the CUPTI version.

CUPTI Initialization

CUPTI initialization occurs lazily the first time you invoke any CUPTI function. For the Event, Metric, and Callback APIs there are no requirements on when this initialization must occur (i.e. you can invoke the first CUPTI function at any point). For correct operation, the Activity API does require that CUPTI be initialized before any CUDA driver or runtime API is invoked. See the CUPTI Activity API section for more information on CUPTI initialization requirements for the activity API.

CUPTI Activity API

The CUPTI Activity API allows you to asychronously collect a trace of an application's CPU and GPU CUDA activity. The following terminology is used by the activity API.

Activity Record
CPU and GPU activity is reported in C data structures called activity records. There is a different C structure type for each activity kind (e.g. CUpti_ActivityMemcpy). Records are generically referred to using the CUpti_Activity type. This type contains only a kind field that indicates the kind of the activity record. Using this kind, the object can be cast from the generic CUpti_Activity type to the specific type representing the activity. See the printActivity function in the activity_trace sample for an example.
Activity Buffer
CUPTI fills activity buffers with activity records as the corresponding activities occur on the CPU and GPU. The CUPTI client is responsible for providing activity buffers as necessary to ensure that no records are dropped.
Activity Queue
CUPTI maintains queues of activity buffers. There are three types of queues: global, context, and stream.
Global Queue
The global queue collects all activity records that are not associated with a valid context. All device, context, and API activity records are collected in the global queue. A buffer is enqueued in the global queue by specifying NULL for the context argument.
Context Queue
Each context queue collects activity records associated with that context that are not associated with a specific stream or that are associated with the default stream. A buffer is enqueued in a context queue by specifying 0 for the streamId argument and a valid context for the context argument.
Stream Queue
Each stream queue collects memcpy, memset, and kernel activity records associated with the stream. A buffer is enqueued in a stream queue by specifying a non-zero value for the streamId argument and a valid context for the context argument. A streamId can be obtained from a CUstream object by using the cuptiGetStreamId function.

CUPTI must be initialized in a specific manner to ensure that activity records are collected correctly. Most importantly, CUPTI must be initialized before any CUDA driver or runtime API is invoked. Initialization can be done by enqueuing one or more buffers in the global queue, as shown in the initTrace function of the activity_trace sample. Also, to ensure that device activity records are collected, you must enable device records before CUDA is initialized (also shown in the initTrace function).

The other important requirement for correct activity API operation is the need to enqueue at least one buffer in the context queue of each context as it is created. Thus, as shown in the activity_trace example, the CUPTI client should use the resource callback to enqueue at least one buffer when context creation is indicated by CUPTI_CBID_RESOURCE_CONTEXT_CREATED. Using the stream queues is optional, but may be useful to reduce or eliminate application perturbations caused by the need to process or save the activity records returned in the buffers. For example, if a stream queue is used, that queue can be flushed when the stream is synchronized.

Each activity buffer must be allocated by the CUPTI client, and passed to CUPTI using the cuptiActivityEnqueueBuffer function. Enqueuing a buffer passes ownership to CUPTI, and so the client should not read or write the contents of a buffer once it is enqueued. Ownership of a buffer is regained by using the cuptiActivityDequeueBuffer function.

As the application executes, the activity buffers will fill. It is the CUPTI client's responsibility to ensure that a sufficient number of appropriately sized buffers are enqueued to avoid dropped activity records. Activity buffers can be enqueued and dequeued at the following points. Enqueuing and dequeuing activity buffers at any other point may result in corrupt activity records.

Before CUDA initialization
Buffers can be enqueued and dequeued to/from the global queue before any CUDA driver or runtime API is called.
In synchronization or resource callbacks
At context creation, destruction, or synchronization, buffers may be enqueued or dequeued to/from the corresponding context queue, and from any stream queues associated with streams in that context. At stream creation, destruction, or synchronization, buffers may be enqueued or dequeued to/from the corresponding stream queue. The global queue may also be enqueued or dequeued at this time.
After device synchronization
After a CUDA device is synchronized or reset (with cudaDeviceSynchronize or cudaDeviceReset), and before any subsequent CUDA driver or runtime API is invoked, buffers can enqueued and dequeued to/from any activity queue.

The activity_trace sample shows how to use global, context, and stream queues to collect a trace of CPU and GPU activity for a simple application.

CUPTI Callback API

The CUPTI Callback API allows you to register a callback into your own code. Your callback will be invoked when the application being profiled calls a CUDA runtime or driver function, or when certain events occur in the CUDA driver. The following terminology is used by the callback API.

Callback Domain
Callbacks are grouped into domains to make it easier to associate your callback functions with groups of related CUDA functions or events. There are currently four callback domains, as defined by CUpti_CallbackDomain: a domain for CUDA runtime functions, a domain for CUDA driver functions, a domain for CUDA resource tracking, and a domain for CUDA synchronization notification.
Callback ID
Each callback is given a unique ID within the corresponding callback domain so that you can identify it within your callback function. The CUDA driver API IDs are defined in cupti_driver_cbid.h and the CUDA runtime API IDs are defined in cupti_runtime_cbid.h. Both of these headers are included for you when you include cupti.h. The CUDA resource callback IDs are defined by CUpti_CallbackIdResource and the CUDA synchronization callback IDs are defined by CUpti_CallbackIdSync.
Callback Function
Your callback function must be of type CUpti_CallbackFunc. This function type has two arguments that specify the callback domain and ID so that you know why the callback is occurring. The type also has a cbdata argument that is used to pass data specific to the callback.
Subscriber
A subscriber is used to associate each of your callback functions with one or more CUDA API functions. There can be at most one subscriber initialized with cuptiSubscribe() at any time. Before initializing a new subscriber, the existing subscriber must be finalized with cuptiUnsubscribe().

Each callback domain is described in detail below. Unless explicitly stated, it is not supported to call any CUDA runtime or driver API from within a callback function. Doing so may cause the application to hang.

Driver and Runtime API Callbacks

Using the callback API with the CUPTI_CB_DOMAIN_DRIVER_API or CUPTI_CB_DOMAIN_RUNTIME_API domains, you can associate a callback function with one or more CUDA API functions. When those CUDA functions are invoked in the application, your callback function is invoked as well. For these domains, the cbdata argument to your callback function will be of the type CUpti_CallbackData.

It is legal to call cudaThreadSynchronize(), cudaDeviceSynchronize(), cudaStreamSynchronize(), cuCtxSynchronize(), and cuStreamSynchronize() from within a driver or runtime API callback function.

The following code shows a typical sequence used to associate a callback function with one or more CUDA API functions. To simplify the presentation error checking code has been removed.

  CUpti_SubscriberHandle subscriber;
  MyDataStruct *my_data = ...;
  ...
  cuptiSubscribe(&subscriber, 
                 (CUpti_CallbackFunc)my_callback , my_data);
  cuptiEnableDomain(1, subscriber, 
                    CUPTI_CB_DOMAIN_RUNTIME_API);

First, cuptiSubscribe is used to initialize a subscriber with the my_callback callback function. Next, cuptiEnableDomain is used to associate that callback with all the CUDA runtime API functions. Using this code sequence will cause my_callback to be called twice each time any of the CUDA runtime API functions are invoked, once on entry to the CUDA function and once just before exit from the CUDA function. CUPTI callback API functions cuptiEnableCallback and cuptiEnableAllDomains can also be used to associate CUDA API functions with a callback (see reference below for more information).

The following code shows a typical callback function.

void CUPTIAPI
my_callback(void *userdata, CUpti_CallbackDomain domain,
            CUpti_CallbackId cbid, const void *cbdata)
{
  const CUpti_CallbackData *cbInfo = (CUpti_CallbackData *)cbdata;
  MyDataStruct *my_data = (MyDataStruct *)userdata;
      
  if ((domain == CUPTI_CB_DOMAIN_RUNTIME_API) &&
      (cbid == CUPTI_RUNTIME_TRACE_CBID_cudaMemcpy_v3020))  { 
    if (cbInfo->callbackSite == CUPTI_API_ENTER) {
        cudaMemcpy_v3020_params *funcParams = 
             (cudaMemcpy_v3020_params *)(cbInfo->
                 functionParams);

        size_t count = funcParams->count;
        enum cudaMemcpyKind kind = funcParams->kind;
        ...
      }
  ...

In your callback function, you use the CUpti_CallbackDomain and CUpti_CallbackID parameters to determine which CUDA API function invocation is causing this callback. In the example above, we are checking for the CUDA runtime cudaMemcpy function. The cbdata parameter holds a structure of useful information that can be used within the callback. In this case we use the callbackSite member of the structure to detect that the callback is occurring on entry to cudaMemcpy, and we use the functionParams member to access the parameters that were passed to cudaMemcpy. To access the parameters we first cast functionParams to a structure type corresponding to the cudaMemcpy function. These parameter structures are contained in generated_cuda_runtime_api_meta.h, generated_cuda_meta.h, and a number of other files. When possible these files are included for you by cupti.h.

The callback_event and callback_timestamp samples described on the samples page both show how to use the callback API for the driver and runtime API domains.

Resource Callbacks

Using the callback API with the CUPTI_CB_DOMAIN_RESOURCE domain, you can associate a callback function with some CUDA resource creation and destruction events. For example, when a CUDA context is created, your callback function will be invoked with a callback ID equal to CUPTI_CBID_RESOURCE_CONTEXT_CREATED. For this domain, the cbdata argument to your callback function will be of the type CUpti_ResourceData.

The activity_trace sample shows how to use the resource callback.

Synchronization Callbacks

Using the callback API with the CUPTI_CB_DOMAIN_SYNCHRONIZE domain, you can associate a callback function with CUDA context and stream synchronizations. For example, when a CUDA context is synchronized, your callback function will be invoked with a callback ID equal to CUPTI_CBID_SYNCHRONIZE_CONTEXT_SYNCHRONIZED. For this domain, the cbdata argument to your callback function will be of the type CUpti_SynchronizeData.

The activity_trace sample shows how to use the synchronization callback.

NVIDIA Tools Extension Callbacks

Using the callback API with the CUPTI_CB_DOMAIN_NVTX domain, you can associate a callback function with NVIDIA Tools Extension (NVTX) API functions. When an NVTX function is invoked in the application, your callback function is invoked as well. For these domains, the cbdata argument to your callback function will be of the type CUpti_NvtxData.

The following code shows a typical sequence used to associate a callback function with one or more NVTX functions. To simplify the presentation error checking code has been removed.

  CUpti_SubscriberHandle subscriber;
  MyDataStruct *my_data = ...;
  ...
  cuptiSubscribe(&subscriber, 
                 (CUpti_CallbackFunc)my_callback , my_data);
  cuptiEnableDomain(1, subscriber, 
                    CUPTI_CB_DOMAIN_NVTX);

First, cuptiSubscribe is used to initialize a subscriber with the my_callback callback function. Next, cuptiEnableDomain is used to associate that callback with all the NVTX functions. Using this code sequence will cause my_callback to be called once each time any of the NVTX functions are invoked. CUPTI callback API functions cuptiEnableCallback and cuptiEnableAllDomains can also be used to associate NVTX API functions with a callback (see reference below for more information).

The following code shows a typical callback function.

void CUPTIAPI
my_callback(void *userdata, CUpti_CallbackDomain domain,
            CUpti_CallbackId cbid, const void *cbdata)
{
  const CUpti_NvtxData *nvtxInfo = (CUpti_NvtxData *)cbdata;
  MyDataStruct *my_data = (MyDataStruct *)userdata;
      
  if ((domain == CUPTI_CB_DOMAIN_NVTX) &&
      (cbid == NVTX_CBID_CORE_NameOsThreadA))  { 
    nvtxNameOsThreadA_params *params = (nvtxNameOsThreadA_params *)nvtxInfo->
             functionParams;
    ...
  }
  ...

In your callback function, you use the CUpti_CallbackDomain and CUpti_CallbackID parameters to determine which NVTX API function invocation is causing this callback. In the example above, we are checking for the nvtxNameOsThreadA function. The cbdata parameter holds a structure of useful information that can be used within the callback. In this case, we use the functionParams member to access the parameters that were passed to nvtxNameOsThreadA. To access the parameters we first cast functionParams to a structure type corresponding to the nvtxNameOsThreadA function. These parameter structures are contained in generated_nvtx_meta.h.

CUPTI Event API

The CUPTI Event API allows you to query, configure, start, stop, and read the event counters on a CUDA-enabled device. The following terminology is used by the event API.

Event
An event is a countable activity, action, or occurrence on a device.
Event ID
Each event is assigned a unique identifier. A named event will represent the same activity, action, or occurence on all device types. But the named event may have different IDs on different device families. Use cuptiEventGetIdFromName to get the ID for a named event on a particular device.
Event Category
Each event is placed in one of the categories defined by CUpti_EventCategory. The category indicates the general type of activity, action, or occurrence measured by the event.
Event Domain
A device exposes one or more event domains. Each event domain represents a group of related events available on that device. A device may have multiple instances of a domain, indicating that the device can simultaneously record multiple instances of each event within that domain.
Event Group
An event group is a collection of events that are managed together. The number and type of events that can be added to an event group are subject to device-specific limits. At any given time, a device may be configured to count events from a limited number of event groups. All events in an event group must belong to the same event domain.
Event Group Set
An event group set is a collection of event groups that can be enabled at the same time. Event group sets are created by cuptiEventGroupSetsCreate and cuptiMetricCreateEventGroupSets.

You can determine the events available on a device using the cuptiDeviceEnumEventDomains and cuptiEventDomainEnumEvents functions. The cupti_query sample described on the samples page shows how to use these functions. You can also enumerate all the CUPTI events available on any device using the cuptiEnumEventDomains function.

Configuring and reading event counts requires the following steps. First, select your event collection mode. If you want to count events that occur during the execution of a kernel, use cuptiSetEventCollectionMode to set mode CUPTI_EVENT_COLLECTION_MODE_KERNEL. If you want to continuously sample the event counts, use mode CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS. Next determine the names of the events that you want to count, and then use the cuptiEventGroupCreate, cuptiEventGetIdFromName, and cuptiEventGroupAddEvent functions to create and initialize an event group with those events. If you are unable to add all the events to a single event group then you will need to create multiple event groups. Alternatively, you can use the cuptiEventGroupSetsCreate function to automatically create the event group(s) required for a set of events.

To begin counting a set of events, enable the event group or groups that contain those events by using the cuptiEventGroupEnable function. If your events are contained in multiple event groups you may be unable to enable all of the event groups at the same time, due to device limitations. In this case, you will need to gather the events across multiple executions of the application.

Use the cuptiEventGroupReadEvent and/or cuptiEventGroupReadAllEvents functions to read the event values. When you are done collecting events, use the cuptiEventGroupDisable function to stop counting of the events contained in an event group. The callback_event sample described on the samples page shows how to use these functions to create, enable, and disable event groups, and how to read event counts.

Collecting Kernel Execution Events

A common use of the event API is to count a set of events during the execution of a kernel (as demonstrated by the callback_event sample). The following code shows a typical callback used for this purpose. Assume that the callback was enabled only for a kernel launch using the CUDA runtime (i.e. by cuptiEnableCallback(1, subscriber, CUPTI_CB_DOMAIN_RUNTIME_API, CUPTI_RUNTIME_TRACE_CBID_cudaLaunch_v3020). To simplify the presentation error checking code has been removed.

static void CUPTIAPI
getEventValueCallback(void *userdata,
                      CUpti_CallbackDomain domain,
                      CUpti_CallbackId cbid,
                      const void *cbdata)
{
  const CUpti_CallbackData *cbData = 
                (CUpti_CallbackData *)cbdata;
     
  if (cbData->callbackSite == CUPTI_API_ENTER) {
    cudaThreadSynchronize();
    cuptiSetEventCollectionMode(cbInfo->context, 
                                CUPTI_EVENT_COLLECTION_MODE_KERNEL);
    cuptiEventGroupEnable(eventGroup);
  }
    
  if (cbData->callbackSite == CUPTI_API_EXIT) {
    cudaThreadSynchronize();
    cuptiEventGroupReadEvent(eventGroup, 
                             CUPTI_EVENT_READ_FLAG_NONE, 
                             eventId, 
                             &bytesRead, &eventVal);
      
    cuptiEventGroupDisable(eventGroup);
  }
}

Two synchronization points are used to ensure that events are counted only for the execution of the kernel. If the application contains other threads that launch kernels, then additional thread-level synchronization must also be introduced to ensure that those threads do not launch kernels while the callback is collecting events. When the cudaLaunch API is entered (that is, before the kernel is actually launched on the device), cudaThreadSynchronize is used to wait until the GPU is idle. The event collection mode is set to CUPTI_EVENT_COLLECTION_MODE_KERNEL so that the event counters are automatically started and stopped just before and after the kernel executes. Then event collection is enabled with cuptiEventGroupEnable.

When the cudaLaunch API is exited (that is, after the kernel is queued for execution on the GPU) another cudaThreadSynchronize is used to cause the CPU thread to wait for the kernel to finish execution. Finally, the event counts are read with cuptiEventGroupReadEvent.

Sampling Events

The event API can also be used to sample event values while a kernel or kernels are executing (as demonstrated by the event_sampling sample). The sample shows one possible way to perform the sampling. The event collection mode is set to CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS so that the event counters run continuously. Two threads are used in event_sampling: one thread schedules the kernels and memcpys that perform the computation, while another thread wakes periodically to sample an event counter. In this sample there is no correlation of the event samples with what is happening on the GPU. To get some coarse correlation, you can use cuptiDeviceGetTimestamp to collect the GPU timestamp at the time of the sample and also at other interesting points in your application.

CUPTI Metric API

The CUPTI Metric API allows you to collect application metrics calculated from one or more event values. The following terminology is used by the metric API.

Metric
An characteristic of an application that is calculated from one or more event values.
Metric ID
Each metric is assigned a unique identifier. A named metric will represent the same characteristic on all device types. But the named metric may have different IDs on different device families. Use cuptiMetricGetIdFromName to get the ID for a named metric on a particular device.
Metric Category
Each metric is placed in one of the categories defined by CUpti_MetricCategory. The category indicates the general type of the characteristic measured by the metric.
Metric Value
Each metric has a value that represents one of the kinds defined by CUpti_MetricValueKind. For each value kind, there is a corresponding member of the CUpti_MetricValue union that is used to hold the metric's value.

The tables included in this section list the metrics available for each device, as determined by the device's compute capability. You can also determine the metrics available on a device using the cuptiDeviceEnumMetrics function. The cupti_query sample described on the samples page shows how to use this function. You can also enumerate all the CUPTI metrics available on any device using the cuptiEnumMetrics function.

Configuring and calculating metric values requires the following steps. First, determine the name of the metric that you want to collect, and then use the cuptiMetricGetIdFromName to get the metric ID. Use cuptiMetricEnumEvents to get the events required to calculate the metric and follow instructions in the CUPTI Event API section to create the event groups for those events. Alternatively, you can use the cuptiMetricCreateEventGroupSets function to automatically create the event group(s) required for metric's events.

Collect event counts as described in the CUPTI Event API section, and then use cuptiMetricGetValue to calculate the metric value from the collected event values. The callback_metric sample described on the samples page shows how to use these functions to calculate event values. Note that, as shown in the example, you should collect event counts from all domain instances and normalize the counts to get the most accurate metric values. It is necessary to normalize the event counts because the number of event counter instances varies by device and by the event being counted.

For example, a device might have 8 multiprocessors but only have event counters for 4 of the multiprocessors, and might have 3 memory units and only have events counters for one memory unit. When calculating a metric that requires a multiprocessor event and a memory unit event, the 4 multiprocessor counters should be summed and multiplied by 2 to normalize the event count across the entire device. Similarly, the one memory unit counter should be multiplied by 3 to normalize the event count across the entire device. The normalized values can then be passed to cuptiMetricGetValue to calculate the metric value.

As described, the normalization assumes the kernel executes a sufficient number of blocks to completely load the device. If the kernel has only a small number of blocks, normalizing across the entire device may skew the result.

Metric Reference - Compute Capability 1.x

Devices with compute capability less than 2.0 implement the metrics shown in the following table.

Table 1. Capability 1.x Metrics
Metric Name Description Formula
branch_efficiency Ratio of non-divergent branches to total branches 100 * (branch - divergent_branch) / branch
gld_efficiency Ratio of requested global memory load transactions to actual global memory load transactions

For CC 1.2 & 1.3: (gld_request / ((gld_32 + gld_64 + gld_128) / (2 * #SM)))

For CC 1.0 & 1.1: gld_coherent / (gld_coherent + gld_incoherent)

gst_efficiency Ratio of requested global memory store transactions to actual global memory store transactions

For CC 1.2 & 1.3: (gst_request / ((gst_32 + gst_64 + gst_128) / (2 * #SM)))

For CC 1.0 & 1.1: gst_coherent / (gst_coherent + gst_incoherent)

gld_requested_throughput Requested global memory load throughput (gld_32 * 32 + gld_64 * 64 + gld_128 * 128) / gputime
gst_requested_throughput Requested global memory store throughput (gst_32 * 32 + gst_64 * 64 + gst_128 * 128) / gputime

Metric Reference - Compute Capability 2.x

Devices with compute capability between 2.0, inclusive, and 3.0 implement the metrics shown in the following table.

Table 2. Capability 2.x Metrics
Metric Name Description Formula
sm_efficiency The ratio of the time at least one warp is active on a multiprocessor to the total time 100 * (active_cycles / #SM) / elapsed_clocks
achieved_occupancy Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor 100 * (active_warps / active_cycles) / max_warps_per_sm
ipc Instructions executed per cycle (inst_executed / #SM) / elapsed_clocks
branch_efficiency Ratio of non-divergent branches to total branches 100 * (branch - divergent_branch) / branch
warp_execution_efficiency Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor thread_inst_executed / (inst_executed * warp_size)
inst_replay_overhead Percentage of instruction issues due to memory replays 100 * (inst_issued - inst_executed) / inst_issued
shared_replay_overhead Percentage of instruction issues due to replays for shared memory conflicts 100 * l1_shared_bank_conflict / inst_issue
global_cache_replay_overhead Percentage of instruction issues due to replays for global memory cache misses 100 * global_load_miss / inst_issued
local_replay_overhead Percentage of instruction issues due to replays for local memory cache misses 100 * (local_load_miss + local_store_miss) / inst_issued
gld_efficiency Ratio of requested global memory load throughput to actual global memory load throughput 100 * gld_requested_throughput/ gld_throughput
gst_efficiency Ratio of requested global memory store throughput to actual global memory store throughput 100 * gst_requested_throughput / gst_throughput
gld_throughput Global memory load throughput ((128 * global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests) * 32 - (l1_local_ld_miss * 128)) / gputime
gst_throughput Global memory store throughput (l2_subp0_write_requests + l2_subp1_write_requests) * 32 - (l1_local_ld_miss * 128)) / gputime
gld_requested_throughput Requested global memory load throughput (gld_inst_8bit + 2 * gld_inst_16bit + 4 * gld_inst_32bit + 8 * gld_inst_64bit + 16 * gld_inst_128bit) / gputime
gst_requested_throughput Requested global memory store throughput (gst_inst_8bit + 2 * gst_inst_16bit + 4 * gst_inst_32bit + 8 * gst_inst_64bit + 16 * gst_inst_128bit) / gputime
dram_read_throughput DRAM read throughput (fb_subp0_read + fb_subp1_read) * 32 / gputime
dram_write_throughput DRAM write throughput (fb_subp0_write + fb_subp1_write) * 32 / gputime
l1_cache_global_hit_rate Hit rate in L1 cache for global loads 100 * l1_global_ld_hit / (l1_global_ld_hit + l1_global_ld_miss)
l1_cache_local_hit_rate Hit rate in L1 cache for local loads and stores 100 * (l1_local_ld_hit + l1_local_st_hit)/(l1_local_ld_hit + l1_local_ld_miss + l1_local_st_hit + l1_local_st_miss)
tex_cache_hit_rate Texture cache hit rate 100 * (tex0_cache_sector_queries - tex0_cache_misses) / tex0_cache_sector_queries
tex_cache_throughput Texture cache throughput tex_cache_sector_queries * 32 / gputime
sm_efficiency_instance The ratio of the time at least one warp is active on a multiprocessor to the total time 100 * active_cycles / elapsed_clocks
ipc_instance Instructions executed per cycle inst_executed / elapsed_clocks
l2_l1_read_hit_rate Hitrate at L2 cache for read requests from L1 cache 100 * (l2_subp0_read_hit_sectors + l2_subp1_read_hit_sectors) / (l2_subp0_read_sector_queries + l2_subp1_read_sector_queries)
l2_tex_read_hit_rate Hitrate at L2 cache for read requests from texture cache 100 * (l2_subp0_read_tex_hit_sectors + l2_subp1_read_tex_hit_sectors) / (l2_subp0_read_tex_sector_queries + l2_subp1_read_tex_sector_queries)
l2_l1_read_throughput Memory read throughput at L2 cache for read requests from L1 cache (l2_subp0_read_sector_queries + l2_subp1_read_sector_queries) * 32 / gputime
l2_tex_read_throughput Memory read throughput at L2 cache for read requests from texture cache (l2_subp0_read_tex_sector_queries + l2_subp1_read_tex_sector_queries) * 32 / gputime
local_memory_overhead Ratio of local memory traffic to total memory traffic between L1 and L2 100 * (2 * l1_local_load_miss * 128) / ((l2_subp0_read_requests + l2_subp1_read_requests +l2_subp0_write_requests + l2_subp1_write_requests) * 32)

Metric Reference - Compute Capability 3.x

Devices with compute capability greater than or equal to 3.0 implement the metrics shown in the following table.

Table 3. Capability 3.x Metrics
Metric Name Description Formula
sm_efficiency The ratio of the time at least one warp is active on a multiprocessor to the total time 100 * (active_cycles / #SM) / elapsed_clocks
achieved_occupancy Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor 100 * (active_warps / active_cycles) / max_warps_per_sm
ipc Instructions executed per cycle (inst_executed / #SM) / elapsed_clocks
branch_efficiency Ratio of non-divergent branches to total branches 100 * (branch - divergent_branch) / branch
warp_execution_efficiency Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor (not available for compute capability 3.0) thread_inst_executed / (inst_executed * warp_size)
inst_replay_overhead Percentage of instruction issues due to memory replays 100 * ((inst_issued_1 + inst_issued_2 * 2) - inst_executed) / (inst_issued_1 + inst_issued_2 * 2)
shared_replay_overhead Percentage of instruction issues due to replays for shared memory conflicts 100 * (shared_load_bank_conflict + shared_store_bank_conflict) / (inst_issued_1 + inst_issued_2 * 2)
global_replay_overhead Percentage of instruction issues due to replays for non-coherent global memory accesses 100 * (global_ld_mem_divergence_replays + global_st_mem_divergence_replays) / (inst_issued_1 + inst_issued_2 * 2)
global_cache_replay_overhead Percentage of instruction issues due to replays for global memory cache misses 100 * global_load_miss / (inst_issued_1 + inst_issued_2 * 2)
local_replay_overhead Percentage of instruction issues due to replays for local memory cache misses 100 * (local_load_miss + local_store_miss) / (inst_issued_1 + inst_issued_2 * 2)
gld_efficiency Ratio of requested global memory load throughput to actual global memory load throughput 100 * gld_requested_throughput / gld_throughput
gst_efficiency Ratio of requested global memory store throughput to actual global memory store throughput 100 * gst_requested_throughput / gst_throughput
shared_efficiency Ratio of shared memory loads and stores executed to shared memory transactions required for those loads and stores 100 * (shared_load + shared_store) / (shared_ld_transactions + shared_st_transactions)
gld_throughput Global memory load throughput ((128 * global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests + l2_subp2_read_requests + l2_subp3_read_requests) * 32 - (l1_local_ld_miss * 128)) / gputime
gst_throughput Global memory store throughput (l2_subp0_write_requests + l2_subp1_write_requests + l2_subp2_write_requests + l2_subp3_write_requests) * 32 - (l1_local_ld_miss * 128)) / gputime
gld_requested_throughput Requested global memory load throughput (gld_inst_8bit + 2 * gld_inst_16bit + 4 * gld_inst_32bit + 8 * gld_inst_64bit + 16 * gld_inst_128bit) / gputime
gst_requested_throughput Requested global memory store throughput (gst_inst_8bit + 2 * gst_inst_16bit + 4 * gst_inst_32bit + 8 * gst_inst_64bit + 16 * gst_inst_128bit) / gputime
nc_gld_requested_throughput Requested throughput for global memory loaded via non-coherent texture cache (not available for compute capability 3.0) (ldg_inst_8bit + 2 * ldg_inst_16bit + 4 * ldg_inst_32bit + 8 * ldg_inst_64bit + 16 * ldg_inst_128bit) / gputime
dram_read_throughput DRAM read throughput (fb_subp0_read + fb_subp1_read) * 32 / gputime
dram_write_throughput DRAM write throughput (fb_subp0_write + fb_subp1_write) * 32 / gputime
l1_cache_global_hit_rate Hit rate in L1 cache for global loads 100 * l1_global_ld_hit / (l1_global_ld_hit + l1_global_ld_miss)
l1_cache_local_hit_rate Hit rate in L1 cache for local loads and stores 100 * (l1_local_ld_hit + l1_local_st_hit) / (l1_local_ld_hit + l1_local_ld_miss + l1_local_st_hit + l1_local_st_miss)
tex_cache_hit_rate Texture cache hit rate 100 * (tex0_cache_sector_queries + tex1_cache_sector_queries + tex2_cache_sector_queries + tex3_cache_sector_queries - tex0_cache_misses - tex1_cache_misses - tex2_cache_misses - tex3_cache_misses) / (tex0_cache_sector_queries + tex1_cache_sector_queries + tex2_cache_sector_queries + tex3_cache_sector_queries)
tex_cache_throughput Texture cache throughput (tex0_cache_sector_queries + tex1_cache_sector_queries + tex2_cache_sector_queries + tex3_cache_sector_queries) * 32 / gputime
sm_efficiency_instance The ratio of the time at least one warp is active on a multiprocessor to the total time 100 * active_cycles / elapsed_clocks
ipc_instance Instructions executed per cycle inst_executed / elapsed_clocks
l2_l1_read_hit_rate Hitrate at L2 cache for read requests from L1 cache 100 * (l2_subp0_read_hit_sectors + l2_subp1_read_hit_sectors + l2_subp2_read_hit_sectors + l2_subp3_read_hit_sectors) / (l2_subp0_read_sector_queries + l2_subp1_read_sector_queries + l2_subp2_read_sector_queries + l2_subp3_read_sector_queries)
l2_tex_read_hit_rate Hitrate at L2 cache for read requests from texture cache 100 * (l2_subp0_read_tex_hit_sectors + l2_subp1_read_tex_hit_sectors + l2_subp2_read_tex_hit_sectors + l2_subp3_read_tex_hit_sectors) / (l2_subp0_read_tex_sector_queries + l2_subp1_read_tex_sector_queries + l2_subp2_read_tex_sector_queries + l2_subp3_read_tex_sector_queries)
l2_l1_read_throughput Memory read throughput at L2 cache for read requests from L1 cache (l2_subp0_read_sector_queries + l2_subp1_read_sector_queries + l2_subp2_read_sector_queries + l2_subp3_read_sector_queries) * 32 / gputime
l2_tex_read_throughput Memory read throughput at L2 cache for read requests from texture cache (l2_subp0_read_tex_sector_queries + l2_subp1_read_tex_sector_queries + l2_subp2_read_tex_sector_queries + l2_subp3_read_tex_sector_queries) * 32 / gputime
local_memory_overhead Ratio of local memory traffic to total memory traffic between L1 and L2 100 * (2 * l1_local_load_miss * 128) / ((l2_subp0_read_requests + l2_subp1_read_requests+l2_subp2_read_requests + l2_subp3_read_requests + l2_subp0_write_requests + l2_subp1_write_requests + l2_subp2_write_requests + l2_subp3_write_requests) * 32)

Samples

The CUPTI installation includes several samples that demonstrate the use of the CUPTI APIs.The samples are:

activity_trace
This sample shows how to collect a trace of CPU and GPU activity.
callback_event
This sample shows how to use both the callback and event APIs to record the events that occur during the execution of a simple kernel. The sample shows the required ordering for synchronization, and for event group enabling, disabling and reading.
callback_metric
This sample shows how to use both the callback and metric APIs to record the metric's events during the execution of a simple kernel, and then use those events to calculate the metric value.
callback_timestamp
This sample shows how to use the callback API to record a trace of API start and stop times.
cupti_query
This sample shows how to query CUDA-enabled devices for their event domains, events, and metrics.
event_sampling
This sample shows how to use the event API to sample events using a separate host thread.

Modules

CUPTI Version

Description

Function and macro to determine the CUPTI version.

Defines

#define CUPTI_API_VERSION 3
The API version for this implementation of CUPTI.

Functions

CUptiResult cuptiGetVersion ( uint32_t* version )
Get the CUPTI API version.

Defines

#define CUPTI_API_VERSION 3

The API version for this implementation of CUPTI. The API version for this implementation of CUPTI. This define along with cuptiGetVersion can be used to dynamically detect if the version of CUPTI compiled against matches the version of the loaded CUPTI library.

v1 : CUDAToolsSDK 4.0 v2 : CUDAToolsSDK 4.1 v3 : CUDA Toolkit 5.0

Functions

CUptiResult cuptiGetVersion ( uint32_t* version )

Get the CUPTI API version. Return the API version in *version.

See also:

CUPTI_API_VERSION

Parameters
version
Returns the version

CUPTI Result Codes

Description

Error and result codes returned by CUPTI functions.

Enumerations

enum CUptiResult
CUPTI result codes.

Functions

CUptiResult cuptiGetResultString ( CUptiResult result, const char** str )
Get the descriptive string for a CUptiResult.

Enumerations

enum CUptiResult

CUPTI result codes. Error and result codes returned by CUPTI functions.

Values
CUPTI_SUCCESS = 0
No error.
CUPTI_ERROR_INVALID_PARAMETER = 1
One or more of the parameters is invalid.
CUPTI_ERROR_INVALID_DEVICE = 2
The device does not correspond to a valid CUDA device.
CUPTI_ERROR_INVALID_CONTEXT = 3
The context is NULL or not valid.
CUPTI_ERROR_INVALID_EVENT_DOMAIN_ID = 4
The event domain id is invalid.
CUPTI_ERROR_INVALID_EVENT_ID = 5
The event id is invalid.
CUPTI_ERROR_INVALID_EVENT_NAME = 6
The event name is invalid.
CUPTI_ERROR_INVALID_OPERATION = 7
The current operation cannot be performed due to dependency on other factors.
CUPTI_ERROR_OUT_OF_MEMORY = 8
Unable to allocate enough memory to perform the requested operation.
CUPTI_ERROR_HARDWARE = 9
The performance monitoring hardware could not be reserved or some other hardware error occurred.
CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT = 10
The output buffer size is not sufficient to return all requested data.
CUPTI_ERROR_API_NOT_IMPLEMENTED = 11
API is not implemented.
CUPTI_ERROR_MAX_LIMIT_REACHED = 12
The maximum limit is reached.
CUPTI_ERROR_NOT_READY = 13
The object is not yet ready to perform the requested operation.
CUPTI_ERROR_NOT_COMPATIBLE = 14
The current operation is not compatible with the current state of the object
CUPTI_ERROR_NOT_INITIALIZED = 15
CUPTI is unable to initialize its connection to the CUDA driver.
CUPTI_ERROR_INVALID_METRIC_ID = 16
The metric id is invalid.
CUPTI_ERROR_INVALID_METRIC_NAME = 17
The metric name is invalid.
CUPTI_ERROR_QUEUE_EMPTY = 18
The queue is empty.
CUPTI_ERROR_INVALID_HANDLE = 19
Invalid handle (internal?).
CUPTI_ERROR_INVALID_STREAM = 20
Invalid stream.
CUPTI_ERROR_INVALID_KIND = 21
Invalid kind.
CUPTI_ERROR_INVALID_EVENT_VALUE = 22
Invalid event value.
CUPTI_ERROR_DISABLED = 23
CUPTI is disabled due to conflicts with other enabled profilers
CUPTI_ERROR_INVALID_MODULE = 24
Invalid module.
CUPTI_ERROR_UNKNOWN = 999
An unknown internal error has occurred.
CUPTI_ERROR_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiGetResultString ( CUptiResult result, const char** str )

Get the descriptive string for a CUptiResult. Return the descriptive string for a CUptiResult in *str.

Note:

Thread-safety: this function is thread safe.

Parameters
result
The result to get the string for
str
Returns the string

CUPTI Activity API

Description

Functions, types, and enums that implement the CUPTI Activity API.

Classes

struct 
The base activity record.
struct 
The activity record for a driver or runtime API invocation.
struct 
The activity record for source level result branch.
struct 
The activity record for a context.
struct 
The activity record for a device.
struct 
The activity record for a CUPTI event.
struct 
The activity record for source-level global access.
struct 
The activity record for kernel.
struct 
The activity record providing a marker which is an instantaneous point in time.
struct 
The activity record providing detailed information for a marker.
struct 
The activity record for memory copies.
struct 
The activity record for memset.
struct 
The activity record for a CUPTI metric.
struct 
The activity record providing a name.
union 
Identifiers for object kinds as specified by CUpti_ActivityObjectKind.
struct 
The activity record for CUPTI and driver overheads.
struct 
The activity record for source locator.

Defines

#define CUPTI_SOURCE_LOCATOR_ID_UNKNOWN 0

Enumerations

enum CUpti_ActivityComputeApiKind
The kind of a compute API.
enum CUpti_ActivityFlag
Flags associated with activity records.
enum CUpti_ActivityKind
The kinds of activity records.
enum CUpti_ActivityMemcpyKind
The kind of a memory copy, indicating the source and destination targets of the copy.
enum CUpti_ActivityMemoryKind
The kinds of memory accessed by a memory copy.
enum CUpti_ActivityObjectKind
The kinds of activity objects.
enum CUpti_ActivityOverheadKind
The kinds of activity overhead.

Functions

CUptiResult cuptiActivityDequeueBuffer ( CUcontext context, uint32_t streamId, uint8_t** buffer, size_t* validBufferSizeBytes )
Dequeue a buffer containing activity records.
CUptiResult cuptiActivityDisable ( CUpti_ActivityKind kind )
Disable collection of a specific kind of activity record.
CUptiResult cuptiActivityDisableContext ( CUcontext context, CUpti_ActivityKind kind )
Disable collection of a specific kind of activity record for a context.
CUptiResult cuptiActivityEnable ( CUpti_ActivityKind kind )
Enable collection of a specific kind of activity record.
CUptiResult cuptiActivityEnableContext ( CUcontext context, CUpti_ActivityKind kind )
Enable collection of a specific kind of activity record for a context.
CUptiResult cuptiActivityEnqueueBuffer ( CUcontext context, uint32_t streamId, uint8_t* buffer, size_t bufferSizeBytes )
Queue a buffer for activity record collection.
CUptiResult cuptiActivityGetNextRecord ( uint8_t* buffer, size_t validBufferSizeBytes, CUpti_Activity** record )
Iterate over the activity records in a buffer.
CUptiResult cuptiActivityGetNumDroppedRecords ( CUcontext context, uint32_t streamId, size_t* dropped )
Get the number of activity records that were dropped from a queue because of insufficient buffer space.
CUptiResult cuptiActivityQueryBuffer ( CUcontext context, uint32_t streamId, size_t* validBufferSizeBytes )
Query the status of the buffer at the head of a queue.
CUptiResult cuptiGetDeviceId ( CUcontext context, uint32_t* deviceId )
Get the ID of a device.
CUptiResult cuptiGetStreamId ( CUcontext context, CUstream stream, uint32_t* streamId )
Get the ID of a stream.
CUptiResult cuptiGetTimestamp ( uint64_t* timestamp )
Get the CUPTI timestamp.

Defines

#define CUPTI_SOURCE_LOCATOR_ID_UNKNOWN 0

The source-locator ID that indicates an unknown source location. There is not an actual CUpti_ActivitySourceLocator object corresponding to this value.

Enumerations

enum CUpti_ActivityComputeApiKind

The kind of a compute API.

Values
CUPTI_ACTIVITY_COMPUTE_API_UNKNOWN = 0
The compute API is not known.
CUPTI_ACTIVITY_COMPUTE_API_CUDA = 1
The compute APIs are for CUDA.
CUPTI_ACTIVITY_COMPUTE_API_FORCE_INT = 0x7fffffff
enum CUpti_ActivityFlag

Flags associated with activity records. Activity record flags. Flags can be combined by bitwise OR to associated multiple flags with an activity record. Each flag is specific to a certain activity kind, as noted below.

Values
CUPTI_ACTIVITY_FLAG_NONE = 0
Indicates the activity record has no flags.
CUPTI_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS = 1<<0
Indicates the activity represents a device that supports concurrent kernel execution. Valid for CUPTI_ACTIVITY_KIND_DEVICE.
CUPTI_ACTIVITY_FLAG_MEMCPY_ASYNC = 1<<0
Indicates the activity represents an asychronous memcpy operation. Valid for CUPTI_ACTIVITY_KIND_MEMCPY.
CUPTI_ACTIVITY_FLAG_MARKER_INSTANTANEOUS = 1<<0
Indicates the activity represents an instantaneous marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_START = 1<<1
Indicates the activity represents a region start marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_END = 1<<2
Indicates the activity represents a region end marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_NONE = 1<<0
Indicates the activity represents a marker that does not specify a color. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_ARGB = 1<<1
Indicates the activity represents a marker that specifies a color in alpha-red-green-blue format. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_SIZE_MASK = 0xFF<<0
The number of bytes requested by each thread Valid for CUpti_ActivityGlobalAccess.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_LOAD = 1<<8
If bit in this flag is set, the access was load, else it is a store access. Valid for CUpti_ActivityGlobalAccess.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_CACHED = 1<<9
If this bit in flag is set, the load access was cached else it is uncached. Valid for CUpti_ActivityGlobalAccess.
CUPTI_ACTIVITY_FLAG_FORCE_INT = 0x7fffffff
enum CUpti_ActivityKind

The kinds of activity records. Each activity record kind represents information about a GPU or an activity occurring on a CPU or GPU. Each kind is associated with a activity record structure that holds the information associated with the kind.

See also:

CUpti_Activity

CUpti_ActivityAPI

CUpti_ActivityContext

CUpti_ActivityDevice

CUpti_ActivityEvent

CUpti_ActivityKernel

CUpti_ActivityMemcpy

CUpti_ActivityMemset

CUpti_ActivityMetric

CUpti_ActivityName

CUpti_ActivityMarker

CUpti_ActivityMarkerData

CUpti_ActivitySourceLocator

CUpti_ActivityGlobalAccess

CUpti_ActivityBranch

CUpti_ActivityOverhead

Values
CUPTI_ACTIVITY_KIND_INVALID = 0
The activity record is invalid.
CUPTI_ACTIVITY_KIND_MEMCPY = 1
A host<->host, host<->device, or device<->device memory copy. The corresponding activity record structure is CUpti_ActivityMemcpy.
CUPTI_ACTIVITY_KIND_MEMSET = 2
A memory set executing on the GPU. The corresponding activity record structure is CUpti_ActivityMemset.
CUPTI_ACTIVITY_KIND_KERNEL = 3
A kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityKernel.
CUPTI_ACTIVITY_KIND_DRIVER = 4
A CUDA driver API function execution. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_RUNTIME = 5
A CUDA runtime API function execution. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_EVENT = 6
An event value. The corresponding activity record structure is CUpti_ActivityEvent.
CUPTI_ACTIVITY_KIND_METRIC = 7
A metric value. The corresponding activity record structure is CUpti_ActivityMetric.
CUPTI_ACTIVITY_KIND_DEVICE = 8
Information about a device. The corresponding activity record structure is CUpti_ActivityDevice.
CUPTI_ACTIVITY_KIND_CONTEXT = 9
Information about a context. The corresponding activity record structure is CUpti_ActivityContext.
CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL = 10
A (potentially concurrent) kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityKernel.
CUPTI_ACTIVITY_KIND_NAME = 11
Thread, device, context, etc. name. The corresponding activity record structure is CUpti_ActivityName.
CUPTI_ACTIVITY_KIND_MARKER = 12
Instantaneous, start, or end marker.
CUPTI_ACTIVITY_KIND_MARKER_DATA = 13
Extended, optional, data about a marker.
CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR = 14
Source information about source level result. The corresponding activity record structure is CUpti_ActivitySourceLocator.
CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS = 15
Results for source-level global acccess. The corresponding activity record structure is CUpti_ActivityGlobalAccess.
CUPTI_ACTIVITY_KIND_BRANCH = 16
Results for source-level branch. The corresponding activity record structure is CUpti_ActivityBranch.
CUPTI_ACTIVITY_KIND_OVERHEAD = 17
Overhead activity records. The corresponding activity record structure is CUpti_ActivityOverhead.
CUPTI_ACTIVITY_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityMemcpyKind

The kind of a memory copy, indicating the source and destination targets of the copy. Each kind represents the source and destination targets of a memory copy. Targets are host, device, and array.

Values
CUPTI_ACTIVITY_MEMCPY_KIND_UNKNOWN = 0
The memory copy kind is not known.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOD = 1
A host to device memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOH = 2
A device to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOA = 3
A host to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOH = 4
A device array to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOA = 5
A device array to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOD = 6
A device array to device memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOA = 7
A device to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOD = 8
A device to device memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOH = 9
A host to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityMemoryKind

The kinds of memory accessed by a memory copy. Each kind represents the type of the source or destination memory accessed by a memory copy.

Values
CUPTI_ACTIVITY_MEMORY_KIND_UNKNOWN = 0
The source or destination memory kind is unknown.
CUPTI_ACTIVITY_MEMORY_KIND_PAGEABLE = 1
The source or destination memory is pageable.
CUPTI_ACTIVITY_MEMORY_KIND_PINNED = 2
The source or destination memory is pinned.
CUPTI_ACTIVITY_MEMORY_KIND_DEVICE = 3
The source or destination memory is on the device.
CUPTI_ACTIVITY_MEMORY_KIND_ARRAY = 4
The source or destination memory is an array.
CUPTI_ACTIVITY_MEMORY_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityObjectKind

The kinds of activity objects.

See also:

CUpti_ActivityObjectKindId

Values
CUPTI_ACTIVITY_OBJECT_UNKNOWN = 0
The object kind is not known.
CUPTI_ACTIVITY_OBJECT_PROCESS = 1
A process.
CUPTI_ACTIVITY_OBJECT_THREAD = 2
A thread.
CUPTI_ACTIVITY_OBJECT_DEVICE = 3
A device.
CUPTI_ACTIVITY_OBJECT_CONTEXT = 4
A context.
CUPTI_ACTIVITY_OBJECT_STREAM = 5
A stream.
CUPTI_ACTIVITY_OBJECT_FORCE_INT = 0x7fffffff
enum CUpti_ActivityOverheadKind

The kinds of activity overhead.

Values
CUPTI_ACTIVITY_OVERHEAD_UNKNOWN = 0
The overhead kind is not known.
CUPTI_ACTIVITY_OVERHEAD_DRIVER_COMPILER = 1
Compiler(JIT) overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_BUFFER_FLUSH = 1<<16
Activity buffer flush overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_INSTRUMENTATION = 2<<16
CUPTI instrumentation overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_RESOURCE = 3<<16
CUPTI resource creation and destruction overhead.
CUPTI_ACTIVITY_OVERHEAD_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiActivityDequeueBuffer ( CUcontext context, uint32_t streamId, uint8_t** buffer, size_t* validBufferSizeBytes )

Dequeue a buffer containing activity records. Remove the buffer from the head of the specified queue. See cuptiActivityEnqueueBuffer() for description of queues. Calling this function transfers ownership of the buffer from CUPTI. CUPTI will no add any activity records to the buffer after it is dequeued.

Parameters
context
The context, or NULL to dequeue from the global queue
streamId
The stream ID
buffer
Returns the dequeued buffer
validBufferSizeBytes
Returns the number of bytes in the buffer that contain activity records
CUptiResult cuptiActivityDisable ( CUpti_ActivityKind kind )

Disable collection of a specific kind of activity record. Disable collection of a specific kind of activity record. Multiple kinds can be disabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters
kind
The kind of activity record to stop collecting
CUptiResult cuptiActivityDisableContext ( CUcontext context, CUpti_ActivityKind kind )

Disable collection of a specific kind of activity record for a context. Disable collection of a specific kind of activity record for a context. This setting done by this API will supercede the global settings for activity records. Multiple kinds can be enabled by calling this function multiple times.

Parameters
context
The context for which activity is to be disabled
kind
The kind of activity record to stop collecting
CUptiResult cuptiActivityEnable ( CUpti_ActivityKind kind )

Enable collection of a specific kind of activity record. Enable collection of a specific kind of activity record. Multiple kinds can be enabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters
kind
The kind of activity record to collect
CUptiResult cuptiActivityEnableContext ( CUcontext context, CUpti_ActivityKind kind )

Enable collection of a specific kind of activity record for a context. Enable collection of a specific kind of activity record for a context. This setting done by this API will supercede the global settings for activity records enabled by cuptiActivityEnable Multiple kinds can be enabled by calling this function multiple times.

Parameters
context
The context for which activity is to be enabled
kind
The kind of activity record to collect
CUptiResult cuptiActivityEnqueueBuffer ( CUcontext context, uint32_t streamId, uint8_t* buffer, size_t bufferSizeBytes )

Queue a buffer for activity record collection. Queue a buffer for activity record collection. Calling this function transfers ownership of the buffer to CUPTI. The buffer should not be accessed or modified until ownership is regained by calling cuptiActivityDequeueBuffer().

There are three types of queues:

Global Queue: The global queue collects all activity records that are not associated with a valid context. All device and API activity records are collected in the global queue. A buffer is enqueued in the global queue by specifying context == NULL.

Context Queue: Each context queue collects activity records associated with that context that are not associated with a specific stream or that are associated with the default stream. A buffer is enqueued in a context queue by specifying the context and a streamId of 0.

Stream Queue: Each stream queue collects memcpy, memset, and kernel activity records associated with the stream. A buffer is enqueued in a stream queue by specifying a context and a non-zero stream ID.

Multiple buffers can be enqueued on each queue, and buffers can be enqueue on multiple queues.

When a new activity record needs to be recorded, CUPTI searches for a non-empty queue to hold the record in this order: 1) the appropriate stream queue, 2) the appropriate context queue. If the search does not find any queue with a buffer then the activity record is dropped. If the search finds a queue containing a buffer, but that buffer is full, then the activity record is dropped and the dropped record count for the queue is incremented. If the search finds a queue containing a buffer with space available to hold the record, then the record is recorded in the buffer.

At a minimum, one or more buffers must be queued in the global queue and context queue at all times to avoid dropping activity records. Global queue will not store any activity records for gpu activity(kernel, memcpy, memset). It is also necessary to enqueue at least one buffer in the context queue of each context as it is created. The stream queues are optional and can be used to reduce or eliminate application perturbations caused by the need to process or save the activity records returned in the buffers. For example, if a stream queue is used, that queue can be flushed when the stream is synchronized.

Parameters
context
The context, or NULL to enqueue on the global queue
streamId
The stream ID
buffer
The pointer to user supplied buffer for storing activity records.The buffer must be at least 8 byte aligned, and the size of the buffer must be at least 1024 bytes.
bufferSizeBytes
The size of the buffer, in bytes. The size of the buffer must be at least 1024 bytes.
CUptiResult cuptiActivityGetNextRecord ( uint8_t* buffer, size_t validBufferSizeBytes, CUpti_Activity** record )

Iterate over the activity records in a buffer. This is a helper function to iterate over the activity records in a buffer. A buffer of activity records is typically obtained by using the cuptiActivityDequeueBuffer() function.

An example of typical usage:

CUpti_Activity *record = NULL;
 CUptiResult status = CUPTI_SUCCESS;
   do {
      status = cuptiActivityGetNextRecord(buffer, validSize, &record);
      if(status == CUPTI_SUCCESS) {
           // Use record here...
      }
      else if (status == CUPTI_ERROR_MAX_LIMIT_REACHED)
          break;
      else {
          goto Error;
      }
    } while (1);

Parameters
buffer
The buffer containing activity records
validBufferSizeBytes
The number of valid bytes in the buffer.
record
Inputs the previous record returned by cuptiActivityGetNextRecord and returns the next activity record from the buffer. If input value if NULL, returns the first activity record in the buffer.
CUptiResult cuptiActivityGetNumDroppedRecords ( CUcontext context, uint32_t streamId, size_t* dropped )

Get the number of activity records that were dropped from a queue because of insufficient buffer space. Get the number of records that were dropped from a queue because all the buffers in the queue are full. See cuptiActivityEnqueueBuffer() for description of queues. Calling this function does not transfer ownership of the buffer. The dropped count maintained for the queue is reset to zero when this function is called.

Parameters
context
The context, or NULL to get dropped count from global queue
streamId
The stream ID
dropped
The number of records that were dropped since the last call to this function.
CUptiResult cuptiActivityQueryBuffer ( CUcontext context, uint32_t streamId, size_t* validBufferSizeBytes )

Query the status of the buffer at the head of a queue. Query the status of buffer at the head in the queue. See cuptiActivityEnqueueBuffer() for description of queues. Calling this function does not transfer ownership of the buffer.

Parameters
context
The context, or NULL to query the global queue
streamId
The stream ID
validBufferSizeBytes
Returns the number of bytes in the buffer that contain activity records
CUptiResult cuptiGetDeviceId ( CUcontext context, uint32_t* deviceId )

Get the ID of a device. If context is NULL, returns the ID of the device that contains the currently active context. If context is non-NULL, returns the ID of the device which contains that context. Operates in a similar manner to cudaGetDevice() or cuCtxGetDevice() but may be called from within callback functions.

Parameters
context
The context, or NULL to indicate the current context.
deviceId
Returns the ID of the device that is current for the calling thread.
CUptiResult cuptiGetStreamId ( CUcontext context, CUstream stream, uint32_t* streamId )

Get the ID of a stream. Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).

See also:

cuptiActivityEnqueueBuffer

cuptiActivityDequeueBuffer

Parameters
context
If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.
stream
The stream
streamId
Returns a context-unique ID for the stream
CUptiResult cuptiGetTimestamp ( uint64_t* timestamp )

Get the CUPTI timestamp. Returns a timestamp normalized to correspond with the start and end timestamps reported in the CUPTI activity records. The timestamp is reported in nanoseconds.

Parameters
timestamp
Returns the CUPTI timestamp

CUPTI Callback API

Description

Functions, types, and enums that implement the CUPTI Callback API.

Classes

struct 
Data passed into a runtime or driver API callback function.
struct 
Data passed into a NVTX callback function.
struct 
Data passed into a resource callback function.
struct 
Data passed into a synchronize callback function.

Typedefs

typedef void  ( *CUpti_CallbackFunc )( void* userdata,  CUpti_CallbackDomain domain,  CUpti_CallbackId cbid, const void* cbdata )
Function type for a callback.
typedef uint32_t  CUpti_CallbackId
An ID for a driver API, runtime API, resource or synchronization callback.
typedef CUpti_CallbackDomain* CUpti_DomainTable
Pointer to an array of callback domains.
typedef CUpti_Subscriber_st *  CUpti_SubscriberHandle
A callback subscriber.

Enumerations

enum CUpti_ApiCallbackSite
Specifies the point in an API call that a callback is issued.
enum CUpti_CallbackDomain
Callback domains.
enum CUpti_CallbackIdResource
Callback IDs for resource domain.
enum CUpti_CallbackIdSync
Callback IDs for synchronization domain.

Functions

CUptiResult cuptiEnableAllDomains ( uint32_t enable, CUpti_SubscriberHandle subscriber )
Enable or disable all callbacks in all domains.
CUptiResult cuptiEnableCallback ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )
Enable or disabled callbacks for a specific domain and callback ID.
CUptiResult cuptiEnableDomain ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain )
Enable or disabled all callbacks for a specific domain.
CUptiResult cuptiGetCallbackName ( CUpti_CallbackDomain domain, uint32_t cbid, const char** name )
Get the name of a callback for a specific domain and callback ID.
CUptiResult cuptiGetCallbackState ( uint32_t* enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )
Get the current enabled/disabled state of a callback for a specific domain and function ID.
CUptiResult cuptiSubscribe ( CUpti_SubscriberHandle* subscriber, CUpti_CallbackFunc callback, void* userdata )
Initialize a callback subscriber with a callback function and user data.
CUptiResult cuptiSupportedDomains ( size_t* domainCount, CUpti_DomainTable* domainTable )
Get the available callback domains.
CUptiResult cuptiUnsubscribe ( CUpti_SubscriberHandle subscriber )
Unregister a callback subscriber.

Typedefs

void ( *CUpti_CallbackFunc )( void* userdata,  CUpti_CallbackDomain domain,  CUpti_CallbackId cbid, const void* cbdata )

Function type for a callback. Function type for a callback. The type of the data passed to the callback in cbdata depends on the domain. If domain is CUPTI_CB_DOMAIN_DRIVER_API or CUPTI_CB_DOMAIN_RUNTIME_API the type of cbdata will be CUpti_CallbackData. If domain is CUPTI_CB_DOMAIN_RESOURCE the type of cbdata will be CUpti_ResourceData. If domain is CUPTI_CB_DOMAIN_SYNCHRONIZE the type of cbdata will be CUpti_SynchronizeData. If domain is CUPTI_CB_DOMAIN_NVTX the type of cbdata will be CUpti_NvtxData.

Parameters
userdata
User data supplied at subscription of the callback
CUpti_CallbackDomain domain
CUpti_CallbackId cbid
cbdata
Data passed to the callback.
typedef uint32_t CUpti_CallbackId

An ID for a driver API, runtime API, resource or synchronization callback. An ID for a driver API, runtime API, resource or synchronization callback. Within a driver API callback this should be interpreted as a CUpti_driver_api_trace_cbid value (these values are defined in cupti_driver_cbid.h). Within a runtime API callback this should be interpreted as a CUpti_runtime_api_trace_cbid value (these values are defined in cupti_runtime_cbid.h). Within a resource API callback this should be interpreted as a CUpti_CallbackIdResource value. Within a synchronize API callback this should be interpreted as a CUpti_CallbackIdSync value.

typedef CUpti_CallbackDomain* CUpti_DomainTable

Pointer to an array of callback domains.

typedef CUpti_Subscriber_st * CUpti_SubscriberHandle

A callback subscriber.

Enumerations

enum CUpti_ApiCallbackSite

Specifies the point in an API call that a callback is issued. Specifies the point in an API call that a callback is issued. This value is communicated to the callback function via CUpti_CallbackData::callbackSite.

Values
CUPTI_API_ENTER = 0
The callback is at the entry of the API call.
CUPTI_API_EXIT = 1
The callback is at the exit of the API call.
CUPTI_API_CBSITE_FORCE_INT = 0x7fffffff
enum CUpti_CallbackDomain

Callback domains. Callback domains. Each domain represents callback points for a group of related API functions or CUDA driver activity.

Values
CUPTI_CB_DOMAIN_INVALID = 0
Invalid domain.
CUPTI_CB_DOMAIN_DRIVER_API = 1
Domain containing callback points for all driver API functions.
CUPTI_CB_DOMAIN_RUNTIME_API = 2
Domain containing callback points for all runtime API functions.
CUPTI_CB_DOMAIN_RESOURCE = 3
Domain containing callback points for CUDA resource tracking.
CUPTI_CB_DOMAIN_SYNCHRONIZE = 4
Domain containing callback points for CUDA synchronization.
CUPTI_CB_DOMAIN_NVTX = 5
Domain containing callback points for NVTX API functions.
CUPTI_CB_DOMAIN_SIZE = 6
CUPTI_CB_DOMAIN_FORCE_INT = 0x7fffffff
enum CUpti_CallbackIdResource

Callback IDs for resource domain. Callback IDs for resource domain, CUPTI_CB_DOMAIN_RESOURCE. This value is communicated to the callback function via the cbid parameter.

Values
CUPTI_CBID_RESOURCE_INVALID = 0
Invalid resource callback ID.
CUPTI_CBID_RESOURCE_CONTEXT_CREATED = 1
A new context has been created.
CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING = 2
A context is about to be destroyed.
CUPTI_CBID_RESOURCE_STREAM_CREATED = 3
A new stream has been created.
CUPTI_CBID_RESOURCE_STREAM_DESTROY_STARTING = 4
A stream is about to be destroyed.
CUPTI_CBID_RESOURCE_SIZE
CUPTI_CBID_RESOURCE_FORCE_INT = 0x7fffffff
enum CUpti_CallbackIdSync

Callback IDs for synchronization domain. Callback IDs for synchronization domain, CUPTI_CB_DOMAIN_SYNCHRONIZE. This value is communicated to the callback function via the cbid parameter.

Values
CUPTI_CBID_SYNCHRONIZE_INVALID = 0
Invalid synchronize callback ID.
CUPTI_CBID_SYNCHRONIZE_STREAM_SYNCHRONIZED = 1
Stream synchronization has completed for the stream.
CUPTI_CBID_SYNCHRONIZE_CONTEXT_SYNCHRONIZED = 2
Context synchronization has completed for the context.
CUPTI_CBID_SYNCHRONIZE_SIZE
CUPTI_CBID_SYNCHRONIZE_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiEnableAllDomains ( uint32_t enable, CUpti_SubscriberHandle subscriber )

Enable or disable all callbacks in all domains. Enable or disable all callbacks in all domains.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackState(sub, d, *) and cuptiEnableAllDomains(sub) are called concurrently, the results are undefined.

Parameters
enable
New enable state for all callbacks in all domain. Zero disables all callbacks, non-zero enables all callbacks.
subscriber
- Handle to callback subscription
CUptiResult cuptiEnableCallback ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )

Enable or disabled callbacks for a specific domain and callback ID. Enable or disabled callbacks for a subscriber for a specific domain and callback ID.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackState(sub, d, c) and cuptiEnableCallback(sub, d, c) are called concurrently, the results are undefined.

Parameters
enable
New enable state for the callback. Zero disables the callback, non-zero enables the callback.
subscriber
- Handle to callback subscription
domain
The domain of the callback
cbid
The ID of the callback
CUptiResult cuptiEnableDomain ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain )

Enable or disabled all callbacks for a specific domain. Enable or disabled all callbacks for a specific domain.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackEnabled(sub, d, *) and cuptiEnableDomain(sub, d) are called concurrently, the results are undefined.

Parameters
enable
New enable state for all callbacks in the domain. Zero disables all callbacks, non-zero enables all callbacks.
subscriber
- Handle to callback subscription
domain
The domain of the callback
CUptiResult cuptiGetCallbackName ( CUpti_CallbackDomain domain, uint32_t cbid, const char** name )

Get the name of a callback for a specific domain and callback ID. Returns a pointer to the name c_string in **name.

Note:

Names are available only for the DRIVER and RUNTIME domains.

Parameters
domain
The domain of the callback
cbid
The ID of the callback
name
Returns pointer to the name string on success, NULL otherwise
CUptiResult cuptiGetCallbackState ( uint32_t* enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )

Get the current enabled/disabled state of a callback for a specific domain and function ID. Returns non-zero in *enable if the callback for a domain and callback ID is enabled, and zero if not enabled.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackState(sub, d, c) and cuptiEnableCallback(sub, d, c) are called concurrently, the results are undefined.

Parameters
enable
Returns non-zero if callback enabled, zero if not enabled
subscriber
Handle to the initialize subscriber
domain
The domain of the callback
cbid
The ID of the callback
CUptiResult cuptiSubscribe ( CUpti_SubscriberHandle* subscriber, CUpti_CallbackFunc callback, void* userdata )

Initialize a callback subscriber with a callback function and user data. Initializes a callback subscriber with a callback function and (optionally) a pointer to user data. The returned subscriber handle can be used to enable and disable the callback for specific domains and callback IDs.

Note:
  • Only a single subscriber can be registered at a time.

  • This function does not enable any callbacks.

  • Thread-safety: this function is thread safe.

Parameters
subscriber
Returns handle to initialize subscriber
callback
The callback function
userdata
A pointer to user data. This data will be passed to the callback function via the userdata paramater.
CUptiResult cuptiSupportedDomains ( size_t* domainCount, CUpti_DomainTable* domainTable )

Get the available callback domains. Returns in *domainTable an array of size *domainCount of all the available callback domains.

Note:

Thread-safety: this function is thread safe.

Parameters
domainCount
Returns number of callback domains
domainTable
Returns pointer to array of available callback domains
CUptiResult cuptiUnsubscribe ( CUpti_SubscriberHandle subscriber )

Unregister a callback subscriber. Removes a callback subscriber so that no future callbacks will be issued to that subscriber.

Note:

Thread-safety: this function is thread safe.

Parameters
subscriber
Handle to the initialize subscriber

CUPTI Event API

Description

Functions, types, and enums that implement the CUPTI Event API.

Classes

struct 
A set of event groups.
struct 
A set of event group sets.

Defines

#define CUPTI_EVENT_OVERFLOW
The overflow value for a CUPTI event.

Typedefs

typedef uint32_t  CUpti_EventDomainID
ID for an event domain.
typedef void *  CUpti_EventGroup
A group of events.
typedef uint32_t  CUpti_EventID
ID for an event.

Enumerations

enum CUpti_DeviceAttribute
Device attributes.
enum CUpti_EventAttribute
Event attributes.
enum CUpti_EventCategory
An event category.
enum CUpti_EventCollectionMode
Event collection modes.
enum CUpti_EventDomainAttribute
Event domain attributes.
enum CUpti_EventGroupAttribute
Event group attributes.
enum CUpti_ReadEventFlags
Flags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents.

Functions

CUptiResult cuptiDeviceEnumEventDomains ( CUdevice device, size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )
Get the event domains for a device.
CUptiResult cuptiDeviceGetAttribute ( CUdevice device, CUpti_DeviceAttribute attrib, size_t* valueSize, void* value )
Read a device attribute.
CUptiResult cuptiDeviceGetEventDomainAttribute ( CUdevice device, CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )
Read an event domain attribute.
CUptiResult cuptiDeviceGetNumEventDomains ( CUdevice device, uint32_t* numDomains )
Get the number of domains for a device.
CUptiResult cuptiDeviceGetTimestamp ( CUcontext context, uint64_t* timestamp )
Read a device timestamp.
CUptiResult cuptiEnumEventDomains ( size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )
Get the event domains available on any device.
CUptiResult cuptiEventDomainEnumEvents ( CUpti_EventDomainID eventDomain, size_t* arraySizeBytes, CUpti_EventID* eventArray )
Get the events in a domain.
CUptiResult cuptiEventDomainGetAttribute ( CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )
Read an event domain attribute.
CUptiResult cuptiEventDomainGetNumEvents ( CUpti_EventDomainID eventDomain, uint32_t* numEvents )
Get number of events in a domain.
CUptiResult cuptiEventGetAttribute ( CUpti_EventID event, CUpti_EventAttribute attrib, size_t* valueSize, void* value )
Get an event attribute.
CUptiResult cuptiEventGetIdFromName ( CUdevice device, const char* eventName, CUpti_EventID* event )
Find an event by name.
CUptiResult cuptiEventGroupAddEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )
Add an event to an event group.
CUptiResult cuptiEventGroupCreate ( CUcontext context, CUpti_EventGroup* eventGroup, uint32_t flags )
Create a new event group for a context.
CUptiResult cuptiEventGroupDestroy ( CUpti_EventGroup eventGroup )
Destroy an event group.
CUptiResult cuptiEventGroupDisable ( CUpti_EventGroup eventGroup )
Disable an event group.
CUptiResult cuptiEventGroupEnable ( CUpti_EventGroup eventGroup )
Enable an event group.
CUptiResult cuptiEventGroupGetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t* valueSize, void* value )
Read an event group attribute.
CUptiResult cuptiEventGroupReadAllEvents ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t* numEventIdsRead )
Read the values for all the events in an event group.
CUptiResult cuptiEventGroupReadEvent ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, CUpti_EventID event, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer )
Read the value for an event in an event group.
CUptiResult cuptiEventGroupRemoveAllEvents ( CUpti_EventGroup eventGroup )
Remove all events from an event group.
CUptiResult cuptiEventGroupRemoveEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )
Remove an event from an event group.
CUptiResult cuptiEventGroupResetAllEvents ( CUpti_EventGroup eventGroup )
Zero all the event counts in an event group.
CUptiResult cuptiEventGroupSetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t valueSize, void* value )
Write an event group attribute.
CUptiResult cuptiEventGroupSetsCreate ( CUcontext context, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, CUpti_EventGroupSets** eventGroupPasses )
For a set of events, get the grouping that indicates the number of passes and the event groups necessary to collect the events.
CUptiResult cuptiEventGroupSetsDestroy ( CUpti_EventGroupSets* eventGroupSets )
Destroy a CUpti_EventGroupSets object.
CUptiResult cuptiGetNumEventDomains ( uint32_t* numDomains )
Get the number of event domains available on any device.
CUptiResult cuptiSetEventCollectionMode ( CUcontext context, CUpti_EventCollectionMode mode )
Set the event collection mode.

Defines

#define CUPTI_EVENT_OVERFLOW

The overflow value for a CUPTI event. The CUPTI event value that indicates an overflow.

Value

((uint64_t)0xFFFFFFFFFFFFFFFFULL)

Typedefs

typedef uint32_t CUpti_EventDomainID

ID for an event domain. ID for an event domain. An event domain represents a group of related events. A device may have multiple instances of a domain, indicating that the device can simultaneously record multiple instances of each event within that domain.

typedef void * CUpti_EventGroup

A group of events. An event group is a collection of events that are managed together. All events in an event group must belong to the same domain.

typedef uint32_t CUpti_EventID

ID for an event. An event represents a countable activity, action, or occurrence on the device.

Enumerations

enum CUpti_DeviceAttribute

Device attributes. CUPTI device attributes. These attributes can be read using cuptiDeviceGetAttribute.

Values
CUPTI_DEVICE_ATTR_MAX_EVENT_ID = 1
Number of event IDs for a device. Value is a uint32_t.
CUPTI_DEVICE_ATTR_MAX_EVENT_DOMAIN_ID = 2
Number of event domain IDs for a device. Value is a uint32_t.
CUPTI_DEVICE_ATTR_GLOBAL_MEMORY_BANDWIDTH = 3
Get global memory bandwidth in Kbytes/sec. Value is a uint64_t.
CUPTI_DEVICE_ATTR_INSTRUCTION_PER_CYCLE = 4
Get theoretical instructions per cycle. Value is a uint32_t.
CUPTI_DEVICE_ATTR_INSTRUCTION_THROUGHPUT_SINGLE_PRECISION = 5
Get theoretical number of single precision instructions that can be executed per second. Value is a uint64_t.
CUPTI_DEVICE_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_EventAttribute

Event attributes. Event attributes. These attributes can be read using cuptiEventGetAttribute.

Values
CUPTI_EVENT_ATTR_NAME = 0
Event name. Value is a null terminated const c-string.
CUPTI_EVENT_ATTR_SHORT_DESCRIPTION = 1
Short description of event. Value is a null terminated const c-string.
CUPTI_EVENT_ATTR_LONG_DESCRIPTION = 2
Long description of event. Value is a null terminated const c-string.
CUPTI_EVENT_ATTR_CATEGORY = 3
Category of event. Value is CUpti_EventCategory.
CUPTI_EVENT_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_EventCategory

An event category. Each event is assigned to a category that represents the general type of the event. A event's category is accessed using cuptiEventGetAttribute and the CUPTI_EVENT_ATTR_CATEGORY attribute.

Values
CUPTI_EVENT_CATEGORY_INSTRUCTION = 0
An instruction related event.
CUPTI_EVENT_CATEGORY_MEMORY = 1
A memory related event.
CUPTI_EVENT_CATEGORY_CACHE = 2
A cache related event.
CUPTI_EVENT_CATEGORY_PROFILE_TRIGGER = 3
A profile-trigger event.
CUPTI_EVENT_CATEGORY_FORCE_INT = 0x7fffffff
enum CUpti_EventCollectionMode

Event collection modes. The event collection mode determines the period over which the events within the enabled event groups will be collected.

Values
CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS = 0
Events are collected for the entire duration between the cuptiEventGroupEnable and cuptiEventGroupDisable calls. This is the default mode.
CUPTI_EVENT_COLLECTION_MODE_KERNEL = 1
Events are collected only for the durations of kernel executions that occur between the cuptiEventGroupEnable and cuptiEventGroupDisable calls. Event collection begins when a kernel execution begins, and stops when kernel execution completes. If multiple kernel executions occur between the cuptiEventGroupEnable and cuptiEventGroupDisable calls then the event values must be read after each kernel launch if those events need to be associated with the specific kernel launch.
CUPTI_EVENT_COLLECTION_MODE_FORCE_INT = 0x7fffffff
enum CUpti_EventDomainAttribute

Event domain attributes. Event domain attributes. Except where noted, all the attributes can be read using either cuptiDeviceGetEventDomainAttribute or cuptiEventDomainGetAttribute.

Values
CUPTI_EVENT_DOMAIN_ATTR_NAME = 0
Event domain name. Value is a null terminated const c-string.
CUPTI_EVENT_DOMAIN_ATTR_INSTANCE_COUNT = 1
Number of instances of the domain for which event counts will be collected. The domain may have additional instances that cannot be profiled (see CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT). Can be read only with cuptiDeviceGetEventDomainAttribute. Value is a uint32_t.
CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT = 3
Total number of instances of the domain, including instances that cannot be profiled. Use CUPTI_EVENT_DOMAIN_ATTR_INSTANCE_COUNT to get the number of instances that can be profiled. Can be read only with cuptiDeviceGetEventDomainAttribute. Value is a uint32_t.
CUPTI_EVENT_DOMAIN_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_EventGroupAttribute

Event group attributes. Event group attributes. These attributes can be read using cuptiEventGroupGetAttribute. Attributes marked [rw] can also be written using cuptiEventGroupSetAttribute.

Values
CUPTI_EVENT_GROUP_ATTR_EVENT_DOMAIN_ID = 0
The domain to which the event group is bound. This attribute is set when the first event is added to the group. Value is a CUpti_EventDomainID.
CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES = 1
[rw] Profile all the instances of the domain for this eventgroup. This feature can be used to get load balancing across all instances of a domain. Value is an integer.
CUPTI_EVENT_GROUP_ATTR_USER_DATA = 2
[rw] Reserved for user data.
CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS = 3
Number of events in the group. Value is a uint32_t.
CUPTI_EVENT_GROUP_ATTR_EVENTS = 4
Enumerates events in the group. Value is a pointer to buffer of size sizeof(CUpti_EventID) * num_of_events in the eventgroup. num_of_events can be queried using CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS.
CUPTI_EVENT_GROUP_ATTR_INSTANCE_COUNT = 5
Number of instances of the domain bound to this event group that will be counted. Value is a uint32_t.
CUPTI_EVENT_GROUP_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_ReadEventFlags

Flags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents. Flags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents.

Values
CUPTI_EVENT_READ_FLAG_NONE = 0
No flags.
CUPTI_EVENT_READ_FLAG_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiDeviceEnumEventDomains ( CUdevice device, size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )

Get the event domains for a device. Returns the event domains IDs in domainArray for a device. The size of the domainArray buffer is given by *arraySizeBytes. The size of the domainArray buffer must be at least numdomains * sizeof(CUpti_EventDomainID) or else all domains will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in domainArray.

Note:

Thread-safety: this function is thread safe.

Parameters
device
The CUDA device
arraySizeBytes
The size of domainArray in bytes, and returns the number of bytes written to domainArray
domainArray
Returns the IDs of the event domains for the device
CUptiResult cuptiDeviceGetAttribute ( CUdevice device, CUpti_DeviceAttribute attrib, size_t* valueSize, void* value )

Read a device attribute. Read a device attribute and return it in *value.

Note:

Thread-safety: this function is thread safe.

Parameters
device
The CUDA device
attrib
The attribute to read
valueSize
Size of buffer pointed by the value, and returns the number of bytes written to value
value
Returns the value of the attribute
CUptiResult cuptiDeviceGetEventDomainAttribute ( CUdevice device, CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )

Read an event domain attribute. Returns an event domain attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

Note:

Thread-safety: this function is thread safe.

Parameters
device
The CUDA device
eventDomain
ID of the event domain
attrib
The event domain attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
CUptiResult cuptiDeviceGetNumEventDomains ( CUdevice device, uint32_t* numDomains )

Get the number of domains for a device. Returns the number of domains in numDomains for a device.

Note:

Thread-safety: this function is thread safe.

Parameters
device
The CUDA device
numDomains
Returns the number of domains
CUptiResult cuptiDeviceGetTimestamp ( CUcontext context, uint64_t* timestamp )

Read a device timestamp. Returns the device timestamp in *timestamp. The timestamp is reported in nanoseconds and indicates the time since the device was last reset.

Note:

Thread-safety: this function is thread safe.

Parameters
context
A context on the device from which to get the timestamp
timestamp
Returns the device timestamp
CUptiResult cuptiEnumEventDomains ( size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )

Get the event domains available on any device. Returns all the event domains available on any CUDA-capable device. Event domain IDs are returned in domainArray. The size of the domainArray buffer is given by *arraySizeBytes. The size of the domainArray buffer must be at least numDomains * sizeof(CUpti_EventDomainID) or all domains will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in domainArray.

Note:

Thread-safety: this function is thread safe.

Parameters
arraySizeBytes
The size of domainArray in bytes, and returns the number of bytes written to domainArray
domainArray
Returns all the event domains
CUptiResult cuptiEventDomainEnumEvents ( CUpti_EventDomainID eventDomain, size_t* arraySizeBytes, CUpti_EventID* eventArray )

Get the events in a domain. Returns the event IDs in eventArray for a domain. The size of the eventArray buffer is given by *arraySizeBytes. The size of the eventArray buffer must be at least numdomainevents * sizeof(CUpti_EventID) or else all events will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in eventArray.

Note:

Thread-safety: this function is thread safe.

Parameters
eventDomain
ID of the event domain
arraySizeBytes
The size of eventArray in bytes, and returns the number of bytes written to eventArray
eventArray
Returns the IDs of the events in the domain
CUptiResult cuptiEventDomainGetAttribute ( CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )

Read an event domain attribute. Returns an event domain attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

Note:

Thread-safety: this function is thread safe.

Parameters
eventDomain
ID of the event domain
attrib
The event domain attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
CUptiResult cuptiEventDomainGetNumEvents ( CUpti_EventDomainID eventDomain, uint32_t* numEvents )

Get number of events in a domain. Returns the number of events in numEvents for a domain.

Note:

Thread-safety: this function is thread safe.

Parameters
eventDomain
ID of the event domain
numEvents
Returns the number of events in the domain
CUptiResult cuptiEventGetAttribute ( CUpti_EventID event, CUpti_EventAttribute attrib, size_t* valueSize, void* value )

Get an event attribute. Returns an event attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

Note:

Thread-safety: this function is thread safe.

Parameters
event
ID of the event
attrib
The event attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
CUptiResult cuptiEventGetIdFromName ( CUdevice device, const char* eventName, CUpti_EventID* event )

Find an event by name. Find an event by name and return the event ID in *event.

Note:

Thread-safety: this function is thread safe.

Parameters
device
The CUDA device
eventName
The name of the event to find
event
Returns the ID of the found event or undefined if unable to find the event
CUptiResult cuptiEventGroupAddEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )

Add an event to an event group. Add an event to an event group. The event add can fail for a number of reasons:

  • The event group is enabled

  • The event does not belong to the same event domain as the events that are already in the event group

  • Device limitations on the events that can belong to the same group

  • The event group is full

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroup
The event group
event
The event to add to the group
CUptiResult cuptiEventGroupCreate ( CUcontext context, CUpti_EventGroup* eventGroup, uint32_t flags )

Create a new event group for a context. Creates a new event group for context and returns the new group in *eventGroup.

Note:
  • flags are reserved for future use and should be set to zero.

  • Thread-safety: this function is thread safe.

Parameters
context
The context for the event group
eventGroup
Returns the new event group
flags
Reserved - must be zero
CUptiResult cuptiEventGroupDestroy ( CUpti_EventGroup eventGroup )

Destroy an event group. Destroy an eventGroup and free its resources. An event group cannot be destroyed if it is enabled.

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroup
The event group to destroy
CUptiResult cuptiEventGroupDisable ( CUpti_EventGroup eventGroup )

Disable an event group. Disable an event group. Disabling an event group stops collection of events contained in the group.

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroup
The event group
CUptiResult cuptiEventGroupEnable ( CUpti_EventGroup eventGroup )

Enable an event group. Enable an event group. Enabling an event group zeros the value of all the events in the group and then starts collection of those events.

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroup
The event group
CUptiResult cuptiEventGroupGetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t* valueSize, void* value )

Read an event group attribute. Read an event group attribute and return it in *value.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.).

Parameters
eventGroup
The event group
attrib
The attribute to read
valueSize
Size of buffer pointed by the value, and returns the number of bytes written to value
value
Returns the value of the attribute
CUptiResult cuptiEventGroupReadAllEvents ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t* numEventIdsRead )

Read the values for all the events in an event group. Read the values for all the events in an event group. The event values are returned in the eventValueBuffer buffer. eventValueBufferSizeBytes indicates the size of eventValueBuffer. The buffer must be at least (sizeof(uint64) * number of events in group) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is not set on the group containing the events. The buffer must be at least (sizeof(uint64) * number of domain instances * number of events in group) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is set on the group.

The data format returned in eventValueBuffer is:

  • domain instance 0: event0 event1 ... eventN

  • domain instance 1: event0 event1 ... eventN

  • ...

  • domain instance M: event0 event1 ... eventN

The event order in eventValueBuffer is returned in eventIdArray. The size of eventIdArray is specified in eventIdArraySizeBytes. The size should be at least (sizeof(CUpti_EventID) * number of events in group).

If any instance of any event counter overflows, the value returned for that event instance will be CUPTI_EVENT_OVERFLOW.

The only allowed value for flags is CUPTI_EVENT_READ_FLAG_NONE.

Reading events from a disabled event group is not allowed. After being read, an event's value is reset to zero.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.). If cuptiEventGroupResetAllEvents is called simultaneously with this function, then returned event values are undefined.

Parameters
eventGroup
The event group
flags
Flags controlling the reading mode
eventValueBufferSizeBytes
The size of eventValueBuffer in bytes, and returns the number of bytes written to eventValueBuffer
eventValueBuffer
Returns the event values
eventIdArraySizeBytes
The size of eventIdArray in bytes, and returns the number of bytes written to eventIdArray
eventIdArray
Returns the IDs of the events in the same order as the values return in eventValueBuffer.
numEventIdsRead
Returns the number of event IDs returned in eventIdArray
CUptiResult cuptiEventGroupReadEvent ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, CUpti_EventID event, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer )

Read the value for an event in an event group. Read the value for an event in an event group. The event value is returned in the eventValueBuffer buffer. eventValueBufferSizeBytes indicates the size of the eventValueBuffer buffer. The buffer must be at least sizeof(uint64) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is not set on the group containing the event. The buffer must be at least (sizeof(uint64) * number of domain instances) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is set on the group.

If any instance of an event counter overflows, the value returned for that event instance will be CUPTI_EVENT_OVERFLOW.

The only allowed value for flags is CUPTI_EVENT_READ_FLAG_NONE.

Reading an event from a disabled event group is not allowed. After being read, an event's value is reset to zero.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.). If cuptiEventGroupResetAllEvents is called simultaneously with this function, then returned event values are undefined.

Parameters
eventGroup
The event group
flags
Flags controlling the reading mode
event
The event to read
eventValueBufferSizeBytes
The size of eventValueBuffer in bytes, and returns the number of bytes written to eventValueBuffer
eventValueBuffer
Returns the event value(s)
CUptiResult cuptiEventGroupRemoveAllEvents ( CUpti_EventGroup eventGroup )

Remove all events from an event group. Remove all events from an event group. Events cannot be removed if the event group is enabled.

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroup
The event group
CUptiResult cuptiEventGroupRemoveEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )

Remove an event from an event group. Remove event from the an event group. The event cannot be removed if the event group is enabled.

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroup
The event group
event
The event to remove from the group
CUptiResult cuptiEventGroupResetAllEvents ( CUpti_EventGroup eventGroup )

Zero all the event counts in an event group. Zero all the event counts in an event group.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.).

Parameters
eventGroup
The event group
CUptiResult cuptiEventGroupSetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t valueSize, void* value )

Write an event group attribute. Write an event group attribute.

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroup
The event group
attrib
The attribute to write
valueSize
The size, in bytes, of the value
value
The attribute value to write
CUptiResult cuptiEventGroupSetsCreate ( CUcontext context, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, CUpti_EventGroupSets** eventGroupPasses )

For a set of events, get the grouping that indicates the number of passes and the event groups necessary to collect the events. The number of events that can be collected simultaneously varies by device and by the type of the events. When events can be collected simultaneously, they may need to be grouped into multiple event groups because they are from different event domains. This function takes a set of events and determines how many passes are required to collect all those events, and which events can be collected simultaneously in each pass.

The CUpti_EventGroupSets returned in eventGroupPasses indicates how many passes are required to collect the events with the numSets field. Within each event group set, the sets array indicates the event groups that should be collected on each pass.

Note:

Thread-safety: this function is thread safe, but client must guard against another thread simultaneously destroying context.

Parameters
context
The context for event collection
eventIdArraySizeBytes
Size of eventIdArray in bytes
eventIdArray
Array of event IDs that need to be grouped
eventGroupPasses
Returns a CUpti_EventGroupSets object that indicates the number of passes required to collect the events and the events to collect on each pass
CUptiResult cuptiEventGroupSetsDestroy ( CUpti_EventGroupSets* eventGroupSets )

Destroy a CUpti_EventGroupSets object. Destroy a CUpti_EventGroupSets object.

Note:

Thread-safety: this function is thread safe.

Parameters
eventGroupSets
The object to destroy
CUptiResult cuptiGetNumEventDomains ( uint32_t* numDomains )

Get the number of event domains available on any device. Returns the total number of event domains available on any CUDA-capable device.

Note:

Thread-safety: this function is thread safe.

Parameters
numDomains
Returns the number of domains
CUptiResult cuptiSetEventCollectionMode ( CUcontext context, CUpti_EventCollectionMode mode )

Set the event collection mode. Set the event collection mode for a context. The mode controls the event collection behavior of all events in event groups created in the context.

Note:

Thread-safety: this function is thread safe.

Parameters
context
The context
mode
The event collection mode

CUPTI Metric API

Description

Functions, types, and enums that implement the CUPTI Metric API.

Classes

union 
A metric value.

Typedefs

typedef uint32_t  CUpti_MetricID
ID for a metric.

Enumerations

enum CUpti_MetricAttribute
Metric attributes.
enum CUpti_MetricCategory
A metric category.
enum CUpti_MetricEvaluationMode
A metric evaluation mode.
enum CUpti_MetricValueKind
Kinds of metric values.

Functions

CUptiResult cuptiDeviceEnumMetrics ( CUdevice device, size_t* arraySizeBytes, CUpti_MetricID* metricArray )
Get the metrics for a device.
CUptiResult cuptiDeviceGetNumMetrics ( CUdevice device, uint32_t* numMetrics )
Get the number of metrics for a device.
CUptiResult cuptiEnumMetrics ( size_t* arraySizeBytes, CUpti_MetricID* metricArray )
Get all the metrics available on any device.
CUptiResult cuptiGetNumMetrics ( uint32_t* numMetrics )
Get the total number of metrics available on any device.
CUptiResult cuptiMetricCreateEventGroupSets ( CUcontext context, size_t metricIdArraySizeBytes, CUpti_MetricID* metricIdArray, CUpti_EventGroupSets** eventGroupPasses )
For a set of metrics, get the grouping that indicates the number of passes and the event groups necessary to collect the events required for those metrics.
CUptiResult cuptiMetricEnumEvents ( CUpti_MetricID metric, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray )
Get the events required to calculating a metric.
CUptiResult cuptiMetricGetAttribute ( CUpti_MetricID metric, CUpti_MetricAttribute attrib, size_t* valueSize, void* value )
Get a metric attribute.
CUptiResult cuptiMetricGetIdFromName ( CUdevice device, const char* metricName, CUpti_MetricID* metric )
Find an metric by name.
CUptiResult cuptiMetricGetNumEvents ( CUpti_MetricID metric, uint32_t* numEvents )
Get number of events required to calculate a metric.
CUptiResult cuptiMetricGetValue ( CUdevice device, CUpti_MetricID metric, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t eventValueArraySizeBytes, uint64_t* eventValueArray, uint64_t timeDuration, CUpti_MetricValue* metricValue )
Calculate the value for a metric.

Typedefs

typedef uint32_t CUpti_MetricID

ID for a metric. A metric provides a measure of some aspect of the device.

Enumerations

enum CUpti_MetricAttribute

Metric attributes. Metric attributes describe properties of a metric. These attributes can be read using cuptiMetricGetAttribute.

Values
CUPTI_METRIC_ATTR_NAME = 0
Metric name. Value is a null terminated const c-string.
CUPTI_METRIC_ATTR_SHORT_DESCRIPTION = 1
Short description of metric. Value is a null terminated const c-string.
CUPTI_METRIC_ATTR_LONG_DESCRIPTION = 2
Long description of metric. Value is a null terminated const c-string.
CUPTI_METRIC_ATTR_CATEGORY = 3
Category of the metric. Value is of type CUpti_MetricCategory.
CUPTI_METRIC_ATTR_VALUE_KIND = 4
Value type of the metric. Value is of type CUpti_MetricValueKind.
CUPTI_METRIC_ATTR_EVALUATION_MODE = 5
Metric evaluation mode. Value is of type CUpti_MetricEvaluationMode.
CUPTI_METRIC_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_MetricCategory

A metric category. Each metric is assigned to a category that represents the general type of the metric. A metric's category is accessed using cuptiMetricGetAttribute and the CUPTI_METRIC_ATTR_CATEGORY attribute.

Values
CUPTI_METRIC_CATEGORY_MEMORY = 0
A memory related metric.
CUPTI_METRIC_CATEGORY_INSTRUCTION = 1
An instruction related metric.
CUPTI_METRIC_CATEGORY_MULTIPROCESSOR = 2
A multiprocessor related metric.
CUPTI_METRIC_CATEGORY_CACHE = 3
A cache related metric.
CUPTI_METRIC_CATEGORY_TEXTURE = 4
A texture related metric.
CUPTI_METRIC_CATEGORY_FORCE_INT = 0x7fffffff
enum CUpti_MetricEvaluationMode

A metric evaluation mode. A metric can be evaluated per hardware instance to know the load balancing across instances of a domain or the metric can be evaluated in aggregate mode when the events involved in metric evaluation are from different event domains. It might be possible to evaluate some metrics in both modes for convenience. A metric's evaluation mode is accessed using CUpti_MetricEvaluationMode and the CUPTI_METRIC_ATTR_EVALUATION_MODE attribute.

Values
CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE = 1
If the metric evaluation mode is per instance, then the event value passed to cuptiMetricGetValue should contain value for an instance of the domain. Also in this mode, cuptiMetricGetValue should be called for all available instances of the domain to get overall status.
CUPTI_METRIC_EVALUATION_MODE_AGGREGATE = 1<<1
If the metric evaluation mode is aggregate, then the event value passed to cuptiMetricGetValue should be aggregated value of an event for all instances of the domain. In this mode, cuptiMetricGetValue should be called only once.
CUPTI_METRIC_EVALUATION_MODE_FORCE_INT = 0x7fffffff
enum CUpti_MetricValueKind

Kinds of metric values. Metric values can be one of several different kinds. Corresponding to each kind is a member of the CUpti_MetricValue union. The metric value returned by cuptiMetricGetValue should be accessed using the appropriate member of that union based on its value kind.

Values
CUPTI_METRIC_VALUE_KIND_DOUBLE = 0
The metric value is a 64-bit double.
CUPTI_METRIC_VALUE_KIND_UINT64 = 1
The metric value is a 64-bit unsigned integer.
CUPTI_METRIC_VALUE_KIND_PERCENT = 2
The metric value is a percentage represented by a 64-bit double. For example, 57.5% is represented by the value 57.5.
CUPTI_METRIC_VALUE_KIND_THROUGHPUT = 3
The metric value is a throughput represented by a 64-bit integer. The unit for throughput values is bytes/second.
CUPTI_METRIC_VALUE_KIND_INT64 = 4
The metric value is a 64-bit signed integer.
CUPTI_METRIC_VALUE_KIND_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiDeviceEnumMetrics ( CUdevice device, size_t* arraySizeBytes, CUpti_MetricID* metricArray )

Get the metrics for a device. Returns the metric IDs in metricArray for a device. The size of the metricArray buffer is given by *arraySizeBytes. The size of the metricArray buffer must be at least numMetrics * sizeof(CUpti_MetricID) or else all metric IDs will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in metricArray.

Parameters
device
The CUDA device
arraySizeBytes
The size of metricArray in bytes, and returns the number of bytes written to metricArray
metricArray
Returns the IDs of the metrics for the device
CUptiResult cuptiDeviceGetNumMetrics ( CUdevice device, uint32_t* numMetrics )

Get the number of metrics for a device. Returns the number of metrics available for a device.

Parameters
device
The CUDA device
numMetrics
Returns the number of metrics available for the device
CUptiResult cuptiEnumMetrics ( size_t* arraySizeBytes, CUpti_MetricID* metricArray )

Get all the metrics available on any device. Returns the metric IDs in metricArray for all CUDA-capable devices. The size of the metricArray buffer is given by *arraySizeBytes. The size of the metricArray buffer must be at least numMetrics * sizeof(CUpti_MetricID) or all metric IDs will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in metricArray.

Parameters
arraySizeBytes
The size of metricArray in bytes, and returns the number of bytes written to metricArray
metricArray
Returns the IDs of the metrics
CUptiResult cuptiGetNumMetrics ( uint32_t* numMetrics )

Get the total number of metrics available on any device. Returns the total number of metrics available on any CUDA-capable devices.

Parameters
numMetrics
Returns the number of metrics
CUptiResult cuptiMetricCreateEventGroupSets ( CUcontext context, size_t metricIdArraySizeBytes, CUpti_MetricID* metricIdArray, CUpti_EventGroupSets** eventGroupPasses )

For a set of metrics, get the grouping that indicates the number of passes and the event groups necessary to collect the events required for those metrics. For a set of metrics, get the grouping that indicates the number of passes and the event groups necessary to collect the events required for those metrics.

See also:

cuptiEventGroupSetsCreate for details on event group set creation.

Parameters
context
The context for event collection
metricIdArraySizeBytes
Size of the metricIdArray in bytes
metricIdArray
Array of metric IDs
eventGroupPasses
Returns a CUpti_EventGroupSets object that indicates the number of passes required to collect the events and the events to collect on each pass
CUptiResult cuptiMetricEnumEvents ( CUpti_MetricID metric, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray )

Get the events required to calculating a metric. Gets the event IDs in eventIdArray required to calculate a metric. The size of the eventIdArray buffer is given by *eventIdArraySizeBytes and must be at least numEvents * sizeof(CUpti_EventID) or all events will not be returned. The value returned in *eventIdArraySizeBytes contains the number of bytes returned in eventIdArray.

Parameters
metric
ID of the metric
eventIdArraySizeBytes
The size of eventIdArray in bytes, and returns the number of bytes written to eventIdArray
eventIdArray
Returns the IDs of the events required to calculate metric
CUptiResult cuptiMetricGetAttribute ( CUpti_MetricID metric, CUpti_MetricAttribute attrib, size_t* valueSize, void* value )

Get a metric attribute. Returns a metric attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

Parameters
metric
ID of the metric
attrib
The metric attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
CUptiResult cuptiMetricGetIdFromName ( CUdevice device, const char* metricName, CUpti_MetricID* metric )

Find an metric by name. Find a metric by name and return the metric ID in *metric.

Parameters
device
The CUDA device
metricName
The name of metric to find
metric
Returns the ID of the found metric or undefined if unable to find the metric
CUptiResult cuptiMetricGetNumEvents ( CUpti_MetricID metric, uint32_t* numEvents )

Get number of events required to calculate a metric. Returns the number of events in numEvents that are required to calculate a metric.

Parameters
metric
ID of the metric
numEvents
Returns the number of events required for the metric
CUptiResult cuptiMetricGetValue ( CUdevice device, CUpti_MetricID metric, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t eventValueArraySizeBytes, uint64_t* eventValueArray, uint64_t timeDuration, CUpti_MetricValue* metricValue )

Calculate the value for a metric. Use the events collected for a metric to calculate the metric value. Metric value evaluation depends on the evaluation mode CUpti_MetricEvaluationMode that the metric supports. If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE, then it assumes that the input event value is for one domain instance. If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_AGGREGATE, it assumes that input event values are normalized to represent all domain instances on a device. For the most accurate metric collection, the events required for the metric should be collected for all profiled domain instances. For example, to collect all instances of an event, set the CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES attribute on the group containing the event to 1. The normalized value for the event is then: (sum_event_values * totalInstanceCount) / instanceCount, where sum_event_values is the summation of the event values across all profiled domain instances, totalInstanceCount is obtained from querying CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT and instanceCount is obtained from querying CUPTI_EVENT_GROUP_ATTR_INSTANCE_COUNT (or CUPTI_EVENT_DOMAIN_ATTR_INSTANCE_COUNT).

Parameters
device
The CUDA device that the metric is being calculated for
metric
The metric ID
eventIdArraySizeBytes
The size of eventIdArray in bytes
eventIdArray
The event IDs required to calculate metric
eventValueArraySizeBytes
The size of eventValueArray in bytes
eventValueArray
The normalized event values required to calculate metric. The values must be order to match the order of events in eventIdArray
timeDuration
The duration over which the events were collected, in ns
metricValue
Returns the value for the metric

Data Structures

Here are the data structures with brief descriptions:

CUpti_Activity
The base activity record
CUpti_ActivityAPI
The activity record for a driver or runtime API invocation
CUpti_ActivityBranch
The activity record for source level result branch
CUpti_ActivityContext
The activity record for a context
CUpti_ActivityDevice
The activity record for a device
CUpti_ActivityEvent
The activity record for a CUPTI event
CUpti_ActivityGlobalAccess
The activity record for source-level global access
CUpti_ActivityKernel
The activity record for kernel
CUpti_ActivityMarker
The activity record providing a marker which is an instantaneous point in time
CUpti_ActivityMarkerData
The activity record providing detailed information for a marker
CUpti_ActivityMemcpy
The activity record for memory copies
CUpti_ActivityMemset
The activity record for memset
CUpti_ActivityMetric
The activity record for a CUPTI metric
CUpti_ActivityName
The activity record providing a name
CUpti_ActivityObjectKindId
Identifiers for object kinds as specified by CUpti_ActivityObjectKind
CUpti_ActivityOverhead
The activity record for CUPTI and driver overheads
CUpti_ActivitySourceLocator
The activity record for source locator
CUpti_CallbackData
Data passed into a runtime or driver API callback function
CUpti_EventGroupSet
A set of event groups
CUpti_EventGroupSets
A set of event group sets
CUpti_MetricValue
A metric value
CUpti_NvtxData
Data passed into a NVTX callback function
CUpti_ResourceData
Data passed into a resource callback function
CUpti_SynchronizeData
Data passed into a synchronize callback function

CUpti_Activity Struct Reference

[CUPTI Activity API]

Description

The base activity record. The activity API uses a CUpti_Activity as a generic representation for any activity. The 'kind' field is used to determine the specific activity kind, and from that the CUpti_Activity object can be cast to the specific activity record type appropriate for that kind.

Note that all activity record types are padded and aligned to ensure that each member of the record is naturally aligned.

See also:

CUpti_ActivityKind

Public Variables

CUpti_ActivityKind kind

Variables

CUpti_ActivityKindCUpti_Activity::kind [inherited]

The kind of this activity.

CUpti_ActivityAPI Struct Reference

[CUPTI Activity API]

Description

The activity record for a driver or runtime API invocation. This activity record represents an invocation of a driver or runtime API (CUPTI_ACTIVITY_KIND_DRIVER and CUPTI_ACTIVITY_KIND_RUNTIME).

Public Variables

CUpti_CallbackId cbid
uint32_t  correlationId
uint64_t  end
CUpti_ActivityKind kind
uint32_t  processId
uint32_t  returnValue
uint64_t  start
uint32_t  threadId

Variables

CUpti_CallbackIdCUpti_ActivityAPI::cbid [inherited]

The ID of the driver or runtime function.

uint32_t CUpti_ActivityAPI::correlationId [inherited]

The correlation ID of the driver or runtime CUDA function. Each function invocation is assigned a unique correlation ID that is identical to the correlation ID in the memcpy, memset, or kernel activity record that is associated with this function.

uint64_t CUpti_ActivityAPI::end [inherited]

The end timestamp for the function, in ns.

CUpti_ActivityKindCUpti_ActivityAPI::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_DRIVER or CUPTI_ACTIVITY_KIND_RUNTIME.

uint32_t CUpti_ActivityAPI::processId [inherited]

The ID of the process where the driver or runtime CUDA function is executing.

uint32_t CUpti_ActivityAPI::returnValue [inherited]

The return value for the function. For a CUDA driver function with will be a CUresult value, and for a CUDA runtime function this will be a cudaError_t value.

uint64_t CUpti_ActivityAPI::start [inherited]

The start timestamp for the function, in ns.

uint32_t CUpti_ActivityAPI::threadId [inherited]

The ID of the thread where the driver or runtime CUDA function is executing.

CUpti_ActivityBranch Struct Reference

[CUPTI Activity API]

Description

The activity record for source level result branch. This activity record the locations of the branches in the source (CUPTI_ACTIVITY_KIND_BRANCH).

Public Variables

uint32_t  correlationId
uint32_t  diverged
uint32_t  executed
CUpti_ActivityKind kind
uint32_t  pcOffset
uint32_t  sourceLocatorId
uint64_t  threadsExecuted

Variables

uint32_t CUpti_ActivityBranch::correlationId [inherited]

The correlation ID of the kernel to which this result is associated.

uint32_t CUpti_ActivityBranch::diverged [inherited]

Number of times this branch diverged

uint32_t CUpti_ActivityBranch::executed [inherited]

The number of times this branch was executed

CUpti_ActivityKindCUpti_ActivityBranch::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_BRANCH.

uint32_t CUpti_ActivityBranch::pcOffset [inherited]

The pc offset for the branch.

uint32_t CUpti_ActivityBranch::sourceLocatorId [inherited]

The ID for source locator.

uint64_t CUpti_ActivityBranch::threadsExecuted [inherited]

This increments each time when this instruction is executed by number of threads that executed this instruction

CUpti_ActivityContext Struct Reference

[CUPTI Activity API]

Description

The activity record for a context. This activity record represents information about a context (CUPTI_ACTIVITY_KIND_CONTEXT).

Public Variables

CUpti_ActivityComputeApiKind computeApiKind
uint32_t  contextId
uint32_t  deviceId
CUpti_ActivityKind kind

Variables

CUpti_ActivityComputeApiKindCUpti_ActivityContext::computeApiKind [inherited]

The compute API kind.

See also:

CUpti_ActivityComputeApiKind

uint32_t CUpti_ActivityContext::contextId [inherited]

The context ID.

uint32_t CUpti_ActivityContext::deviceId [inherited]

The device ID.

CUpti_ActivityKindCUpti_ActivityContext::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_CONTEXT.

CUpti_ActivityDevice Struct Reference

[CUPTI Activity API]

Description

The activity record for a device. This activity record represents information about a GPU device (CUPTI_ACTIVITY_KIND_DEVICE).

Public Variables

uint32_t  computeCapabilityMajor
uint32_t  computeCapabilityMinor
uint32_t  constantMemorySize
uint32_t  coreClockRate
CUpti_ActivityFlag flags
uint64_t  globalMemoryBandwidth
uint64_t  globalMemorySize
uint32_t  id
CUpti_ActivityKind kind
uint32_t  l2CacheSize
uint32_t  maxBlockDimX
uint32_t  maxBlockDimY
uint32_t  maxBlockDimZ
uint32_t  maxBlocksPerMultiprocessor
uint32_t  maxGridDimX
uint32_t  maxGridDimY
uint32_t  maxGridDimZ
uint32_t  maxIPC
uint32_t  maxRegistersPerBlock
uint32_t  maxSharedMemoryPerBlock
uint32_t  maxThreadsPerBlock
uint32_t  maxWarpsPerMultiprocessor
const char * name
uint32_t  numMemcpyEngines
uint32_t  numMultiprocessors
uint32_t  numThreadsPerWarp

Variables

uint32_t CUpti_ActivityDevice::computeCapabilityMajor [inherited]

Compute capability for the device, major number.

uint32_t CUpti_ActivityDevice::computeCapabilityMinor [inherited]

Compute capability for the device, minor number.

uint32_t CUpti_ActivityDevice::constantMemorySize [inherited]

The amount of constant memory on the device, in bytes.

uint32_t CUpti_ActivityDevice::coreClockRate [inherited]

The core clock rate of the device, in kHz.

CUpti_ActivityFlagCUpti_ActivityDevice::flags [inherited]

The flags associated with the device.

See also:

CUpti_ActivityFlag

uint64_t CUpti_ActivityDevice::globalMemoryBandwidth [inherited]

The global memory bandwidth available on the device, in kBytes/sec.

uint64_t CUpti_ActivityDevice::globalMemorySize [inherited]

The amount of global memory on the device, in bytes.

uint32_t CUpti_ActivityDevice::id [inherited]

The device ID.

CUpti_ActivityKindCUpti_ActivityDevice::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_DEVICE.

uint32_t CUpti_ActivityDevice::l2CacheSize [inherited]

The size of the L2 cache on the device, in bytes.

uint32_t CUpti_ActivityDevice::maxBlockDimX [inherited]

Maximum allowed X dimension for a block.

uint32_t CUpti_ActivityDevice::maxBlockDimY [inherited]

Maximum allowed Y dimension for a block.

uint32_t CUpti_ActivityDevice::maxBlockDimZ [inherited]

Maximum allowed Z dimension for a block.

uint32_t CUpti_ActivityDevice::maxBlocksPerMultiprocessor [inherited]

Maximum number of blocks that can be present on a multiprocessor at any given time.

uint32_t CUpti_ActivityDevice::maxGridDimX [inherited]

Maximum allowed X dimension for a grid.

uint32_t CUpti_ActivityDevice::maxGridDimY [inherited]

Maximum allowed Y dimension for a grid.

uint32_t CUpti_ActivityDevice::maxGridDimZ [inherited]

Maximum allowed Z dimension for a grid.

uint32_t CUpti_ActivityDevice::maxIPC [inherited]

The maximum "instructions per cycle" possible on each device multiprocessor.

uint32_t CUpti_ActivityDevice::maxRegistersPerBlock [inherited]

Maximum number of registers that can be allocated to a block.

uint32_t CUpti_ActivityDevice::maxSharedMemoryPerBlock [inherited]

Maximum amount of shared memory that can be assigned to a block, in bytes.

uint32_t CUpti_ActivityDevice::maxThreadsPerBlock [inherited]

Maximum number of threads allowed in a block.

uint32_t CUpti_ActivityDevice::maxWarpsPerMultiprocessor [inherited]

Maximum number of warps that can be present on a multiprocessor at any given time.

const char * CUpti_ActivityDevice::name [inherited]

The device name. This name is shared across all activity records representing instances of the device, and so should not be modified.

uint32_t CUpti_ActivityDevice::numMemcpyEngines [inherited]

Number of memory copy engines on the device.

uint32_t CUpti_ActivityDevice::numMultiprocessors [inherited]

Number of multiprocessors on the device.

uint32_t CUpti_ActivityDevice::numThreadsPerWarp [inherited]

The number of threads per warp on the device.

CUpti_ActivityEvent Struct Reference

[CUPTI Activity API]

Description

The activity record for a CUPTI event. This activity record represents the collection of a CUPTI event value (CUPTI_ACTIVITY_KIND_EVENT). This activity record kind is not produced by the activity API but is included for completeness and ease-of-use. Profile frameworks built on top of CUPTI that collect event data may choose to use this type to store the collected event data.

Public Variables

uint32_t  correlationId
CUpti_EventDomainID domain
CUpti_EventID id
CUpti_ActivityKind kind
uint64_t  value

Variables

uint32_t CUpti_ActivityEvent::correlationId [inherited]

The correlation ID of the event. Use of this ID is user-defined, but typically this ID value will equal the correlation ID of the kernel for which the event was gathered.

CUpti_EventDomainIDCUpti_ActivityEvent::domain [inherited]

The event domain ID.

CUpti_EventIDCUpti_ActivityEvent::id [inherited]

The event ID.

CUpti_ActivityKindCUpti_ActivityEvent::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_EVENT.

uint64_t CUpti_ActivityEvent::value [inherited]

The event value.

CUpti_ActivityGlobalAccess Struct Reference

[CUPTI Activity API]

Description

The activity record for source-level global access. This activity records the locations of the global accesses in the source (CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS).

Public Variables

uint32_t  correlationId
uint32_t  executed
CUpti_ActivityFlag flags
CUpti_ActivityKind kind
uint64_t  l2_transactions
uint32_t  pcOffset
uint32_t  sourceLocatorId
uint64_t  threadsExecuted

Variables

uint32_t CUpti_ActivityGlobalAccess::correlationId [inherited]

The correlation ID of the kernel to which this result is associated.

uint32_t CUpti_ActivityGlobalAccess::executed [inherited]

The number of times this instruction was executed

CUpti_ActivityFlagCUpti_ActivityGlobalAccess::flags [inherited]

The properties of this global access.

CUpti_ActivityKindCUpti_ActivityGlobalAccess::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS.

uint64_t CUpti_ActivityGlobalAccess::l2_transactions [inherited]

The total number of 32 bytes transactions to L2 cache generated by this access

uint32_t CUpti_ActivityGlobalAccess::pcOffset [inherited]

The pc offset for the access.

uint32_t CUpti_ActivityGlobalAccess::sourceLocatorId [inherited]

The ID for source locator.

uint64_t CUpti_ActivityGlobalAccess::threadsExecuted [inherited]

This increments each time when this instruction is executed by number of threads that executed this instruction

CUpti_ActivityKernel Struct Reference

[CUPTI Activity API]

Description

The activity record for kernel. This activity record represents a kernel execution (CUPTI_ACTIVITY_KIND_KERNEL and CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL).

Public Variables

int32_t  blockX
int32_t  blockY
int32_t  blockZ
uint8_t  cacheConfigExecuted
uint8_t  cacheConfigRequested
uint32_t  contextId
uint32_t  correlationId
uint32_t  deviceId
int32_t  dynamicSharedMemory
uint64_t  end
int32_t  gridX
int32_t  gridY
int32_t  gridZ
CUpti_ActivityKind kind
uint32_t  localMemoryPerThread
uint32_t  localMemoryTotal
const char * name
uint32_t  pad
uint16_t  registersPerThread
void * reserved0
uint32_t  runtimeCorrelationId
uint64_t  start
int32_t  staticSharedMemory
uint32_t  streamId

Variables

int32_t CUpti_ActivityKernel::blockX [inherited]

The X-dimension block size for the kernel.

int32_t CUpti_ActivityKernel::blockY [inherited]

The Y-dimension block size for the kernel.

int32_t CUpti_ActivityKernel::blockZ [inherited]

The Z-dimension grid size for the kernel.

uint8_t CUpti_ActivityKernel::cacheConfigExecuted [inherited]

The cache configuration used for the kernel. The value is one of the CUfunc_cache enumeration values from cuda.h.

uint8_t CUpti_ActivityKernel::cacheConfigRequested [inherited]

The cache configuration requested by the kernel. The value is one of the CUfunc_cache enumeration values from cuda.h.

uint32_t CUpti_ActivityKernel::contextId [inherited]

The ID of the context where the kernel is executing.

uint32_t CUpti_ActivityKernel::correlationId [inherited]

The correlation ID of the kernel. Each kernel execution is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the kernel.

uint32_t CUpti_ActivityKernel::deviceId [inherited]

The ID of the device where the kernel is executing.

int32_t CUpti_ActivityKernel::dynamicSharedMemory [inherited]

The dynamic shared memory reserved for the kernel, in bytes.

uint64_t CUpti_ActivityKernel::end [inherited]

The end timestamp for the kernel execution, in ns.

int32_t CUpti_ActivityKernel::gridX [inherited]

The X-dimension grid size for the kernel.

int32_t CUpti_ActivityKernel::gridY [inherited]

The Y-dimension grid size for the kernel.

int32_t CUpti_ActivityKernel::gridZ [inherited]

The Z-dimension grid size for the kernel.

CUpti_ActivityKindCUpti_ActivityKernel::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_KERNEL or CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL.

uint32_t CUpti_ActivityKernel::localMemoryPerThread [inherited]

The amount of local memory reserved for each thread, in bytes.

uint32_t CUpti_ActivityKernel::localMemoryTotal [inherited]

The total amount of local memory reserved for the kernel, in bytes.

const char * CUpti_ActivityKernel::name [inherited]

The name of the kernel. This name is shared across all activity records representing the same kernel, and so should not be modified.

uint32_t CUpti_ActivityKernel::pad [inherited]

Undefined. Reserved for internal use.

uint16_t CUpti_ActivityKernel::registersPerThread [inherited]

The number of registers required for each thread executing the kernel.

void * CUpti_ActivityKernel::reserved0 [inherited]

Undefined. Reserved for internal use.

uint32_t CUpti_ActivityKernel::runtimeCorrelationId [inherited]

The runtime correlation ID of the kernel. Each kernel execution is assigned a unique runtime correlation ID that is identical to the correlation ID in the runtime API activity record that launched the kernel.

uint64_t CUpti_ActivityKernel::start [inherited]

The start timestamp for the kernel execution, in ns.

int32_t CUpti_ActivityKernel::staticSharedMemory [inherited]

The static shared memory allocated for the kernel, in bytes.

uint32_t CUpti_ActivityKernel::streamId [inherited]

The ID of the stream where the kernel is executing.

CUpti_ActivityMarker Struct Reference

[CUPTI Activity API]

Description

The activity record providing a marker which is an instantaneous point in time. The marker is specified with a descriptive name and unique id (CUPTI_ACTIVITY_KIND_MARKER).

Public Variables

CUpti_ActivityFlag flags
uint32_t  id
CUpti_ActivityKind kind
const char * name
union CUpti_ActivityObjectKindId objectId
CUpti_ActivityObjectKind objectKind
uint64_t  timestamp

Variables

CUpti_ActivityFlagCUpti_ActivityMarker::flags [inherited]

The flags associated with the marker.

See also:

CUpti_ActivityFlag

uint32_t CUpti_ActivityMarker::id [inherited]

The marker ID.

CUpti_ActivityKindCUpti_ActivityMarker::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_MARKER.

const char * CUpti_ActivityMarker::name [inherited]

The marker name for an instantaneous or start marker. This will be NULL for an end marker.

union CUpti_ActivityObjectKindIdCUpti_ActivityMarker::objectId [inherited]

The identifier for the activity object associated with this marker. 'objectKind' indicates which ID is valid for this record.

CUpti_ActivityObjectKindCUpti_ActivityMarker::objectKind [inherited]

The kind of activity object associated with this marker.

uint64_t CUpti_ActivityMarker::timestamp [inherited]

The timestamp for the marker, in ns.

CUpti_ActivityMarkerData Struct Reference

[CUPTI Activity API]

Description

The activity record providing detailed information for a marker. The marker data contains color, payload, and category. (CUPTI_ACTIVITY_KIND_MARKER_DATA).

Public Variables

uint32_t  category
uint32_t  color
CUpti_ActivityFlag flags
uint32_t  id
CUpti_ActivityKind kind
union CUpti_MetricValue payload
CUpti_MetricValueKind payloadKind

Variables

uint32_t CUpti_ActivityMarkerData::category [inherited]

The category for the marker.

uint32_t CUpti_ActivityMarkerData::color [inherited]

The color for the marker.

CUpti_ActivityFlagCUpti_ActivityMarkerData::flags [inherited]

The flags associated with the marker.

See also:

CUpti_ActivityFlag

uint32_t CUpti_ActivityMarkerData::id [inherited]

The marker ID.

CUpti_ActivityKindCUpti_ActivityMarkerData::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_MARKER_DATA.

union CUpti_MetricValueCUpti_ActivityMarkerData::payload [inherited]

The payload value.

CUpti_MetricValueKindCUpti_ActivityMarkerData::payloadKind [inherited]

Defines the payload format for the value associated with the marker.

CUpti_ActivityMemcpy Struct Reference

[CUPTI Activity API]

Description

The activity record for memory copies. This activity record represents a memory copy (CUPTI_ACTIVITY_KIND_MEMCPY).

Public Variables

uint64_t  bytes
uint32_t  contextId
uint8_t  copyKind
uint32_t  correlationId
uint32_t  deviceId
uint8_t  dstKind
uint64_t  end
uint8_t  flags
CUpti_ActivityKind kind
void * reserved0
uint32_t  runtimeCorrelationId
uint8_t  srcKind
uint64_t  start
uint32_t  streamId

Variables

uint64_t CUpti_ActivityMemcpy::bytes [inherited]

The number of bytes transferred by the memory copy.

uint32_t CUpti_ActivityMemcpy::contextId [inherited]

The ID of the context where the memory copy is occurring.

uint8_t CUpti_ActivityMemcpy::copyKind [inherited]

The kind of the memory copy, stored as a byte to reduce record size.

See also:

CUpti_ActivityMemcpyKind

uint32_t CUpti_ActivityMemcpy::correlationId [inherited]

The correlation ID of the memory copy. Each memory copy is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the memory copy.

uint32_t CUpti_ActivityMemcpy::deviceId [inherited]

The ID of the device where the memory copy is occurring.

uint8_t CUpti_ActivityMemcpy::dstKind [inherited]

The destination memory kind read by the memory copy, stored as a byte to reduce record size.

See also:

CUpti_ActivityMemoryKind

uint64_t CUpti_ActivityMemcpy::end [inherited]

The end timestamp for the memory copy, in ns.

uint8_t CUpti_ActivityMemcpy::flags [inherited]

The flags associated with the memory copy.

See also:

CUpti_ActivityFlag

CUpti_ActivityKindCUpti_ActivityMemcpy::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMCPY.

void * CUpti_ActivityMemcpy::reserved0 [inherited]

Undefined. Reserved for internal use.

uint32_t CUpti_ActivityMemcpy::runtimeCorrelationId [inherited]

The runtime correlation ID of the memory copy. Each memory copy is assigned a unique runtime correlation ID that is identical to the correlation ID in the runtime API activity record that launched the memory copy.

uint8_t CUpti_ActivityMemcpy::srcKind [inherited]

The source memory kind read by the memory copy, stored as a byte to reduce record size.

See also:

CUpti_ActivityMemoryKind

uint64_t CUpti_ActivityMemcpy::start [inherited]

The start timestamp for the memory copy, in ns.

uint32_t CUpti_ActivityMemcpy::streamId [inherited]

The ID of the stream where the memory copy is occurring.

CUpti_ActivityMemset Struct Reference

[CUPTI Activity API]

Description

The activity record for memset. This activity record represents a memory set operation (CUPTI_ACTIVITY_KIND_MEMSET).

Public Variables

uint64_t  bytes
uint32_t  contextId
uint32_t  correlationId
uint32_t  deviceId
uint64_t  end
CUpti_ActivityKind kind
void * reserved0
uint32_t  runtimeCorrelationId
uint64_t  start
uint32_t  streamId
uint32_t  value

Variables

uint64_t CUpti_ActivityMemset::bytes [inherited]

The number of bytes being set by the memory set.

uint32_t CUpti_ActivityMemset::contextId [inherited]

The ID of the context where the memory set is occurring.

uint32_t CUpti_ActivityMemset::correlationId [inherited]

The correlation ID of the memory set. Each memory set is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the memory set.

uint32_t CUpti_ActivityMemset::deviceId [inherited]

The ID of the device where the memory set is occurring.

uint64_t CUpti_ActivityMemset::end [inherited]

The end timestamp for the memory set, in ns.

CUpti_ActivityKindCUpti_ActivityMemset::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMSET.

void * CUpti_ActivityMemset::reserved0 [inherited]

Undefined. Reserved for internal use.

uint32_t CUpti_ActivityMemset::runtimeCorrelationId [inherited]

The runtime correlation ID of the memory set. Each memory set is assigned a unique runtime correlation ID that is identical to the correlation ID in the runtime API activity record that launched the memory set.

uint64_t CUpti_ActivityMemset::start [inherited]

The start timestamp for the memory set, in ns.

uint32_t CUpti_ActivityMemset::streamId [inherited]

The ID of the stream where the memory set is occurring.

uint32_t CUpti_ActivityMemset::value [inherited]

The value being assigned to memory by the memory set.

CUpti_ActivityMetric Struct Reference

[CUPTI Activity API]

Description

The activity record for a CUPTI metric. This activity record represents the collection of a CUPTI metric value (CUPTI_ACTIVITY_KIND_METRIC). This activity record kind is not produced by the activity API but is included for completeness and ease-of-use. Profile frameworks built on top of CUPTI that collect metric data may choose to use this type to store the collected metric data.

Public Variables

uint32_t  correlationId
CUpti_MetricID id
CUpti_ActivityKind kind
uint32_t  pad
union CUpti_MetricValue value

Variables

uint32_t CUpti_ActivityMetric::correlationId [inherited]

The correlation ID of the metric. Use of this ID is user-defined, but typically this ID value will equal the correlation ID of the kernel for which the metric was gathered.

CUpti_MetricIDCUpti_ActivityMetric::id [inherited]

The metric ID.

CUpti_ActivityKindCUpti_ActivityMetric::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_METRIC.

uint32_t CUpti_ActivityMetric::pad [inherited]

Undefined. Reserved for internal use.

union CUpti_MetricValueCUpti_ActivityMetric::value [inherited]

The metric value.

CUpti_ActivityName Struct Reference

[CUPTI Activity API]

Description

The activity record providing a name. This activity record provides a name for a device, context, thread, etc. (CUPTI_ACTIVITY_KIND_NAME).

Public Variables

CUpti_ActivityKind kind
const char * name
union CUpti_ActivityObjectKindId objectId
CUpti_ActivityObjectKind objectKind

Variables

CUpti_ActivityKindCUpti_ActivityName::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_NAME.

const char * CUpti_ActivityName::name [inherited]

The name.

union CUpti_ActivityObjectKindIdCUpti_ActivityName::objectId [inherited]

The identifier for the activity object. 'objectKind' indicates which ID is valid for this record.

CUpti_ActivityObjectKindCUpti_ActivityName::objectKind [inherited]

The kind of activity object being named.

CUpti_ActivityObjectKindId Union Reference

[CUPTI Activity API]

Description

Identifiers for object kinds as specified by CUpti_ActivityObjectKind.

See also:

CUpti_ActivityObjectKind

Public Variables

CUpti_ActivityObjectKindId::@1  dcs
CUpti_ActivityObjectKindId::@0  pt

Variables

CUpti_ActivityObjectKindId::@1 CUpti_ActivityObjectKindId::dcs [inherited]

A device object requires that we identify the device ID. A context object requires that we identify both the device and context ID. A stream object requires that we identify device, context, and stream ID.

CUpti_ActivityObjectKindId::@0 CUpti_ActivityObjectKindId::pt [inherited]

A process object requires that we identify the process ID. A thread object requires that we identify both the process and thread ID.

CUpti_ActivityOverhead Struct Reference

[CUPTI Activity API]

Description

The activity record for CUPTI and driver overheads. This activity record provides CUPTI and driver overhead information (CUPTI_ACTIVITY_OVERHEAD).

Public Variables

uint64_t  end
CUpti_ActivityKind kind
union CUpti_ActivityObjectKindId objectId
CUpti_ActivityObjectKind objectKind
CUpti_ActivityOverheadKind overheadKind
uint64_t  start

Variables

uint64_t CUpti_ActivityOverhead::end [inherited]

The end timestamp for the overhead, in ns.

CUpti_ActivityKindCUpti_ActivityOverhead::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_OVERHEAD.

union CUpti_ActivityObjectKindIdCUpti_ActivityOverhead::objectId [inherited]

The identifier for the activity object. 'objectKind' indicates which ID is valid for this record.

CUpti_ActivityObjectKindCUpti_ActivityOverhead::objectKind [inherited]

The kind of activity object that the overhead is associated with.

CUpti_ActivityOverheadKindCUpti_ActivityOverhead::overheadKind [inherited]

The kind of overhead, CUPTI, DRIVER, COMPILER etc.

uint64_t CUpti_ActivityOverhead::start [inherited]

The start timestamp for the overhead, in ns.

CUpti_ActivitySourceLocator Struct Reference

[CUPTI Activity API]

Description

The activity record for source locator. This activity record represents a source locator (CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR).

Public Variables

const char * fileName
uint32_t  id
CUpti_ActivityKind kind
uint32_t  lineNumber

Variables

const char * CUpti_ActivitySourceLocator::fileName [inherited]

The path for the file.

uint32_t CUpti_ActivitySourceLocator::id [inherited]

The ID for the source path, will be used in all the source level results.

CUpti_ActivityKindCUpti_ActivitySourceLocator::kind [inherited]

The activity record kind, must be CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR.

uint32_t CUpti_ActivitySourceLocator::lineNumber [inherited]

The line number in the source .

CUpti_CallbackData Struct Reference

[CUPTI Callback API]

Description

Data passed into a runtime or driver API callback function. Data passed into a runtime or driver API callback function as the cbdata argument to CUpti_CallbackFunc. The cbdata will be this type for domain equal to CUPTI_CB_DOMAIN_DRIVER_API or CUPTI_CB_DOMAIN_RUNTIME_API. The callback data is valid only within the invocation of the callback function that is passed the data. If you need to retain some data for use outside of the callback, you must make a copy of that data. For example, if you make a shallow copy of CUpti_CallbackData within a callback, you cannot dereference functionParams outside of that callback to access the function parameters. functionName is an exception: the string pointed to by functionName is a global constant and so may be accessed outside of the callback.

Public Variables

CUpti_ApiCallbackSite callbackSite
CUcontext  context
uint32_t  contextUid
uint64_t * correlationData
uint32_t  correlationId
const char * functionName
const void * functionParams
void * functionReturnValue
const char * symbolName

Variables

CUpti_ApiCallbackSiteCUpti_CallbackData::callbackSite [inherited]

Point in the runtime or driver function from where the callback was issued.

CUcontext CUpti_CallbackData::context [inherited]

Driver context current to the thread, or null if no context is current. This value can change from the entry to exit callback of a runtime API function if the runtime initializes a context.

uint32_t CUpti_CallbackData::contextUid [inherited]

Unique ID for the CUDA context associated with the thread. The UIDs are assigned sequentially as contexts are created and are unique within a process.

uint64_t * CUpti_CallbackData::correlationData [inherited]

Pointer to data shared between the entry and exit callbacks of a given runtime or drive API function invocation. This field can be used to pass 64-bit values from the entry callback to the corresponding exit callback.

uint32_t CUpti_CallbackData::correlationId [inherited]

The activity record correlation ID for this callback. For a driver domain callback (i.e. domain CUPTI_CB_DOMAIN_DRIVER_API) this ID will equal the correlation ID in the CUpti_ActivityAPI record corresponding to the CUDA driver function call. For a runtime domain callback (i.e. domain CUPTI_CB_DOMAIN_RUNTIME_API) this ID will equal the correlation ID in the CUpti_ActivityAPI record corresponding to the CUDA runtime function call. Within the callback, this ID can be recorded to correlate user data with the activity record. This field is new in 4.1.

const char * CUpti_CallbackData::functionName [inherited]

Name of the runtime or driver API function which issued the callback. This string is a global constant and so may be accessed outside of the callback.

const void * CUpti_CallbackData::functionParams [inherited]

Pointer to the arguments passed to the runtime or driver API call. See generated_cuda_runtime_api_meta.h and generated_cuda_meta.h for structure definitions for the parameters for each runtime and driver API function.

void * CUpti_CallbackData::functionReturnValue [inherited]

Pointer to the return value of the runtime or driver API call. This field is only valid within the exit::CUPTI_API_EXIT callback. For a runtime API functionReturnValue points to a cudaError_t. For a driver API functionReturnValue points to a CUresult.

const char * CUpti_CallbackData::symbolName [inherited]

Name of the symbol operated on by the runtime or driver API function which issued the callback. This entry is valid only for driver and runtime launch callbacks, where it returns the name of the kernel.

CUpti_EventGroupSet Struct Reference

[CUPTI Event API]

Description

A set of event groups. A set of event groups. When returned by cuptiEventGroupSetsCreate and cuptiMetricCreateEventGroupSets a set indicates that event groups that can be enabled at the same time (i.e. all the events in the set can be collected simultaneously).

Public Variables

CUpti_EventGroup*eventGroups
uint32_t  numEventGroups

Variables

CUpti_EventGroup* * CUpti_EventGroupSet::eventGroups [inherited]

An array of numEventGroups event groups.

uint32_t CUpti_EventGroupSet::numEventGroups [inherited]

The number of event groups in the set.

CUpti_EventGroupSets Struct Reference

[CUPTI Event API]

Description

A set of event group sets. A set of event group sets. When returned by cuptiEventGroupSetsCreate and cuptiMetricCreateEventGroupSets a CUpti_EventGroupSets indicates the number of passes required to collect all the events, and the event groups that should be collected during each pass.

Public Variables

uint32_t  numSets
CUpti_EventGroupSetsets

Variables

uint32_t CUpti_EventGroupSets::numSets [inherited]

Number of event group sets.

CUpti_EventGroupSet * CUpti_EventGroupSets::sets [inherited]

An array of numSets event group sets.

CUpti_MetricValue Union Reference

[CUPTI Metric API]

Description

A metric value. Metric values can be one of several different kinds. Corresponding to each kind is a member of the CUpti_MetricValue union. The metric value returned by cuptiMetricGetValue should be accessed using the appropriate member of that union based on its value kind.

CUpti_NvtxData Struct Reference

[CUPTI Callback API]

Description

Data passed into a NVTX callback function. Data passed into a NVTX callback function as the cbdata argument to CUpti_CallbackFunc. The cbdata will be this type for domain equal to CUPTI_CB_DOMAIN_NVTX. Unless otherwise notes, the callback data is valid only within the invocation of the callback function that is passed the data. If you need to retain some data for use outside of the callback, you must make a copy of that data.

Public Variables

const char * functionName
const void * functionParams

Variables

const char * CUpti_NvtxData::functionName [inherited]

Name of the NVTX API function which issued the callback. This string is a global constant and so may be accessed outside of the callback.

const void * CUpti_NvtxData::functionParams [inherited]

Pointer to the arguments passed to the NVTX API call. See generated_nvtx_meta.h for structure definitions for the parameters for each NVTX API function.

CUpti_ResourceData Struct Reference

[CUPTI Callback API]

Description

Data passed into a resource callback function. Data passed into a resource callback function as the cbdata argument to CUpti_CallbackFunc. The cbdata will be this type for domain equal to CUPTI_CB_DOMAIN_RESOURCE. The callback data is valid only within the invocation of the callback function that is passed the data. If you need to retain some data for use outside of the callback, you must make a copy of that data.

Public Variables

CUcontext  context
void * resourceDescriptor
CUstream  stream

Variables

CUcontext CUpti_ResourceData::context [inherited]

For CUPTI_CBID_RESOURCE_CONTEXT_CREATED and CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING, the context being created or destroyed. For CUPTI_CBID_RESOURCE_STREAM_CREATED and CUPTI_CBID_RESOURCE_STREAM_DESTROY_STARTING, the context containing the stream being created or destroyed.

void * CUpti_ResourceData::resourceDescriptor [inherited]

Reserved for future use.

CUstream CUpti_ResourceData::stream [inherited]

For CUPTI_CBID_RESOURCE_STREAM_CREATED and CUPTI_CBID_RESOURCE_STREAM_DESTROY_STARTING, the stream being created or destroyed.

CUpti_SynchronizeData Struct Reference

[CUPTI Callback API]

Description

Data passed into a synchronize callback function. Data passed into a synchronize callback function as the cbdata argument to CUpti_CallbackFunc. The cbdata will be this type for domain equal to CUPTI_CB_DOMAIN_SYNCHRONIZE. The callback data is valid only within the invocation of the callback function that is passed the data. If you need to retain some data for use outside of the callback, you must make a copy of that data.

Public Variables

CUcontext  context
CUstream  stream

Variables

CUcontext CUpti_SynchronizeData::context [inherited]

The context of the stream being synchronized.

CUstream CUpti_SynchronizeData::stream [inherited]

The stream being synchronized.

Data Fields

Here is a list of all documented struct and union fields with links to the struct/union documentation for each field:

L

l2_transactions
CUpti_ActivityGlobalAccess
l2CacheSize
CUpti_ActivityDevice
lineNumber
CUpti_ActivitySourceLocator
localMemoryPerThread
CUpti_ActivityKernel
localMemoryTotal
CUpti_ActivityKernel

M

maxBlockDimX
CUpti_ActivityDevice
maxBlockDimY
CUpti_ActivityDevice
maxBlockDimZ
CUpti_ActivityDevice
maxBlocksPerMultiprocessor
CUpti_ActivityDevice
maxGridDimX
CUpti_ActivityDevice
maxGridDimY
CUpti_ActivityDevice
maxGridDimZ
CUpti_ActivityDevice
maxIPC
CUpti_ActivityDevice
maxRegistersPerBlock
CUpti_ActivityDevice
maxSharedMemoryPerBlock
CUpti_ActivityDevice
maxThreadsPerBlock
CUpti_ActivityDevice
maxWarpsPerMultiprocessor
CUpti_ActivityDevice

Notices

Notice

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

Trademarks

NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.