Pandas arrays

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.

For some data types, pandas extends NumPy’s type system.

Kind of Data

Pandas Data Type

Scalar

Array

TZ-aware datetime

DatetimeTZDtype

Timestamp

Datetime data

Timedeltas

(none)

Timedelta

Timedelta data

Period (time spans)

PeriodDtype

Period

Timespan data

Intervals

IntervalDtype

Interval

Interval data

Nullable Integer

Int64Dtype, …

(none)

Nullable integer

Categorical

CategoricalDtype

(none)

Categorical data

Sparse

SparseDtype

(none)

Sparse data

Pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.

array(data[, dtype, copy])

Create an array.

Datetime data

NumPy cannot natively represent timezone-aware datetimes. Pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data.

Timestamp

Pandas replacement for python datetime.datetime object.

Properties

Timestamp.asm8

Return numpy datetime64 format in nanoseconds.

Timestamp.day

Timestamp.dayofweek

Return day of the week.

Timestamp.dayofyear

Return the day of the year.

Timestamp.days_in_month

Return the number of days in the month.

Timestamp.daysinmonth

Return the number of days in the month.

Timestamp.fold

Timestamp.hour

Timestamp.is_leap_year

Return True if year is a leap year.

Timestamp.is_month_end

Return True if date is last day of month.

Timestamp.is_month_start

Return True if date is first day of month.

Timestamp.is_quarter_end

Return True if date is last day of the quarter.

Timestamp.is_quarter_start

Return True if date is first day of the quarter.

Timestamp.is_year_end

Return True if date is last day of the year.

Timestamp.is_year_start

Return True if date is first day of the year.

Timestamp.max

Timestamp.microsecond

Timestamp.min

Timestamp.minute

Timestamp.month

Timestamp.nanosecond

Timestamp.quarter

Return the quarter of the year.

Timestamp.resolution

Return resolution describing the smallest difference between two times that can be represented by Timestamp object_state

Timestamp.second

Timestamp.tz

Alias for tzinfo

Timestamp.tzinfo

Timestamp.value

Timestamp.week

Return the week number of the year.

Timestamp.weekofyear

Return the week number of the year.

Timestamp.year

Methods

Timestamp.astimezone

Convert tz-aware Timestamp to another time zone.

Timestamp.ceil

return a new Timestamp ceiled to this resolution

Timestamp.combine(date, time)

date, time -> datetime with same date and time fields

Timestamp.ctime

Return ctime() style string.

Timestamp.date

Return date object with same year, month and day.

Timestamp.day_name

Return the day name of the Timestamp with specified locale.

Timestamp.dst

Return self.tzinfo.dst(self).

Timestamp.floor

return a new Timestamp floored to this resolution

Timestamp.freq

Timestamp.freqstr

Return the total number of days in the month.

Timestamp.fromordinal(ordinal[, freq, tz])

passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself

Timestamp.fromtimestamp(ts)

timestamp[, tz] -> tz’s local time from POSIX timestamp.

Timestamp.isocalendar

Return a 3-tuple containing ISO year, week number, and weekday.

Timestamp.isoformat

Timestamp.isoweekday

Return the day of the week represented by the date.

Timestamp.month_name

Return the month name of the Timestamp with specified locale.

Timestamp.normalize

Normalize Timestamp to midnight, preserving tz information.

Timestamp.now([tz])

Return new Timestamp object representing current time local to tz.

Timestamp.replace

implements datetime.replace, handles nanoseconds

Timestamp.round

Round the Timestamp to the specified resolution

Timestamp.strftime

format -> strftime() style string.

Timestamp.strptime(string, format)

Function is not implemented.

Timestamp.time

Return time object with same time but with tzinfo=None.

Timestamp.timestamp

Return POSIX timestamp as float.

Timestamp.timetuple

Return time tuple, compatible with time.localtime().

Timestamp.timetz

Return time object with same time and tzinfo.

Timestamp.to_datetime64

Return a numpy.datetime64 object with ‘ns’ precision.

Timestamp.to_numpy

Convert the Timestamp to a NumPy datetime64.

Timestamp.to_julian_date

Convert TimeStamp to a Julian Date.

Timestamp.to_period

Return an period of which this timestamp is an observation.

Timestamp.to_pydatetime

Convert a Timestamp object to a native Python datetime object.

Timestamp.today(cls[, tz])

Return the current time in the local timezone.

Timestamp.toordinal

Return proleptic Gregorian ordinal.

Timestamp.tz_convert

Convert tz-aware Timestamp to another time zone.

Timestamp.tz_localize

Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp.

Timestamp.tzname

Return self.tzinfo.tzname(self).

Timestamp.utcfromtimestamp(ts)

Construct a naive UTC datetime from a POSIX timestamp.

Timestamp.utcnow()

Return a new Timestamp representing UTC day and time.

Timestamp.utcoffset

Return self.tzinfo.utcoffset(self).

Timestamp.utctimetuple

Return UTC time tuple, compatible with time.localtime().

Timestamp.weekday

Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.

If the data are tz-aware, then every value in the array must have the same timezone.

arrays.DatetimeArray(values[, dtype, freq, copy])

Pandas ExtensionArray for tz-naive or tz-aware datetime data.

DatetimeTZDtype([unit, tz])

An ExtensionDtype for timezone-aware datetime data.

Timedelta data

NumPy can natively represent timedeltas. Pandas provides Timedelta for symmetry with Timestamp.

Timedelta

Represents a duration, the difference between two dates or times.

Properties

Timedelta.asm8

Return a numpy timedelta64 array scalar view.

Timedelta.components

Return a components namedtuple-like.

Timedelta.days

Number of days.

Timedelta.delta

Return the timedelta in nanoseconds (ns), for internal compatibility.

Timedelta.freq

Timedelta.is_populated

Timedelta.max

Timedelta.microseconds

Number of microseconds (>= 0 and less than 1 second).

Timedelta.min

Timedelta.nanoseconds

Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.

Timedelta.resolution

Return a string representing the lowest timedelta resolution.

Timedelta.seconds

Number of seconds (>= 0 and less than 1 day).

Timedelta.value

Timedelta.view

Array view compatibility.

Methods

Timedelta.ceil

return a new Timedelta ceiled to this resolution

Timedelta.floor

return a new Timedelta floored to this resolution

Timedelta.isoformat

Format Timedelta as ISO 8601 Duration like P[n]Y[n]M[n]DT[n]H[n]M[n]S, where the [n] s are replaced by the values.

Timedelta.round

Round the Timedelta to the specified resolution

Timedelta.to_pytimedelta

Convert a pandas Timedelta object into a python timedelta object.

Timedelta.to_timedelta64

Return a numpy.timedelta64 object with ‘ns’ precision.

Timedelta.to_numpy

Convert the Timestamp to a NumPy timedelta64.

Timedelta.total_seconds

Total duration of timedelta in seconds (to ns precision).

A collection of timedeltas may be stored in a TimedeltaArray.

arrays.TimedeltaArray(values[, dtype, freq, …])

Pandas ExtensionArray for timedelta data.

Timespan data

Pandas represents spans of times as Period objects.

Period

Period

Represents a period of time

Properties

Period.day

Get day of the month that a Period falls on.

Period.dayofweek

Day of the week the period lies in, with Monday=0 and Sunday=6.

Period.dayofyear

Return the day of the year.

Period.days_in_month

Get the total number of days in the month that this period falls on.

Period.daysinmonth

Get the total number of days of the month that the Period falls in.

Period.end_time

Period.freq

Period.freqstr

Period.hour

Get the hour of the day component of the Period.

Period.is_leap_year

Period.minute

Get minute of the hour component of the Period.

Period.month

Period.ordinal

Period.quarter

Period.qyear

Fiscal year the Period lies in according to its starting-quarter.

Period.second

Get the second component of the Period.

Period.start_time

Get the Timestamp for the start of the period.

Period.week

Get the week of the year on the given Period.

Period.weekday

Day of the week the period lies in, with Monday=0 and Sunday=6.

Period.weekofyear

Period.year

Methods

Period.asfreq

Convert Period to desired frequency, either at the start or end of the interval

Period.now

Period.strftime

Returns the string representation of the Period, depending on the selected fmt.

Period.to_timestamp

Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period

A collection of timedeltas may be stored in a arrays.PeriodArray. Every period in a PeriodArray must have the same freq.

arrays.PeriodArray(values[, freq, dtype, copy])

Pandas ExtensionArray for storing Period data.

PeriodDtype

An ExtensionDtype for Period data.

Interval data

Arbitrary intervals can be represented as Interval objects.

Interval

Immutable object implementing an Interval, a bounded slice-like interval.

Properties

Interval.closed

Whether the interval is closed on the left-side, right-side, both or neither

Interval.closed_left

Check if the interval is closed on the left side.

Interval.closed_right

Check if the interval is closed on the right side.

Interval.is_empty

Indicates if an interval is empty, meaning it contains no points.

Interval.left

Left bound for the interval

Interval.length

Return the length of the Interval

Interval.mid

Return the midpoint of the Interval

Interval.open_left

Check if the interval is open on the left side.

Interval.open_right

Check if the interval is open on the right side.

Interval.overlaps

Check whether two Interval objects overlap.

Interval.right

Right bound for the interval

A collection of intervals may be stored in an arrays.IntervalArray.

arrays.IntervalArray

Pandas array for interval data that are closed on the same side.

IntervalDtype

An ExtensionDtype for Interval data.

Nullable integer

numpy.ndarray cannot natively represent integer-data with missing values. Pandas provides this through arrays.IntegerArray.

arrays.IntegerArray(values, mask[, copy])

Array of integer (optional missing) values.

Int8Dtype

An ExtensionDtype for int8 integer data.

Int16Dtype

An ExtensionDtype for int16 integer data.

Int32Dtype

An ExtensionDtype for int32 integer data.

Int64Dtype

An ExtensionDtype for int64 integer data.

UInt8Dtype

An ExtensionDtype for uint8 integer data.

UInt16Dtype

An ExtensionDtype for uint16 integer data.

UInt32Dtype

An ExtensionDtype for uint32 integer data.

UInt64Dtype

An ExtensionDtype for uint64 integer data.

Categorical data

Pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a pandas.api.types.CategoricalDtype.

CategoricalDtype([categories, ordered])

Type for categorical data with the categories and orderedness.

CategoricalDtype.categories

An Index containing the unique categories allowed.

CategoricalDtype.ordered

Whether the categories have an ordered relationship.

Categorical data can be stored in a pandas.Categorical

Categorical(values[, categories, ordered, …])

Represent a categorical variable in classic R / S-plus fashion.

The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:

Categorical.from_codes(codes[, categories, …])

Make a Categorical type from codes and categories or dtype.

The dtype information is available on the Categorical

Categorical.dtype

The CategoricalDtype for this instance

Categorical.categories

The categories of this categorical.

Categorical.ordered

Whether the categories have an ordered relationship.

Categorical.codes

The category codes of this categorical.

np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!

Categorical.__array__([dtype])

The numpy array interface.

A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either

  • the string 'category'

  • an instance of CategoricalDtype.

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.

Sparse data

Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a SparseArray.

SparseArray(data[, sparse_index, index, …])

An ExtensionArray for storing sparse data.

SparseDtype([dtype, fill_value])

Dtype for data stored in SparseArray.

The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse accessor for more.

Scroll To Top