Performant Customizing
This topic contains explanations of performant and non-performant customizing.
Python API:
Efficient Use of =search_record()/get_children()
Information
- The
ppms.search_record
Python API functions as well as DtpRecord.get_children
possess a di_list
parameter.
- This parameter restricts the number of data items loaded by the API function.
Non-performant
- If an empty list is handed over, all data items from the respective data table are loaded, even if you do not access these attributes.
- This leads to an enormously longer run time when loading the respective records.
- The more data items are available in the table, the longer it takes to load each attribute.
from ppms import ppms
project_record = ppms.search_record(461, ['000041'], []) # No restriction on the amount of data items to load
project_name = project_record.project_name.get_value()
total_remaining_effort = 0
for task_record in project_record.get_children(463): # No restriction on the amount of dataitems to load
remaining_effort = task_record.effort_rem.get_value()
total_remaining_effort += remaining_effort
ppms.ui_message_box(project_name, 'The remaining effort is {}'.format(total_remaining_effort))
Performant
- The run time of the macro can be reduced by a restriction to relevant data items.
from ppms import ppms
project_record = ppms.search_record(461, ['000041'], ['project_name']) # The data items to load are restricted to "project_name"
project_name = project_record.project_name.get_value()
total_remaining_effort = 0
for task_record in project_record.get_children(463, ['effort_rem']): # The data items to load are restricted to "effort_rem"
remaining_effort = task_record.effort_rem.get_value()
total_remaining_effort += remaining_effort
ppms.ui_message_box(project_name, 'The remaining effort is {}'.format(total_remaining_effort))
Value Ranges
Database access in =computeOutput()
Subqueries in value ranges
- A classic example for non-performant customizing are database queries within a
computeOutput()
value range.
- The PLANTA server loads modules as follows (simplified):
- The module customizing is analyzed in order to determine which data must be loaded.
- The data is fetched from the database restricted to the filter criteria.
- The value ranges are calculated.
- If there is a
computeOutput()
value range on one of the DIs in the module, it must be calculated.
- The calculation is done in the data area of each record of this table.
- A
ppms.search_record()
drops a database query from the server.
- Since this must happen for each record, there are now n-queries for the data area which reduce run time.
Note
- Many computeOutput value ranges can easily be converted to computeSqlValueRanges, which may have a positive effect on the runtime.
Correct Definition of Dependencies
Brief overview
- The
computeOutput()
, processInput()
, and checkInput()
value range functions require a defined list of dependencies.
- When executing the value range, the dependencies are packed in the DtpRecord of the data item as well.
- In the case of
computeOutput()
, the calculation is done each time one of the dependencies changes.
Dependency "asterisk" (*)
- A
computeOutput()
value range with "asterisk" dependency attempts to calculate itself as often as possible.
- As long as a data item with “asterisk” dependency has been loaded, each save, reset, and filter leads to a recalculation, regardless of whether the data item exists in this module.
- If no module in which the data item exists is opened anymore, it is not recalculated.
- In the case of complex value ranges, the continuous recalculation can lead to significantly worse run times.
Alternative
- If a data item is to recalculate itself, but there are no dependencies that could be entered, there is a tiny workaround that will help you:
- Create a virtual data item in the same table as a Yes/No field.
- This data item is entered as a dependency in the actual data item with value range.
- If you want the data item to recalculate itself, you simply have to invert the value of the virtual DI (
di.set_value(not di.get_value())
).
Customizing
Customize Data Areas
Note
- In order to prevent runtime issues, we recommend that you always use data fields from the same data table per data area.
Minimize Database Queries
The latency between database and application server is a particularly important point when regarding the performance.
- For each database query, a round trip between the application and database server is required:
- The PLANTA server sends a query to the database.
- The database answers with the result.
- Hence, for any customizing, you should always minimize the number of database queries.
The problem as well as the solution can be illustrated with the following example. For this purpose
- two systems have been compared:
- “local” system: Database and PLANTA server are installed on the same machine. The latency between the two of them, is less than 1ms.
- “remote” system: Database and PLANTA server are installed on different machines. There is a latency of about 6ms between them.
- two different Python macros are run in order to fetch data from the database and measure the time.
- 1 query per input value: It is iterated via a list of resources and a query is sent to the database for each resource.
- 1 query with n-input values: A list of resources is formated in a query and sent to the database.
The following charts show the increase of run time when a query is made:
1 Abfrage pro Eingabewert
The following example could be a “normal” code, which somebody wrote without having a guilty conscience
import time
from ppms import ppms
RESOURCES = ['R1', 'R8', 'R41']
def get_start_and_end_period_of_resource(resource_id):
"""Retrieve the start and end period of a list of resources"""
resource_record = ppms.search_record(467, [resource_id], ['start_period', 'end_period'])
start_period = resource_record.start_period.get_value()
end_period = resource_record.end_period.get_value()
return start_period, end_period
start = time.time()
for resource in RESOURCES:
start_period, end_period = get_start_and_end_period_of_resource(resource_id=resource)
end = time.time()
execution_time = round(end - start, ndigits=6)
ppms.ui_message_box('Executing the code took {} seconds!'.format(execution_time))
- For each resource, a database query (=
ppms.search_record
) must be sent, which will be a lot slower in a system with greater latency.
- If the number of queried resources increases, run time increases significantly with each resource.
1 query with n-input values
A more efficient version of the above code:
import time
from ppms import ppms
RESOURCES = ['R1', 'R8', 'R41']
RAW_QUERY = """SELECT
DI001218 "resource",
DI001230 "start period",
DI001231 "end period"
FROM
DT467
WHERE
DI001218 IN ({resources})
"""
def get_start_and_end_period_of_resources(resources):
"""Retrieve the start and end period of a list of resources"""
formatted_resources = "'" + "', '".join(resources) + "'"
query = RAW_QUERY.format(resources=formatted_resources)
result = ppms.db_select(query)
periods = {}
for resource_id, start_period, end_period in result:
periods[resource_id] = start_period, end_period
return periods
start = time.time()
periods = get_start_and_end_period_of_resources(resources=RESOURCES)
end = time.time()
execution_time = round(end - start, ndigits=6)
ppms.ui_message_box('Executing the code took {} seconds!'.format(execution_time))
- Now, all start/end periods are fetched by means of a single database, regardless of how many resources are queried.
- If the number of queried resources increases, the run time hardly increases.
Filter Twice
In modules, filtering is often done twice, which increases the run time unnecessarily.
- This is often caused by unclean module macros.
- If a module is opened initially, the
on_load()
and on_initial_focus()
functions are run.
- If both of these methods contain a
Module.menu(12)
, it is filtered twice.
Delay Module Loading
Loading data in a module always takes up runtime, which is why this should only be done if the user really wants to see the data.
- A module is loaded as soon as it is filtered with
Module.menu(12)
.
- To delay the loading, filtering must be called as late as possible.
- Here, the
on_initial_focus()
module method is suitable since it is opened when the user focuses a module for the first time.
- Whether the module data is yet to be loaded can be checked with the
Module.is_stub()
module method.
A submodule which is already opened at module start calls the
on_initial_focus()
method only when the user clicks in the module!
- Here you have to filter in
on_load()
.
Filter on Virtual Data Items
Filtering on virtual data items is always slower than filtering on real data items
- A filter on a real data item is inserted directly in the database query by the PLANTA software in order to restrict the number of results.
- For virtual data items, you have to fetch all data of the data table from the database and then filter it within PLANTA.
Use of Caches
If a value hardly changes but is often read, you should consider whether it makes sense to float the value.
- You should ask yourself the following questions:
- How often is the value retrieved?
- A value which is only retrieved once or twice per session is not a good caching candidate.
- How long does it take to detect the value?
- It is not worth the effort to cache simple calculations that are detected in a fraction of a second since here the effort of setting up and invalidating the cache exceeds the benefit.
- How often does the value change?
- Values that frequently change are good cache candidates, unless the calculation is longer and the values are often required at different positions anyway.
- When do I have to empty the cache?
- Often the cache is emptied if the value changes. In some cases it may also be sufficient to reset the cache at some positions in the code, if you know that only the successor code uses the cache.
- From where do I have to be able to empty the cache?
- If I offer a function with cache in a module, there has to be a function to empty this cache as well.
LRU cache
Tripping hazards
- The parameters of the cached function and the return value should be primitive data types (ciphers, texts, ...), PLANTA data types (DtpRecord, DataItem, ...).
- If I-texts are to be cached across sessions, the user language must be a part of the parameters, otherwise the I-text is cached in the language of the user that calls the function first.
Example code
import functools
from ppms import ppms
@functools.lru_cache(maxsize=None)
def get_dataitem_attributes(di_id):
"""Return the attributes of a DI or None if the DI is not valid"""
di_record = ppms.search_record(412, [di_id], ['column_type', 'df_length', 'dt', 'di_name', 'format_id'])
if di_record is None:
return None
column_type = di_record.column_type.get_value()
df_length = di_record.df_length.get_value()
dt = di_record.dt.get_value()
di_name = di_record.di_name.get_value()
format_id = di_record.format_id.get_value()
return column_type, df_length, dt, di_name, format_id
def cache_clear():
"""Clear all caches used in this module"""
get_dataitem_attributes.cache_clear()
Database
Use of Indexes
Information
- If queries are often filtered via the same search criteria, a performance improvement can be achieved by setting indexes on these search criteria.
- The respective type of index depends on the database.