The smallest operation unit supported by SCT memory database is a table, that is, there is no library concept;SCT memory database is based on columnar storage and supports C++ basic data structure types, including integer(Including u08, i08, u16, i16, u32, i32, u64, i64)、 float(Including float, double)、string(string)、date(date)、time(time)、datetime(datetime).
Any field of SCT memory database implies an index based on a balanced tree, see create,The save and load statements in the sql statement support the saving and loading of tables. Explanation:SCT memory database query language (hereinafter abbreviated as sql),similar to standard query language (hereinafter abbreviated as SQL),The basic syntax structure is compatible, and the overall processing flow is not compatible with SQL.
Note that the following SQL operation methods are different from standard SQL:
Features supported by SCT memory database:
SCT memory database contains two modes of executable programs and development SDK and examples.
Run the corresponding sct_local_mem.exe directly in the local mode; after entering the execution environment, enter the "help" command to view the help information or directly enter the sql statement to execute. For the corresponding sql specifications, see the sql quick guide section .
The network mode server is sct_s_mem.exe, and the server uses sct_mdb.toml as the configuration file. The current configuration content is only the binding port and user group information. For the toml file specification, see Related specifications (Note that the configuration file used by SCT does not support toml table array).
The sct_s_mem.exe will auto execute "init.sql" in work directory when start, this is useful for auto load data into memory.
The network mode client is sct_c_mem.exe, the client must specify the user name, password and optional IP port to run. The following is a valid running example. sct_c_mem.exe user user_pwd sct_c_mem.exe user user_pwd 192.168.1.11 55555
The default network mode verification policy is that the user name and password must be specified for the first login, 3 errors stop for 1 minute, the next 2 errors stop for 30 minutes, the next error stop for 15 days, and the next 1 error stop forever until The service restarts.
The corresponding system file maybe not include the help document, and the complete lastest help reference file is the content of this page on website.
The actual memory storage is columnar storage
The SCT memory database data processing flow is as follows:
The content of the where statement must be the retrieval condition using the index (different from the standard SQL statement here), and the condition is applied to filter the data rows.
The row information takes out the column content and performs expression calculation to generate a data set.
The data set can be expanded horizontally and vertically (join/union) to expand into a larger single data set.
Sort, filter, group and aggregate, project, rename and recalculate several post-operations in a single set.
sort
There are five operations for filtering:
filter inner row
filter between rows, complexity n*n
filter adjacent rows
filter row range
existence/subquery filter
group
project
rename
recompute
This article is used as a quick guide to write simple and effective SCT memory databasesql statements. The sql statement is composed of DDL and DML, where DML is composed of select, insert, delete, update, rows_of, cols_of, fields_of, show tables, etc.; DDL consists of create, alter, drop, save, load, exec and other components. Each sql statement non-constant expression or comment content can be connected with any number of blank or comment characters, the blank characters are 0x09 to 0x0c (note that the blank characters do not contain 0x0d, which means that the end of the SQL statement execution line defaults to unix format) or 0x20 ASCII characters. All SQL statements are strictly case sensitive, and the character encoding is the only encoding UTF-8. The terminator of each sql statement is ";", which can be omitted if it does not affect the parsing of the statement.
//This is a comment
/*This is a comment */
0 //Signed or unsigned integer 1 //Elliptical integer 100 //Elliptical integer +1 //signed integer -100 //signed integer -1000 //signed integer +0b000 //signed binary integer -0xa0b3 //signed hexadecimal integer 0xA0cD //Unsigned mixed case hexadecimal integer -847u2 //signed two-byte long integer 56u1 //Unsigned one-byte long integer 0x0au1 //Unsigned hexadecimal one-byte integer
+0. //Positive floating point number without fractional part -0.01 //Only decimal part negative floating point number -100.0 //Only the integer part of the negative floating point number -100e20 //Only integers and negative floating-point numbers with positive exponents 1.25E-32 //decimal and negative exponent part positive floating point number 1.25E+20 //decimal and positive exponent part positive floating point number 1.00E20f //strictly four-byte floating point number 1.00E20d //strict eight-byte floating point number
"a和中文bc" //Contains valid strings in Chinese and English (UTF-8 encoding) "ab和中文c\"\'\\\a\b\f\n\r\t\v" //Valid string containing control escape "ab和中文c\x0a\x0001\uaabb\Uaabbccdd" //Valid string containing hexadecimal escape and Unicode escape @"aabbcc(018d7##@!@^_""AA6)aabbcc" //Raw string separated by aabbcc
2020-02-03 //February 3, 2020 0020-01-31 //January 31, 2020
18:59:31 //18:59:31 00:00:01.100 //0 hours 0 minutes 1.1 seconds
2020-02-03 18:59:31.123 //February 3, 2020 18:59:31.123
The expression uses the C language expression basic structure and retains the priority of the corresponding operation symbol id //Domain expression represents the corresponding field value (id) //The bracket expression indicates the evaluation of the priority bracket content id+10 //additive expression represents the sum of the domain field and the constant value 10 id+max(grade) //additive expression represents the sum of the maximum value of the field field and the grade field id*sin(id) //Integral expression represents the product of the sin value of id domain and id domain id << 10 //The shift expression represents the value of the id field shifted by 10 bits to the left id&&sin(id)>0 //The logical AND expression represents the logical AND result of id and sin(id)>0 id/sin(id) //Integral expression represents the quotient of id and sin(id) id&min(id) //bits and expressions represent id and min(id) bits and results id!=0||id==10 //Logical OR expression means the logical OR result of id not equal to 0 and id equal to 10 Expression type conversion and function processing follow the following principles If the data type can be implicitly converted to the target type according to the C++ language, then it is implicitly converted to the target type according to C++. The function processing process is: if the function parameter is a field, the function acts on the value of each row of the field; if the function parameter is a constant value, the constant value is used as the function parameter; if the number of rows of the two parameters is not equal, the longer number of rows is used As an output result, short parameters that are insufficient are filled continuously to the same length with the last bit.
Function explanation and usage example after the function Type conversion function x2i08 //Convert any data type to a one-byte signed integer x2i08(type), x2i08(2.0) x2i16 //Convert any data type to a two-byte signed integer x2i16(type), x2i16(2.0) x2i32 //Convert any data type to a four-byte signed integer x2i32(type), x2i32(2.0) x2i64 //Convert any data type to an eight-byte signed integer x2i64(type), x2i64(2.0) x2u08 //Convert any data type to a one-byte unsigned integer x2u08(type), x2u08(2.0) x2u16 //Convert any data type to a two-byte unsigned integer x2u16(type), x2u16(2.0) x2u32 //Convert any data type to a four-byte unsigned integer x2u32(type), x2u32(2.0) x2u64 //Convert any data type to an eight-byte unsigned integer x2u64(type), x2u64(2.0) x2flt //Convert any data type to a four-byte floating point number x2flt(id), x2flt(10) x2dbl //Convert any data type to an eight-byte floating point number x2dbl(id), x2dbl(10) x2str //Convert any data type to a string x2str(id), x2str(10) x2date //Convert any data type to date x2date(b_date),x2date("2012-03-05") x2time //Convert any data type to time x2time(b_time),x2time("18:35:24") x2datetime //Convert any data type to date and time x2datetime(b_dt),x2datetime("2012-03-05 18:35:24") Aggregate function sum //Find the sum of a column sum(id) product //Find the product of a certain column product(id) min //Find the minimum value of a column min(id) max //Find the maximum value of a column max(id) avg //Find the mean of a column avg(id) count //Find the total number of a column count(id) sort //Sort a column sort(id) unique //Unique a column unique(sort(id)) count_unique //Find the unique count of a column count_unique(sort(id)) Range function first //Take the first element of a column first(id) last //Take the last element of a column last(id) first_k //Take the first k elements of a certai first_k(id,10) last_k //Take the last k elements of a certain co last_k(id,20) range //Take a column of interval elements range(id,1,5) nth //Take the nth element of a certain column nth(id,10) String function lcase //Convert the string to lowercase lcase("Abd") ucase //Convert the string to uppercase ucase("abc") concat //Connect two strings concat("abc","012") length //Find the string length length("NAME") all_case //Output all case string all_case("Insert") starts //Output is string start with starts(name, "Jim") contain //Determine whether the string contains the target contain(desc, "error") ends //Output is string end with ends(val, "es") substr //Take the string starting from the index to a fixed number of substrings substr(name, 1, 4) replace //Replace the substring of the string with the new string replace(name, 2, 4, "abc") trim_l //Move out consecutive spaces on the left side of the string trim_l(name) trim_r //Move out consecutive spaces on the right side of the string trim_r(name) trim //Move out consecutive spaces on both sides of the string trim(name) reg_match //Regular expression matching (using ECMA-262 (2011) standard) reg_match(id,"[1-9]+.*") fuzzy_match //Fuzzy Retrieval of Differences in Matching Characters fuzzy_match(id,"what"),fuzzy_match(type,"system", 5) Mathematical function abs //Find the absolute value of the value abs(value) ceil //Find the smallest integer that is not less than the given valu ceil(value) floor //Find the largest integer that is not greater than the give floor(value) round //Find the rounded value of the value round(3.4) exp //Find the definite power of e exp(value) pow //The power of the value pow(id,2.0) pi //Constant value Π pi() sqrt //Find the square root of the value sqrt(value) log //Find the natural logarithm of the value log(value) log2 //Find the base-2 logarithm of the value log2(value) log10 //Find the base-10 logarithm of the value log10(value) log1p //Find the natural (base e) logarithm of 1+value log1p(value) sin //Find the sine of the value sin(value) cos //Find the cosine of the value cos(value) tan //Find the tangent of the value tan(value) asin //Find the arc sine of the value asin(value) acos //Find the arc cosine of the value acos(value) atan //Find the arc tangent of the value atan(value) e //The mathematical constant e e() Internal function raw2hex //Memory hexadecimal display raw2hex(id) hex2raw //Convert hexadecimal to memory data representation hex2raw("00aabb") row_id //Output line number row_id() id //Output internal id id() now_id //Output max id that not used. now_id() hash_passwd //Output the string's hash password hash_passwd("mypass") Statistical function var //Output variance var(id) stddev //Standard deviation stddev(id) Sequence function seq //Output sequence 0 to n seq(10), seq(count(id)), seq(3)*3 rand //Output n random numbers from 0 rand(10) constants //Output n constants constants(1,10), constants("sss",10) Date function last_day //Return to the last day of the date month last_day(date()) is_leap //Return whether it is a leap year is_leap(date()) now //Return the current date and time now() date //Return the current date date() time //Return the current time time() add_day //Return the time or date time after n days add_day(now(),100) add_nanosec //Return the time or date time after n nanoseconds add_nanosec(time(),10000000) sub_day //Return the time or date and time n days ago sub_day(now(),100) sub_nanosec //Return the time or date and time before n nanoseconds sub_nanosec(time(),10000000) day_diff //Return the difference between the date or date time day_diff(x2date(d,date()) nanosec_diff //Return the difference in nanoseconds between time or date and time nanosec_diff(dt,now()) year_of //Return the year part of the date or date time year_of(date()) month_of //Return the month part of the date or date time month_of(date()) day_of //Return the day part of the date or date time day_of(date()) hour_of //Return the time part of the time or date time hour_of(time()) minute_of //Returns the minute part of the time or date time minute_of(time()) second_of //Returns the second part of the time or date time second_of(time()) nanosec_of //Return the nanosecond part of the time or datetime nanosec_of(time()) date_of //Return the date part of the date and time date_of(now()) time_of //Return the time part of the date and time time_of(now()) weekday //Return the day of the date or date and time (0 is Monday) weekday(now()) weekday_name //Return the name of the week of the date or date and time weekday_name(now()) weekday_cname //Return the Chinese name of the week of the date or date and time weekday_cname(now())
id>=10 //Greater than or equal to logic operation id>=all (select id from t2) //The logical operation of all results greater than or equal to the subquery id==any (select id from t2) //Logical operation equal to any result of subquery id==any (1,3,5,7) //Logical operation equal to any result of the set id==10 //Equal to logic operation id!=10 //Not equal to logical operation id>=10&&id<=20 //Logic AND operation between relational operations id>=10||(id < 5&& id > 1) //Logical OR operation between relational operations 10 <= id <= 20 //Relational operation is less than or equal to, less than or equal to operation 10 < id <= 20 //Less than, less than or equal to relational operations first //The first of the range operation last //The last of the range operation first 100 //The first 100 of the range operation last 100 //The last 100 of the range operation range 10 100 //The 10th to 100th range operation nth 10 //The 10th number of the range operation id max //The largest value range operation id min //The smallest value range operation id max 100 //The largest first 100 of the value range operation id min 100 //The smallest first 100 of the value range operation id in (1,2,3,4,7) //in operation takes the id of 1, 2, 3, 4, 7 id in (select id from t2); //Retrieve the result of id belonging to the sub-expression id not in (select id from t2); //Remove the result that id does not belong to the sub-expression Note that the following are invalid where conditions 10 < id //The field name id should be placed first
select * from table1; //Select all fields from table1 select id from tab2; //Select the id field from tab2 select id /sin(id) from tab2 where id < 10; //Select the id data with id less than 10 from tab2 and find the value of id divided by sin(id) select max(id) from tab2 where id < 10&&id > 3; //Select the largest id greater than 3 and less than 10 from tab2 select id << 10 from tab2 where id!=3 //Select the id whose id is not equal to 3 from tab2 and shift it to the left by 10 places select sin(id),name into t2 from t1 where id < 10; //Select from t1 with id less than 10 and insert t2 with sin(id) and name as the column select * from t1 where id in (select id2 from t2 where id2 > 10); //Select from t1 when id is in t2 and when id2 is greater than 10, all of t1 in id2 select * from t1 where id < 10 order by id, name asc; //Select from t1 when the id is less than 10 and sort the dictionary in ascending order by id and name select * from t1 where id < 10 group sum(id) by name; //Select from t1 when id is less than 10 and use name to group sum(id) select * from t1 where id < 10 all_each on distinct; //Select from t1 when the id is less than 10 and filter all rows to remove duplicates select * from t1 where id < 10 all_each on id==_id && name==_name; //Select from t1 when the id is less than 10 and filter all rows with the same id and the same name,_ means each of the previous set select * from t1 where id < 10 adj on id>_id && name==_name; //Select from t1 when the id is less than 10, then filter the row neighbor id greater than the previous data with the same id and name,_ means the previous line select * from t1 where id < 10 inner on id==name; //Select from t1 when the id is less than 10 and filter the data with the same id and name in the row select * from t1 where id < 10 first; //Select from t1 when the id is less than 10 and filter to keep the first row select * from t1 where id < 10 first 30; //Select from t1 when the id is less than 10, filter and keep the first 30 rows select * from t1 where id < 10 range 30 40; //Select from t1 when the id is less than 10 to filter and retain the data in the range of 30 to 40 rows select id1,name from t1 where id < 10 exists (select id from t2 where id==id1); //Select from t1 when id is less than 10 and exist in t2 id1 and name select id as nid,name from t1 where id < 10 not exists (select id from t2 where id==nid); //Select from t1 when id is less than 10 and there is no nid and name with id equal to nid in t2 select * from t1 where id < 10 recompute id=id+10; //Select from t1 when id is less than 10 and recalculate id equal to id plus 10. select * from t1 where id < 10 recompute id+=10,name=x2str(id); //Select from t1 when the id is less than 10, recalculate the column id equal to id plus 10, name equal to the id converted into a string select * from t1 where id < 10 project except id; //Select from t1 when the id is less than 10 all post-projection to retain the column data except id select * from t1 where id < 10 project id; //Select all post-projection columns whose id is less than 10 from t1 to retain column id select * from t1 where id < 10 rename id newid; //Select from t1 when the id is less than 10 and rename the id to newid select * from t1 where id < 10 project id range 30 40 group sum(id) by id order by id rand rename id newid; //Select all post-projection id from t1 when id is less than 10, filter data in the range of 30 to 40, use id to group sum(id), sort by id randomly, rename id to newid select * from t1 where id < 10 join (select * from t2 where id2 < 10 on id1==id2); //Select from t1 when id1 is less than 10 and expand horizontally. Select from t2 when id2 is less than 10 and id2 is equal to id1. select sin(seq(count(id))+1), id, name from t1 where id < 10 union (select * from t2 where id < 10); //Select from t1 when id is less than 10 sin(seq(count(id))+1), id, name vertical expansion, select all the id less than 10 from t2 select sin(id) as sinid,id from t1 where id < 10 inner on sinid < id; //Select sin(id) with id less than 10 from t1 as sinid and id filter to keep the data with sinid less than id in the row select * from t1 export as csv split by "," path = "t1.csv"; //Select all fields of t1 to export to t1.csv file separated by commas select * from t1 export as csv split by "," path += "t1.csv"; //Select all fields of t1 to export to or replace the t1.csv file separated by commas
insert into tab1 (name,id) values("Tom",100); //Single insert insert into tab2 (val,type)values(1.00,1)(2.00,2)(3.00,3); //Multiple inserts insert into tab2 (val,type)values(1.00,1),(2.00,2),(3.00,3); //Multiple inserts insert into t2 (id,name) select id,name from t1; //Insert subquery result insert into t2 (id,name) select seq(10)+1,"name"+x2str(seq(10)+1); //Insert custom data Note that SCT memory database does not support null fields, so insert must specify a value for each field
delete from tab1 where id < 10; //Delete all data with id less than 10 from tab1
update tab1 set name="Tom",id=10 where id==5; //Update the name field with id equal to 5 in tab1 to Tom's id to 10. update tab1 set value=10 where __id==5; //Update the value of the internal system id equal to 5 in tab1 to 10 (note that __id is a reserved keyword for the system, which is the result of the id() function output, and is only used for the double equal sign query condition)
create table tab1 (id u08 (false, false)); //Create table tab1, the initial field is id type is one-byte unsigned integer, no repetition is allowed create table tab1 (val float(),name string(true)); //Create table tab1, the initial field val is floating-point type and no repetition is allowed, and the field name is string type and repetition is allowed Note that the second parameter of the current field is a reserved parameter and has no effect in actual use
alter table tab1 add id u08 (false, false); //Add id field in tab1 alter table tab1 modify id u08 (false, false); //Modify the id field type in tab1, the modification will clear the previous field data, please backup alter table tab1 rename id newid; //Rename the id field in tab1 to newid alter table tab1 drop id; //Delete the id field in tab1
drop table tab1 //Delete table tab1
truncate table t1; //Quickly clear t1 and retain data structure
save tab1 as bin path+="tab1.bin" //Save table tab1 as a binary data file, the saved file name is tab1.bin, if there is a file, replace it save tab1 as sql path="tab1.sql" //Save table tab1 as a sql data file, and the saved file name is tab1.sql
Note: The table cannot exist when using load load tab1 from bin path="tab1.bin" //Load the binary data file of table tab1, load table tab1 from the file to the database load tab1 from sql path="tab1.sql" //Load the sql data file of table tab1, and load the table information from the file to the database (note that the table name tab1 is ignored, and the imported table name is determined by the content of the sql file)
Note: The table must exist when using import import t1 from csv split by "\t" path="t1.csv" //Import from t1.csv to t1 with \t as the separator import t1 from csvs path="t1.csvs" //Import t1.csvs into t1 (csvs file indicates that the string is converted to ASCII code hexadecimal characters)
abc abc0123 ABC_abc [Valid non-@string (see string)]
rows_of tab1 //Get the number of rows in tab1
cols_of tab1 //Get the number of columns in tab1
fields_of tab1 //Get the field name of tab1
show tables //Get all table names in the database
exec "data.sql" //Execute all statements in the sql file
atom_exec insert into t1 (id,name) select 1,"sss";delete from t1; commit //Transactional execution of multiple statements with ";" as the interval and ending with commit (the isolation level is the highest level, generally speaking, either all succeed or all fail, multi-statements only support insert, update, delete)
backup as bin path="b1.sbk" //Backup the entire database as binary data and save it in the b1.sbk file backup as sql path+="b1.sbk" //Backup the entire database as sql data and replace the b1.sbk file
restore from sql path="b1.sbk" //Restore the entire database from the sql backup file b1.sbk restore from bin path="b1.sbk" //Restore the entire database from the binary backup file b1.sbk
SCT memory database uses BST as an index internally. When querying for each column, the data is retrieved through the BST tree to construct query results, which is very suitable for small-scale interval queries; the content of the where statement is the first-level filter condition of the search condition, which determines the output row The most important part of the number, the query conditions should give priority to ensuring that the where part is fully filtered.
Where query conditions are given priority to use a <=id <=b instead of id>= a&&id <=b. The former will directly use the internal interval algorithm to complete, the latter will query the two parts of the interval separately for collection Intersection operation.
Use && in preference for where conditions, and try to avoid using || logical operations.
Try not to use != expressions in where conditions.
Use rows_of first to get the number of rows in the table, not the count field.
For the low effective data of the big data table, the query filter in is preferred for the initial filtering.
For large data sets, multiple valid data can be filtered according to conditions using exists.
When sql is parsed, the parsed syntax tree will be cached. It is recommended to merge insert statements as much as possible to avoid multiple parsing.
Use the where statement to filter the data set first, and reduce the use of data set post-operation processing.
The sql statement expression evaluation is slower than the C language, try to avoid using a large amount of expression processing; a large amount of expression processing can be reprocessed in C language after the parsed expression is obtained.
When importing large quantities of data, the load command is preferred, the import statement is second best, and the exec or individual insert statement is used last.
SCT memory database is written in C++, and the network side provides packaged C/C++/Java(include JDBC)/Python interface. The development library is located in the sdk directory and also provide a sample.
The Windows dynamic link library of vs2019 is provided in the installation package, located in the sdk/dll directory, also G++ 8.2.0 library in so directory. The header file is located in the sdk/inc directory.
Basic functions include connecting, exiting the server, executing SQL statements, and obtaining data set row and column information and value information. The following is a complete C language interface example (located in sdk/sql_lib_sample.c):
#include <stdlib.h> #include <stdio.h> #include "sql_lib.h" int main(int argc, char** argv) { if(!connect_to_server("192.168.5.10", 55555, "user", "user")){ return -1; } const char* s = "create table tab1 (id u32(true), name string());"; void* d = exec(s, strlen(s)); if (d == NULL) { printf("execute sql failed\n"); return 0; } del_datas(d); s = "insert into tab1 (id,name) select seq(1000000), x2str(seq(1000000)+10);"; d = exec(s, strlen(s)); if (d == NULL) { printf("execute sql failed\n"); return 0; } del_datas(d); s = "select id,name from tab1;"; d = exec(s, strlen(s)); //d = exec("select id,name from tab1 where id < 10;"); if( d == NULL ){ printf("execute sql failed\n"); return 0; } print(d); //print_bymyself(d); del_datas(d); s = "drop table tab1;"; d = exec(s, strlen(s)); exit_to_server(); return 0; }
Network connection is unsuccessful 1. The local end of SCT does not support network functions. Make sure that the server is turned on when using the client to connect; in the default configuration file, sct_mdb.toml, the server section port specifies the local link port. Make sure that the corresponding port is not occupied. 2. SCT uses login verification strategy. Ensure that the number of operations does not exceed the threshold requirement.
where statement is not supported SCT where statements are not compatible with standard SQL content. Related statements such as inline condition filtering (id==name), exist, group by, etc. need to be converted to data post-processing operations。
Function reg_match is not working properly The SCT regular expression parsing library is an implementation of ECMA-262 (version 2011), compatible with most regular expression grammars, and the addition of \h matches Chinese \H matches non-Chinese boundary characters. In order to avoid performance pitfalls under special circumstances, the regular expression library uses non-backtracking matching by default, that is, .* will match all, and should be avoided when using it.
fuzzy_match is particularly slow The default fuzzy matching will match all matching results with a difference of less than half the length, so the matching algorithm is not suitable for longer strings.
Error feedback SCT memory database has been thoroughly and fully tested, but some unknown errors may still be unavoidable. For related errors, please send the problem description to admin@shucantech.com.